Efficient estimation of the stationary distribution for exponentially ergodic Markov chains

Efficient estimation of the stationary distribution for exponentially ergodic Markov chains

Journal of Statistical Planning and Inference 27 (1991) 105-123 105 North-Holland Efficient estimation of the stationary distribution for expon...

990KB Sizes 0 Downloads 60 Views

Journal

of Statistical

Planning

and Inference

27 (1991) 105-123

105

North-Holland

Efficient estimation of the stationary distribution for exponentially ergodic Markov chains Spiridon

Penev *

Institute of Applied Mathematics & Informatics, 1000 Sofia, Bulgaria Received

4 April

Recommended

1989; revised

manuscript

received

Abstract: In a classical paper by Dvoretsky, pirical

distribution

function

been shown.

If X,,X,,

tion function

as an estimator

totic

efficiency

exponentially estimator

arises ergodic

Using the bounded local asymptotic

Kiefer and Wolfowitz

in case of i.i.d.

observations

. . . ,X,,, . . . is only a stationary of the (continuous)

in this Markov

case.

subconvex

minimax

Under

sequence)

in a local asymptotic

1989

bound

Key words and phrases: Local

sequence

stationary some

the asymptotic

X1,X,,

. . ..X.

of the em-

variable

distribution

additional

F. But the question

assumptions

(stationary

distribution

function

X has

distribu-

of its asymphomogeneous is an efficient

sense.

loss function equals

minimaxity

of the random

we still could use the empirical

we show that the empirical

minimax

AMS Subject Classification: Primary

sequences;

21 September

by J. Pfanzagl

62G20,

asymptotic

IF,,(t)-F(t)J) with g-bounded,

g(sup, fi

IY(t)l) w h ere Y(t) is a certain

Eg(sup

62M05;

secondary

minimaxity;

62605,

empirical

Gaussian

increasing, process.

the

62G30. distribution

function;

Markov

stability.

1. Introduction The problem of asymptotic minimaxity of the empirical distribution function (EDF) has attracted the attention of many statisticians. In a pioneering paper of Dvoretsky, Kiefer and Wolfowitz (1956) it was shown that in case of i.i.d. observations the EDF is asymptotically minimax among the collection of all continuous distributions. As Millar (1979) notes, “This paper has stood for over 20 years as one of the pivotal achievements of nonparametric decision theory”. One direction for generalizing this result was to show the asymptotic minimax character of the EDF in case i.i.d. among smaller classes of DF’s (such as the class of the concave distributions, the distributions having a decreasing density with respect to Lebesgue measure, the IFR-distributions and so on). * Research tract

partially

supported

by the Ministry

of Culture,

Science and Education

1035.

0378-3758/91/$03.50

0

1991-Elsevier

Science

Publishers

B.V. (North-Holland)

in Bulgaria;

Con-

S. Penev

106

/ Efficient estimation of the stationary distribution

Kiefer and Wolfowitz (1976) proved the asymptotic minimaxity of the EDF in the class of all concave distributions. In the papers Millar (1979, 1983), using the modern technique of convergence of experiments and the general formulation of the asymptotic minimax theorem of Le Cam (1972), the asymptotic minimaxity of the EDF among each of the above mentioned (and also other) classes was shown. On the other hand it was natural to try to generalize the results of Dvoretsky, Kiefer and Wolfowitz in another direction - namely to avoid the i.i.d. assumption. Indeed this makes the problem harder, but there exists a result of Billingsley (1968) in the literature, showing the existence of a limit distribution for the EDF of a stationary q-mixing sequence of observations. This bolstered our feeling that it could be done similarly for weakly dependent observations. Also there was the book of Roussas (1972), showing the possibility to prove local asymptotic minimax optimality of estimators and tests in parametric situations also in case of observations arising from stationary ergodic Markov sequences. As far as we know, not much has been done in applying this in nonparametric situations. Our contribution here is, using the theory of convergence of experiments, to show that the (piecewise linear and continuous version of the) EDF for a special class of stationary ergodic Markov sequences possesses a local asymptotic minimax (LAM) optimality property. We do not strive for the utmost generality in the assumptions because this would make the proofs more involved. Also the discussion will be heuristic in some part. Let us start with a concise outline of the probabilistic setting we deal with. We consider a homogeneous Markov chain X= (Xn)nkO taking values in (E, %). Here E = [0, l] and YI3its Bore1 a-field. The chain has a regular transition probability kernel P(x,A),xe [0, l],A E % and (to begin with) arbitrary initial distribution g(xo). We assume

that the following

condition

Condition (A). Existence of bounded the unit square such that P(x,A)

=

iA

P(Y

Lebesgue

measure

density,

i.e. a bounded

function

p(y 1x) on

1xl dy

for all A E S&VXE [0, 11 and, moreover, positive

holds:

inf, p(y 1x)26>0

for all Y in a set S with

h(s) > 0.

This condition has important consequences: (i) Doeblin’s condition holds. (ii) There is a uniquely defined invariant probability measure rr for P( . , . ) and moreover exponential convergence holds, i.e. there exist qe (0,l),a>O such that ;trr

;;g

IP”(x,B)--rc(B)I

5aqn

for all n.

107

S. Penev / Efficient estimation of the stationary distribution

(Loeve (1960, Chapter

VII, 27.3), Doob (1956, p. 197)). Here P”(.

n-step transition probability (iii) If 9(X0) := n then the v(n) =aq”. Here we use the (1968, Chapter 20, 20.2). (iv) 7reA lE. Th’is f o 11ows p( . , . ) is bounded. Additionally to Condition

, .) denotes

the

kernel. sequence X= (X,,)nZo is stationary and v-mixing with definition of p-mixing sequence given in Billingsley easily

from

the equality

n=nP

and

the fact

that

(A) we assume:

Condition (B). Z- 2 lE. After these probabilistic preliminaries let us introduce the statistical model we consider. Suppose we have weakly dependent observations x0, xi, x2, . . . ,x, from a stationary Markov chain X with (unknown) transition probability kernel P( . , - ) and initial law 9(X0)= 71 satisfying Conditions (A) and (B). The problem is to estimate the stationary distribution function F. Of course we still could use the EDF as we would certainly do (relying on the result of Dvoretsky, Kiefer and Wolfowitz) if the observations were i.i.d. But now the question of the asymptotic efficiency of this estimator arises. We shall see (Theorem 5.1 and Corollary 5.1) that the (piecewise linear and continuous version of the) EDF preserves its optimality in a local asymptotic minimax sense.

2. Perturbations

and stability

Now we would like to discuss the difficulties which arise when we try to describe LAM lower bounds in non-i.i.d. situations. The complexity here is of qualitative nature. Let us explain it in some words. In order to describe LAM lower bounds one has to consider perturbations of a given probability structure in a neighborhood of this structure. Now in the i.i.d. case describing such neighborhoods is an easy job because once perturbed the density for one observation, one has already perturbed the whole (product-density) structure. In case of dependency there are much more possibilities for perturbance. But they also can not be too much because one has to preserve the main properties of the structure (e.g. stationarity, ergodicity) after the perturbation. That means that the structure has to possess some kind of stability. To describe this property we have first to define the perturbations of the chain in a suitable form. Let H be the set of all measurable functions h(x,y) on the unit square with E h2(Xo,X,)< 03, E(h(X,,X,) 1X0) = 0 almost surely. H is a Hilbert space with respect to the scalar product (hi,&)

= Eh,Wo,X,MXo,X,).

108

S.

Penev / Efficient estimation of the stationary distribution

Let p( . ) denote the density of n with respect We denote the corresponding norm by 1x)&4

llhll?f = jj h2(x>y)p(_v

to A IE.

dy h.

Let HO be the subset of all bounded (sup-norm) h E H. Then H, is dense in H which follows for example from Strasser (1985, Lemma 75.5). For h E HO and sufficiently

large n define

the (perturbed)

ph/di (XT A) =

c

P(Y

A

transition

kernel

I-9(1+ W,Wfi) dy =

P,,,\ii; by:

s

Ph,dY

A

1X>dy.

Now we shall see that under small perturbations of the kernel P( . , a) of the form prescribed the chain X remains geometrically ergodic with invariant probability =h/di

-

Tt.

It is obvious that Condition (A) remains valid under small perturbations using kernels Ph/fi if h E HO and n is large enough for it will exist a positive constant 6, I 6 such that infph/fi(y x

1x)>dl>o

if

inf& x

) x)z6>0.

For n large enough we get transition probability kernels Ph,\i;2(. , . ) satisfying Condition (A) for all h E H,,. Hence (cf. (ii)after Condition (A)), a unique invariant probability zh,,&, (. ) for Ph/fi(. , . ) exists with nh/j, 4 il for all h E HO. The following lemma is true: Lemma 7th/fi

-A

2.1. For n large enough, IEfor all hEHo.

under the Conditions

This lemma shows that also Condition tions we consider. Corollary

(B) remains

(A) and (B) it holds

valid under the small perturba-

2.1. For n large enough, zh/\i;;- rcfor all h E HO.

Our next step is to see that not only remain Conditions (A) and (B) valid for the small perturbations described, but also a kind of stability property is true. To describe it let us denote by m the set of finite signed measures on [0, l] endowed with the variation norm 11 . I/ (which makes it a Banach space). The kernel P( . , -) defines a linear mapping m + m by fiP( .) = 1p(dx)P(x, . ) for p E m. The norm II . 11defines in a natural way a norm in the space of linear bounded operators B : m + m by

IIBII=sup( IIiuBII:lliull%1). Let us fix some arbitrary

d>O. Denote

Kd = h EH: sup lh(x,y)j cd X,Y

. I

109

S. Penev / Efficient estimation of the stationary distribution

The stability

property

means that for all h E Kd and all n 1 n,(d) a constant

C(P)

exists such that /In-

=h/\in

11i

c(p>

11ph/\l;l

(2.1)

- p 11.

In a more general framework and for general norms such stability requirements are studied in Kartashov (1981) and Kartashov (1984) who considers the so-called strongly stable Markov chains. For the variation norm we consider it was shown in Neveu (1964, Chapter V.3.2) (cf. also Kartashov (1981)) that strong stability and in particular (2.1) is ensured by Doeblin’s condition. Hence (cf. (i) after Condition (A)), (2.1) holds in our case. For a given h EH~ and n large enough nh,,,$ --71. Write the density (dn(h,\i,)/ drc)(x) in the form l+&(x). If SUP~,~ lh(x,y))~C, and sup,,,p(y/x)~C,, C= C, . C,, then: supll~~lc 1 11 pPh,& - P)ll4 I?&%. Hence: IIPh,fi - PI/ i c/hi. Finally

(2.2)

using (2.1) we get: fill 7r-

nh/vs

11 =

V%Z

"

~h_(x)~(x)~

dKdic(P)lIPh,vi;-P

11Ice.

(2.3)

10

Write P#,x for the law of X0, Xi,. . . ,X,, under Ph/\ii2( . ) and 9(X0) = nh/&. Denote by pi;,( .) the density with respect to A lE of the measure ?rh&.

Under conditions (A) and (B) it holds

Lemma 2.2.

dP/$&

loi3 where An,h s

dP,j”’

(X0,x,

3 . . . >

X,)

=&h-i

llhll~+o~,y(l)

N(0, I(hll$) under PF’.

3. The construction

of the mapping

z

Now we want to introduce the main steps in finding the LAM bound for the estimators or the stationary distribution of the chain. We are going to follow Millar (1983, Chapter VIII). Fix some h E Ho. Write v, for zh/“;l and Q,( . , . ) for Ph,dx(. , . ). Let us consider F J&(u) = 10” pg,(X) dX. It holds: -iI F,,du) Here,

of course,

= Fe(u) +

I;I 0

h-WPW

dx.

F,(u) = F(u) = jt p(x) d_x= x([O, u]). We have:

&(F,,~(u)

-F(U)) = fi [” h-,(x)p(x) ti = G(~, - rr)[~,4. -0

(3.1)

S.

110

Crucial Kartashov

Penev / Efficient estimation of the stationary distribution

in the sequel is the following (1984): v, = rt(Z- (Q, - P)R)-’

presentation

given in Kartashov

(1981) and

(valid for large n).

Here R = (Z-P+ZZ-' =ZZ+ CT=*=, (P'-ZZ)and ZZ= 1 on is the stationary projector of the transition kernel P, i.e. ZZ(x, dy) = n[O, 11. rr(dy). By Z we denote the identity mapping I: m + m and by QP: QP(x,A)= jQ(x,dy)P(y,A). bounded because of the strong stability property (Kartashov For large n we can present v, as a convergent sum:

The operator R is (1981, Theorem 1)).

~[Z+(Q,-P)R+((Q,-P)R)2+~~~]

v,=

= n(Z+(Q,-P)R)+o(IIQ,-PII)=n(Z+(Q,-P)R)+o(l/fi) For the last equality Hence fi(v,

(2.2) has also been used.

- n)

=&T(Q~-P)R+o(I)=&(Q,-P) In view of the obvious

(Q, - P)Zi'=0 we have:

equality

=\/;;n(Q,-P,.~~~(P’-n)lo,u)+o(l).

fi(P,,&-F(u))

1x) the k-step transition

Let us denote by #)(y ten in the form fi(Ph,fi(~) 111

ZZ+[!~(P~-n)]+o(l). L

(3.2)

density.

Then (3.2) may be writ-

-P(u)) 1

=

Y)P(Y 1X)P(X) dx dy

W,

I 0 .I’0 U +kE,

I

I ZG,Y)P(Y

1i.i 0.0

1x)~(x)P’%

1~) kdydz+o(l).

0

This gives rise to the following definition of the mapping tl : H+B Banach space of continuous functions x in [0,11,x(O) =x(l) = 0, endowed supremum norm): U rlh(z.4)

(B - the with the

1

h(x,y)P(y

=

1X)P(X) tidy

s0 I 0 + j,



1

1

0

0

0

Z&Y)P(Y 552

I x)P(x)P(%

I u) dx dy dz.

(3.3)

This mapping could be used for construction of an abstract Wiener space (Millar (1983)). But at this point we have to overcome some additional difficulties. The

S. Penev / Efficient estimation of the stationary distribution

111

problem is that the mapping ~1: H -+ B lacks the desirable one-to-one property (many kernel densities ph(y 1x) with essentially different functions h will yield the same stationary density pb(x)). In order to make the mapping one-to-one, we decompose the space H in a direct sum of ker ~~ and its orthogonal complement H,:H=kerr,@H,. Now if hi, h2 E H and hl = hiker + hlkerl, h2 = hzker + hzkerl are their corresponding decompositions, then rlhl = s,h, iff h, - h2 E ker rl almost surely and this means h Iker~= hzkerL almost surely. Hence if we consider the rather narrower parametrization, using only the subspace HI instead of the space H, then the mapping T : Hl -+ B (T - restriction of T, on the space H,) will be one-to-one.

4. The dual mapping

T*

:

B* --+H:

The closure of TH, in sup-norm gives the space B. The dual space B* coincides with the set of finite signed measures on [0,11. Denote by ( . , 1jB the duality relation between the elements of B* and B. For a finite signed measure m on [0, l] and for arbitrary h E Hl we can write (m, ThjB = (T*m, h) =

Now

s*m(s, i’)h(s, t)p(t

remember that the functions ( X0) = 0 almost surely. Hence for on t we can write:

depending

(4.1)

h EH

we

E(h(&,X,)

1@p(s) dt ds.

any

satisfy the property functions c(s), c~(s)~>~ not

(m, r&n

-

c(s)Ms, OP@j S)P(S)dt b m(du)

U,,

u,(d

-

~dW+~‘(r

] t)

drl

. m(du)W, OPU 1@P(S)dt h 1

=

1

-1

1

m[t,l]--E(s)+

i1 CO .o

(m [r, 11- P&)p(@(r

f so

k=l . h(s,

t)p(f

1 S)P(S)

dt

ds

( t) dr 1

(4.2)

112

S. Penev / Efficient estimation

We have denoted

of the stationary distribution with respect to m of

by E(s), Sk(s)kzl the results of the integration

c(s), C!&)k> 1. The functions S(s) and ?&) should be chosen j 7*m(s, f)p(t ) s) dt = 0 for all s. This will be true if E(s) = j F(b 1s)m(db) and &(s) =

F@+‘)(b

r*m(s, t) EH~ cH,

so that

i.e.

k = 1,2, . . .

1s)m(db),

where

’p’“‘(b ) s) db

Fck)(t ) s) =

.i 0 (here we have used Fubini’s theorem paring (4.1) and (4.2) we get: T*m(s,

C) = m [t,

l] -

’ l F(b d0

and the integration-by-parts

+,t,

m [r, ]_

0

1f7(k+I) (a ( s)m(da)

0

s)m(db)+ we have used Fubini

and integration ~,o,.,(t)

-F(u

pCk)(r1t) dr 1

1

E k=l

(again

Com-

s)mW

si s 1

formula).

(F’k’(r 1t) -Fck+

by parts).

1s) + kgl

‘)(r 1s))m(dr)

s0

(Fck)(u

Hence j t) -Fck+‘)(u

j s))]

m(du).

Now we have to prove that not only T*m(s, t) E H, but even T*m(s, t) EH~. At first we note that if Qn,h, =(I +hi/v%)p(y 1x), i= 1,2, then r,h, -T~Iz~=O means in view of (3.2) that fi~(Q,,~,Qn,JR =O. Because of the one-to-one property of the mapping I-P+I7=R-‘:m -+m (Kartashov (1981)) it follows then that rQ,,h, = nQ,,tzz. Hence if h E ker TV, then essentially # h(s, t)p(t 1s)p(s) ds=O for all t E [0, l] and 1: h(s, t)p(t I s) dt = 0 for all SE [0, l] hold. In view of these equalities

one can easily see that the equality 1

1

T*m(.s, t)h(s, t)p(t 1s)p(s) ds dt = 0 r .o i LO holds,

which means

Proposition

s*m E H,.

4.1. It holds Il7*ll?f=

[I -0

{ E(Y(u)Y(o)))m(du)m(du),

!” 0

where Y(t), t E [0, l] is the ‘Billingsfeyprocess’ (Biflingsley (1968, Theorem 22. l)), i.e.

113

S. Penev / Efficient estimation of the stationary distribution

the Gaussian stochastic process with a.s. continuous paths, E Y(u) = 0, P(Y(O)= Y(l)=O)=l,

E{ Y(u)Y(u)}

= F(min(u,

u))-P’(u)F(o) 1

zp,u)w-% 1.i’ $1ii’ o)(t)F’k’(U two -

I I

I 0 @To - Wu)F(u)

+

0 1

j,

I0

Z[O,

F(u)F(u) *

0

Remark 4.1. Formula (4.3) is just another covariance function in Billingsley (1968).

5. The local asymptotic

information

version

of the formula

(4.3)

22.12 for the

hound

Assume the chain satisfies the Conditions (A) and (B). The expansion of log(dP&/dP&@) in Lemma 2.2 and the first lemma of Le Cam (1972) show that the measures Pi;k and P,$“’are contiguous. Denote by

A n,h

= &

:$i

h(xi*xi+

1).

The Cramer-Wold device, combined with Theorem 20.1 of Billingsley (1968), shows normal with mean that the vector dn,h,,dn,h *,..., dn,hk converges to multivariate vector zero and covariance matrix z=

(oc)i,i=r,2

,_,_,k,

oi,j=

(hi,hj).

If & is the canonical normal cylinder measure on HI, then its characteristic function is Q(h) = exp( - + llhll;) for all h E HI”= H,. The crucial fact is that the image R of this cylinder measure by the mapping r has characteristic function (Millar (1983, Chapter

V.l,

exp{ -9

(1.7))): IIr*mll’,} = exp

1

1

-3 L i’.r 0

0

E[Y(u)Y(u)]m(du)m(du)

. I

i.e. R (on C[O, 11) is the law of the process Y of Proposition 4.1. The process Y(t), te [0, l] possesses continuous trajectories a.s. and R is a a-additive measure on the space B. Denote by Kid (d>O) the set Kid = {h EHI 1SUP~,~ (h(x,y)/ O the convergence of the experiments {Pi;k : h eKld} to the limit experiment t%, the Gaussian shift for the abstract Wiener space (r,H,,B) (see also Millar (1983, Chapters 11.2.3, V.2)). We have proved also that fi(Ph,,&(u) --F(u)) = th(u) + o(l). Hence &(y-F/&

= fi(y-F,)+fi(F,-F,,&

=y’-sh+o(l).

114

S. Penev / Efficient estimation of the stationary distribution

Here Y’ = fi(Y -F,) will be considered as an estimator of the ‘local parameter’ rh if Y is an estimator of the ‘global parameter’ Fe. Let g be a bounded increasing function defined on [O,oo) and I(x) = g(sup, Ix(t)]), where x is a real continuous function on [0,11. If F is the continuous distribution function of X0 then the loss when estimating

F by the function

x will be defined to be equal to l(fi(x-F)). Then the same arguments as in Millar (1983, Theorem 1.10.(a)) or in Strasser (1985, Chapter 83) lead to the following theorem: 5.1. Denote by b any Markov kernel in the decision space. Then under the Conditions (A) and (B) it holds

Theorem

lim lim inf inf “+L= b d-m

sup hcK,d

1(~(Y_Fh,J;;))b(x,dY)~~~(dx)2E

~(Y,,(P,).

13

Here Y,,(,, denotes the ‘Billingsley process’ with F= F,, Fck)(u 1t) = Pk(t, [0, v)), FO(t) = n [0, t). Note that in Millar’s theorem the inf is taken over the so-called generalized procedures (which are a little bit more than the Markov kernels). But taking inf only over the Markov kernels we preserve, of course, the sign of the inequality. Corollary 5.1. Let now Kd= {h EH ) SUP~,~ 1h(x, y) I< d}. (A) and (B) it holds lim lim inf inf d’03 n-03 b

sup hcKd

This is of course true, because has to be taken.

6. The asymptotic

efficiency

Then under Conditions

I(~(Y-F,,~))b(x,dY)P~,;~(dx)lEz(Y,,(,,).

we have enlarged

the set over which the supremum

of the EDF

Now we want to show that the lower bound in Corollary 5.1 can actually be attained and that the efficient estimator attaining it is the (piecewise linear and continuous version of the) EDF (see e.g. Billingsley (1968, Chapter 11.13)). In fact we would like to have the ‘standard’ EDF

but there is a problem because it does not belong to the class of decision functions we consider. Note however that asymptotically it does not matter if we take the ‘standard’ EDF or its continuous version. Abusing notation we shall denote both

115

S. Penev / Efficient estimation of the stationary distribution

of them in the same way. Alternatively one could try to extend Millar’s results for the case of D-spaces instead of separable Banach spaces but we do not make such an effort here. We have to show

(6.1) Our loss function in order

1 is bounded.

The discussion

to show (6.1) it suffices

in Millar

(1984) shows then that

to show that for every fixed d>O

under

I’$&,

9 fi@-Ft,,/d

-

yFo,(P)

for an arbitrary sequence hr, h,, So we have to prove a uniform 22.1 of Billingsley (1968). The illustrate only some steps in the To make a start, we introduce expected value under Pr>h; n e(u)

= EoUI,o,.,Wo)

&,~,/d~)

. . . , h,, . . . in Kd. (in shrinking neighbourhoods) variant of Theorem whole proof is tedious. We skip the details and proof. the following notations: Eh,,,x( .) will denote the

-FoW)U,o,,,(&)-Fo(u))l;

= Ewd(~,o,u~(Xo)

-Fh,/,~(u))(I[o,u)(xk)

-Fh,,/fi(@)b

02w = @o(u)+ 2 E@k,O@). k=l

We have already seen that the v-mixing property, the uniqueness of the stationary distribution and the exponential speed of convergence remain valid for small perturbations of the transition kernel. Now we want to show some uniformity of this validity when h := h,/fi, h, E I&d>0 fixed. First of all, we show the following lemma: Lemma

6.1. Under the Conditions hsz; n

d

suP x

(A) and (B):

sip IP;Jvrn(X,A) - %,J,dA)l 5 4”

for sufficiently large n (here P&G denotes the n-step transition probability corresponding to the kernel Ph,/&(x, A) = jA ph,/fi(y 1x) dy and nh”,,& iS the stationary probability distribution, corresponding to the same kernel; q E (0,1)). Corollary

6.1. For large n it holds: hs~g n

dzere vh,/di)

f

d

i=l

i2v5GXl
= suPx sup,4 jp&,,&,A)

- n,,,,&(A)\.

116

S. Penev / Efficient estimation of the stationary distribution

Lemma 6.2.

uniformly in 1.4and in h,eKd. Analogous tedious calculations like in the proof of Lemma 6.2 show that corresponding uniform (in shrinking neighbourhoods) variants of Lemma 4 (Chapter 20), Theorem 20.1 and Lemma 1 (Chapter 22) in Billingsley (1968) hold. This shows the convergence of the finite dimensional distributions in Theorem 22.1 (Billingsley (1968)) is uniform. Now it remains to show that for large n, for all E > 0, q > 0 and 6 E (0,l) and for all h, E Kd the inequality P~~~~(W(r,,h,/\,6)r&)rSrl holds,

(6.2)

where Yn, h,/&i

=

fi@n

-

Fh,/d

~(x.6)=,t~~P6

IW-W)l.

lnequalrty (6.2) means some kind of ‘uniform tightness’ for all h, E Kd. This can be proved in analogous way as in the proof of Theorem 22.1, using on the corresponding places the uniform variants of Lemma 4 (Chapter 20) and Lemma 1 (Chapter 22). Remark 6.1. The condition of boundedness of the loss function 1 can easily be weakened. The most trivial way to do this is to replace 1 by min(a, 1) and then to like I,,F(~) =g(n{ (x(t) - F(t))2F(dt) with g let a--+00. Also other loss functions bounded, increasing and uniformly continuous could alternatively be used. Remark 6.2. We consider in this paper the state space E= [0,11. In fact this is not a severe restriction. It is not difficult to see that Theorem 22.1 of Billingsley can be reformulated for the case that the state space is R’ by conveniently defining the function gt(o) there. Correspondingly our optimality result can be reformulated to cover this case.

7. Appendix This appendix

contains

the proofs

of the main

statements

in the paper.

Proof of Lemma 2.1. For large n the kernels ph/fi(. , .) satisfy Condition (A) if h E HO and hence (cf. (iv)) zh/fi 9A for n large enough. In view of Condition (B) it suffices to show that n Q rr,,,\i;; for n large enough. We shall see even more. Instead of ‘shrinking’ functions h/fi +O let us consider ‘fixed’ functions h but in suitably small neighbourhood (sup-norm) of 0. Of course if h E HO then h/fi will

S. Penev / Efficient estimation of the stationary distribution

117

belong

to any fixed neighbourhood of 0 for n large enough. Denote by Kd= {~EH: su~x,~ lh(x,y)l 0. We shall see that there exists 6 E (0,l) such that if h E K8 then n + nh. At first note that under Condition (A) there exist r~(0, 1) and 6’>0 such that

for n large enough (cf. Loke (1960, p. 369) or study carefully the proof of Lemma 6.1 below). Now take 6 E (0, min(d’, 1 -r)). Assume there exists A E!J~ such that n(A) > 0 but r&I) = 0 for some h E K6. Then r”zP,“(x,A)r

1 -sup (

lh(x,y)I

&Y

‘PN(x,A)Z(l

for all n large enough. Hence (1 - d)n/rn 5 2/r&4). contradiction if n is large enough. Proof of Lemma

-6)“n(A)/2

>

But (l-&)/r>

1 and we get

2.2. It holds:

for a realization x0, x1, . . . , x,, of the random variables X0(o), X,(o), . . . ,X,(o). Here pr;,( . ) denotes the density with respect to A lE of the invariant measure nh,&. Let us denote

is p-mixing with p(n) = 2a. q” ~ ‘, a> 0, q E (0,l). This is a consequence of the exponential convergence (cf. (ii) and (iii) after Condition (A) or rbragimov and Linnik (1965, Chapter XIX)). Then it holds: C,“=, n’j/$($
then l;lo,yll,...,v,,...

(Theorem Eqol;lk=O

Because

20.1 of Billingsley (1968)). But using the definition for k= 1,2, . . . . Hence under Pp),

of the stationarity

it holds also a.s.

of 17i, we have easily

S. Penev / Efficient estimation of the stationary distribution

118

Proof of Proposition for llr*m1];:

4.1. Using Fubini’s

theorem,

(

Let

US

fix some natural s 01 i 01 (ILO, u,(f)

= F(min(u,

number - F(u

0)) -

k=l

1

m(du)m(do)

F(dt 1s)F(ds) N. Then

1 d)(Jo,

- F(u

u,(t)

1s)]F(dt

Z,o,U)(f)F(k)(v I t)F(dt)

F(u

1 s))F(dt

1s)F(ds)

’ F(v I s)F(u 1s)F(ds), s0

Z,o,U~(t)[F(k)(v 1t)-Fck+‘)(v

_

expression

Is)+ i (F’k’(v( t)-F(k+‘)

. z,o,“)(t)-F(v

.

we get the following

- F(u)F(v)

(7.1)

1s)F(ds)

I

+ NF(u)F(o)

I s)Fck+ ‘)(v I s)F(ds).

(7.3) -analogous to (7.2) with ‘exchanged Using the equality

(7.2)

roles’ of u and v.

“1 Fck)(u

1 t)p(t

1 S)

dt = Ftk+ l)(u / s),

;

(F(‘)(o 1t) - Fck+‘)(u 1s))F(dt

1s)F(ds)

= 0,

(7.4)

(Fck’(u 1t) - Fck+ ‘)(u 1s))F(dt

1s)F(ds)

= 0.

(7.5)

!0 we

get: 1

‘1

to i i0 1

‘1

0

0

.I’I

F(u 1s).

F(o

k=l

1 S) . f k=l

It is easy to check the following

equality:

1

$1

(F’k’(~ 1t) - Fck+ ‘)(u 1s))(F(‘)(o 1t) - F(‘+ ‘)(v ) s)) . F(dt 1s)F(dt) LO i 50 1

‘1

F’k’(~ 1t)F(‘)(o 1t)F(dt)

= 0

-

Fck+ “(u / s)F(‘+ ‘)(v 1s)F(ds). 0

119

S. Penev / Efficient estimation of the stationary distribution

Using

this result,

j,

j,

one has:

1 1 t)F(u

1

t)F(dt)

+

F(u 1t).

s

0

f

0

’F(u

+

1s)F(ds)

1 F(u

=

‘)(u 1s))F(dt

1: (F’@(u 1t) -Fck+ ‘)(u / s))(F(‘)(u ) t) -F”+

i;

F(‘)(u 1t)F(dt)

I=2

I FcN+ “(u 1s)FcN+ “(u 1s)F(ds)

I t). f F(k)(~I i’)F(dt) + s0

k=2

s0

1

FcN+ ‘)(u ) s) . ,fl F(‘+ ‘)(u 1s)F(ds)

_ 5 0 1

FcN+ ‘)(u 1s) . ki, Fck+ ‘)(u 1s)F(ds).

s0

Using

(7.1)-(7.5) 1

and the last equality,

we deduce:

1

Z,o,Uj(t)- F(u 1s) + ,E, (F’k’(~ 1t) -Fck+ ‘) (u I s))]

AN=

i 0 i[ 0 Zro,,,(t)-F(u

1s)+

2

(Fck’(u 1t)-Ftk+‘)

(u ( s))] . F(dt j s)F(ds)

k=l

= F(min(u,

u)) + :

1

) W(W

-

F(u)F(o) 1

Ir;

F(N+

+

- F(u)F(u)

ii

Zlo,uj(W(k)(u +k!,

/ OF(W

ILo,uj(OF(k)(u

k=l

l)(u

1 s)~(N+

1)

(u

1

s)F(ds)

F(u 1t)FcN+ ‘)(u 1t)F(dt)

s

_

s

F(u 1t)FcN+ ‘)(u ) t)F(dt)

I

+

+

(u I s) . ,;, F”+ “(u 1s)F(ds)

NF(u)F(o)

-

FcN+ ‘)(u 1s).

f

1

Fck+ ‘)(u 1s)F(ds)

k=l

. 1

Because of the uniform and exponentially fast convergence, the expression in the third brackets on the right side tends uniformly in u and u to -F(u)F(u) as N+ m. Due to the same reason: NF(u)F(u)

i

I

’ FcN+ ‘) (u 1s) . ,!I F”+ ‘)(u 1s)F(ds)

IF(u) - FcN+ ‘) (u 1s) I . ,il F(‘+ ‘)(u 1GF(ds)

S. Penev / Efficient estimation of the stationary distribution

120

sNeaoqN+‘-+O uniformly

as N--*03

in u and v. Hence lim A,-,, = F(min(u,

v)) -F(u)F(v)

N-CC

1

+

1su)w~(k)(vI Jo,

j,

+

0 fl(O - F(u)F(v)

0 1

k!, isZ[O,V)Wk)(UI

0 fl(O -F(uF(v)

0

I I

*

Therefore

(E(Y(u)Y(v)))m(du)m(do). IIT*di = 1;1; Proof of Lemma

6.1. Denote

A h,/di

=

A h&i

=

by

sup sup {P~,/\~;;(x~,A)-P~,,\~;;(x*,A)). XI,XZ A

Then SUP

{P(Y I Xl)

SUP

XI,XZ A

n-“2h,(x~,~))-~(~

. (I+

5sup

sup

xl,xz (we

1 A

1x2)(1 + n-1’2Mx2,~HI

(P(Y j XI)-P(Y

A

dy

j~2W~+2C2d/fi

sA

used that p( y 1x) is bounded and h, E K,J. But the condition (A) ensures that sup sup XI,XZ A

(consult

(P(YI~,)-P(YI~~))~YI~-~ sA

for this inequality

Loeve (1960, Chapter

Hence there exists (for large n) a constant h, E Kd the inequality Ah,,&< q holds. Now using Loke (1960, Chapter VII.27.3.B)

Proof of Lemma

VII.27.3)). q< 1 such that we have

6.2. Obviously

Eh,/\i;l{~(~(u)-Fh,,~(u))}2 =

Eh,/fi(z[O,

+ 2 i k=l

u)(~O)

(l

-

-

Fh,/6i(@)2

(k/n))E,“,~~{(z,O,u)(XO)

-Fh,,du)) ’ cz[O, .)(xk)

-

Fh,/vd~)))~

independently

of

121

S. Penev / Efficient estimation of the stationary distribution

We want to evaluate from above the difference

lb204-e?,,dmElw -4?“/d~))1*1 5

leow-eo,h,/d~)l n-1

+zkEn l@,(U)1

n-1

n-1

+ 2 kg,@k(U) - k;, @k,h,/dK@) + E, Wn)Qk,hn/\lJ2@) .

(7.6)

Now we use Corollary 7.1 of Ibragimov and Linnik (1965, Chapter XIX) and inequality (20.35) of Billingsley (1968) to verify that the series I,“=, ek and C,“=, Q~,~,/G are absolutely convergent uniformly in h, E Kd, u E [0,11. Hence ,tn l&r +O ; 1:; ,r;

as n-+oo,

I@k,h,/dK +o

(7.7) as n-too

(7.8)

and the convergence is uniform in h, E K,,, u E [0,11. But from (7.6) it follows:

Ia*(U)-Eh./\lJI{~(~n(U)-Fh~,~(U))}*l 5 I@&+@0,h,/fi04+g

le,(u)l

n-1 co n-l + WEE Fi l@/&/d~)I+2 k;, k+r(~)-ek,h,/dA~)) * Apparently, in order to complete the proof, we have only to show the uniform convergence of Cil: (ek(u)-@k,h,,~(z.4)) to zero. It is easy to see that

S. Penev / Efficient estimation of the stationary distribution

122

I +

~ro,u,W~~ki;;c~1s)-Ft,,,\l;;@) -F(%

14 +F@WXW.

s0

Here h;, denotes kernel

the density

P/,,\l;;(x,-4) =

of the stationary

distribution,

s A

PLY 1x)(1 +W%Lw9)

corresponding

to the

dy.

As in (2.3) we have 1

s ~~o,u~W[F&x(~

I!

~(42;,(4 dads 5 G . yk/fi

.

-F/,,,d~)l

14

0

0

with some constants Ct > 0 and y E (0, l), independent Apparently then (l/fi)Ciii yk+O as n+oc. Now consider

of h, E Kd.

1 I[o,

.,CW&z@

19

-

F/z,,,\l;;@)

-

Fck)@

) 4

+

.I’0 On the one hand there exist constants

Cz> 0, /3~ (0,l)

F@W’(W. such that

I ~,o,u,(W&i@

14 -

F/zn/fi@M’W

5 G. Bk,

15 0 1 ~[o,&)[F(~)(~ Ii

because constant

1s)

-F(u)lF(W

5

C2.

Pk

0

of the uniform exponential C,>O, such that

convergence.

On the other hand there exists a

(for the last inequality we have used the comment in Theorem 6 of Kartashov (1984) which states that for minor norms I/Q-P 11 also the inequality sup, 11 Qk - Pk 11 I C IIQ- P I(with some positive constant C is valid). Hence

both the inequalities 1

IS

~~o,u~W~%d~

14

-4z,,d~)-F(~)(~

1 ~)+F(u)lF(ds)

r2C2pk,

0

hold for large n and k. For a given n we choose [. ] denotes the integer part). Then

for example

m(n) = [n1’3] (where

S. Penev / Efficient estimation of the stationary distribution

The expression on the right side of the last inequality small for large n. This completes the proof.

123

can be made

arbitrarily

Acknowledgement The author to substantial

is indebted to the referee for critical reading of earlier version leading improvement of both the style and presentation of this article.

References Billingsley,

P. (1968). Convergence of Probability Measures. Cambridge

Dvoretsky,

A., J. Kiefer and J. Wolfowitz

tion function Doob,

and the classical

(1956). Asymptotic

multinomial

estimator.

(1956). Stochastic Processes. Wiley,

J.L.

Ibragimov, Moscow.

I.A. and Yu.V. Linnik

Kartashov,

N.V. (1981). Strongly

Cambridge.

Ann. Math. Statist. 27, 642-669.

(1965). Independent and Stationary Sequences (in Russian). stable Markov

chains.

In: V.M. Zolotarev of Seminar.

N.V. phase

(1984). state.

Nauka,

and V.V. Kalishnikov,

The Institute

for Systems

Eds., Studies,

Criteria

for uniform

ergodicity

and strong

L.M.

(1972).

Mathematical

stability

of Markov

chains

with

Theory Probab. Math. Statist. 30, 65-81.

Kiefer, J. and J. Wolfowitz (1976). Asymptotically minimax tion functions. Z. Wahrsch. Verw. Gebiete 34, 73-85. LeCam,

Press,

of the sample distribu-

54-59.

Kartashov, general

University character

New York.

Stability Problems for Stochastic Models, Proceedings Moscow,

minimax

Limits

of experiments.

Statistics and Probability.

estimation

of concave

and convex distribu-

In: Proceedings of the Sixfh Berkeley Symposium University

of California

Press,

Berkeley-Los

on

Angeles,

245-261. Loeve,

M. (1960). Probability Theory, 2nd ed. D. van Nostrand,

Millar,

P.W.

(1979). Asymptotic

minimax

theorems

for the sample

Verw. Gebiete 48, 233-252. Millar, P.W. (1983). The Minimax Principle in Asymptotic babilites

de Saint-Fluor

76-265. Millar, P.W.

X1-1981.

Lecture

Princeton, distribution

NJ. function.

Z. Wahrsch.

Statistical Theory. Ecole d’Ete de Pro-

Notes in Mathematics,

Vol. 976. Springer-Verlag,

Berlin,

(1984). A general approach to the optimality of minimum distance estimators. Trans. Amer. Math. Sot. 286 (l), 377-418. Neveu, J. (1964). Bases MathPmatiques du Calcul des ProbabilitB. Masson, Paris. Roussas, G. (1972). Contiguity of Probability Measures. Cambridge University Press, Cambridge. Strasser, H. (1985). Mathematical Theorey of Statistics. W. de Gruyter, Berlin-New York.