Exponential convergence for sequences of random variables

Exponential convergence for sequences of random variables

STATISTI¢~ & PROBAIIILffY LEI"rER8 ELSEVIER Statistics & Probability Letters 34 (1997) 159-164 Exponential convergence for sequences of random varia...

274KB Sizes 0 Downloads 131 Views

STATISTI¢~ & PROBAIIILffY LEI"rER8 ELSEVIER

Statistics & Probability Letters 34 (1997) 159-164

Exponential convergence for sequences of random variables Jiaming Sun 1 Department of Mathematics, Unicersity of British Columbia, Vancouver BC, Canada V6T IZ2 Received August 1996; revised September 1996

Abstract

Using the compactness in large deviation theory, this note describes a large deviation upper bound by a lower semicontinuous function. It then obtains a characterization for exponential convergence and discusses exponential convergence rates.

Keywords: Compactness; Convex conjugate; Exponential convergence; Large deviations

1. Introduction and results Let {S~},>~ be a sequence o f random variables taking values in [~d with finite first moments. The exponential convergence (EC) of the probabilities P(IIS~ - E[S~][I >~ne) to 0 (as n ~ oc for all e > 0) has been studied by many authors, such as Baum et al. (1962), Ellis (1984), Petrov (1975) and Schonmann (1989). In this note, we will continue their studies to study EC for general sequences. Let/~, be the distribution o f (S,-E[Sn])/n, n i>1. As in large deviation theory, there are two key ingredients in the study of EC:

en(t'x)l~n(dx),

q~(t):=limsupllog/ n~oc

/1

tE~ a

(1.1)

d~,l

l(x) : = q~*(x) = sup{ (t,x) - ~(t): t E [~d },

X E •d.

(1.2)

These arise naturally as follows. For example, for d = I and all e > 0, Chebychev's inequality implies that e(e,) := lim sup 1 log P(S, - E[Sn] >~ne,) ~< inf lim sup 1 log e-~t~E[exp(t(S, - E[S,]))] />0

n~oc

/1

-- inf{-tt: + ~ ( t ) } = - sup{tF, - q~(t)} = - I ( e ) , t>0

tER

i This research was supported in part by the Natural Sciences and Engineering Research Council of Canada. 0167-7152/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved PH SO I 6 7 - 7 1 5 2 ( 9 6 ) 0 0 I 7 7 - 0

(1.3)

160

Jiammg Sun I Statistics & Probability Letters 34 (1997) 159--164

where the last second equality follows from the fact that q~(t) is a nonnegative (by Jensen's inequality) function. We note that the process in (1.3) is widely used to prove EC and large deviation upper bounds. But so far, it is unknown that how much losses the inequality in (1.3) will cause in general. In what follows, we will prove that almost no exponential components are lost through this inequality. Specifically, we will show that, if e(~) in (1.3) is strictly negative for all e > 0 (i.e., EC does occur to the right of 0), then -~(e,) must be strictly negative for all ~, > 0 (Theorem 2), so the EC must be provable from (1.3). Furthermore, the EC rates l(e) obtained in (1.3) are very close to optimal near 0 (Theorem 3). So we can claim that Chebychev's inequality is very efficient in deriving EC. The key tool to prove Theorems 2 and 3 is the following large deviation type estimate (Theorem 1). We derive it for ~d case. One can also generalize it to more general spaces such as separable Banach spaces as long as the exponential tightness is assumed. The interesting parts in Theorem 1 are the lower bound in (1.4) and the partial identification of the rate function J. Let ~ be the set where ~ ( t ) is finite. Theorem 1. Let {#,},~>l be any sequence of probability measures on R a. Define ~ ( t ) and l ( x ) as in (1.1) and (1.2). I f 0E ~ ° , then there exists a lower semicontinuous (lsc)function J: ~a ___.[ 0 , ~ ] , such that for any Borel set B C ~a, -

inf J ( x ) ~ lim sup 1 log/a,(B)~< - inf J(x), xEB°

n~oc

n

(1.4)

xEB

where B ° and B are the interior and closure of B, resp. Furthermore, the convex conjug#ate J* o f J satisfies J*(t) = ~ ( t ) if tE ~ ° and J*(t)<~ ~ ( t ) otherwise. In particular, J >11. From now on, let us go back to the random variable setting given in the beginning. We use Theorem 1 to obtain a characterization for EC in Theorem 2 and its corollary. Such a characterization was also obtained by Ellis (1984) but under stronger conditions. He assumed that the limit (rather than the superior limit) in ( 1.1 ) exists. See Example 1 for a case where the limit in ( 1.1 ) does not exist but our result applies. Note that Theorems 2 and 3 are only stated for the positive half-line. Analogous results hold for the negative half-line. T h e o r e m 2. Let d = 1 and {S,},>~1 be a sequence of r.v.s with 0 E ~ °. Then the following three statements

are equivalent: (a) e(e,) < O,

(b)-i(~)
Ve, > O.

V~>O.

(c) ¢,'_(0) = 0. Corollary 1. Let {S,},~>l be a sequence o f r.v.s taking# values in •a with 0 E ~ °. Then the [bllowing# three statements are equivalent:

(a) lim s u p ~ l/n loge(HS~ - E[Sn]l I ~>ne.) < 0, ~'~ > 0. (b) - l ( x ) < O, Vx ~ O. (c) ¢ ' ( 0 ) = 0.

The next theorem compares the EC rates I obtained in (1.3) with the real rate function J obtained in Theorem 1 by considering p, to be the distribution of (Sn - E[Sn])/n, n>~ 1. T h e o r e m 3. Let d = 1 and {Sn}n>,l be a sequence ofr.v.s with 0 E ~ °. f f ' ~ + ( 0 ) = 0 and ~ ( t ) > O for all

t > 0 , then there exist 7i~0 as i --~ o~ such that for each i, J(7i) = l()'i).

Jiaming Sun / Statistics & Probability Letters 34 (1997) 159 164

161

Under Theorem 3, (1.3) and the lower bound o./'(1.4) say that for each 7i and all small 6 > 0 - ! ( 7 i ) = - J ( 7 i ) ~< -

inf.

X>)',--3

J(x)<~;()'

i -

6)~

-

(1.5)

l(','i - 3).

This explains why the estimates in (1.3) are very close to optimal near each 7, by the continuity of l(x) near 0. Example 1. Let {Sn}~>~l be such that, for each k>~l, P(S2k = 2k) = e -2k, P(S2k = - 1 ) = 2ke -2k and P(S2k-~ = k ) = e -k, P(S2k-~ = - 1 ) = k e -k and both with the rest of probabilities sitting at 0. Then, for t > 1, the limit in (1.1) equals t - 1 along the even sequence but ( t - 1)/2 along the odd sequence. So the limit in (1.1) does not exist i f t > 1. It is easy to see that ~ ( t ) = t 1 i f t > 1 and 0 otherwise. So ~ ' ( 0 ) = 0 (note that ~ ( t ) is not differentiable at t = 1) and l ( x ) = x if 0~ 0 for all t > 0 in Theorem 3 cannot be dropped in general.

In order over ~ ( t ) show how sequences.

to apply Theorems 2 and 3, the key step is to verify ~ ' ( 0 ) = 0. One can do so if a good control near 0 can be found (as done in Theorem 2 of Baum et al. (1962) for independent sum Sn). To this idea works, we give a short proof of ~ ' ( 0 ) = 0 in Example 2 for bounded ~b-mixing stationary This rederives the EC result of Schonmann (1989).

Example 2. Suppose that {X,}~ez is a bounded qS-mixing stationary sequence with E(XI) = 0. Let S~ = Xi + . - . + X n , n>11 Then q ~ ' ( 0 ) - - 0 .

2. The Proofs Proof of Theorem 1. We will use techniques in large deviation theory in our proof. Since 0 E @°, the sequence {#n}n~>l is exponentially tight (Dembo and Zeitouni, 1993). Then O'Brien and Vervaat (1991) (also Pukhalskii, 1991) proved the following compactness: for any subsequence { ~ , } , there exists a subsubsequence {/an,, } which satisfies a large deviation principle (LDP) with some rate function .~: ~d ~ [0,c~] (.¢ is lsc, in particular). Define A to be the set of all such rate functions of sub-subsequences. Define Jl(x) := inf~eA ,¢(x), x E Ed. Let J(x) be the greatest lsc function not exceeding Jl(x). Then it is not difficult to see for all open sets G and closed sets F in I~d,

inf J(x) = inf Jl(X)

xEG

xEG

and

inf J(x)<~ inf Jr(x).

xEF

(2.1)

rEF

Now let G C I~d be an open set. For any J E A, by the definition of A, there exists a subsequence {Pn'} which satisfies a LDP with ,~" as its rate function. Consequently, by the large deviation lower bound lira n ' - -inf ~ n'l Iogpn,(G)>~ - inf ,,¢(x). vEG

So by (2.1)

,

l i m s u p - log#~(G)>~ sup n~,~

n

.~EA

(-

inf , f ( x )

xEG

)=-

inf Jl(x) = - inf J(x).

xEG

xEG

On the other hand, let F C It~d be closed. Choose a subsequence {n ~} such that limsup . . . . n1 log/~,(F)

lira n'l log pn,(F). n'-,~

Jiamm,q Sun / Statistics & Probability Letters 34 (1997) 159-164

162

By the compactness, there exists a sub-subsequence {n"} such that {/1~,,} satisfies a LDP with some rate function Y, so by the large deviation upper bound and (2.1), l in--oo m s u p lnl o g / ~ , ( F ) =

n "lim ~

~1 log/~n,,(F)

~< - inf J ( x ) ~ < - inf Jl(x)<~ - inf J(x). xEF

xEF

xEF

(2.2)

So (1.4) is proved. Finally, for any Y E A , J is the rate function of some subsequence satisfying a LDP, say {/~,}. Then for all t E ~ °, by Varadhan's Integral Theorem (Dembo and Zeitouni, 1993),

~(t):=

n lim , - ~ ~1 log / e n, (t'x)/a~,(dx) = sup{(t,x) - J ( x ) : x E Rd}.

(2.3)

Taking the supremum over J EA in the above equation, we see that • (t) = sup ~ j ( t ) = sup{(t,x / - Jl(X): x E •d} .~EA

=sup{(t,x)-J(x):

xER d}=J*(t),

V t E ~ °.

(2.4)

Since ~ ( t ) is convex and J*(t) is lsc, (2.4) implies that J*(t)<~O(t) for all boundary points t of ~ and hence for other t. In particular, from Ekeland and Teman (1976), J ~ > c o n v ( J ) = J * * > ~ * = I, where c o n v ( J ) is the greatest lsc convex function not exceeding J . []

Proof of Theorem 2. (c)=¢,(b)=~(a) follow from the definition of the convex conjugate and (1.3). So we need only prove (a)=>(c). Suppose ~ + ( 0 ) > 0. Then l ( x ) = 0 if 0 < x < #~.(0) (note that # is nonnegative). Let x0 := sup{x: l(x) = 0}. Let to > 0 be in 9 °. Define u0 = ~(to)/to. Then 0
l(x) = ~ * ( x ) = sup{tx - # ( / ) : = sup{tx - J * ( t ) :

tE(0,t0)}

tE(0,t0)} = J**(x) -- conv(J)(x).

(2.5)

Now if J(xo) > l(xo) = 0, then by the lower semicontinuity of J and convexity of !, we can construct a lsc convex function which is strictly between J(x) and l(x) for all x near xo (or for all x near and less than xo) and equals l(x) elsewhere. This says that l(x) < c o n v ( J ) ( x ) for all x near and less than xo, which contradicts (2.5). So J(xo) = 0. Therefore, e(e) = 0 for all 0 < e < x o by the lower bound in (1.4). Therefore (a)~(c).

Proof of Corollary 1. Similarly as above, (c)=*,(b)=~(a) is obvious. Now if (a) holds, then it is not difficult to see that, by Theorem 2, all the directional derivatives of ~ ( t ) at 0 are 0. So ~ ' ( 0 ) = 0, since ~ ( t ) is convex (Rockafellar, 1972). Proof of Theorem 3. Let to > 0 be in 9 °. Define u0 = th(to)/to. By the hypothesis, uo > 0. Similarly as in (2.5), we have l(x) = c o n v ( J ) ( x ) Vx E [0, u0]. Then note that if ! is strictly convex near an xo E (0, Uo), then J(xo) = l(xo). Next suppose that l(x) is linear on some interval (a,b)C(O, uo), say l ( x ) = cx + d, xE(a,b). Then define xl = inf{x > 0: l(x) = cx + d} and x2 = sup{x: 1(x) = cx + d}. Then similarly as in the proof of Theorem 2, J(xi) = l(xi) as long as xi E (0,u0) i = 1,2. Now note that Theorem 2 implies l(x) > 0 if x > 0. If 1+(0) > 0, then ~ ( t ) = ~ * * ( t ) = l*(t) = 0 for all small t > 0, which contradicts the hypothesis. So xl > 0. Therefore, either l is strictly convex in (0,Co) for some eo > 0 or there are infinitely many above

Jiaming Sun / Statistics & Probability Letters 34 (1997) 159- 164

163

kinds o f intervals [Xl,X2] , in each (0,e), e. > 0. Consequently, in either case, we can choose ?r 10, as i---, oc, such that J(y, ) = l(Vi) for each i >/1. IS]

Proof of Example 2. We only prove ~ , ( 0 ) = 0. The proof of 4 ' _ ( 0 ) = 0 is similar. Let m > 1, be an integer. Define TiSm = Xi+l + ... +X,+m, m,i>~i. Assume that IX, l<~C. Let k~>l and t > 0. Then by the Cauchy-Schwarz inequality and the stationary property,

E[exp(tS2~)] <~ E[exp(2tSm + 2tT2mSm + "'" + 2tT21k-I)mSm)] <~ E[exp(2tSm + 2tT2mSm + "'" + 2tT21k-2)msm)](2q~(m)e 2tmc + E[exp(2tSm)]) <<," . <~(2(k(m)e 2tmc + E[exp(2tSm)])k,

(2.6)

where the second inequality follows from a well-known inequality about b-mixing sequence (Hall and Heyde, 1980, Theorem A.6). Note that E[XI] = 0, we thus have

1 • (t) <~~m l°g(2d~(m)e2tmc + E[exp(2tSm)])

(2.7)

for all t > 0 and m > 1. Then letting m = [l/t] in above (2.7), we conclude that limsup t~o -O(t) - ~ - <~ lim t~o ~ 1

l°g(2q~([1/t])e2tit"tlC + E[exp(2tSi1/d)])

= lim tl0 21 log E[exp(2tS[i/t])] = 0, where the last step follows from the dominated convergence theorem and the fact that tS[l/tl ~ 0 in probability as t ~ 0, which is a result of the ergodicity for c~-mixinq stationary sequences (Rosenblatt, 1974). []

Note: After the completion of this paper, Professor A. Dembo pointed out that the constant n in Theorems 1-3 can be replaced by any positive constant ~tn with ctn ~ ,:x~ as n ~ oc. So these results can be applied to study other convergence besides exponential one.

Acknowledgements The author is very grateful to G.L. O'Brien for his guidance and many valuable suggestions throughout the preparation o f this paper. He also wishes to thank N. Madras and T. Salisbury for their helpful discussions.

References Baum, L.E., M. Katz and R.R. Read (1962), Exponential convergence rates for the law of large numbers, Trans. A M S 102, 187-199. Bradley, R. and W. Bryc (1985), Multilinear forms and measures of dependence between random variables, J. Multivariate Anal. 16, 335-367. Dembo, A. and O. Zeitouni (1993), Large Det'iation~ Techniques and Applications (A.K. Peters, Wellesley, Mass (formerly, Jones and Bartlett, Boston)). Ekeland, I. and R. Temam (1976), Convex Analysis and Variational Problems (North-Holland, Amsterdam). Ellis, R.S. (1984), Large deviations for a general class of random vectors, Ann. Probab. 12, 1-12. Hall, P. and C.C. Heyde (1980), Martingale Limit Theory and its Application (Academic Press, New York). O'Brien, G.L. and W. Vervaat (1991), Capacities, large deviations and loglog laws, in: S. Cambanis, G. Samorodnitsky and M.S. Taqqu, eds., Stable Processes (Birkh~iuser, Boston) pp. 43-84. Petrov, V.V. (1975), Sums oJ Independent Random Variables (Translated by A.A. Brown) (Springer, Berlin).

164

Jiamin,q Sun I Statistics & Probability Letters 34 (1997) 159-164

Pukhalskii, A. (1991), On functional principle of large deviations, in: V. Sazonov and T. Shervashidze, eds., New Trend~ in Probability and Statistics, Vol. 1 (VSP/Moks'las) pp. 198-218. Rockafellar, R.T. (1972), Convex Analysis (Princeton University Press, Princeton, 2nd printing). Rosenblatt, M. (1974), Random Processes (Springer, New York). Schonmann, R.H. (1989), Exponential convergence under mixing, Probab. Theory Related Fields 81, 235-238.