Statistics & Probability Letters 1 (1983) 197-202 North-Holland
June 1983
Convergence Rates of Large Deviations Probabilities for Point Estimators H e r m a n R u b i n * a n d A n d r e w L. R u k h i n ** Division of Mathematical Sciences, Department of Statistics, Purdue University, Lafayette, IN 47907, U.S.A.
Received August 1982; revised version received December 1982
Abstract. The convergence rates of large deviations probabilities are determined for a class of estimators of a real parameter. We also give a simple upper bound for probabilities of large deviations when the latter are measured in terms of the Chernoff function.
Keywords. Probabilities of large deviations, m i n i m u m contrast estimators, M-estimators, Chernoff function.
1. Introduction and summary Let (P0, 0 ~ 0 ) be a family of probability distributions over space X and assume that the parameter space O is an open subset of the real line. If x=(xl ..... x,) is a random sample from this family and ~ = 8(x) is a consistent estimator of 0, then the rate at which Po(18- 01 > c) tends to zero for a fixed c > 0 is of importance in large sample estimation theory. In the case of a more general parameter space with a metric D these probabilities of large deviations have the form Pe(D(&O) > c). Useful lower bounds for the limits of these probabilities have been obtained by Bahadur (see Bahadur (1967) and Bahadur, G u p t a and Zabell (1980)). Further results concerning the asymptotical behavior of large deviations probabilities for various estimators have been given by Fu (1973, 1975). In this paper we determine the exact rate of convergence of large deviations probabilities for a * Research supported by U.S. Army Research Office, contract No. DA80K0043. ** Research supported by National Science Foundation, Grant MCS-8101670.
class of approximate M-estimators, consistency of which has been established by Huber (1967). In Section 2 a simple upper bound is obtained for probabilities of large deviations when these are measured by means of the Chernoff function and the parameter takes values in an abstract space.
2. An upper bound for large deviations probabilities Let x 1, x 2 be a sequence of i.i.d, random varia~bles with common distribution P = P0o where 00 is a fixed parametric value. In this section we consider a class of statistics 8 = 8 , ( x ) such that there exists a sequence q,, q, ---, 0, . . . .
n-'
W(xj, 8 ) - i n f n-'Y~W(xj, O)<~q,. 1
o
1
Here W is a measurable real function on X × O, and the parameter space @ is assumed to be a locally compact Hausdorff topological space. This class of approximate minimum contrast estimators (which of course includes maximum likelihood estimators) was considered by Huber (1967). Using the idea of original Wald's proof
0167-7152/83/$03.00 © 1983 Elsevier Science Publishers B.V. (North-Holland)
197
V o l u m e 1, N u m b e r 4
STATISTICS & PROBABILITY LETTERS
(1949) Huber under mild regularity conditions proved the consistency of statistics from this class (see also Pfanzagl (1969) and Perlman (1972)). Under somewhat different assumptions we show that the large deviations probabilities for these procedures tend to zero exponentially fast and give a simple upper bound for these probabilities. Denote for any measurable set C
q(x,
C ) = inf
W(x, t)
large n n
.-'EW(xj,8)-n-'F.W(xj, Oo) l
1 n
n
<~n-' E W(xj, 8 ) - i n f n-~E W(xj, O) < ' 0 , 1
1
one concludes that the event 8 ~ E implies inf
tEC
J u n e 1983
-
n - ' ~ W ( x j , O)<~n-'~W(xj, Oo)+%.
O~E
1
1
and
o(C)
= inf E s>0
exp(s[W(x, 0o) - q ( x ,
Thus for large n C)]). V(p(8) ~ a)
Our assumptions have the following form. Assumption 1. For all 0 * 00 we have E [ W ( x , 00) -
W(x,
<~P{ inf n - ' ~1 W(xj,
0)] < 0.
n-'iW(xj, Oo)+,o}.
Assumption
2. For any a, 0 < a < 1 there exists a compact set A and sets D k, k = l , . . . , K such that [._) kDk D A ~ and
P(Dk) <~a, k= 1..... K.
(2.1)
Assumption 3. For any 0 and positive ( there exists a neighborhood B of 0 such that p(B)<~p(O)+,, where
p(O)=
0 ( 0 ) = inf E s>0
(2.2)
0((0)) is the Chernoff function,
exp(s[W(x, 0o) - W(x,
0)]).
Notice that Assumption 3 is quite similar to Assumption 2 of Bahadur (1965) where it was used to prove asymptotic optimality of the likelihood ratio test. Theorem 1. 0
Under Assumptions
1-3
for any a,
Let A and Dk, k = l , . . . , K , be the sets such that (2.1) holds, and for a positive c let Bo be a neighborhood of 0 for which (2.2) is satisfied. Then the compact set C = A ¢q E is covered by open sets B 0, 0 ~ C. Therefore there exists a finite subcovering, say, B l ..... Bm of C. Denote for i = 1. . . . . m, p = m + 1,..., m + K,
qi(x) = q ( x , Bi), qp(x)=q(x, Dp-m). One has (with M = m + K ) n
inf O~E
n -l
W(x,,O)>~ rain 1
l~i~M
I
n-~W(xj,
O)<~n-l~W(xj,~).
OEE
1
1
1
n
min 1 <~i<~M
lim sup p l / , ( p ( 6 ) ~ a) ~< a. n~oo
inf
n-'Eq,(xj).
It follows from (2.3) that P(8~E)~P
Proof. Since p(Oo)= 1 > a, the point 00 does not belong to the set E = (0: p(O) <~a). Also the event ~ E implies that
(2.3)
1
n-~Y'.[W(x~,Oo)-q,(xj)] 1
+ % > O)
<~ i~l
P n-'E[W(xj,Oo)-qi(xj) ] 1
+c o > O) Since for any fixed positive c o and all sufficiently 198
STATISTICS & PROBABILITY LETTERS
Volume 1, Number 4
<~M
max
(
°[1 W(xj, 00)
P n-l E
I<~i<~M
-qi(x/)]+co>O}. Because of Chernoff's Theorem (1952))
(see Chernoff
s>0
[ s>0
lim sup pl/n{~ ~ E )
l~i~M
inf
s>0
eS'°Eexp(s[W(X, Oo)-qi(X)]}. (2.4)
Notice that for a fixed i, i = 1 , . . . , M , positive c 1 there exists s u such that
< inf
s>0
1
,
where m is a fixed positive integer and (x t . . . . . xm) are i.i.d, r a n d o m variables with distribution P (see Perlman (1972)). These modifications (with m >1 2) allow to establish Theorem 1 for location-scale parameter families under mild m o m e n t conditions.
Eexp(s[W(X, Oo)-qi(X)]}+, ,. In this section we determine the exact convergence rate for some approximate M-estimators of a real parameter O. These estimators ~ satisfy the following condition: There exists a ( n o n r a n d o m ) sequence q., q, --* 0 such that
inf eS'oE exp(s [ W ( X , 0o) - q~ ( X ) ] }
s>0
~< e .... E exp{s I [W(X, 0o) - q , ( X ) ] } ~< inf E
s>0
exp(s[W(X, 0o)- qi(X)])
+ 2c,.
Thus in (2.4) we can let ( o tend to zero and obtain lim sup P 1 / ' ( 6 c E ) ~< max
inf
l<~i<~Ms>0
Eexp(s[W(X, O o ) - q , ( X ) ] } .
Assumption 3 entails for i = 1. . . . . m
exp(s[W(X,
O0) - q , ( X ) ] }
<...P(Oi)+c<-..a+c, and because of Assumption 2 for i = m + 1. . . . . M
s>O
) ] l J ]/m
3. Convergence rates for approximate M-estimators
Therefore for sufficiently small c o
inf E
~
m[ sY'~ W ( x j , Oo ) - q ( x j , C
and any
E exp(s 1[ W ( X, 00) - q~( X ) ] }
s>0
/ inf E e x p
so that
inf E
R e m a r k 1. For smooth functions W the probability P~/" (p(6) <~a) for small a typically behaves as a c with c >I 2. However, examples with discrete parameter 0 or nondifferentiable functions W show that the b o u n d of Theorem 1 is the best l~ossible without any regularity assumption (see also Section 3). R e m a r k 2. Assumptions 2 and 3 can be modified and weakened. For instance the function p(C) can be replaced by
pl/n{n-l~[W(xj, Oo)-qi(xj)] "kCo?>O} 1 ---, inf e"° E exp(s[W( X, Oo)-qi( X)] ),
~< max
June 1983
exp(s[W(X,
00)-q,(X)])
~< a.
Since c was an arbitrary positive number, Theorem 1 is proven.
I n - ' ~ w ( x ' '3) Here w(x,O) is a real function over X × R. The consistency of approximate M-estimators has been also established by H u b e r (1967).
Theorem 2. Assume that for each fixed x, w( x, •) is
a decreasing function, and let ~ = 8,(x) be the corresponding approximate M-estimator. If for a positive c Po(w(X,
0 + c) > 0) > 0,
(3.1)
then inf
s>O
Eoexp(sw(X,O+c))=e,(O,c ),
(3.2)
199
Volume 1, Number 4 P~/"(,~
8 - c)
<
STATISTICS& PROBABILITYLETTERS
--,
inf E o e x p ( - s w ( X, 8 - c)) = e2(8, c)
(3.3)
s>O
and p~/"([6 - O[> c) --+ m a x ( e l ( 8 , Q , e 2 ( 8 , c)).
(3.4)
Proof. One has, for arbitrary small positive Co,
P0(8~8+c) <~Po n - I ~ w ( x j , n) >-%forn~<8+c k
=
1
n-lEw
xj,8+c
>--c o .
1
Also
n-lEw(xj,8+c)>Co Po(8>8+c).
?o k
1
Applying Chernoff's Theorem we obtain inf e-S'0Eo exp(sw( X, O + c)) s>0
~< lim P~/ "( 8 > O + c} ~< inf e " o E o e x p ( s w ( X , 0 + c ) ) .
(3.5)
s>0
June 1983
of Po (so that approximate maximum likelihood estimator obtains), condition (3.6) is satisfied under mild regularity assumptions. Monotonicity of w( x, 8) means the log-concavity of p( x, 8). Remark 2. Notice that Theorem 2 does not hold if the function w is not monotone. This can be seen by the example where w is the logarithmic derivative of Cauchy distribution density with location parameter 8. In this case probabilities of large deviations for the maximum likelihood estimator do not have the rate given by the right-hand side of (3.4). However, if the latter is a decreasing function of c and the assumptions of Theorem 1 are met, then (3.4) provides an upper bound for probabilities of large deviations. In the case of Cauchy distribution this monotonicity does not hold but Theorem 1 shows that probability of large deviations decreases exponentially. Their precise rate is unknown and its determination seems to be a difficult problem. Remark 3. Note that one can derive from Theorem 2 a class of estimators 8 which are asymptotically efficient, after Bahadur: lim
lim
n-It -21Ogp0(l~-0l > c ) =
-½1(8).
Because of (3.1) the functions of c o in (3.5) are continuous. Since c o was arbitrary one deduces lim P~/"(8 > 0 + c) = inf E o exp(sw(X, 0 + c)). s>0
Analogously, formula (3.3) is established which, together with (3.2), implies (3.4). Remark 1. It follows from Theorem 2 that the probabilities of large deviations Po([8 - 8[ > c) tend to zero exponentially fast if and only if both infima in (3.4) are smaller than one, i.e.,
(3.7) Indeed (3.7) holds if 8 is an approximate M-estimator with w(x, 8) = log' p(x, 8), which is easy to check by taking limit as c tends to zero for the logarithm of the right-hand side of (3.4). An analogue of the result of Fu (1973) concerning asymptotical efficiency of the generalized maximum likelihood estimator 8, n
I--lp(xj, = , 8)X(8)
n
m0ax l~-Ilp(xj,8)~(8 ),
Eow ( x, 8 + c) < 0 can be obtained from this remark since 8 is an approximate M-estimator with w(x, 8 ) = l o g ' p ( x , O) (the last function is assumed to be monotone and the function log' ~ is assumed to be bounded).
and
Eow( X, O - c ) > O. These conditions are met of course when
Eow(X, 8) = 0
(3.6)
and w(x, O) is strictly monotone. In the case when w(x, O)= log' p ( x , 8), w h e r e p ( x , 8) is the density 200
Remark 4. Assume that 8 is a real location parameter, and let 8 be the Pitman estimator in the problem of confidence estimation of 8 by means of
Volume 1, Number 4
STATISTICS & PROBABILITY LETTERS
an interval of width 2c, i.e., tl
June 1983
T h u s T h e o r e m 2 contains Chernoff's Theorem,
n
I-Ip(xj-8-c)= I-Ip(xj-8 +c). i i
Pl/"(£ >_-a) ~
inf E a e x p ( s ( X - a ) ) , s>0
If W(X, O) = W(X,- O) and 81 is the corresponding equivariant M-estimator, then the c o m p a r i s o n of 8 and 81 shows that
as a particular case. In another example let w(x, 0 ) = - 1, x < 0; w(x, O ) = 1, x > 0. Then the corresponding Mestimator can be interpreted as the median, 8 = xo/2), and
max[ inf fpl-~(x)p~(x + 2c)dx]
p1/n(x(1/2) > O + , )
I.s>OS
~maX[isnffexp(sw(x+ c))p(x)dx].
---, inf
[espa(x> 0 + ,) + e-spa(x< O+ , ) ]
s>0
= [2Pa(XO+,)] '/2 It is assumed here that log p is a concave function (i.e., the family ( p ( . - O)) has a m o n o t o n e likelihood ratio) and that w is a m o n o t o n e function.
For instance, P is a double exponential distribution; then according to this formula
ell"( x,,/2 ) > Remark 5. T h e o r e m 2 holds if the approximate M-estimators 8 are defined as follows: For any measurable set E and a positive Co
Since, for O ( 0 ) = inf E 0>0
t)<~%,t~E I c(Sq~E) I n-ly'~w(xj, n1 C
(n
12-1EW(Xj, 1
t)>
/
--Co, t ~ E
O + c) --* e-lOl/2(2 - e-lal) '/2 = o(O)
W(x,
0) = Ix - 01 + log 2,
exp(s[W(x, 0 ) - W(x,
= e-1Ol/2(1 +
0)])
llOl),
T h e o r e m 1 implies that
,
lim
Pl/n(Ix t) <~O(/).
n~OO
if n is sufficiently large.
This inequality means that
As an application of T h e o r e m 2 let us consider the situation when
(2-
w(x,o)=x-~(o)
In this example, log o ( 0 ) / l o g p ( 0 ) decreases from 4 as 101--' 0 to 1 as 101 --' ~ . This shows that the bound in T h e o r e m 1 cannot be improved.
where + is an increasing function. Clearly the corresponding M-estimator 8 satisfies the condition
~(8)=x+
~ xj
-, n
I
and
e0(8 > o + ,)
=
ea(~ > q~(0 + ,))
=
Pa(~ ~ a).
e-lOl) '/2 ~< (1 + ½101).
References Bahadur, R.R. (1965), An optimal property of the likelihood ratio statistic, Proc. 5th Berkeley Syrup. Math. Statist. Probab. !, 13-26. Bahadur, R.R. (1967), Rates of convergence of estimates and test statistics, Ann. Math. Statist. 38, 304-324. Bahadur, R.R., J.C. Gupta and S.L. Zabell (1980), Large deviations, tests, and estimates, in: I.M. Chakravarti, ed.,
Asymptotic Theory of Statistical Tests and Estimation (Academic Press, New York) pp. 33-64. 201
Volume 1, Number 4
STATISTICS & PROBABILITY LETTERS
Chernoff, H. (1952), A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math. Statist. 23, 493-507. Fu, J.C. (1973), On a theorem of Bahadur on the rate of convergence of point estimators, Ann. Statist. 1,745-749. Fu, J.C. (1975), The rate of convergence of consistent point estimators, Ann. Statist. 3, 234-240. Huber, P.J. (1967), The behavior of maximum likelihood estimates under nonstandard conditions, Proc. 5th Berkeley Syrup. Math. Statist. Probab. 1, 221-233.
202
June 1983
Perlman, M.D. (1972), On strong consistency of approximate maximum likelihood estimators, Proc. 7th Berkeley Syrnp. Math. Statist. Probab. 1,263-281. Pfanzagl, J. (1969), On the measurability and consistency of minimum contrast estimators, Metrika 19, 249-272. Wald, A. (1949), Note on the consistency of the maximum likelihood estimate, Ann. Math. Statist. 20, 595-601.