Statistics & Probability Letters 2 (1984) 39 44 North-Holland
A SIMPLE PROOF
FOR THE CHERNOFF-SAVAGE
THEOREM
Michael G. A K R I T A S Department o/ Statistics, Unit~ersiO' of Rochester, N Y 14627, USA Received April 1983
Abstract: A proof for the Chernoff-Savage theorem is given based on differentiable statistical functionals. It employs no second order differentiability condition. Keyword9 and phrases: rank tests, Chernoff-Savage theorem, differentiable statistical functionals.
1. Introduction
The Chernoff-Savage theorem (Chernoff and Savage (1958)) is recognized as a landmark contribution to the theory of rank tests, and it has triggered a lot of interesting research. However, because of the highly technical nature of the original as well as subsequent proofs (Govindarazulu, Le C a m and Raghavachari (1967), Pyke and Shorack (1968)), it still remains inaccesible to most graduate students in statistics. I do not know of any statistics book that proves the theorem although some advanced books (H~ijek and Sidhk (1967), Serfling (1980)) state it. The Chernoff-Savage theorem, besides being important in its own right, provides an alternative way of deriving Pitman efficiencies for rank tests and thus facilitates the teaching of this subject. The purpose of this note is to present a simple proof of the Chernoff-Savage theorem using the (relatively) modern approach of differentiable statistical functions. The proof consists of two parts, first derive the two partial Gateaux derivatives and the associated influence functions, and then show that the remainder term is Op(N-1/2). The second part is more technical, but it can be presented heuristically even at first year graduate students (see Remark 1). Indeed, the present proof was devised while the author was teaching the second part of a two semester sequence in mathematical statistics for beginning graduate students at the University of Rochester. In the context of the theory of differentiable statistical functions this note may be viewed as an additional example for handling the remainder term. It is worth pointing out that no second derivative condition was imposed on the score functions (see also Govindarazulu, Le Cam and Raghavachari (1967)). The next section describes the heuristics while Secion 3 contains the proofs.
2. Heuristics
Let X 1. . . . . X,,, be independent identically distributed random variables with distribution function F (iid F ) and Y1. . . . . Yn be iid G and independent of the X's. A general linear rank statistic for testing the hypothesis H0: F = G is of the form m - 152" - i = l J[Ri/(N + 1)] where J is some score generating function, R, Supported by N A T O Research Grant No. 1465. 0167-7152/84/$3.00 © 1984, Elsevier Science Publishers B.V. (North-Holland)
39
is the rank of J~; in the combined sample and N = m + n. Noting that R, = mF,,,(X,) + nG,,(X,) where F,,,, (C;,,) is the empirical distribution function corresponding to X~..... X,,, (Y~. . . . . Y,,). the rank statistic may' be rewritten as n~
rk?-1
~
i=1
J[ pNPm( Xt ) -+-q,,G,,( X, )] = fJ(
p,,k,,,+ qxG,,) d F,,,,
where px = m/( N + 1), q,~ = n/( N + 1). Thus, if we define the functional S( F. G) = fJ( pxF + q,vG) dF, the rank statistic is S(F,,,,, G.). Note that the dependence of S(F, G) on P,v is not indicated explicitly; this will also be the case for other functionals to be used below. The bivariate von Mises expansion for this functional is
(1)
S(f', ( ~ ) = S ( F , G)+S~(/>, F, G)+ £'2(F, G, G ) + R ( F , F, (7, G) where S,(F, F, G ) = ~ t S ( F +
t ( / e - F ) , G)[,= o,
,~2(F, G, G)=~--~S(F, G+ t ( G - G))I,= o
are the two partial Gateaux derivatives and R(F, F, G, G) is the remainder term. The partial derivatives ,~I(F, F, G) and S~(F, G, G) may, formally, be evaluated for any distributions F, G but we will be interested in P = F,,,, G = (~.. It is usually the case that
Sl(P,F,G)=fICI(x)d(P(x)-F(x)),
S2(F,&C)=fIC2(x)d(&(x)-G(x))I
(2)
so that if one can show that R(F,,,, F, (~., G ) = op(N-'/2), the asymptotic distribution of N'/2[S(F.,, Can) S(F, G)] is derived by the usual central limit theorem for triangular arrays. In this section we will derive the influence functions IC 1, IC z, appearing in (2), and will show heuristically that R(P.,, F, Gn, G ) = %(N-1/2). Conditions under which the calculations are valid are given in the next section. We have -
S,(f'.,,F, G) :-dT d f J[ puF + pNt( Pm- F) + qxG] d[ F + t( Pm- F)]l,=o = PufJ'( PNF + qNG)( ~'.,- F) d F + f
J( pNF+ quG)d( £'.,- F).
Integrating by parts the second term above gives
S,( f'.,, F, G)= - q u f J ' ( pNF + quG)( f'.,- F) dG = quf A(x) d( P.,- F)
(3)
by integrating by parts again, where A(x)= f~J'(pNF+ qNG) dG for any fixed x 0. Of course, IC~(x)= quA(x). Similarly, S2(F, (~., G ) =-~TfJ[pNF+qNG+qNt(G.--G)] d dFJ,=0
= qufJ'( PNF + qNG)(G. - G) dF= --qmfB(x) d(O,,- G)I by integration by parts, where B(x)= f~J'(puF+ qNG) dF for any fixed x 0, so that IC2(x ) Proceeding with the remainder term we will show heuristically that
S(£',., (7.)-S(F, G ) = S, (F,,,, F, G ) + .~2(F, (~, G)+op(N -I/2) 40
(4) =
--qNB(X).
(5)
Indeed, we have
S(~,,, O , , ) - S ( F, G ) = f J ( p x P , , , + qxG,,)d( P,,,, - F ) + f [ J( p~.F,,,, + qxG,, ) - J( p x F + qxG )] d F =fJ(p.~.F+qxG)d(~,,-F)
+ f[p,(p,,_
g)+qx(O,,_G)]J,(p,~F+q,G)dF+op(
N 12)
= - f(F,,,,- F)J'( p v F + qxG)d[p~.F + q,.G] +f[Px'(F,,,,-F)+q.~(G,,-G)]J'(PxF+qxG)dF+op(N
=
-an f(P,,,-
1,'_)
g ) J ' ( p x F + q.~.G)dG
+q 'f (Ou -- O)J'(
pNF + q.~,G) d F + % ( U 1/2)
= $1( £',,, F, G) + IS2(F, G,,, G) + %( N ,/2)
(6)
and this completes the heuristic derivation of the Chernoff-Savage Theorem. Remark 1. In deriving the second equality above we used
f [ J( pNF,,,, + qNG,, ) -- J( p,vF + qu G )] d( ~ , , - F) = op( N-,/2 ). This can be heuristically justified by pointing out that (F,,,,- F ) can 'absorb' N ~/2 without blowing up whereas the term in the bracket tends to zero. Similarly for the second term in the same equality.
3. Proofs
In the notation introduced in Section 2 consider the assumptions PN remains bounded away from 0 and 1 as N -~ ~ ,
I J ( s ) l < ~ K [ s ( 1 - s ) ] -11/2)+~ IJ'(s)]<~K[s(1-s)] -'-'a/2)+~
(A1) 0
(A2)
for some K, e positive constants. Theorem. Under assumptions (A1), (RA2) we have that
N,/2S(£',,, G , , ) - S ( F , G)
:=~N(O~ 1)
ON
where O2N= PN 1 VarF[IC1 ( X)] + q~l Vara[iC2 ( X)] and IC 1, IC 2 are defined in connection with relations (3), (4) respectively.
41
Proof. Choose a number 6 < e and set q(s) = [s(1 -- s)] tl/21 8 0 < s < 1. First we will justify the integration by parts performed in the second equality of (3). We have [ P,,,,(x) -
F(x)]IC,(x)lY~
= [ P,,,,(x)-
F(x)]f'J'(pxF+ q,G)
=[F,,,,(x)-F(x)]J(pxF(x)+qxG(x))l*
p.vF+ q~.G-pxF)l**
d(
-px[fv,,,,( x)-F(x)] f~J'(pxF+q,vG)dFj5
Consider first the limits as x tends to m. The first term is, in absolute value, less than or equal to K(1 - F(x))(1 - F(x)) ~1/2),~ __+0 as x --+ ac. whereas the second term is. in absolute value less than or equal to
F(x)) ~
K(1 -
( 1 - F ( y ) ) 1 ~(1 - F ( y ) )
1 (I/2)~,*
d F ( y ) --, 0
as x ---, ~ .
The integration by parts performed in relation (4) is justified similarly. Next to show (5). Note that the first equality in (6) is actually ,
~c
G)=fx,,J(p.~,,,+qN(3,,)d(F,,,,-
S(P,, (7,)-S(F,
F)
+ f[ J( PUP,,,+ qNG,,)II x,,,. ~ , - J ( Px g + qNG)] dr and the main step to be justified is the second equality in (6). First we will show
f[J(PNPm+quG,,)llx,,,.~l _j(pNF+qNG) ] d(fZ, _ F)=%(N
1/2)
(7)
and then we will show
f[ J( P.,~Z, + qN< ) I, x,,,.~, -J( pNF + qxG)] dF =
f[ p N ( P , , -
F ) + q u ( G , , - G)]
J'( pxF + qxG)
dF + %(N-1/2).
(8)
But (7) may be rewritten as
£['""[ ,(pNg,, + qN<)-Ji p,,r + qNC)] dl ~,,- F)+ f[~'Jt pNF + qNG) dF + f,~,[ j{ p~i~o,+ q~<)-Jf pNF + qNC)] at i~o,- g)= o,,~U ,/2)
~9)
and it is easy to see that the last two terms on the left-hand side above are % ( N 1/2). Thus it remains to show that the first term is op(N 1./2); integrating by parts and noting that one of the terms is o r ( N -1/2) we have that (9) will follow from
fx'""ml/2(iS,,,-r X~
<~"ml/2~'mZF']q(g) 42
d[
J( pNF,,,, + qxG,,)- J( p,~,g +
f x'''''q(F)
+q~(7'")-J(PNF+q"G)]'
qNG)] = op(1).
I n t e g r a t i n g by parts again, and noting that one of the terms is %(1), this reduces to showing that
pxF,,,,+qxC;,,)dq(F)
PxF+q,,G)dq(F) -P- > o .
-
(lo)
In order to do this, note that by the well k n o w n linear b o u n d s for the empirical d i s t r i b u t i o n function (Shorack (1972)) we have that. with p r o b a b i l i t y as high as it is desired, ( p ~ / K ) F < p ~ , + q ~ ( 7 , , on [ Xtl ,, oc) and
(p,./K)(1 - F)
- ~,,) + q , ( l
=-l-p.~F,,,-qxG,,
- (~,,) + ( 1 - P x -
q,)
on(-~.X~,,~),
where K is large enough ( K will also stand for a generic constant from now on). Thus, by (A2), we have that, with p r o b a b i l i t y as high as it is desired,
Ij( p,,.b,,,, + q,d,,)i < K[( p~,./;,,,+
q., (~,,)(1 -
1/2+,
px~,,- q,,. (~,,)]
<, 2 . . ,
on [Xli,. X,,,>]
a n d f [ F ( 1 - F ) ] -~1/2~+" d q ( F ) < oo, so that the d o m i n a t e d convergence t h e o r e m applies with p r o b a b i l i t y as high as it is desired a n d from this (10) ( a n d thus (7)) follows easily. Next, in o r d e r to show (8) it suffices to show
Ull2fx'"'>[p~(~,, F)+qx((J,,-G)][J'(p,,F+q,~.G)-J'(z)]
dF=%(1)
(11)
X(ll
where z lies between p,,, F,,,, + qNG,, a n d
]N,12(iS,, _F)j,(z)]<~
pivF+ qNG. N o t e
that
F,,,, q(
,.5+,
so that, using the linear b o u n d s for the empirical d i s t r i b u t i o n function again,
[F(1-F)]¢l/2i-a [z(1-z)]
~
15 <
[F(1-F)]¢I/2)-~ IF(l-F)]
K[F(1 _ F)]
l+~-,~
15
with p r o b a b i l i t y as high as it is desired. Similarly,
iN1/2(~a,_G)j,(z)]~< ull2Gn -G q(G)
Iq(G)Klz(1-z)] -1.5 +'t'
C o n s i d e r first a b o u n d for the above as x --> oo; if (1 - G ) / ( 1 - F ) = O(1), as x --+ oo, then (1 - G) < K(1 - F ) and we arrive at the same b o u n d as before (i.e. K(1 - F ) -I+* a) as x ---> oo; if (1 - G ) / ( 1 - F ) ---> oo, as x ~ oo, then (take for simplicity z =p:~.F+ qxG)
$~1/2~ al'l G) (I/2) a t 1-F s -
(l-G)
0/2) a
(l-G)
[1-pNF-qlvG]l5"
¢'/2, ~
[Px(1-F)+qx(i-G)]
~<(1-F)
l+,-~
(1-_F
15-"
(1-Fll'5
" [ p x + q , ' ~ ' ~ l - G ] 15-"
asx--+~
so that the same b o u n d holds even in this case. Similarly for x ---> - oo. W e have shown now that, with 43
p r o b a b i l i t y as high as it is desired, the i n t e g r a n d in (11) is b o u n d e d by 4 C K [ F ( 1 - F ) ] - l ÷ ~ - a (which is i n t e g r a b l e with respect to F ) , where C is a large enough constant to b o u n d IIN1/2(F,,,,- F ) / q ( F ) I I and IIN 1/2((7,,,_ G ) / q ( G ) I t with the desired p r o b a b i l i t y . Thus the d o m i n a t e d convergence theorem applies with p r o b a b i l i t y as high as it is desired a n d this shows (11) and hence (8). R e m a r k 2. The C h e r n o f f - S a v a g e theorem is usually stated in terms of some score function a ~ ( i ) a n d then requiring that m
N
1/2 ~ ,
[a~(R,)-q~(RJN)]
~0
i=l
R
in p r o b a b i l i t y with O(u) = lim a x ( 1 + [uN]), 0 < u < 1. and N 1~2aN(N) ~ O. U n d e r these conditions, however, the statistic ~',"_ ~ax ( R i) is a s y m p t o t i c a l l y equivalent to the statistic we considered with J ( . ) = qs(. ) (cf. Hfijek and Sidfik (1967, p. 235)). R e m a r k 3. We have actually shown that the r e m a i n d e r term is % ( N - 1 / 2 ) u n i f o r m l y in F, G. Next it is easy to show that, for 8' satisfying (2 + 6 ' ) ( - 0 . 5 + e ) > - 1 the 2 + 6' m o m e n t s of IC 1 a n d IC 2 are b o u n d e d u n i f o r m l y in F, G (see C h e r n o f f and Savage (1958, p. 977)). Thus frorfi a generalization of the Berry Esseen theorem (Esseen (1945, p. 43)) we get that the a s y m p t o t i c n o r m a l i t y in the T h e o r e m holds u n i f o r m l y in F, G p r o v i d e d IC 1, IC 2 have variances b o u n d e d away from zero.
References Chernoff, H. and I.R. Savage (1958), Asymptotic normality and efficiency of certain nonparametric test statistics, Ann. Math. Statist. 29, 972-994. Esseen, C.G. (1945), Fourier analysis of distribution functions, Acta. Math. 7"/, 1-125. Govindarajulu, Z., L. Le Cam and M. Raghavachari (1967), Generalizations of theorems of Chernoff and Savage oq asymptotic normality of nonparametric test statistics. Proc. 5th Berk. Syrup. on Math. Statist. and Probab. 1, 609-638.
44
Hfijek, J. and Sidfik, Z. (1967), Theory of Rank Tests, Academic Press, New York. Pyke, R. and Shorack, G. (1968), Weak convergence of a two-sample empirical process and a new approach to Chernoff-Savage Theorems, Ann. Math. Statist. 39, 755 771. Serfling, R.J. (1980), Approximation Theorems of Mathematical Statistics, Wiley, New York. Shorack, G.R. (1972), Functions of Order Statistics, Ann. Math. Statist. 43, 412-427.