Dispersive comparison of distributions: a multisample testing problem

Dispersive comparison of distributions: a multisample testing problem

ELSEVIER Statistics Dispersive & Probability comparison Letters 21 (1994) 237-245 of distributions: problem Leszek Marzec*, Mathematical Insti...

551KB Sizes 2 Downloads 32 Views

ELSEVIER

Statistics

Dispersive

& Probability

comparison

Letters

21 (1994) 237-245

of distributions: problem

Leszek Marzec*, Mathematical Institute, University Received

A multisample

testing

Pawel Marzec

of Wroclaw, 50-384 Wrociaw, Poland

May 1993; revised November

1993

Abstract A k-sample testing problem for the dispersive comparison of distributions in the fully nonparametric sense of Bickel and Lehmann (1979) and Lewis and Thompson (1981) is considered. A functional which is monotone with respect to the ordering in dispersion is proposed. Asymptotic laws of the resulting test statistics are established. The tests are shown to be asymptotically distribution free and consistent. A result concerning the optimality in this class of tests, under a sequence of local alternatives, is obtained and asymptotic relative efficiencies with respect to other tests for some specific alternatives are given. Key words:

Nonparametric;

Dispersive

ordering;

k-sample scale problem;

Test for dispersion;

Order statistic; Invariance

1. Introduction The concept Lehmann

of the dispersive

comparison

(1979) and Lewis and Thompson

Definition 1. The distribution d(F) d d(G), if and only if F-‘(P)

- F-‘(N)

d G-‘(p)

function

of distributions

(1981) is based

in the fully nonparametric on the following

(d.f.) G is said to be at least as dispersed

- G-‘(a)

whenever

sense

of Bickel

and

definition.

as the d.f. F, written

0 < CIc: /I < 1.

Here F - ’ and G -’ denote the left continuous inverses of the corresponding d.f.‘s. In the sequel d(F) = d(G) means d(F) d d(G) and d(G) 6 d(F) whereas d(F) < d(G) means d(F) d d(G) and d(F) # d(G). Obviously, d(F) = d(G) is equivalent to F(x) = G(x + ,u) for some ,U and all real x. Properties of the ordering in dispersion have been studied by many authors including Bickel and Lehmann (1979), Lewis and Thompson (1981), Oja (1981), Shaked (1982, 1985), Lynch et al. (1983), Deshpande and

* Corresponding

author

0167-7152/94/$7X10 0 1994 Elsevier Science B.V. All rights reserved SSDI 0167-7152(94)00012-W

238

L. Marzec,

P. Mar-cc

1 Statistics

& Probability

Letters 21 (1994) 237-245

Kochar (1983) Droste and Wefelmeyer (1985) and Bagai and Kochar (1986). The equivalent version of this ordering has also been discussed by Doksum (1969). Let k 2 2 be a given integer and let Fi, FZ, . . , Fk be d.f.‘s. In this paper we are concerned with the problem of testing the null hypothesis Ho: d(F,) = ... = d(F,) against

(1)

the alternative

H,:

d(F,) < ... < d(F,),

where at least one of the inequalities is strict. Obviously the two-sample dispersion testing problem, i.e. when k = 2, arises as a special case. Note that if F,(x) = F((x - ~i)/ai), i = 1, . . . , k, then H,, states that c1 = . . . = CT~whereas HA means o1 < ... < ok with at least one strict inequality, and therefore (1) generalizes a classical multisample dispersion testing problem to a fully non-parametric case. Also note that for continuous and increasing d.f.‘s which vanish at zero, Ho states that F1 = ... = F, whereas HA is a subhypothesis of the upward trend alternative H,: F1 2 ... > F, with at least one strict inequality. For the relations with other well-known testing problems see Bagai and Kochar (1986) and Marzec and Marzec (1993). It should be noted that recently, for the case k = 2, the following functionals which measure deviations from Ho towards H, have been considered: Al =

-f22Wl dx,

a, U:(x)

-cc

s

wherefdenotes 42 =

the density

“{&(x)Cl

-

s0

of F

1 ss0

’ {aB(x)+(l 0 ’ [F;‘(x)

where B(t) = s A4 =

and Kochar,

(Aly, 1990)

-COB(y)-B[ax+(l

- F;‘(x)]

(2)

-a)y])dxdy,

d x and ol~(O, l/2]

(Marzec

and Marzec,

1991b),

0

1 { 1 - [f2 F; ’ (x)/f1 F; l(x)] ‘j2} dp(x) s

1989),

Fzb)l - F,(x)Cl - F,(x)13 dx,

where F1 and F, are life d.f.‘s A3 =

(Ahmad

(Marzec

and Marzec,

1992),

0

A, = m

{CF,(F;‘(x)

where the integer

m3 2

+ u) (Marzec

xlm-’ - [F,(F;‘(x) and Marzec,

+ u) - x]~-~}

dxdu,

1993).

In the present paper the newly defined functional is used to propose tests for the k-sample, k 2 2, dispersion testing problem (1). In Section 2 a motivation of this choice is given. Section 3 presents a class of tests for the problem (1). The proposed tests are shown to be asymptotically distribution free and consistent. Section 4 is devoted to the Pitman asymptotic relative efficiency comparisons with the Ahmad and Kochar (1989) tests, Aly (1990) test and Marzec and Marzec (199 1b, 1993) tests for some particular d.f.‘s belonging to HA. Moreover, in Section 5, the optimal member in the proposed class of tests is identified by obtaining the weighted coefficients which maximize the asymptotic power under a sequence of local alternatives.

239

L. Marzec. P. Marzrc / Statistics & Probability Letters 21 11994) 237-245

The following notation will be used in the sequel. Given a random sample X = (X1, . . . , X,), we denote by Xi:,< ... < X,,:, the order statistics of X. If X and Y have d.f.‘s F and G then X d stY means F 3 G. Moreover, @ denotes the standard normal N(0, 1) d.f.

2. Preliminaries In this section we define a functional which is increasing with respect to the ordering that we provide a useful lemma which will be used in the sequel.

in dispersion.

After

2.1. A jiunctional for dispersive ordering Let a(x) be a continuous function defined on the interval [0,11, such that a(0) = 0, a(x) < 0 for x ~(0,1/2) and a(x) + a(1 - x) = 0 for x E [0,11. Given the d.f. F, let us define the functional 1 da(F) = Obviously,

a(x) F - ’ (x) dx.

s

(3)

0

from the fact that si a(x) dx = 0 we obtain

Lemma 1. Zfd(F)

< d(G) then 6,(F)

Proof. By the symmetry

6,(F)

= 6,(G)

whenever

d(F)

= d(G).

< 6,(G).

of the function

a(x) we obtain

that

i/2 6,(G) - 6,(F)

= -

a(x)[G-l(l-x)-F~‘(l-x)-G-l(x)+F~’(x)]dx. s

0

Since - a(x)>O,

G-‘(l-x)-G-l(x)-[F-‘(l-x)-F-‘(x)]>O,

and for x in a neighbourhood

of zero the above inequality

forOdx<1/2, is strict, the required

result follows.

0

2.2. An auxiliary lemma Let a;(F)

= 2

JJ

aCf’(x)la CFMI F(x) Cl - FM1 dxdy. X
For a(x) = 1, x E [O,l],

we use the notation

Lemma 2. Ifthefunction la;(F)

a(x) is difSerentiuble

- o,2(G)J < [a;(F)

-t o:(G)]sup fER

+

sup l4t)l SUP rs[O.

For a proof see the appendix.

11

(4)

a:(F). and a(x) + a(1 - x) = 0, XE[O, 11, then

1F(t) - G(t)1 f

sup

la(t)l

ts[O,l]

t

sup ts10,

y 11

+

sup te[O, 11

Ia(t)l

1

240

L. Marzec,

P. Marzec

3. Tests for dispersive comparison

/ Statistics

& Probability

Letters 21 (1994) 237-245

of k-distributions

Given the vector X = (X(l), . . . ,Xck’) of k, k > 2, independent samples X@) = (Xi’), . , X,(f)), where X(‘) comes from the d.f. FL, i = 1, . . . , k, consider the problem of testing Ho against HA in (1). Let a(x), x E [0, 11, be the function specified in Section 2 and let B = (B2, , Bk) be the vector with positive components. We define

dff,a(F13...,Fk) = C BiCGa(Fi)-6a(Fi-I)lt i=Z

(5)

where 6, is given by (3). Obviously, under H,,, dB,, = 0 whereas by Lemma 1, under HA, As,, > 0. Thus A,,, can be taken as a measure of deviation from H,, towards H,. It should be noted that, in the case k = 2, A equals A3 given in (2) provided B = 1 and a(x) = x(x/[2ol(l - a)] - 1) for 0 < x < 51, a;; = (2x - l)a/[2(1 - cc)] for c( < x < l/2. Consider the estimators for As., of the following form: 7’,,,(X) = i

Bi

i=2

[

;^I$’ I

U(j/ni)Xli’n, - $“‘-f’ I

J-1

1

a(j/ni_,)Xjfny!! j=l

(6)

1

Obviously, T,,, is location invariant. The following lemma shows that TB,a has some isotonic property (see e.g. Barlow and Doksum, 1972; Marzec and Marzec, 1991a). Let Y=(Y”‘, . . . . Ytk)) be the vector of independent samples YcO= (Yi”, . . , Yij)), i = 1, . . . , k, where YcO comes from the d.f. Gi. Then for continuous d.f.‘s we have-the following lemma. Lemma 3. Let d(Gl) d d(F,),

d(Fj) = d(Gj)for

j = 2, . . . , k - 1, d(Fk) < d(Gk). Then T,,,(X

For a proof see the appendix. The expression (6) shows that T,,,(x) is not a distribution free statistic. But it appears that properly normalized T,,,(X) is asymptotically distribution free. LetN=nl+ ... + nk. Given the vector B = (B2, . . . , &) and the function a(x), x E [0,11, specified in Section 2, let us assume that (1) ni/N+pi,piE(O,l),i=l,...,

k,asN+a,

(2) CfZl&,X2 < m 3 a(t)I < (3) suPS[o, 11 l&W) t4) cf=l tBi+I - Bi)20,2(Fi)

0-z 3

> 0,

where B1 = Bk+l = 0 and 0,’ is given by (4). The under (l)-(4)

Theorem 1. fi[T,,,(X)-

As,,(F1,

. . . . Fk)]/VB,a(X)%

we have the following

theorem.

@, as N-+ 00.

Here vi,.(X)

(7)

= i$l F (Bi+l - Bi)2u,(X(i))> L

where UB(Z1, . . ..Z”) = -$l$l

Cljm

=

2 if j # m,

ajj

m$l

=

1.

@jm u(j/v)u(m/v)m(v

-j)bj+irv

-

zj:v)(zm+l:v

-

zm:v)>

(8)

241

L. Marzec, P. Marzec 1 Statistics & Probability Letters 21 (1994) 237-245

Proof. First observe

that in view of Shorack’s

(1972) result we conclude

that (9) i = 1, . . , k. Since

as N --f CYZ , where the matrix C = (Dij)k x k is such that oij = 0 for i # j and gii = ai(Fi)/pi, forb=(B,-B,,B,--B,,...,B,--B,+,)wehave (9) implies

Bi)2ai(Fi),

Let Fi denote

the empirical

that

d.f. based on the sample

X@, i = 1, . , k. Since

2

, it follows from Lemma a,2(F^i)~

i =

1, . . ..k.

2, in view of the law of large numbers 1, . . . , k. Moreover,

o,Z(Fi),asN+a,i=

where u, is given in (8). Consequently,

and the Glivenko-Cantelli

a computation

in view of (7) Slutsky’s

theorem

Corollary.

Given the level cc~(O, l), the test t+bB,a(X) which rejects

alternatiue

HA if fiT,,,(X)/

theorem,

shows that c2(F1) = u, (Xc’)), i = 1, completes

the proof.

the null hypothesis

that . , k,

0

H, in favour

of the

VB,o(X) > @ - ’ (1 - a) is consistent.

Proof. Obviously, P{fiT,,.(X)/V,,.(X)

> @-‘(I

-a)>

= PI&T,,.(X) >@-I(1

Since under

4. Examples Consider

H, we have As,U(Fl, . . . , Fk) > 0, Theorem

-Z)-

1 completes

- ~e,,(Fr, JN&,,(Fl>

the proof.

. ..>Fk)]/f’s.#) ..AJl~B,a(X)J.

0

and efficiency comparisons the following

skew-symmetric

weight functions

u(x), x E [0,11:

(1) a(x) = sin[rc(2x - 1)2m-1], rn~N, (2) a(x) = (2x - 1)2”-’ 1sin(27cx)l, mu N, (3) a(x) = tg[Prt(x - l/2)] sin(rrx), /?E(O,l). The corresponding tests $B,a (X), of the form given in corollary, are denoted by A:,;, A:), , AFL, respectively. To see how these tests perform we consider the case k = 2 in (1) and study their Pitman’s asymptotic relative efficiencies (ARES) with respect to tests proposed for the two-sample dispersive testing problem i.e. to the Ahmad and Kochar (1989) tests (AK), Aly (1990) test (A), L0.5,N and U3,N tests of Marzec and Marzec (1991b, 1993). We consider a sequence of sample sizes m, and n, satisfying my/N, + p, p ~(0, l), as v -+ co, where (Fe,,, G), v 2 1, from H,.,, where 8, = 610+ C/N,“*, C is N, = m, + n,,, and a sequence of alternatives a constant and d( Fe,,) = d(G).

Marzec, P.

242

/ Statistics

Probabilitv Letters

AK, A,

U3,N, A&f&,

(1994) 237-245

1 ARES

F, FZ F3 F‘I

the A$:‘,

relative

0.9575 3.7175 7.0629 1.1184

to

0.9575 0.9290 0.8548

0.7609 2.4240 5.2039 1.3265

0.9588 0.9276 0.8353 1.0245

1.0998 1.0107 0.8824 0.9950

Ah!&,, tests

1.0235 1.0125 0.9058 0.9893

0.9447 1.0656 1.2603 1.0967

Since all the above tests are location invariant the shifts parameters of the considered d.f.‘s are taken to be zero. We discuss ARES with respect to the following families of d.f.‘s (see e.g. Marzec and Marzec, 1991b): (a) (b) (c) (d)

the the the the

exponential proportional hazards family: F1(x,B) = 1 - exp[-(1 + 0)x], 8 > 0, x > 0, Makeham family: F,(x, 0) = 1 - exp[-x - 6(x + eeX - l)], 8 3 0, x >, 0, linear failure rate family: F,(x, 0) = 1 - exp[ -x - 0x2/2], 0 2 0, x > 0, normal scale family: F4(x, 0) = @(x/Q), 0 < f3< 1, - CC < x + cc.

For Fi, F2, F3 and F, let 0, = 0 and Q0 = 1, respectively. In Table 1 ARES of the Ay)N test relative to the AK, A, L 0.5,N, U3,N, A$, Az)N, A?$,, tests are displayed. Since the Aly (1990) test (A) is only defined for life d.f.‘s the corresponding ARE value for F4 is not given in Table 1. ARES of the .4:fk, test with respect to the Ahmad and Kochar (1989) tests (AK) are presented for p = l/2. To obtain the corresponding ARES for other p, p~(0, 1) each value of the first column of the table ought to be multiplied by 2 max(p, 1 - p). It is clear from the table that the proposed tests generally perform well for the alternatives considered. Moreover, it is seen from Table 1 that the good competitors as compared to the AK, A, L0,5.N, U3,N tests are: for F1 - the I$$,~ test, for F2 - the .4fk test, for F3 - the test and for F4 -the Atk test. The A:‘,, Af’N and AF)N, Ab_l$,Ntests generally perform better than, for considered. other ~EFV, ~E(O, l), the Ac,)N, Ag)N, AFL tests, respectively, for the four alternatives

A&

5. Close alternatives

and locally optimal test

In this section some asymptotic properties for the tests Gs,,(X) specified in corollary of Section 3, under a sequence of alternatives which converge to the null hypothesis Ho are studied. Given k 2 2, from H, of the form N = nl + ... + nk, let (F1,N, . . . . F,,,), N 3 2, be the sequence of local alternatives

Fir:(X) -

Fil:,N(X)

= L

t(X)

+

Ci,N,

XE(O,

I),

fi

i = 1, . . . , k, where t(x) is an arbitrary SUP

1Fi,N(t)

- F,(t)1 + 0,

non-decreasing

as N 4 CO,

function,

Ci,N is a real constant,

and assume

that (11)

tER

i=l , . . . , k, where d(F,) = ... = d(F,). From now on all considered d.f.‘s are assumed to have the third absolute moment finite. Moreover, let p1 = ... = pk, where pi is given by condition (1) in Section 3, and let the function a(x), x E [0,11, satisfy the regularity conditions of Sections 2 and 3. We have the following theorem. Theorem 2. Given the function a(x), XE [0,11, the asymptotic power of the test $B,a(X) is maximized under the sequence of alternatives given by (10) and (11) for B = B”, where B: = (i -

l)(k - i + l),

i = 2, . . . , k.

(12)

L. Marzec. P. Marzec 1 Statistics & Probability Letters 21 (1994) 237-245

243

Proof. By assumption fiT,,,(X)/ VB,n(X) is based on the vector X = (Xhi), . . . ,X$@) of independent samples, where X# comes from the d.f. Fi,N, i = 1, . . . , k, specified by (10) and (11). Let Fi,N denote the empirical d.f. based on X$‘, i = 1, . . . , k. Since sup IF^i,N(t) - Fi,~(t)l 6 st sup

IGi(t) - G(r)),

ts[o,11

fER

where G denotes the uniform (0,l) d.f. and Gi means the empirical d.f. based on the sample of size ni from G, the left-hand side of the inequality converges in probability to zero, as N -P co. Thus (11) and Lemma 2 imply that

i = 1, . . . , k, where a,2 is given by (4). Consequently, in view of (10) and (1 l), by applying of Berry-Esseen’s theorem (see Helmers et al., 1990) and Slutsky’s theorem we obtain P{#TB,a(X)/VB,.(X)

> @-‘(1

-a)}

-+ @

- @-‘(1

- u) +

J: +)<(x)dx $

an extended that

+)

asN+

version

co,

I

a,(F,)

where (13) Now the problem is to find the vector B” which maximizes y(B). The necessity condition be the extremum of the function y leads to the following system of equations: k

(&c’)2=(2B~-Bj9_,-B~+,)

i=2

i

BP,

j=2,

. . ..k.

required

for B” to

(14)

i=2

where B,O = Bks, 1 = 0. It can be shown that B” given by (12) is the unique solution maximum of the function y given by (13). This completes the proof. 0

of (14) and is the point of

Acknowledgement The authors

thank

a referee for his valuable

comments

and suggestions.

Appendix Proof of Lemma 2. Since I[1 - F(y)] F(x) - [l - G(y)]G(x)l obtain in view of (4) that fla,?(F)

- d(G)1 ,<

ss

,< sup

t,RIF(t) - G(t)1Cl - G(Y)+ f’(x)1 we

I~CFMI~F(Y)I (Cl - F(~)lf’b)--Cl -G(y)lW)) XCY

- aCG(x)laCG(~)l)Cl - G(Y)IW)I dxdy d II + 12 + 13,

+{aCF(x)laCF(~)l

244

L. Marzec,

P. Marzec

/ Statistics

& Probability

Letters 21 (1994) 237-245

where II = sup IF(t) - G(t)1

I,

= ss

laCF(x)ll

ss

tER

IaCFWll

laCF(y)l

laCG(~)ll

laCF(x)l - aCG(x)ll Cl -

ss X
By the symmetry

-

of u(x) we obtain

lZ1j < fo:(F)supIF(t)

aCG(~lll

G(y)

+ F(x)1

dxdy,

Cl -

G(y)lGWdxdy,

and

rew,

sup

y 11

[ te[O,

by using the mean value Lagrange

IliI d f c:(G)

- G(t)1 SUP Ia(

supIF tER

Consequently,

fE[O,

by the symmetry

11

dxdy.

!$!

+

11

theorem $ SUP re[o, 11 I

of considerations

Proof of Lemma 3. Note that by the symmetry

sup tsco.

la(t)1 11

1 .

we have the following a(t)

3

inequality

i = 2,3.

1

the proof is complete.

0

of u(x), x E [0,11, we obtain

P@i)

1

C U(j/tli)Xjfij

G(Y)IGW

that

- G(t)1 sup

tER

n,-

[I1 -

X
13 =

Moreover,

I~F(Y)II

XGY

= - 2

j=l

u(j/n.)(X? ,

nx J.:n, -

XV, n,,) 3

i =

1, ....k.

(A.11

j=l

where p(n) = [(n + 1)/2] - 1 and [x] means the integer non-decreasing in x, G;’ F1 (x) - x is non-increasing j = 2, . . ..k 1, we obtain in view of (A.l) and (6) that

hfl:$’ U(j/n,)c,~lFi(X:'!,)-~i-~~l J-1 I

part of x. Since by assumption G;’ FL(x) - x is in X, in x, and G/ ’ F,(x) - x is constant

U(j/,i_l)Gi_:Fi_l(Xlf,!),) L

1

j=l

1.

Since (G;‘F,(X~“),

the proof is complete.

. . . . G;'F,(X,$,"), ....

G;lt’k(X;;)))=stY

0

References Ahmad, LA. and SC. Kochar (1989), Testing for dispersive ordering, Statist. Probab. Lett. 7, 179-185. Aly, E.-E. (1990), A simple test for dispersive ordering, Statist. Probab. Lett. 9, 323-325. Bagai, I. and SC. Kochar (1986), On tail-ordering and comparison of failure rates, Commun. Statist.-Theor. Methods 15, 1377-1388. Barlow, R.E. and K.A. Doksum (1972), Isotonic tests for convex orderings, in: Proc 6th Berkeley Symp. Math. Statist. Probab. I (Univ. of California Press, CA) pp. 293-323. Bickel P.J. and E.L. Lehmann (1979), Descriptive statistics for nonparametric models, IV. Spread, in: Jureckova ed., Contributions to Statistics, Acad. Prague, pp. 3340. Deshpande, J.V. and S.C. Kochar (1983), Dispersive ordering is the same as tail-ordering, Adv. Appl. Probab. 15, 686-687. Doksum, K. (1969) Starshaped transformations and the power of rank tests, Ann. Math. Statist. 40, 1167-l 176. Droste, W. and W. Wefelmeyer (1985), A note on strong unimodality and dispersivity, J. Appl. Probab. 22, 235-239. Helmers, R., P. Janssen and R. Serfling (1990), Berry-Esseen and bootstrap results for generalized L-statistics, Stand. J. Statist. 17, 65-71.

L. Marzec. P. Marzec / Statistics & Probability Letters 21 (1994) 237-245

245

Lewis, T. and J.W. Thompson (1981) Dispersive distributions and the connection between dispersivity and strong unimodality, J. Appl. Probab. 18, 7&90. Lynch, J., G. Mimmack and F. Proschan (1983), Dispersive ordering results, Adu. Appl. Probab. 15, 8899891. Marzec, L. and P. Marzec (1991a), Testing the dispersive equivalence of two populations, Statist. Probab. Left. 12, 233-237. Marzec, L. and P. Marzec (1991b), On testing the equality in dispersion of two probability distributions, Biomefrika 78, 9233925. Marzec, L. and P. Marzec (1992) A class of tests for the dispersive-equivalence of two probability distributions, Calcutta Statist. Assoc. BuU. 42, 129- 134. Marzec, L. and P. Marzec (1993), Tests for dispersive comparison of two distributions, Commun. Statist.-Theor. Methods 22, 843-851. Oja, H. (1981), On location, scale, skewness, and kurtosis of univariate distributions, Stand. J. Sfatist. 8, 154-168. Shaked, M. (1982) Dispersive ordering of distributions, J. Appl. Probab. 19, 31@320. Shaked, M. (1985) Ordering distributions in dispersion, in: S. Kotz and N.L. Johnson, eds., Encyclopedia ofStatistical Sciences, Vol. 5, (Wiley, New York). Shorack, G.R. (1972) Functions of order statistics, Ann. Math. Statist. 43, 412-427.