Note on interpolated order statistics

Note on interpolated order statistics

Statistics & Probability North-Holland Letters 27 May 1992 14 (1992) 129-131 Note on interpolated order statistics Jukka Nyblom Department of Ma...

182KB Sizes 0 Downloads 193 Views

Statistics & Probability North-Holland

Letters

27 May 1992

14 (1992) 129-131

Note on interpolated

order statistics

Jukka Nyblom Department of Mathematical Sciences /Statistics, University of Tampere, Finland Received

July 1991

Abstract: Confidence intervals for an arbitrary population quantile based on interpolating The obtained interval is shown to have approximately the required coverage probability generalization of what Hettmansperger and Sheather (1986) proposed for the median. Keywords: Confidence

interval,

distribution-free,

nonparametric,

adjacent order statistics are presented. over continuous distributions. It is a

quantile.

1. Introduction random sample from a continuous population with the Let xc,)
being independent from F. Thus for r < s, P(x(,, G E, G xc,)) = rs - rrr. In practice one tries to find r and s such that rrr = 4~ = l- rs leading to an equal tail probability confidence interval with an approximate confidence coefficient y = 1 - (Y. However, if n is not very large the exact coefficient rr, - rrr may be far from the desired one. In the case of median Hettmansperger and Sheather (1986) solved this problem by interpolating the two confidence intervals which have the closest confidence coefficients from above and from below to y. In order to be specific, suppose that [xCk,, x(,_~+ r)] and x(,-k)] give confidence intervals for the median with confidence coefficients yk & y 2 yk+ r. 1x (kfl), Denote I=

(n -k)Z

Yk-Y *=

Yk-Yk+l’

k+(n-2k)Z’

the confidence coefficient y Then [(I - &+) + +k+l), (1 - h(,_,+,) + Ax(, _kJ has approximately for a wide range parent distributions as was numerically shown by Hettmansperger and Sheather (1986). Their argument leading to (1) was based on the Laplace distribution. Thus if the parent distribution is truly Laplace, then the approximation is exact. The objective of this paper is to give a mathematically sound basis for the finding of Hettmansperger and Sheather (19861, and moreover to generalize the method to give approximately distribution-free confidence intervals for any quantile. Correspondence to: Jukka Tampere, Finland. 0167-7152/92/$05.00

Nyblom,

Dept.

of Mathematical

0 1992 - El sevier Science

Publishers

Sciences/Statistics,

University

B.V. All rights reserved

of Tampere,

P.O.

Box 607, SF-33101

129

Volume

14, Number

2. Approximate

STATISTICS

2

confidence

Let us seek a formula

& PROBABILITY

27 May 1992

LETTERS

intervals

for A = A(@, p) such that with fixed p,

F(Sp < (1 - A)xc,, + Axc,+rJ

= P.

Then A($, p) leads to the interpolation coefficient in the upper bound. Consider the probability F((1-A)x(,j+Ax(,+,j>x)

coefficient

=F((l-A)(x(,,-x)

in the

lower

bound

+A(xcr+r,-x)

and

A(1 - ia,

p)

to the

>O).

Then Proposition 1 of Hettmansperger and Sheather (1986) applied to the variables x(r) -x, . . . , xc,) -x [an ordered sample from the population with the distribution function F,(.> = F(. + x>] gives F((1

- A)xc,, + Axc,+u >x)

= F(x(r+l)>x)

x-Ay/(l-A))]r[l-F(x+y)]“-‘~lf(x+y)

-n

dy, (2)

where f(x) = F’(x). When Y or IZ - r is moderately large the main from the vicinity of the origin. Expanding at y = 0 we get

contribution

in the integral

comes

(3)

rlogF(x-&)=rlogF(x)-&$$y+o(y), (n-r-1)

log(l-F(x+y))

=(n-r-l)log(l-F(x))-(n-r-1)

f(x) 1 -F(x)

(4)

Y +0(y).

Designate

m=m(x)

=

rA(1 -F(x)) (1 -A)F(x)

(5)



From (3) - (5) it follows that r,,,F(x-&)

+(n-r-1)

lo&l-F(x+y))

=rlogF(x)-mlog(l-F(x))+(n+m-r-l)log(l-F(x)) -(n+m-r-l)

0x1 l-F(x)

y+o(y)

=rlogF(x)-mlog(l-F(x))+(n+m-r-1)log(l-F(x+y))+o(y). When

neglecting

the o(y) term the integral

in (2) is approximately

equal to F(x)~(~

F(x)~(l-F(x))-m~m(l-F(~+y))“+m-r-lf(x+y)

130

dy=

-F(x))“-r

n+m-1



Volume

14, Number

Table 1 Confidence

2

intervals

Sample size

Quantile

11

0.25

14

0.25

30

0.10

STATISTICS

for population

quantiles

(nominal

& PROBABILITY

values

Confidence interval

Distribution

(X(1,. X(h)) (X(2,, X(S)) interpolated interpolated interpolated interpolated

LE’I-I-ERS

21 May 1992

in parentheses) Tail probabilities below the lower limit

above the upper limit

continuous continuous normal Laplace Cauchy uniform

0.0422 0.1971 0.0489 0.0482 0.0477 0.0497

(0.05) (0.05) (0.05) (0.05)

0.0343 0.1145 0.0501 0.0518 0.0543 0.0484

(X(1,~ X(7,) (X(2,. X(8,) interpolated interpolated interpolated interpolated

continuous continuous normal Laplace Cauchy uniform

0.0178 0.1010 0.0243 0.0237 0.0232 0.0252

(0.025) (0.025) (0.025) (0.025)

0.0383 0.0103 0.0259 0.0268 0.0276 0.0251

(X(,,~ X(6,) (X(2,? X(7,) interpolated interpolated interpolated interpolated

continuous continuous normal Laplace Cauchy uniform

0.0424 0.1837 0.0488 0.0484 0.0474 0.0502

(0.05) (0.05) (0.05) (0.05)

0.0732 0.0258 0.0514 0.0523 0.0548 0.0492

Coverage probability

(0.05) (0.05) (0.05) (0.05)

0.9234 0.6883 0.9010 0.9000 0.8980 0.9013

(0.90) (0.90) (0.90) (0.90)

(0.025) (0.025) (0.025) (0.025)

0.9439 0.8887 0.9498 0.9495 0.9492 0.9497

(0.95) (0.95) (0.95) (0.95)

(0.05) (0.05) (0.05) (0.05)

0.8844 0.8905 0.8998 0.8993 0.8978 0.9006

(0.90) (0.90) (0.90) (0.90)

which leads to the approximate formula F(x)r(l

( 1

q1 -+(r,+Ax(r+l,>x) =W(r+l)>X) -n “, l n-r m(x)

+n-r

P(xc,, >x) +

m(x)

-F(x))“-r +n --r

m(x) m(x) fn-r

fYxcr+l)>x),

(6) since

fYX(r+l)>x) - P(X(r,>x)

= (;)F(x)r(l

-F(x))“?

From (6) with (5) we immediately see that F((1 - h)x(,, + Axe,,,) > is approximately distribution-free. Putting the right hand side of (6) equal to p, replacing x by [* and solving A we find r(l -p)(rr+r

-P)

(n-I-)p(P-rr) Clearly A(rrr, p) = 0 and A(T~+~, p) = 1 as they should. The formula (1) appears, if p = r. Table 1 shows some numerical results which affirm the usefulness of interpolation. Note that the maximum discrepancy between the exact and nominal coverage probabilities is 0.0022. The figures in Table 1 are found by numerical integration when necessary. Reference Hettmansperger, T.P. and S.J. Sheather (1986), Confidence intervals based on interpolated order statistics, Statist. Probab. Lett. 4, 75-79. 131