Statistics & Probability North-Holland
Letters
27 May 1992
14 (1992) 129-131
Note on interpolated
order statistics
Jukka Nyblom Department of Mathematical Sciences /Statistics, University of Tampere, Finland Received
July 1991
Abstract: Confidence intervals for an arbitrary population quantile based on interpolating The obtained interval is shown to have approximately the required coverage probability generalization of what Hettmansperger and Sheather (1986) proposed for the median. Keywords: Confidence
interval,
distribution-free,
nonparametric,
adjacent order statistics are presented. over continuous distributions. It is a
quantile.
1. Introduction random sample from a continuous population with the Let xc,)
being independent from F. Thus for r < s, P(x(,, G E, G xc,)) = rs - rrr. In practice one tries to find r and s such that rrr = 4~ = l- rs leading to an equal tail probability confidence interval with an approximate confidence coefficient y = 1 - (Y. However, if n is not very large the exact coefficient rr, - rrr may be far from the desired one. In the case of median Hettmansperger and Sheather (1986) solved this problem by interpolating the two confidence intervals which have the closest confidence coefficients from above and from below to y. In order to be specific, suppose that [xCk,, x(,_~+ r)] and x(,-k)] give confidence intervals for the median with confidence coefficients yk & y 2 yk+ r. 1x (kfl), Denote I=
(n -k)Z
Yk-Y *=
Yk-Yk+l’
k+(n-2k)Z’
the confidence coefficient y Then [(I - &+) + +k+l), (1 - h(,_,+,) + Ax(, _kJ has approximately for a wide range parent distributions as was numerically shown by Hettmansperger and Sheather (1986). Their argument leading to (1) was based on the Laplace distribution. Thus if the parent distribution is truly Laplace, then the approximation is exact. The objective of this paper is to give a mathematically sound basis for the finding of Hettmansperger and Sheather (19861, and moreover to generalize the method to give approximately distribution-free confidence intervals for any quantile. Correspondence to: Jukka Tampere, Finland. 0167-7152/92/$05.00
Nyblom,
Dept.
of Mathematical
0 1992 - El sevier Science
Publishers
Sciences/Statistics,
University
B.V. All rights reserved
of Tampere,
P.O.
Box 607, SF-33101
129
Volume
14, Number
2. Approximate
STATISTICS
2
confidence
Let us seek a formula
& PROBABILITY
27 May 1992
LETTERS
intervals
for A = A(@, p) such that with fixed p,
F(Sp < (1 - A)xc,, + Axc,+rJ
= P.
Then A($, p) leads to the interpolation coefficient in the upper bound. Consider the probability F((1-A)x(,j+Ax(,+,j>x)
coefficient
=F((l-A)(x(,,-x)
in the
lower
bound
+A(xcr+r,-x)
and
A(1 - ia,
p)
to the
>O).
Then Proposition 1 of Hettmansperger and Sheather (1986) applied to the variables x(r) -x, . . . , xc,) -x [an ordered sample from the population with the distribution function F,(.> = F(. + x>] gives F((1
- A)xc,, + Axc,+u >x)
= F(x(r+l)>x)
x-Ay/(l-A))]r[l-F(x+y)]“-‘~lf(x+y)
-n
dy, (2)
where f(x) = F’(x). When Y or IZ - r is moderately large the main from the vicinity of the origin. Expanding at y = 0 we get
contribution
in the integral
comes
(3)
rlogF(x-&)=rlogF(x)-&$$y+o(y), (n-r-1)
log(l-F(x+y))
=(n-r-l)log(l-F(x))-(n-r-1)
f(x) 1 -F(x)
(4)
Y +0(y).
Designate
m=m(x)
=
rA(1 -F(x)) (1 -A)F(x)
(5)
’
From (3) - (5) it follows that r,,,F(x-&)
+(n-r-1)
lo&l-F(x+y))
=rlogF(x)-mlog(l-F(x))+(n+m-r-l)log(l-F(x)) -(n+m-r-l)
0x1 l-F(x)
y+o(y)
=rlogF(x)-mlog(l-F(x))+(n+m-r-1)log(l-F(x+y))+o(y). When
neglecting
the o(y) term the integral
in (2) is approximately
equal to F(x)~(~
F(x)~(l-F(x))-m~m(l-F(~+y))“+m-r-lf(x+y)
130
dy=
-F(x))“-r
n+m-1
’
Volume
14, Number
Table 1 Confidence
2
intervals
Sample size
Quantile
11
0.25
14
0.25
30
0.10
STATISTICS
for population
quantiles
(nominal
& PROBABILITY
values
Confidence interval
Distribution
(X(1,. X(h)) (X(2,, X(S)) interpolated interpolated interpolated interpolated
LE’I-I-ERS
21 May 1992
in parentheses) Tail probabilities below the lower limit
above the upper limit
continuous continuous normal Laplace Cauchy uniform
0.0422 0.1971 0.0489 0.0482 0.0477 0.0497
(0.05) (0.05) (0.05) (0.05)
0.0343 0.1145 0.0501 0.0518 0.0543 0.0484
(X(1,~ X(7,) (X(2,. X(8,) interpolated interpolated interpolated interpolated
continuous continuous normal Laplace Cauchy uniform
0.0178 0.1010 0.0243 0.0237 0.0232 0.0252
(0.025) (0.025) (0.025) (0.025)
0.0383 0.0103 0.0259 0.0268 0.0276 0.0251
(X(,,~ X(6,) (X(2,? X(7,) interpolated interpolated interpolated interpolated
continuous continuous normal Laplace Cauchy uniform
0.0424 0.1837 0.0488 0.0484 0.0474 0.0502
(0.05) (0.05) (0.05) (0.05)
0.0732 0.0258 0.0514 0.0523 0.0548 0.0492
Coverage probability
(0.05) (0.05) (0.05) (0.05)
0.9234 0.6883 0.9010 0.9000 0.8980 0.9013
(0.90) (0.90) (0.90) (0.90)
(0.025) (0.025) (0.025) (0.025)
0.9439 0.8887 0.9498 0.9495 0.9492 0.9497
(0.95) (0.95) (0.95) (0.95)
(0.05) (0.05) (0.05) (0.05)
0.8844 0.8905 0.8998 0.8993 0.8978 0.9006
(0.90) (0.90) (0.90) (0.90)
which leads to the approximate formula F(x)r(l
( 1
q1 -+(r,+Ax(r+l,>x) =W(r+l)>X) -n “, l n-r m(x)
+n-r
P(xc,, >x) +
m(x)
-F(x))“-r +n --r
m(x) m(x) fn-r
fYxcr+l)>x),
(6) since
fYX(r+l)>x) - P(X(r,>x)
= (;)F(x)r(l
-F(x))“?
From (6) with (5) we immediately see that F((1 - h)x(,, + Axe,,,) > is approximately distribution-free. Putting the right hand side of (6) equal to p, replacing x by [* and solving A we find r(l -p)(rr+r
-P)
(n-I-)p(P-rr) Clearly A(rrr, p) = 0 and A(T~+~, p) = 1 as they should. The formula (1) appears, if p = r. Table 1 shows some numerical results which affirm the usefulness of interpolation. Note that the maximum discrepancy between the exact and nominal coverage probabilities is 0.0022. The figures in Table 1 are found by numerical integration when necessary. Reference Hettmansperger, T.P. and S.J. Sheather (1986), Confidence intervals based on interpolated order statistics, Statist. Probab. Lett. 4, 75-79. 131