Statistical Methodology 8 (2011) 462–467
Contents lists available at ScienceDirect
Statistical Methodology journal homepage: www.elsevier.com/locate/stamet
On simultaneous closeness probabilities of order statistics from odd sample sizes to the population median N. Balakrishnan a,b , J.P. Keating c,∗ a
McMaster University, Hamilton, Ontario, Canada L8S 4K1
b
King Saud University, Riyadh, Saudi Arabia
c
The University of Texas at San Antonio, San Antonio, TX 78249-0704, United States
article
info
Article history: Received 26 October 2010 Received in revised form 27 April 2011 Accepted 5 May 2011 Keywords: Order statistics Pitman closeness Simultaneous closeness Percentile Symmetry property Medians
abstract In this note, we present alternative derivations for the probability that an individual order statistic is closest to the target parameter among all order statistics from a complete random sample. This approach is simpler than the geometric arguments used earlier. We also provide a simple direct proof for the symmetry property of the simultaneous closeness probabilities among order statistics for the estimation of percentiles from a symmetric family. Finally, we offer an alternative simpler proof for the result that sample medians from larger odd sample sizes are Pitman closer to the population median than sample medians from smaller odd sample sizes. © 2011 Elsevier B.V. All rights reserved.
1. Introduction Recently, Balakrishnan et al. [2] derived expressions for the probability that an individual order statistic Xi:n is closer than any other order statistic to the true value of ξp , the 100pth percentile of the parent distribution. Their results simplified the geometric-based formulation of Fountain et al. [5] for the special case of order statistics. In exploratory data analysis, one often uses an individual order statistic as an estimate of a population quantile. While such estimates have traditionally been based on expected values of order statistics, this idea has been incorporated into Q–Q plots by Balakrishnan et al. [3] as an alternative way of obtaining robust plots for fitting data. Now, let us explain precisely what it means for one order statistic to be closer to a population parameter θ than all other order statistics. We define the simultaneous closeness probability (SCP) as in [4].
∗
Corresponding author. Tel.: +1 2104585370; fax: +1 2104586350. E-mail address:
[email protected] (J.P. Keating).
1572-3127/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.stamet.2011.05.002
N. Balakrishnan, J.P. Keating / Statistical Methodology 8 (2011) 462–467
463
Definition 1. The simultaneous closeness probability (SCP) of Xi:n , i ∈ {1, . . . , n}, among the order statistics, X1:n , . . . , Xn:n , in the estimation of a population parameter θ is
πi:n (θ ) = Pr |Xi:n − θ | < min |Xj:n − θ | . j,j̸=i
(1)
The results for πi:n (ξp ) developed by Balakrishnan et al. [2] followed the geometric structure using Voronoi regions detailed in [5]. In subsequent sections, we present an alternative approach for derivation of the SCP in (1). Use of the following lemma due to Iliopoulos and Balakrishnan [6], which pertains to the conditional independence of blocked ordered data, is essential. Lemma 1 (Iliopoulos and Balakrishnan [6]). Let X1:n < X2:n < · · · < Xn:n be the order statistics from a random sample of size n from a continuous population with cdf F (x) and pdf f (x). Let θ ∈ ℜ be a cut-point and D be the number of order statistics that are at most θ . Then conditional on D = d, 1. (X1:n , . . . , Xd:n ) and (Xd+1:n , . . . , Xn:n ) are independent; d
2. (X1:n , . . . , Xd:n ) = (Y1:d , . . . , Yd:d ), where {Yi:d } are the order statistics from a random sample of size d F (y) from the distribution F truncated on the right at θ , that is, from the distribution with cdf G(y) = F (θ ) f (y)
and pdf g (y) = F (θ) , for y ≤ θ ; d
3. (Xd+1:n , . . . , Xn:n ) = (Z1:n−d , . . . , Zn−d:n−d ), where {Zi:n−d } are the order statistics from a random sample of size n − d from the distribution F truncated on the left at θ , that is, from the distribution f (z ) F (z )−F (θ) with cdf H (z ) = 1−F (θ) and pdf h(z ) = 1−F (θ) , for z ≥ θ . Lemma 1 uses the simple binomial counting variable and it removes the need for crossing points of estimators as given in [8] for developing Karlin’s corollary (see Corollary 4.3.2, page 113). While Lemma 1 can be used to simplify results concerning Pitman closeness (also known in the literature as Pitman nearness), we focus here on the SCP as developed recently by Balakrishnan et al. [2]. 2. The alternative derivation of SCP Let X1:n , . . . , Xn:n be the order statistics obtained from a random sample of size n taken on a random variable, X . We assume that X has an absolutely continuous pdf, f (x), and cdf, F (x). Theorem 1 (Balakrishnan et al. [2]). Let X1:n , . . . , Xn:n be the order statistics obtained from a random sample of size n taken on a random variable, X , under the conditions given above with support ℜ. Then for i = 2, . . . , n − 1, the simultaneous closeness probability, πi:n (θ ), of Xi:n with respect to θ is given by
πi:n (θ ) =
(i − 1)!(n − i)! +
∞
∫
n!
(F (2θ − x))i−1 (1 − F (x))n−i f (x)dx
θ
θ
∫
n!
(i − 1)!(n − i)!
(1 − F (2θ − x))n−i (F (x))i−1 f (x)dx.
(2)
−∞
In the special case where i = 1, the SCP, π1:n (θ), of X1:n with respect to θ is given by
π1:n (θ ) = (1 − F (θ ))n + n
∞
∫ θ
(1 − F (x))n−1 f (2θ − x)dx.
(3)
In the special case where i = n, the SCP, πn:n (θ), of Xn:n with respect to θ is given by
πn:n (θ ) = (F (θ )) + n n
∫
θ
(F (x))n−1 f (2θ − x)dx.
(4)
−∞
Proof. We consider the SCP for the extreme order statistics first and then proceed to derive the SCP for all intermediate order statistics.
464
N. Balakrishnan, J.P. Keating / Statistical Methodology 8 (2011) 462–467
2.1. The SCP for the sample minimum For deriving the probability that X1:n is closest to θ , we note that there are only two possibilities: 1. D = 0, in which case X1:n is closest to θ with probability 1; 2. D = 1, in which case X1:n is closest to θ if θ − X1:n < X2:n − θ . Now, since D is a binomial random variable, B (n, F (θ )), we have
π1:n (θ ) =
n
(F (θ ))0 (1 − F (θ ))n n + (F (θ ))1 (1 − F (θ ))n−1 × Pr (θ − X1:n < X2:n − θ|D = 1) 0
1
= (1 − F (θ ))n + nF (θ ) (1 − F (θ ))n−1 n −2 ∫ ∞ 1 − F (x) f (x) F (θ ) − F (2θ − x) (n − 1) dx × F (θ ) 1 − F (θ ) 1 − F (θ ) θ ∫ ∞ = (1 − F (θ ))n + n (1 − F (x))n−1 f (2θ − x)dx.
(5)
θ
The result in Eq. (5) coincides with the expression given in Eq. (3).
2.2. The SCP for the sample maximum For deriving the probability that Xn:n is closest to θ , we note once again that there are only two possibilities: 1. D = n, in which case Xn:n is closest to θ with probability 1; 2. D = n − 1, in which case Xn:n is closest to θ if θ − Xn−1:n > Xn:n − θ . Following arguments similar to those for the sample minimum, we find
πn:n (θ ) =
n
(F (θ ))n (1 − F (θ ))0 n + (F (θ ))n−1 (1 − F (θ ))1 × Pr (θ − Xn−1:n > Xn:n − θ |D = n − 1) n−1 n
= (F (θ ))n + nF (θ )n−1 (1 − F (θ )) n −2 ∫ θ F (x) F (2θ − x) − F (θ ) f (x) × (n − 1) dx 1 − F (θ ) F (θ ) F (θ ) −∞ ∫ θ = (F (θ ))n + n (F (x))n−1 f (2θ − x)dx.
(6)
−∞
The result in Eq. (6) coincides with the expression given in Eq. (4).
2.3. The SCP for intermediate order statistics For deriving the probability that Xi:n is closest to θ , where 2 ≤ i ≤ n − 1, we observe yet again that there are only two possibilities: 1. D = i − 1, in which case Xi:n is closest to θ if θ − Xi−1:n > Xi:n − θ ; 2. D = i, in which case Xi:n is closest to θ if θ − Xi:n < Xi+1:n − θ .
N. Balakrishnan, J.P. Keating / Statistical Methodology 8 (2011) 462–467
465
Proceeding as before, we then obtain
πi:n (θ ) =
n
i−1 n
(F (θ ))i−1 (1 − F (θ ))n−i+1 × Pr (θ − Xi−1:n > Xi:n − θ |D = i − 1)
(F (θ ))i (1 − F (θ ))n−i × Pr (θ − Xi:n < Xi+1:n − θ |D = i) ] [∫ ∞ n! = (F (2θ − x))i−1 (1 − F (x))n−i f (x)dx (i − 1)!(n − i)! θ [∫ θ ] n! + (1 − F (2θ − x))n−i (F (x))i−1 f (x)dx . (i − 1)!(n − i)! −∞ +
i
(7)
These expressions are the same as those derived in Theorem 1 of Balakrishnan et al. [2] based on the geometric structure using Voronoi regions detailed in [5]. It is of interest to observe that this result implies that, between domain values for which Xi:n = θ and Xi+1:n = θ , one is comparing the minimum of the order statistics that exceed θ (which is unique) with the maximum of the order statistics that are less than θ (which is also unique). The fact that the counting variable D is binomial results in clearer derivations of the corresponding SCP. 2.4. The symmetry property We now tender a simple proof for Corollary 3 of Balakrishnan et al. [2] in which the connection between πi:n (ξp ) and πn−i+1:n (ξ1−p ) for symmetric distributions has been established. Theorem 2 (Symmetry Property). Without loss of any generality, let us assume that the pdf f (x) is symmetric about the origin. Let θ = ξp be the pth quantile of the distribution F (x). Then, Xi:n being the order statistic Pitman closest to ξp is equivalent to Xn−i+1:n being the order statistic Pitman closest to θ ∗ = ξ1−p = −ξp = −θ . Proof. Let us consider
πi:n (ξp ) =
∞
∫
n!
(i − 1)!(n − i)!
i−1
F (2ξp − x)
ξp
∫
n!
ξp
(1 − F (x))n−i f (x)dx
1 − F (2ξp − x)
n−i
(F (x))i−1 f (x)dx (i − 1)!(n − i)! −∞ ∫ −ξp i−1 n! = F (2ξp + u) (1 − F (−u))n−i f (−u)du (i − 1)!(n − i)! −∞ ∫ ∞ n−i n! + 1 − F (2ξp + u) (F (−u))i−1 f (−u)du (i − 1)!(n − i)! −ξp +
setting u = −x. Now, upon using the symmetry conditions that f (u) = f (−u) and F (−u) = 1 − F (u), we simply obtain
πi:n (ξp ) =
[∫
n!
(i − 1)!(n − i)! +
(i − 1)!(n − i)!
= πn−i+1:n (ξ1−p ),
i−1 1 − F (2ξ1−p − u) (F (u))n−i f (u)du
]
−∞
∫
n!
which is the required result.
ξ1−p
∞
ξ1−p
n−i F (2ξ1−p − u) (1 − F (u))i−1 f (u)du
466
N. Balakrishnan, J.P. Keating / Statistical Methodology 8 (2011) 462–467
3. Comparison of sample medians from odd sample sizes Iliopoulos and Balakrishnan [7] recently derived an exact expression for the probability that a sample median from an odd (say, 2m − 1) sample size is Pitman closer to the population median ξ1/2 than the sample median arising from another random sample with two additional observations (which will then be of size 2m + 1) in the case of symmetric distributions. Theorem 3. Let Y1:2m−1 , . . . , Y2m−1:2m−1 be the order statistics from 2m − 1 i.i.d. random variables from a symmetric population with pdf f (y) and cdf F (y), where ξ1/2 is the population median. Similarly, let X1:2n−1 , . . . , X2n−1:2n−1 be the order statistics from 2n − 1 i.i.d. random variables from the same population, where m > n. Then, Ym:2m−1 is Pitman closer to ξ1/2 than Xn:2n−1 is. Proof. The pdf of the sample median Ym:2m−1 is (see [1]) fm (y) =
(2m − 1)! f (y){F (y)}m−1 {1 − F (y)}m−1 , [(m − 1)!]2
−∞ < y < ∞.
(8)
Without loss of generality, we set ξ1/2 = 0 and we are then interested in the probability Pr {|Ym:2m−1 | < |Xn:2n−1 |} =
∫
∞
x
∫
fn (x)fm (y)dydx + −x
0
∫
0
−∞
−x
∫
fn (x)fm (y)dydx . x
Each integral can be shown to exceed 14 , with the result that the sum exceeds 12 , in the following way. Setting w = F (y) so that we have dw = f (y)dy, we obtain ∞
∫
∫
0
F (x)
(2m − 1)! m−1 m−1 fn (x)fm (y)dydx = w (1 − w) dw fn (x)dx 2 −x 0 F (−x) [(m − 1)!] ∫ ∞ = {IF (x) (m, m) − IF (−x) (m, m)} fn (x)dx, x
∞
∫
∫
0
where It (m, m) is the incomplete beta ratio with shape parameters m and m. Now, note that I 1 +s (m, m) − I 1 −s (m, m) > I 1 +s (n, n) − I 1 −s (n, n) whenever m > n since the beta pdf is symmetric 2
2
2
2
when the shape parameters are equal and is more concentrated about parameter increases. Thus, setting t = F (x), we obtain ∞
∫ 0
∫
x
fn (x)fm (y)dydx =
{It (m, m) − I1−t (m, m)}
1 2
−x
>
1
∫
1
∫
{It (n, n) − I1−t (n, n)}
1 2
1
∫ = 1 2
{2It (n, n) − 1}
1 2
when the common shape
(2n − 1)! n−1 t (1 − t )n−1 dt [(n − 1)!]2
(2n − 1)! n−1 t (1 − t )n−1 dt [(n − 1)!]2
(2n − 1)! n−1 t (1 − t )n−1 dt [(n − 1)!]2
1 = {It2 (n, n) − It (n, n)} |tt = =1/2 =
1 4
.
Similarly, the second integral exceeds 14 , thus yielding the required result.
4. Concluding remarks The results presented here provide alternative derivations of results obtained by Balakrishnan et al. [2,3]. These alternative derivations provide another perspective to researchers conducting research into the simultaneous closeness probabilities of several order statistics. The result established for the Pitman closeness of sample medians for odd sample sizes is likely to hold for even sample sizes
N. Balakrishnan, J.P. Keating / Statistical Methodology 8 (2011) 462–467
467
as well. However, the corresponding inequalities from two even sample sizes are quite complicated and need careful analysis. In dealing with odd sample sizes, we used the fact that the sample median was a unique order statistic rather than the average of two middle order statistics. For even sample sizes, the expressions for the Pitman closeness probabilities will involve four order statistics, two from each of two independent samples. We leave this problem as currently open. Finally it would be of interest to see whether the result corresponding to the population median could be extended to any population quantile of order p. In this case, the basic properties that we have used for the beta density do not naturally extend for any choice of p and therefore it would be of interest to see how the arguments could be modified to establish this result. We are at present looking into these problems. Acknowledgments We thank the referees, the Associate Editor and the Editor for their careful and insightful reading of an earlier version of this manuscript. References [1] B.C. Arnold, N. Balakrishnan, H.N. Nagaraja, A First Course in Order Statistics, in: Classics in Applied Mathematics, vol. 54, Society for Industrial and Applied Mathematics, Philadelphia, 2008. [2] N. Balakrishnan, K.F. Davies, J.P. Keating, R.L. Mason, Simultaneous closeness among order statistics to population quantiles, Journal of Statistical Planning and Inference 140 (2010) 2408–2415. [3] N. Balakrishnan, K.F. Davies, J.P. Keating, R.L. Mason, Correlation-type goodness-of-fit test for extreme value distribution based on simultaneous closeness, Communications in Statistics—Simulation & Computation 40 (2011) 1074–1095. [4] C.R. Blyth, Some probability paradoxes in choice from among random alternatives, Journal of the American Statistical Association 67 (1972) 366–381. [5] R.L. Fountain, J.P. Keating, H.B. Maynard, The simultaneous comparison of estimators, Mathematical Methods of Statistics 5 (1996) 187–198. [6] G. Iliopoulos, N. Balakrishnan, Conditional independence of blocked ordered data, Statistics & Probability Letters 79 (2009) 1008–1015. [7] G. Iliopoulos, N. Balakrishnan, An odd property of sample median from odd sample sizes, Statistical Methodology 7 (2010) 678–686. [8] J.P. Keating, R.L. Mason, P.K. Sen, Pitman’s Measure of Closeness: A Comparison of Statistical Estimators, Society for Industrial and Applied Mathematics, Philadelphia, 1993.