15 Prediction of order statistics

15 Prediction of order statistics

N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved, 1 l ~) Prediction of Order St...

1MB Sizes 33 Downloads 151 Views

N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved,

1 l ~)

Prediction of Order Statistics

Kenneth S. Kaminsky and Paul I. Nelson

1. Introduction

Let X = ( X ( 1 ) , X ( 2 ) , . . . ,X(m))' and Y = (I1(1), Y ( 2 ) , . . . , Y(n))' be the order statistics of two independent random samples from the same family of continuous probability density functions {f(xlO)}, where 0 is an unknown parameter vector. Our main goal is to describe procedures where, having observed some of the components of X, say X1 = (X(1),X(2),... ,X(r))' it is desired to predict functions of the remaining components of X, namely X2 = (X(r+l),X(r+2),... ,X(m))t, called the one sample problem, or of Y, called the two sample problem. Motivation for this type of prediction arises in life testing where X represents the ordered lifetimes of m components simultaneously put on test. If the test is stopped after the r th failure so that X1 represents the only available data, we have a type II censored sample. In the one sample problem it may be of interest, for example, to predict: (i) X(m), the time at which all the components will have failed, (ii) a sample quantile X(s) of X2, where s is the greatest integer in m2, 0 < 2 < 1, s > r, (iii) the mean failure time of the unobserved lifetimes in X2. In the two sample problem, it may also be of interest to predict such functions of Y as: (i) the range, (ii) quartiles, (iii) the smallest lifetime. Prediction of order statistics can also be used to detect outliers or a change in the model generating the data. See for example Balasooriya (1989). We will describe interval and point prediction. Much of the past work on prediction intervals concerns approximations to complicated distributions of pivotals on a case by case basis. The availability of high speed computing has somewhat diminished the need for such approximations. Accordingly, we will focus on this computational approach to constructing prediction intervals. Patel (1989) gives an extensive survey of prediction intervals in a variety of settings, including order statistics. While overlapping his review to some extent, we will for the most part complement it. Throughout, we use boldface to denote vectors and matrices and A' to denote the transpose of the matrix or vector A.

431

K. S. Kaminsky and P. L Nelson

432

2. Prediction preliminaries Since prediction of random variables has received less attention in the statistical literature than parameter estimation, we begin by giving a brief general description of point and interval prediction. Let U and W be vectors of random variables whose joint distribution (possibly independence) depends on unknown parameters 0. Having observed U = u, it is desired to predict T = T(W), some function of W. Let T = T(U) be a function of U used to denote a generic predictor of T. Good choices for i? may be defined relative to specification of either the loss L(i?, T) incurred when i~ is used to predict T or some inference principle such as maximum likelihood. When the former approach is used, L(i?, T) is typically some measure of distance between i? and T. An optimal choice for T is then made by finding (if possible) that function which minimizes E{L[I?(U), T(W)]} ,

(2.1)

where"E"denotes expectation over all joint realizations of U and W. The set A = A(u) computed from the observed value of U is called a 1 - 27 prediction region for T = T(W) if for all 0 in the parameter space:

Po(T E A(U)) = 1 - 27 •

(2.2)

This statement is to be interpreted in the following frequentist perspective. Let (Ui, Wi), i = 1,2,... ,M, be independent copies of (U, W) and let N(M) denote the number of these for which T / = Ti(Wi)EA(U,.) holds. Then as M---+ oe, N ( M ) / M ~ 1 - 2 7 . Unlike the typical case of parameter estimation where the true value of the parameter may never be known, an experimenter is often able to ascertain whether or not T lies in A(u). Thus, if an experimenter makes many forecasts where there is a real cost associated with making an incorrect prediction, it becomes important to control the ratio N ( M ) / M . To apply this general setting to the prediction of order statistics we associate the vector U with the elements of X1 = (X(~),X(2),... ,X(r))' I n the one sample problem we associate W with X2 (X(r+l),X(r+2),... ,X(m))' while in the two sample problem, we associate W with Y. Specifically, we have: =

U = X, = (Xllt,X(21,... ,X~rl)' , X2 W=

(2.3)

in the one sample problem, in the two sample problem .

In all cases we consider the function being predicted T = T(W) to be linear with generic form: T = Z~i W~ = i¢' W ,

(2.4)

where {~:i} are constants. Note that by taking all of the components of K except one equal to zero, predictions can be made for individual components of X2 and Y.

Prediction of order statistics

433

3. Assumptions and notation Most of the parametric work on prediction of order statistics assumes that the order statistics X and Y are based on random samples from a location scale family of continuous probability density functions (pdf's) of the form: f(xlO ) = (1/a)g((x -/z)/a)

,

(3.1)

where o- > 0 and the pdf g (known) generates a distribution with finite second moment. Some early papers assumed for convenience that one of these parameters was known. Except when/~ is the left endpoint of the support of the distribution, as is the case when g is a standard exponential distribution, this assumption is unrealistic and we will assume for the most part that both location and scale parameters are unknown. We will also describe the construction of nonparametric prediction intervals where it is only assumed that the pdf of the underlying observations is continuous. Recall that we partition X ' = (Xtl, XI2), where X 1 represents the first r observed order statistics. Denote the vectors of standardized order statistics Zx~, Zx2, and Z y by: gxi ---~ ( X i -]21i)/17 ,

i = 1,2,

(3.2) Zy = (Y-

t~lv)/a ,

where 11,12 and 1y are columns of ones of appropriate dimension. The Z ' s have known distributions generated by the pdf 9- We let c~ij be the expected value of the ith standardized order statistic of a sample of size j from (3.1) and express these as vectors:

~1 = E(Zxl) = (~l,m, ~2,m,.-., °~r,m)I, e2 = E(Zx2) = (er+l,m, Ctr+2,m,..., em,m)',

(3.3)

~y = E ( Z y ) = (~l,n, ~2,n,..., O~n,n)! • Partition the variance-covariance matrix of X as: Var(X) = a2Var(Zx)

0"2( VI'I -

Vl'2)

(3.4)

v211

¢y2 V X

and represent the covariance matrix of Y by: Var(Y) = a2Var(Zv) =- ~r2 Vv .

(3.5)

K. S. Kaminsky and P. I. Nelson

434

For s > r, let a2w~ denote the m - r dimensional column vector of covariances between X1 and X~, so that: C o v ( Xi,X(s) ) -~- ry20)s -- a2V! --

(3.6)

1 ,s

where Vl,s is the i th row of V1,2. The covariances given in Vx and Vr and the expected values in the ~'s specified above do not depend on unknown parameters. They can be obtained: (i) from tables in special cases, (ii) by averaging products of simulated standardized order statistics and appealing to the law of large numbers, (iii) by numerical integration. Note that X and Y are uncorrelated since they are independent. Prediction can also be carried out in a Bayesian setting. Let ~c(0) denote a prior distribution on the parameter 0, rc(0[xl) the posterior distribution of 0 given xl and f(xl, tl__0) be the joint pdf of (X1, T) conditional on 0. The predictive density of t is then given by:

f(tlxl) = / If(x1, tlO)/f(xl IO)]~(dOIxl) •

(3.7)

In the two sample case T and X1 are conditionally independent given 0 and (3.7) simplifies accordingly.

4. Point prediction When predicting future outcomes from past outcomes, whether in the context of order statistics or otherwise, prediction intervals and regions probably play a more important role than point prediction. Yet, just as there are times when we desire a point estimator of a parameter, accompanied, if possible, by an estimate of its precision, there are times when a point predictor of a future outcome may be preferred to an interval predictor. And when a point predictor is accompanied with an estimate of its precision (such as mean square error), it could be argued that little is lost in comparison with having intervals or regions. Point predictors of order statistics, and their errors are the topics of this section.

4.1. Linear prediction Using the notation established in Section 3, we assume that X~ has been observed, and that X2 and/or Y have yet to be observed (or are missing). Let f X(s)

W(s) = [ y(~)

in the one sample problem in the two sample problem

and W = ~"X2 LY

in the one sample problem in the two sample problem

(4.1)

Prediction o f order statistics

435

We will concentrate on the prediction of a single component g~s) of W, although we could equally well predict all of W, or some linear combination of its components. Hence, for point prediction we will drop the subscript s and simply let C o v ( X 1 ,X(s)) = COs = CO ~" ((DI, C O 2 , ' . . , (Dr) t. From Lloyd (1952), the BLUE estimators of # and a based on X1, and their variances and covariances are /~ = - ~ ' I F I X 1 ,

8 = -ltlflX1,

Var(/~) = (0(lall0~l)O-2/A,

Var(6) = ( l ' l a l l l l ) o ' 2 / A ,

(4.2)

Cov(~, 6-) = --(l'lOllel)oa/A , where ! r 1 = ~"~ll(ll~ 1 -- 0~tlll)~"~ll/A A = (ltlallll)(0Ctl~r-~ll0~l)

-- ( l t l a l l l l )

2

,

and Ar'~12 ~

~?x =

1?21 ~r'~22 ff = Vx 1

A linear predictor W(~) = ~'X1 is called the best linear unbiased predictor (BLUP) of W(,)if E(W(s)) = E ( ~ ) )

(4.3)

and E(I~(,) - E(g~s)) 2 is a minimum, where ~ is a vector of constants. Note that in the context of the model (3.2), two constraints are imposed on ~ by (4.3). Namely, that ~'ll = 1 and ~'~1 = cq,k, where k = m in the one sample case and k = n in the two sample case. Goldberger (1962) derived the best linear unbiased predictor of the regression parameters in the generalized linear model (i.e., the linear model with correlated errors). Kaminsky and Nelson (1975a) first applied Goldberger's result to the order statistics of model (3.2) for the prediction of the components of X2 and Y, or for linear combinations of X2 and Y. Combining the one and two sample problems, the BLUP of W(~) can be written X(s) = (~ + ffO~s,rn) -~- cot~e~ll(Xl - ~11 -- O'~1)

W(~) =

^ Y(~)

/~ + 6~,~

in the one sample problem, in the two sample problem,

(4.4)

where 2 _< r < s _< m in the one sample problem, and 2 < r < m; 1 < s < n in the two sample problem. The simpler two sample predictor results from the fact that, by our independence assumption, Cov(Xl, Y(~)) = 0. The mean square error (mse) of W-(~) is

436

K. S. Kaminsky and P. 1. Nelson

O'2{/)ss --¢.O'1t~11¢.0-]-Cll }

mse(l~(,))

i n t h e o n e sample case,

, 2 i t O-2 {~1~'-~1 -t- O~s,nll~'~l 1 -- 2cq,nllllal}/A

in the two sample case , (4.5) where cll = var{(1 - ~O'~llll)fi q- (C~s,rn (2)/~"~11~1)(}}/0-2. We point out that in the two sample case, the BLUP of Y(~)isjust the BLUE of E(Y(~)), and we call f'(~) the expected value predictor of Y(~). In the one sample case, similar simplifications occur to the BLUP of X(~) for a fairly wide class of underlying pdf's. To see what these simplifications are, we state some results from Kaminsky and Nelson (1975b). -

-

Regularity conditions. The underlying standardized pdf, 9, is such that the limits: lira c%~ = Pl;

lira es,m = P2 ,

m--+oo

tn--+oo

(4.6)

and mlim (m • vr~)

_

2,(1

- 22)

g(Pl) :7]~2)

(4.7)

'

where 0 < ~1 < ,~.2 < l, r = ((m -Jr- 1)~1), s = ((m + 1)22), Pi = G-I()~i), i = 1,2, and (a) denotes the closest positive integer to a. Conditions under which the above limits hold may be found in Blom (1958) and van Zwet (1964). We are now ready to state the theorem which characterizes the conditions under which the BLUP of X(~) simplifies to the BLUE of its expected value: exist,

THEOREM 4.1. Consider the order statistics of a random sample of size m from (3.1) and suppose that regularity conditions (4.6) and (4.7) are satisfied. Then the following are equivalent: (i) The (standardized) distribution of the random sample is a member of the family ~ = { G1, G2, G3 }, where % ( x ) = x c,

0
G2(x) = ex,

x<0

C3(x)=(-x)%

c>0,

,

x<-l,

e>0.

(ii) For all r,s and m,2 _< r < s_< m, the BLUP of X(s) and the BLUE of E(X(s)) are identical (i.e,, ~o'~211(X1 - f i l l -6-~1) -= 0). (iii) For all r and m(2 < r < m), the BLUE of E(X(s)) is X(~) (i.e., k(s) = Another characterization based on the form of the BLUP of X(s) is contained in the following theorem:

437

Prediction of order statistics

THEOREM 4.2. Assume regularity conditions (4.6) and (4.7) are satisfied. Then (a) The B L U P o f X(s) does not depend (functionally) considered known or on/2 if it is not if and only if the distribution is G4(x) = 1 - e -x, x > 0. (b) The B L U P of X(s) does not depend (functionally) considered known or on 6. if it is not if and only if dized distribution is either Gs(x) = 1 - x -c, x > 1, c > -1 0,

on ~t if this parameter is underlying standardized on a if this parameter is the underlying standar0 or G 6 ( x ) = 1 - ( - x ) c,

F o r the proofs of Theorems 4.1 and 4.2, the reader is referred to Kaminsky and Nelson (1975b). The proofs are based on results of Watson (1972). It is not difficult to show, with the help of Malik (1966, 1967), that for any of the distributions in the family {Gi}, i = 1 , 2 , . . . , 6, we have ~o'f211 -= ( 0 , 0 , . . . ,0, q) ,

(4.8)

where q = coi/vir, i = 1, 2 , . . . , r. This means that for any member of this family, the B L U P of X(s) takes the form X(s) = q "X(r) + (1 - q)./2 + (C~,,m-- q" Ctr,,,)6. •

(4.9)

In fact, in examining (4.9), Theorems 4.1 and 4.2, we remark further, that If G E ~ , then )((s)=/2+6.C~,m, if G = G 4 , then X(~)=X(r)+(Ct~,m--er,m)6., (4.10) if G = G5 or G6, then )?(~) = (C~,m/er,m)"X(r) + (1 - c~,,n/c~r,m)'/2 • A linear predictor W ( s ) = = ~1X1 is called the best linear invariant predictor (BLIP) of W(~) if E(W(~) - W(s))2 is a minimum in the class of linear predictors of W(s) whose mean square error is proportional to if2. This condition imposes the constraint ~'ll = 1 on the coefficient vector ~. Recall that if in addition ~'~l = e~,k, where k = m in the one sample case and k = n in the two sample case, ~X1 is an unbiased predictor o f W(~). Thus, the mean square error of the BLIP of W(s) never exceeds that of the B L U P . On the other hand, unbiasedness is lost for the BLIP. Kaminsky, M a n n and Nelson (1975) derive the BLIP of the components of X2 in the model (3.2). The B L I P W(~) is

Y(s) = Y(s) =

C12 61 -}- c22

in the one sample problem,

d 1 2 6.

in the two sample problem,

(4.11)

wherec12 = Coy{& (1 - oJf21111)/2 + (e~,m - e~'g211~l)/a 2, c22 = d22 = Var(6.)/a 2, and d12 = Coy{6.,/2 + 7~,m6.}/~r2. The mean square error of if@) is

mse(W(,)) =

{ mse(2(~)) - c22a2/(1 + c22)

in the one sample case,

mse(Y(s)) - d22a2/(1 + d22)

in the two sample c a s e . (4.12)

438

K. S. Kaminsky and P. I. Nelson

These quantities can be found with the help of (4.2). Kaminsky and Nelson (1975a) derive the asymptotically best linear unbiased predictor (ABLUP) of X(~) based on k sample quantiles. They find that the A B L U P of X(~) always takes a form similar to (4.12), where the quantities q, fi, 6-, c~r,m, and c~,,~ are replaced by their asymptotic counterparts. Kaminsky and Rhodin (1978) find that the prediction information contained in the largest observed order statistics X(r) in large samples is relatively high over a wide range of failure distributions. Nagaraja (1984) obtains the A B L U P and the asymptotically best linear invariant predictor (ABLIP) of X(s) based upon X(I),X(2),... ,X(r), where r and s are both small relative to the sample size, m. This is done as follows: Assume that appropriately normalized, X(1) converges in law to nondegenerate distribution function G. That is, there are sequences of constants cm and dm such that P{(X(a) - Cm)/dm < x} = 1 - Fm(cm q- dmx) ~ G(x)

(4.14)

as m ---+oc. Then, the order statistics X0),X(2),... ,X(r) behave (approximately) as the first r upper record values from G(# + ox), where /~ =em and o- = din. The problem of predicting X(~) then reduces to predicting the s th upper record value from the first r record values from G, a problem already studied by others. Nagaraja (1986) used two criteria as alternatives to mean square error to study prediction error in predicting X(~) from X(1),X(2),... ,X(r), in sampling from the two-parameter exponential distribution. These criteria were probability of nearness (PN) and probability of concentration (PC). A predictor X (1)(~) has higher probability of nearness (PN) to X(s) than predictor X ~ ) if P{ X0)-(s) X(~) < X ~ )-X(~) } > 0.5 .

(4.15)

We1 say that X({ ) is preferable to X!2, ) in the PN sense if (4.15) holds. A predictor ts) ~s) () X(s) has higher probability of concentration (PC)around X(~) than predictor X~a)) if

P{X(1)-X(~)(,) < c } _>P{ X ~ ) - X ( , )


(4.16)

for all c > 0 with strict inequality for at least one e. When (4.16) holds, we say that ~1) is preferable to X~2) in the PC sense. Nagaraja's (1986) study of prediction error via these criteria and mean square error was restricted to the twoparameter exponential distribution. It would be useful to expand such a study to other failure distributions. W

4.2. Other point predictors Kaminsky and Rhodin (1985) apply the principle of maximum likelihood to the prediction of order statistics. We will once again assume that we want to predict a single component, f X(s) W(s) = ~ y(~)

in the one sample problem in the two sample problem

Predictionof orderstatistics

439

where the sampling structure is the same as in Section 4.1 except that we no longer restrict sampling to location and scale families. Specifically, consider the joint pdf, f(xl, w(~);0), Of Xl and W(~)where 0 is an unknown vector of parameters. Viewed as a function of W(~) and 0, define

L(w(,), O;Xl) = f(xl, W(s); O)

(4.17)

to be the predictive likelihood function (PLF) of w(~) and 0. Let V~) and 0* be statistics depending on X1 for which

L(w(s), O*;xl) = sup f(xl, w(~); O) .

(4.18)

(w(s1,0) We call ~]) the maximum likelihood predictor (MLP) of W(~)and 0* the predictive maximum likelihood estimator (PMLE) of 0. Kaminsky and Rhodin (1985) obtain some uniqueness results and derive the MLP's for several distributions. Raqab and Nagaraja (1992) prove that in the one sample case with 0 known, the MLP is an unbiased predictor if the mode equals the mean in the conditional distribution of X(~) given X(r). This condition does not hold for the one parameter exponential distribution where the MLP is negatively biased. Raqab and Nagaraja (1992) also show that when X(r) and X(~) are sample quantiles, the mean square prediction error of the MLP based on samples from a one parameter exponential distribution or a uniform distribution, both with known scale parameter, converges to zero as the sample size m ~ ec. However, general results for maximum likelihood prediction of order statistics rivaling results for maximum likelihood estimation of parameters do not exist at this time. Takada (1991) develops the general form of a predictor 6(W(~)) based on a sample from a location-scale family which is better than both the BLUP and BLIP according to the PN criterion given in (4.15). The predictor 6(W(s)) is median unbiased in the sense that P(6(W(s)) _< W(~))= P(6(W(~)) >__ W(s)) for all values of the location and scale parameters (/1, a). Takada (1991) computes 6(W(s)) for samples from the two parameter exponential distribution. Raqab and Nagaraja (1992) also define a predictor t/(W(s)) to be the median of the conditional distribution of X(~) given X(r), which typically depends on unknown parameters. Let 0 (W(~)) denote this predictor with parameters replaced by estimators. Raqab and Nagaraja (1992) derive 0(W(~)) for the one parameter exponential and uniform distributions and show by a small scale simulation that it performs well in terms of rose. Adatia and Chan (1982) use adaptive methods to estimate the unknown parameters and predict future order statistics in sampling from the Weibull distribution. Their procedures assume an unknown scale parameter and a shape parameter which is also unknown, but assumed to lie in a restricted interval. Balasooriya and Chan (1983) and Balasooriya (1987) used the cross-validatory method of Stone (1974) and Geisser (1975) to predict future order statistics in samples from Weibull distributions and gamma distributions, respectively.

440

K. S. K a m i n s k y and P. I. Nelson

5. I n t e r v a l p r e d i c t i o n

5.1. Intervals based on pivotals

Suppose in the general setting of Section 2 we want to predict T = T(IV) from i~(U) and that Q = ~b(T, i~) is a pivotal. In the parametric case this means that for data generated by a p d f f a s given in (3.1), Q has a distribution free of (/~, ~) and in the nonparametric case the distribution of Q is the same for all continuous densities. In either case, a 1 - 27 prediction region for T is obtained by finding a set B that contains Q with probability 1 - 27 and then inverting:

P(Q E B)

1 - 27 =

(5.1)

= P(T E A(U)) ,

yielding A (U) as a 1 - 27 prediction region. The concept of invariance allows this procedure to be used for location-scale families. See appendix G of Lawless (1982) for a concise discussion of this concept. The estimators (]~(XI) , (7(XI)) computed from X1 are called equivariant if for all constants e and d, d > 0, fi(cl + d X l ) = c + d ~ ( X l ) ,

(5.2)

ff(cl + dX1) = d~(X1) . BLUE's, maximum likelihood estimators and Mann's (1969) best linear invariant estimators (BLIE's) based on samples from (3.1) are equivariant. We consider the problem of predicting linear combinations of unobserved random variables so that, using K to denote a generic vector of constants, T = ~¢'X2 in the one sample problem and T = K' Y in the two sample problem. For any equivariant estimators (/~, ~), the following are pivotals: =

-

=



(5.3)

Let 2~ = a'~X1 be a linear unbiased predictor of X~, r < s _< m, and I~ = b'~X1 a linear unbiased predictor of Y~, 1 < s < n, and ~ an equivariant estimator of a. Form arrays of predicted values and predicting coefficients denoted by: 22 = (X(r+l),k(r+2),. •. , X^( m ) ) ',

= (]~(1), Y(2),'", Y(n))', A = (ar+l,ar+2,... ,am) , B = (bl,b2,... ,bn) .

(5.4)

We obtain from (3.2) and (5.3) that the following are pivotals: Qx,,, = •'(X2 - X2)/# =

2,

'(zx2

-

A'Z.,),

Oy,,, = ~ ' ( Y - ~ ) / ~ =

e2,

'(zy -

gz.)

(5.5)

Prediction of order statistics

441

Let Oxa. and QaY)¢ represent the 100 ~ percentage points of the distributions of Q~. and Qy, respectively:

e ( o ~ . _< OL) = a, P(O..

(5.6)

<_ 0 ~ . ) = a .

We again note that for a given location-scale family the percentage points Qxa. and Qya. can easily be found by simulating data from the standardized pdf g(x) as defined in 3.1 and computing Q~. and Qy#. Iterate this procedure a large number of times and invert the empirical distribution functions to estimate the desired percentage points. Some authors, eg see Lawless (1971) and Likeg (1974), have used other versions of Q~. and Qy. of the form kl(Xl)Qx.+k2(X1) and k3(YI)Q>~ + k4(Y), where {kj} are random variables free of unknown parameters. In either case, level 1 - 2 7 prediction intervals are then given by: For K'X2 [dX2 + #Q~., For K'Y [K'Y+ 6Q~.,

~¢'22+ #QISI ,

~¢'Y+ #Q1ySI .

(5.7)

(5.8)

Let [Lx(XI), Ux(Xl)] and [Ly(X1), Uy(Xl)] denote the intervals computed from X1 given in (5.7) and (5.8) respectively. One sided lower 1 - 7 prediction intervals corresponding to the two-sided intervals above are:

[Lx(X1),oo),

[Ly(X1),oc) .

(5.9)

5.2. Intervals based on best linear predictors Optimality regarding the choice of which unbiased linear predictors {Xs} and {Ys} and to use in (5.7) and (5.8) has not been established in general. However, the BLUP's provide a reasonable basis on which to make this choice. From (4.4) they have the form:

2(,/(~, a) = ~ + ~,,mO + o ~ V ; I ( X l -- f a -- ~la),

(5.10)

E,)(~, a) -- ~ + ~,,na, where fi and 6- are the BLUE's of # and a. Using any equivariant estimators/~ and # , pivotals based on the form of the predictors for individual order statistics in (5.10) are (using (5.3)) given by:

442

K. S. K a m i n s k y and P. I. Nelson

Qx,,(~, 6) = (xl, I - k ( ~ , 6))/6 = ~'I2(ZX, s -- I - t - - ¢ O ' s V x 1 ( l x - } - - ~ l ) ) --

, O~s. m

--

!

--1

~1(1 -]-OJsV X )

-

(5.11)

t V-1~1~ (.0 s

Qy,,(~, #) = (Y(s) - Y,(~, 6))16 = ~ J 2 ( Z Y , , -I- O~s,n) -

~Yl - O~s,n ,

where Zx, s = (X(,) - # ) / a and Zr,, = (Y(,) - l~)/a. Expression (5.11) verifies our statement that these functions are distributed free of/~ and a. Here, letting Q~,~(~, 6) = Z~c, Qx,s([t, 6) and Qy,~([z, 6) = E~c, Qy,,(ft, 6), prediction intervals constructed from (5.6) and (5.11) parallel those in (5.7) and

(5.8): For

r'X2

[r'22(/~, 6) + aQ~x,,, - ~ For r ~Y [r'lz(/~,6) +#@,K,

r'22(/~,~)+6Ox[~ ~] ,

(5.12)

K'Y(/~,#)+6Qy1-7] .

(5.13)

5.3. The exponential distribution Some analytic results for the above procedure have been obtained for the exponential model f(x[#, ~) = ( l / a ) exp((x - l~)/a)) where x > # and f(x[#, a) = 0 elsewhere. When # is known to equal zero, these results are derived from the fact that the scaled spacings { 2 ( m - i + 1)(X(i)-X(i_l))/a} are digtributed as independent chi-square variates. Let Sr = Y']ir=i X(i) + (m - r)X(~), r times the role of a. For predicting X(~) from X1 with /~ known and taken as zero Lawless (1971) derived the distribution of the pivotal

R(X1 ,Xs) = (X(,) - X(r))/S~

(5.14)

as:

P(R(xl,x,)

>_ t) = ( r / ~ ( s - r , m - s +

rl(

1)) ~

i=0 × [l+(m-s+i+l)t]-r/[m-s+i+l

)

s--r-- 1 (--1)' i

I ,

(5.15)

for all t > 0, where B(a,b) = ( a - 1)!(b- 1 ) ! / ( a + b 1)!. Percentage points {Ra (X1, X(s))}, 0 < 6 < 1, of the distribution given in (5.15) can be approximated by a Newton-Raphson iteration, yielding a 1 - 27 prediction interval for X(s) of the form:

[x(r~ + R,(xl,X(s~)Sr, x(r~ + Rl='(x,,x(,~)sr]

.

(5.16)

Prediction of order statistics

443

Kaminsky and Nelson (1974) show that the percentage points of the distribution given in (5.15) can be approximated by scaled percentage points of an appropriate F distribution. This F-approximation follows from using a Satterthwaite type approximation to the numerator and denominator of R(X1,X(~)). They show that this approximation can also be used when the prediction is based on a subset of X1. An approximate 1 - 2~ prediction interval for X(~) based on X1 is given by: IX(r) + A1F~(a,b)a, X(r) + A,F0 r)(a,b)6-I ,

(5.17)

m where a = 2AZ/A2, Ai = ~j:r+l(m - j + 1)-i,i =. 1,2;b = 2r, F~(a,b) is the 1006 percentage point of an F-distribution with a and b degrees of freedom and r~ is the BLUE of a based on X1. Like~ (1974) extended Lawless's (1971) result to an exponential family with unknown location parameter #. However, the exact results are quite complicated and given implicitly. Instead, we recommend using simulation as outlined above. Lawless (1977) gives explicit formulas for prediction intervals for Y(s) in the two sample problem in the same setting. Abu-Salih et al. (1987) generalize prediction from the one parameter exponential to samples from a mixture of exponential pdf's of the form f(xlal, o-2, fi) = fi(1/al) exp(-x/al) + (1 - fl)(1/a2) exp(-x/a2). The mixing proportion fl is assumed known. They derive an exact prediction interval for X(~) when al/a2 is known. Their formulas are fairly complicated. Their intervals can also be obtained through simulation.

5.4. Optimality Takada (1985, 1993) obtains some optimality results in the one sample problem for the one sided version of the prediction interval given in (5.16) based on samples from an exponential distribution with known location parameter. Consider the class C(1 - ~) of lower 1 - y prediction intervals for X(s) of the type given in (5.9) which are invariant in the sense that for any positive constant c, Lx(cX1) = cLx(X1). Takada (1985) showed that the conditional mean length E(X(~.) - Lx(XI))[X(s) >_Lx(XI)) is minimized over C(1 - y) for all a by the lower bound of Lawless's (1971) interval. In the same setting, Takada (1993) also shows that 6(X1) = X(r)+ RT(X1,X(~)), the lower bound of Lawless's (1971) interval as given in (5.16), minimizes P(X(~) > 6(X1) + a) for all positive constants a , for all values of the scale parameter a, among all invariant lower 1 - 7 prediction limits 6(X1). Takada (1993) calls such optimal intervals uniformly most accurate equivariant. This type of optimality minimizes the probability that the value being predicted is more than any specified distance above the lower endpoint of the prediction interval.

5.5. Adaptive and distribution free intervals Suppose that the exact form of the standardized density given in (3.1) is not known but we are willing to assert that it lies in some specified finite collection G

444

K. S. Kaminsky and P. L Nelson

of pdf's. In such a situation, Ogunyemi and Nelson (1996) proposed a two stage procedure. In the first stage X1, the available data, is used to select a density g from G and the same data are then used in the second stage to construct prediction intervals from (5.12) and (5.13) via simulation. This procedure can be used for both the one and two sample problems. Fully nonparametric (for any continuous pdf) prediction intervals in the two sample problem for individual components Y(~) were given by Fligner and Wolfe (1976). They showed using the probability integral transform and basic properties of uniform order statistics that for 1 < i < j < m ,

P(x(,) < Y(.) < x )l = i=1

-A.

m

i

"

m

(5.18)

Thus, (X(i),X(])) provides a 100A% prediction interval for Y(~). For samples from a discrete distribution the right hand side of (5.18) provides a lower bound on the coverage probability. Fligner and Wolfe (1979) for m and n large and m / ( m + n) not close to 1 approximate the coverage probability A in (5.18) in terms of the standard normal C D F • by ~(Aj) - @(Ai), where Ak = [(m(n + 2))°5(n + 1)/ (s(n - s + 1)(n + m + 1))°5]((k - 0.5)/m - s/(n + 1)), k = i,j. These nonparametric intervals are easy to use and perform well in predicting middle sample quantiles. However, if m and n are very different, the possible levels of coverage are very limited. For example, if s is close to n and m/n is small, the coverage rate in (5.18) will be close to zero. See Patel (1989) for a further discussion of nonparametric prediction intervals.

5.6. Bayesian prediction Specification of a prior on the unknown parameter 0 allows the derivation of a predictive distribution on the quantity being predicted as given in (3.7). The frequentist interpretation of prediction regions given in Section 2 does not apply to probabilities obtained from this distribution and inferences drawn from it can be highly dependent on the prior chosen. The use of what are called noninformative priors provides some level of objectivity. See Geisser (1993) for a full treatment of the application of the Bayesian paradigm to prediction. Dunsmore(1974) applied the predictive distribution to life testing and proposed constructing a 1 - 27 highest posterior density region for T of the form:

A = {t;f(t[xl) > b} ,

(5.19)

where the constant b is chosen by the requirement that f A f ( d t l x l ) = 1 - 2 . 7. Dunsmore derives these regions for samples from the one and two parameter exponential distribution with a variety of priors. From the Bayesian perspective the region given in (5.19) would be interpreted as a 1 - 27 prediction region for T. In some cases, with noninformative priors Dunsmore's regions turn out to be identical to frequentist prediction regions.

Prediction of order statistics

445

Calabria and Pulcini (1994) consider the two sample problem where the data are generated by an inverse Weibull distribution. Specifically, 1/X has a Weibull distribution with parameters c~ and ft. They derive the predictive distribution of Y(s) for a noninformative prior of the form ~(~, fl) = c/aft, ~ > 0, fi > 0. They also use a log inverse gamma prior on P(X > t), for some specified t. This is converted into a conditional prior on ~ given fl and finally into a joint prior by placing a noninformative prior on ft. Lingappaiah (1983) placed a normal prior on # and an independent noninformative prior on ~r2 when observations are from a normal distribution with mean # and variance 0-2. He derived the predictive distribution of Y(s). Lingappaiah (1978) obtained predictive distributions for future order statistics based on several random samples from a gamma family with pdf f(xl~, ~) = x ~-1 exp(-x/a)/F(~)a ~, x > 0, with shape parameter c~being a known integer, by placing a gamma prior on 1/o- ~ 0. There is an initial sample from which all the data are used to construct a posterior for 0. He also supposes that p additional samples resulting in selected order statistics {X(i),ki} are available, where {X(i), ki}, is the ki th order statistic from sample i, 1 < i <_p. The posterior for 0 is successively updated by using it as a prior with the next sample. Suppose now it is desired to predict Y(s), the S th order statistic obtained from a future random sample from the same underlying gamma distribution. Lingappaiah (1978) obtains the predictive distribution of Y(~) by using the posterior of 0 as a prior in the expression given in (3.7). The formulas are quite complicated unless all the order statistics are minima. Lingappaiah (1978) shows that the variance of the predictive distribution decreases as p increases and more data becomes available. Lingappaiah (1986) follows a similar program for samples from the power family of pdf's of the form:

f(xlO)=Ox °-',

0
1,

0>0

,

(5.20)

with an exponential prior on 0. He also allows censoring of both the smallest and largest order statistics and obtains predictive distribution for ratios of the form Y(n s2+l)/Y(sl+l) • Predictive distributions are also discussed below in the context of samples that allow for outliers and a shift in models.

5.7. Multiple future samples Let X, Y1, Y2,..., Yp, be vectors of order statistics obtained from independent random samples of sizes m, nl,nz,...,np from a continuous pdf. Use Y(i,j) to denote the jth component of Yi,j = 1 , 2 , . . . , n i , i = 1 , 2 , . . . , p . Given indices {qi, qi <_ni}, based on X it is desired to construct lower prediction intervals for {Y(i,m+q~+l)} of the form {Ii ~- [Lg(X), ec),i = 1 , 2 , . . . ,p} so that the probability that all p intervals are simultaneously correct is 1 - 7 • Note that if the i th interval actually contains Y(i,ni-q~+l), then at least qi of the components of Yg lie in/~. In this setting, Chou and Owen (1986a) give a complex distribution free expression for the probability of simultaneous coverage of intervals of the form

K. S. Kaminsky and P. L Nelson

446

{ [Xq, oe), i = 1 , 2 , . . . ,p} in terms of the multivariate hypergeometric distribution. A trial and error process must be used to attain a value close to the desired 1 - 7, if this is possible. If qi > ni, for some i, this procedure cannot be used. Chou (1988) assumes that the observations are selected from a one parameter exponential distribution with unknown scale parameter a. The functions {Y(i,n,-qi+l)/~} are pivotals, where 2 is the mean of X. Thus,the joint coverage probability P(t) of intervals of the form {I/(t) = It,2, ee)} does not depend on a. Chou (1988) gives an explicit expression for the joint coverage of the intervals {Ii(t)} as:

jl =ql

j2=q2

jp=np

x (-1)i[1 + ( J + i ) t / m ] -m , where K = ~/P=I ni,J = }-'f=lJi. P(t) is a decreasing function of t with P(0) = 1. Therefore, there is a unique solution call, it to, to the equation P(t) = 1 - 7 , allowing iterative construction of simultaneous 1 - 7 intervals. Chou (1988) shows that P(t) >_ (1 + Kt/m) -m. This implies that to >_ (m/K)((1 - 7 ) -1/m - 1). Chou and Owen (1986b) obtained similar results for samples from a normal distribution with unknown mean # and standard deviation a. Let # denote the unbiased estimate of standard deviation computed from X. Here, {(~-y(i,, q+l))/6} are pivotals leading to intervals of the form { [ 2 - t 6 , cc)}. Chou and Owen (1986) table values of t for p = 2 , 1 - y = 0 . 9 0 , 0.95,nl=nz=n=20,80(10),ql =qz=2, ql=q2=3, m=2, 15(1), 20, 25, 30, 40, 50, oc. Values for t under different settings and for different location scale families can be found by simulation.

' 5.8. Outliers and model shifts Although robustness is potentially an important issue for prediction for order statistics, relatively little work has been done on this problem. Investigating situations where the underlying data are not the realizations of identically distributed random variables is a good place to start. Observations may not have the same distribution because of a systematic shift in the model or the occurrence of outliers. Lingappaiah (1991 a) introduced a particularly tractable version of a shift in model scenario. Let O = {Xi, i = 1 , 2 , . . . , N} be independent random variables with one parameter exponential distributions. Partition the components of O into k successive blocks of random variables where the ith block consists of ni elements, each having mean io-, o- unknown, }-~=1 ni = N. Let Y be independent of O and consist of the order statistics obtained from a random sample of sizes n from a one parameter exponential distribution with mean ka. The goal is to construct a prediction interval for Y(s). Lingappaiah (1991a) places an exponential prior with mean 1 on ( l / a ) and obtains the predictive distribution H(u, s) = P(Y(s) > ulO ) in terms of the quantity S = 1 + Y'~=I iSi, where S,- is the sum of the observations in the ith block, i--- 1,2,... ,k. For example, H(u, 1) = [S/(S + nku)] N+I, so that a

Prediction of order statistics

447

1 - 7 lower predictive interval for Y(1) is given by [u0, ec), where u0 is the solution to H(uo, 1) = 1 - ?. The cases where s > 1 are more complicated but can be handled similarly. Lingappaiah (1985) allowed a more general shift model and used the posterior from previous stages to serve as the prior for the next stage. The presence of outliers can cause serious distortions for both estimation and prediction. Dixit (1994) considered the possibility that the available data contain some outliers. Specifically, Let X = ( X ( I ) , X ( 2 ) , . . . , X ( m ) ) t and Y=(Y(1), Y(2), - • •, Y(n))~ be collections of order statistics constructed by sorting independent random variables {X/} and {Yj} constructed as follows. Both {X/} and {Yj} have distributions in the Weibull family with pdf's of the form:

f(x[O, 6,fl) = fiO6x ~-1 exp[-6Ox~],

x > 0 .

(5.22)

The {X~} are distributed independently with 6 = 1. The {Yj} are independent with k components having pdf with 6 ¢ 1. These k random variables represent outliers. The remaining n - k components have 6 = 1. It is not known which components are outliers. But, the values of 6 and k are assumed known. Dixit (1994) places a gamma prior on 0 of the form:

n(Oia, h) = h(Oh) a-1 e x p ( - O h ) / F ( a ) ,

0> 0 ,

(5.23)

where a and h are specified hyperparameters. The goal here is to construct a predictive interval for the future order statistic Y(s), which may or may not be an outlier. Since the estimator 0 = ~i-1X].~ + (m - r)X/~, the total time on test of the/3 th power of the available data, is sufficient for 0, the predictive interval may conveniently based on 0. Dixit (1994) obtains an expression for H(u,s) =_P(Y(s)> ulO), the predictive probability that Y(~) exceeds u. Things simplify when predicting Y(1). An upper 1 - 7 predictive interval for Y(1) is given by (0, b), where b = [(h + O)/(6k + n - k)l[? -1/(a+r) - 1]. Dixit (1994) also extends these results to situations where several samples are available. In related work, Lingappaiah (1989a) allowed a single outlier with mean (1/a) + 6 in samples from a one parameter exponential distribution with mean a. Using a gamma prior on 1/a he obtained the predictive distribution for X(r) in terms of confluent hypergeometric functions. Also see Lingappaiah (1989c) for predictive distributions of maxima and minima in the one and multiple sample problems based on data from an exponential distribution with outliers. Lingappaiah (1989b) considered samples from a generalized logistic distribution with pdf of the form:

f(x[b, 6) = ce-X/(1 + e-X) c+1 ,

(5.24)

where c = b6 or b + 6. In both cases c ¢ b corresponds to an outlier with 6 a known value. A gamma prior is placed on b and the predictive distribution for Y(s) based on an independent random sample from (5.24) with c = b is obtained. Lingappaiah (1991b) derived the distribution of X(s)-X(r) in samples from a gamma distribution with shape parameter b, both shape and scale parameters assumed known, in the presence of a single outlier with shape parameter shifted a

448

K. S. Kaminsky and P. L Nelson

k n o w n a m o u n t , b + 6. T h i s d i s t r i b u t i o n c a n be u s e d to c o n s t r u c t a p r e d i c t i o n i n t e r v a l f o r X(s) b a s e d o n X(r).

6. Concluding remarks M u c h h a s b e e n l e a r n e d a b o u t the p r e d i c t i o n o f o r d e r statistics in the p a s t 25 years. L i n e a r p o i n t p r e d i c t o r s a n d p r e d i c t i o n i n t e r v a l s b a s e d o n s a m p l e s f r o m l o c a t i o n - s c a l e f a m i l i e s h a v e b e e n e x t e n s i v e l y studied. R e l a t i v e l y little is k n o w n about prediction based on samples from more general families of distributions, n o n l i n e a r p r e d i c t i o n a n d o p t i m a l i t y . F u t u r e r e s e a r c h in these a n d o t h e r a r e a s will u n d o u b t e d l y e x p a n d o u r k n o w l e d g e o f this i n t e r e s t i n g p r o b l e m .

References Abu-Salih, M. S., M. S. Ali Khan and K. Husein (1987). Prediction intervals of order statistics for the mixture of two exponential distributions. Aligarh. or. Statist. 7, 11 22. Adatia, A. and L. K. Chan (1982). Robust procedures for estimating the scale parameter and predicting future order statistics of the Weibull distribution. IEEE Trans. Reliability R-31, 5, 491-499. Balasooriya, Uditha (1987). A comparison of the prediction of future order statistics for the 2parameter gamma distribution. IEEE Transactions on Reliability R-36(5), 591 594. Balasooriya, Uditha (1989). Detection of outliers in the exponential distribution based on prediction. Commun. Statist. - Theory Meth. 711-720. Balasooriya, Uditha and Chart, K. Lai (1983). The prediction of future order statistics in the twoparameter Weibull distributions - a robust study. Sankhyd B, 45(3), 320-329. Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables. Almqvist and Wiksell, Uppsala, Sweden; Wiley, New York. Calabria, R. and D. Pulcini (1994). Bayes 2-sample prediction for the inverse Weibull distribution. Commun. S t a t i s t . - Theory Meth. 23(6), 1811-1824. Chou, Youn-Min. (1988). One-sided simultaneous prediction intervals for the order statistics of 1 future samples from an exponential distribution. Commun. Statist. - Theory Meth. 17(11), 3995-4003. Chou, Youn-Min and D. B. Owen (1986a). One-side distribution free and simultaneous prediction limits for p future samples. J. Qual. Tech. 18, 96-98. Chou, Youn-Min and D. B. Owen (1986b). One-sided simultaneous lower prediction intervals for 1 future samples from a normal distribution. Technometrics 28(3), 247-251. Dixit, Ulhas J. (1994). Bayesian approach to prediction in the presence of outliers for a Weibull distribution. Metrika 41, 12~136. Dunsmore, I. R. (1974). The Bayesian predictive distribution in life testing models. Technometrics 16(3), 455-460. Fligner, M. A. and D. A. Wolfe (1976). Some applications of sample analogues to the probability integral transformation and a coverage probability. Amer. Statist. 30, 78-85. Fligner, M. A. and D. A. Wolfe (1979). Methods for obtaining a distribution-free prediction interval for the median of a future sample. J. Qual. Tech. 11, 192-198. Geisser, S. (1975). The predictive sample reuse method with application. J A S A 70, 320-328. Geisser, S. (1993). Predictive Inference: An Introduction. Chapman Hall, New York. Goldberger, A. S. (1962). Best linear unbiased prediction in the generalized regression model. J A S A 57, 369-375. Kaminsky, K. S. and P. I. Nelson (1974). Prediction intervals for the exponential distribution using subsets of the data. Technometrics 16(1), 57-59.

Prediction of order statistics

449

Kaminsky, K. S. and P. I. Nelson (1975a). Best linear unbiased prediction of order statistics in location and scale families. JASA 70(349), 145-150. Kaminsky, K. S. and P. I. Nelson (1975b). Characterization of distributions by the form of the predictors of order statistics. In: G. P. Patil et al. ed., Statistical Decisions in Scientific Work 3, 113115. Kaminsky, K. S., N. R. Mann and P. I. Nelson (1975). Best and simplified prediction of order statistics in location and scale families. Biometrika 62(2), 525 527. Kaminsky, K. S. and L. S. Rhodin (1978). The prediction information in the latest failure. JASA 73, 863-866. Kaminsky, K. S. and L. S. Rhodin (1985). Maximum likelihood prediction. AnInStMa 37, 507-517. Lawless, J. F. (1971). A prediction problem concerning samples from the exponential distribution with application in life testing. Technometrics 13(4), 725-729. Lawless, J. F. (1977). Prediction intervals for the two parameter exponential distribution. Technometrics 19(4), 469472. Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. Wiley, New York. Like~, J. (1974). Prediction of sth ordered observation for the two-parameter exponential distribution. Technometries 16(2), 241-244. Lingappaiah, G. S. (1978). Bayesian approach to the prediction problem in gamma population. Demonstratio Mathematica 11(4), 907420. Lingappaiah, G. S. (1983). Prediction in samples from a normal population. J. Statist. Res. 17 (1, 2), 43 50. Lingappaiah, G. S. (1985). A study of shifting models in life tests via Bayesian approach using semi-orused priors (Soups). Ann. Inst. Statist. Math. 37(A), 151-163. Lingappaiah, G. S. (1986). Bayesian approach to prediction in censored samples from the power function population. J. Bihar Math. Soc. 10, 60-70. Lingappaiah, G. S. (1989a). Prediction in life tests based on an exponential distribution when outliers are present. Statistica 49(4), 585 593. Lingappaiah, G. S. (1989b). Prediction in samples from a generalized logistic population of first or second kind when an outlier is present. Rev. Mat. Estat., Sao Paulo, 7, 87-95. Lingappaiah, G. S. (1989c). Bayes prediction of maxima and minima in exponential life tests in the presence of outliers. J. Indust. Math. Soc. 39(2), 169-182. Lingappaiah, G. S. (1991a). Prediction in exponential life tests where average lives are successively increasing. Pak. J. Statist. 7(1), 33-39. Lingappaiah, G. S. (1991b). Prediction in samples from a gamma population in the presence of an outlier. Bull. Malaysian Soc. (Second Series), 14, 1-14. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. Malik, H. J. (1966). Exact moments of order statistics from the Pareto distribution. Skandinavisk Aktuarietidskrift 49, 3-4, 144-157. Malik, H. J. (1967). Exact moments of order statistics from a power-function distribution. Skandinavisk Aktuarietidskrift 50, 3 4 , 64-69. Mann, N. R. (1968). Optimum estimators for linear functions of location and scale parameters. Ann. Math. Stat. 40, 2149 55. Nagaraja, H. N. (1984). Asymptotic linear prediction of extreme order statistics. Ann. Inst. Statist. Math. 289-299. Nagaraja, H. N. (1986). Comparison of estimators and predictors from two-parameter exponential distribution. Sankhy~ Ser. B 48(1), 10-18. Ogunyemi, O. T. and P. I. Nelson (1996). Adaptive and exact prediction intervals for order statistics. Commun. Statist. B., 1057 1074. Patel, J. K. (1989). Prediction intervals a review. Commun. Statist. Theory Meth. 18(7), 2393-2465. Raqab, M. Z. and H. N. Nagaraja, (1992). On some predictors of future order statistics. Tech. Report. No. 488, Ohio State Univ.

450

K. S. Kaminsky and P. L Nelson

Stone, M. (1974). Cross-validatory choice and assessment of statistical prediction (with discussion). J. Roy. Stat. Soc. B 36, 111-147. Takada, Y. (1985). Prediction limit for observation from exponential distribution. Canad. J. Statist. 13(4). 325-330. Takada, Y. (1991). Median unbiasedness in an invariant prediction problem. Stat. and Prob. Lett. 12, 281-283. Takada, Y (1993). Uniformly most accurate equivariant prediction limit. Metrika 40, 51-61. Van Zwet, W. R. (1964). Convex Transformations of Random Variables. Mathematisch Centrum, Amsterdam. Watson, G. S. (1972). Prediction and the efficiency of least squares. Biometrika 59, 91-98.