Confidence regions for the ratio of percentiles

Confidence regions for the ratio of percentiles

ARTICLE IN PRESS Statistics & Probability Letters 76 (2006) 384–392 www.elsevier.com/locate/stapro Confidence regions for the ratio of percentiles Li...

202KB Sizes 3 Downloads 111 Views

ARTICLE IN PRESS

Statistics & Probability Letters 76 (2006) 384–392 www.elsevier.com/locate/stapro

Confidence regions for the ratio of percentiles Li-Fei Huanga,, Richard A. Johnsonb a

Applied Statistics and Information Science Department, Ming Chuan University, Taipei, Taiwan b Department of Statistics, University of Wisconsin, Madison WI 53706, USA Received 28 November 2004; received in revised form 18 July 2005 Available online 15 September 2005

Abstract Motivated by applications in the lumber industry, we derive confidence regions for the ratio of percentiles from two different populations. Generalizing work on inferences concerning the ratio of two means, we develop an exact confidence procedure when the two populations are normal and have the same variance. Other cases for normal populations are treated by large sample methods. General populations are also treated in a large sample context. The different large sample procedures are compared with a small simulation study. An example, using strength of lumber data, is also given. r 2005 Elsevier B.V. All rights reserved. Keywords: Non-central t distribution; non-parametric estimation; Strength of lumber; Large sample comparisons

1. Introduction In the wood industry, it is common practice to compare two different strength properties for lumber of the same dimension, grade and species or the same strength property for lumber of two different dimensions, grades or species. Engineers often express a comparison in terms of the ratio of two strength properties. For example, the ratio of mean bending strengths. Because United States lumber standards are given in terms of population fifth percentiles, the ratio is often expressed in terms of the fifth percentiles of two strength distributions rather than the means. Aplin et al. (1986) give point estimates of the ratio of dry to green lumber. We develop confidence regions for the ratios of percentiles from two different populations. Both normal population and non-parametric procedures are derived. Most of the existing literature on ratios deals with the ratio of means and, in particular, ratios of means of normal distributions. Fieller (1954) and Ogawa (1983) treat bivariate normal distributions. McDonald (1981) obtains confidence regions for the ratio of means of two independent normal distributions arising in a straightline linear model. Hwang (1995) uses a resampling approach to construct confidence regions for this same ratio. Evans et al. (2005) show how to obtain large sample confidence intervals for the ratio of Weibull percentiles. Although it is common to compare the same percentiles of two different distributions F 1 ðÞ and F 2 ðÞ, initially we allow for different percentiles. Let p1 and p2 be specified, so x1p1 ¼ inffx : F 1 ðxÞXp1 g and x2p2 ¼ inffy : F 2 ðyÞXp2 g Corresponding author.

E-mail address: [email protected] (L.-F. Huang). 0167-7152/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2005.08.034

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

385

are the first population’s 100p1 th percentile and the second population’s 100p2 th percentile, respectively. The corresponding population ratio of percentiles is y ¼ x1p1 =x2p2 . 2. Confidence regions for the ratio of percentiles—independent normal distributions We obtain exact confidence regions for the ratio of percentiles for two different normal distributions when the ratio of variances is known. To set notation, let X i , i ¼ 1; . . . ; m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y j , j ¼ 1; . . . ; n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the samples be independent. Here, xipi ¼ mi þ F1 ðpi Þsi is the lower 100pi th percentile i ¼ 1, 2 where FðÞ is the standard normal cdf. We also let zpi be the upper 100pi th percentile of the standard normal distribution, so xipi ¼ mi  zpi si for i ¼ 1, 2. Our first result establishes that a certain random quantity has a non-central t distribution. Theorem 2.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the random samples be independent. Let y ¼ x1p1 =x2p2 . If the ratio of k ¼ s1 =s2 is known, the random quantity 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi11 1 y2 ðX  yY Þ=k ffi þ 2 A rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Tðy; kÞ ¼ @ Pm Pn 2 m nk k ðX i X Þ2 þ ðY j Y Þ2 i¼1 j¼1 mþn2

follows the non-central t distribution with m þ n  2 degrees of freedom and non-centrality parameter 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi11   1 y2 A y @ þ dðy; kÞ ¼ zp1  zp2 . m nk2 k Remark. If the two variances are the same, that is, k ¼ 1, then 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffi11 1 y2 A X  yY ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rP þ TðyÞ ¼ @ Pn m m n ðX i X Þ2 þ ðY j Y Þ2 i¼1

j¼1

mþn2

follows the non-central t distribution with m þ n  2 degrees of freedom and non-centrality parameter 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffi11 1 y2 A þ dðyÞ ¼ @ ðzp1  yzp2 Þ. m n Proof. Here k ¼ s1 =s2 and y ¼ ðm1  zp1 s1 Þ=ðm2  zp2 s2 Þ. Under the assumptions, the random quantity ðX  yY Þ=ks2 is distributed as Nð0; 1=m þ y2 =nk2 Þ plus the constant m1 =s1  m2 y=s1 ¼ zp1  ðy=kÞzp2 and Pn Pn Pm Pm 2 2 2 2 j¼1 ðY j  Y Þ j¼1 ðY j  Y Þ i¼1 ðX i  X Þ i¼1 ðX i  X Þ þ ¼ þ s21 s22 s22 k2 s22 is independently distributed as w2mþn2 ð0Þ. The result follows by the definition of the non-central t as a ratio of these random quantities (see Johnson et al., 1995, pp. 508–518). An exact confidence region for y is generated by considering the test of H 0 : y ¼ y0 versus H 1 : yay0 for each possible y0 and then collecting all y0 for which the null hypothesis is not rejected. We use an equal tail test. For the given k and any specified significance level a, we determine Lðy0 ; kÞ and Uðy0 ; kÞ such that a ¼ P½Tðy0 ; kÞoLðy0 ; kÞ ¼ P½Tðy0 ; kÞ4Uðy0 ; kÞ 2

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

386

Table 1 Some 95% confidence regions for ratio of percentiles of two independent normal populations with equal variances spooled ¼ 2

m¼n

x ¼ 5, y ¼ 10

x ¼ 10, y ¼ 5

Means ðp1 ¼ p2 ¼ 0:5Þ

10 20

ð0:359; 0:659Þ ð0:403; 0:605Þ

ð1:518; 2:786Þ ð1:651; 2:485Þ

5th percentiles ðp1 ¼ p2 ¼ 0:05Þ

10 20

ð0:087; 0:458Þ ð0:055; 0:398Þ

ð1; 11:368Þ [ ð2:183; 1Þ ð2:510; 18:222Þ

and then reject the null hypothesis if Tðy0 ; kÞoLðy0 ; kÞ or Tðy0 ; kÞ4Uðy0 ; kÞ. A 100ð1  aÞ% confidence region is the collection of all y0 such that Lðy0 ; kÞpTðy0 ; kÞpUðy0 ; kÞ. & Table 1 illustrates possible confidence regions for the ratio of percentiles from two normal distributions having equal variances. The region can be a bounded interval, the complement of an interval, or the whole real line. The latter two cases occur when the estimated percentile in the denominator and both percentiles are too close to zero, respectively. Table 1 also shows that the regions for x2p2 =x1p1 can be obtained from those for the ratio x1p1 =x2p2 . 3. Large-sample confidence regions When large samples are available, we can obtain confidence intervals for the ratio of percentiles even when the ratio of variances is unknown. The first result still requires normal populations. Theorem 3.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the samples be independent. If limm;n!1 m=m þ n ¼ l ð0olo1Þ, and x2p2 a0, pffiffiffiffiffiffiffiffiffiffiffiffi X  zp1 S 1 x1p1  mþn Y  zp2 S 2 x2p2

!

converges in distribution to the normal with mean 0 and variance ! z2p2 s22 x21p1 s21 1þ þ 1þ . 2 lx22p 2 ð1  lÞx42p 2 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Pm Pn 2 2 Proof. Since S1 ¼ i¼1 ðX i  X Þ =ðm  1Þ and S 2 ¼ j¼1 ðY j  Y Þ =ðn  1Þ, from the central limit theorem and a square root transformation on the sample variances, we have the well-known result z2p1

!

02

3 2 31 m1 X B6 7 6 7C pffiffiffiffiffiffiffiffiffiffiffiffiB6 S 1 7 6 s1 7C D 6 7 m þ nB 7C B6 Y 7  6 C! N 4 ð0; RÞ, @4 5 4 m2 5A s2 S2 where R ¼ diag½s21 =l; s21 =2l; s22 =ð1  lÞ; s22 =2ð1  lÞ: To apply the delta method, we set t ¼ ðt1 ; t2 ; t3 ; t4 Þ0 and gðtÞ ¼

t1  zp1 t2 . t3  zp2 t4

(1)

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

387

We conclude that gððX ; S 1 ; Y ; S 2 Þ0 Þ ¼ ðX  zp1 S1 Þ=ðY  zp2 S2 Þ is asymptotically normal with mean y and covariance matrix ðm þ nÞ1 Rg where  0   qg qg diag½s21 =l; s21 =2l; s22 =ð1  lÞ; s22 =2ð1  lÞ qt qt ! ! 2 2 2 2 2 zp zp s2 x1p1 s1 ¼ 1þ 1 þ 1þ 2 : & 2 2 lx2p 2 ð1  lÞx42p 2 2

Rg ¼

Remark. If s1 ¼ s2 ¼ s, the limiting distribution is x1p1 2 N ;s x2p2

" 1þ

z2p1 2

!

1 þ lx22p2



z2p2 2

!

x21p1 ð1  lÞx42p2

#! .

In any case, x1p1 can be estimated by x  zp1 s1 , x2p2 can be estimated by y  zp2 s2 , s21 can be estimated by s21 , and s22 can be estimated by s22 . Therefore, an approximate 100ð1  aÞ% confidence interval for y is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ! ! u 2 2 2 2 2 u 2 þ z 2 þ z x  zp1 s1 s2 ðx  zp1 s1 Þ s1 p1 p2  za=2 t þ y  zp 2 s 2 2m 2n ðy  zp2 s2 Þ2 ðy  zp2 s2 Þ4 which is always a bounded interval. However, the simulations in Section 5 show poor coverage rates when x2p2 is near zero. When sample sizes are large, a non-parametric approach is possible. Let ½w denote the smallest integer that is greater than or equal to w. We estimate x1p1 by the order statistic x^ 1p1 ¼ X ðr1 Þ where r1 ¼ ½mp1  and estimate x2p2 by the order statistic x^ 2p2 ¼ Y ðr2 Þ where r2 ¼ ½np2 . Theorem 3.2. Let X 1 ; . . . ; X m be a random sample from F 1 ðÞ; let Y 1 ; . . . ; Y n from F 2 ðÞ; and let the samples be independent. If the population density function F 0i ðÞ, is positive and continuous in a neighborhood of xipi , for i ¼ 1; 2, limn;m!1 m=ðn þ mÞ ¼ l ð0olo1Þ, and x2p2 a0, then ! pffiffiffiffiffiffiffiffiffiffiffiffi x^ 1p1 x1p1 mþn  x2p2 x^ 2p 2

converges in distribution to the normal distribution with mean 0 and variance x21p1 p2 ð1  p2 Þ p1 ð1  p1 Þ þ . lx22p2 ½F 01 ðx1p1 Þ2 ð1  lÞx42p2 ½F 02 ðx2p2 Þ2 Proof. Since ½x^ 1p1 ; x^ 2p2 0 is asymptotically normal with mean 0 and covariance matrix " # 1 p1 ð1  p1 Þ p2 ð1  p2 Þ diag ; mþn l½F 01 ðx1p1 Þ2 ð1  lÞ½F 02 ðx2p2 Þ2 applying the delta method with gððt1 ; t2 Þ0 Þ ¼ t1 =t2 gives the result. The resulting approximate 100ð1  aÞ% confidence interval for y is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2 u ^x1p x^ 1p1 1 p2 ð1  p2 Þ up1 ð1  p1 Þ 1 ,  za=2 t  0 þ  0 2 4 x^ 2p2 ½F^ 2 ðx^ 2p2 Þ2 ½F^ 1 ðx^ 1p1 Þ2 mx^ 2p nx^ 2p 2

2

0 where we estimate the value of the density function F 0i ðxipi Þ by the consistent kernel estimates F^ i ðx^ ipi Þ for i ¼ 1; 2 as suggested in Silverman (1986)). &

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

388

4. An alternative large-sample approach For normal populations, under our alternative approach, we consider X  zp1 S 1  yðY  zp2 S2 Þ and derive large sample confidence regions that may be intervals, complements of intervals, or sometimes even the whole real line. Theorem 4.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 , let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 , and let the two random samples be independent. Let y ¼ x1p1 =x1p2 be unknown but finite. Under the condition that m limm;n!1 mþn ¼ l ð0olo1Þ, the random quantity pffiffiffiffiffiffiffiffiffiffiffiffi m þ nðX  zp1 S1  yðY  zp2 S 2 ÞÞ converges in distribution to the normal with mean 0 and variance ! ! z2p1 2 z2p2 2 1 1 2 1þ 1þ s1 þ y s2 . l ð1  lÞ 2 2 Proof. From the normal limit in (1), we conclude that the linear combination pffiffiffiffiffiffiffiffiffiffiffiffi m þ nðX  zp1 S1  yðY  zp2 S 2 ÞÞ converges in distribution to normal with mean 0 and variance ! ! z2p1 2 z2p2 2 1 1 2 1þ 1þ s1 þ y s2 . l ð1  lÞ 2 2 Under the condition that y ¼ x1p1 =x2p2 , according to Slutsky’s theorem, X  zp1 S 1  yðY  zp2 S 2 Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ZM ! ! ffi m;n ðyÞ ¼ v u 2 u z2p1 S 21 z S22 p t 1þ þ y2 1 þ 2 2 m 2 n

(2)

is, asymptotically, standard normal. As a function of y, Z M m;n ðyÞ has a single maximum which is, in absolute value, greater than the asymptotes. In particular, Z M ðyÞ satisfies m;n Y  zp2 S2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lim Z M ! ; m;n ðyÞ ¼ v u u z2p2 S 22 t 1þ 2 n

y!1

Y  zp2 S 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lim Z M ! m;n ðyÞ ¼  v u u z2p2 S 22 t 1þ 2 n

y!1

and S2 S2 ð1 þ z2p1 =2Þ 1 ðY  zp2 S 2 Þ þ yð1 þ z2p2 =2Þ 2 ðX  zp1 S1 Þ d M m n Z ðyÞ ¼  .  2 2 3=2 dy m;n 2 S zp S 1 2 2 2 ð1 þ 21 Þ þ y ð1 þ zp2 =2Þ m n Consequently, the derivative vanishes at a unique value # !  "  2 2 zp2 S 21 z S p 2 y ¼  1 þ 1 ðY  zp2 S 2 Þ ðX  zp1 S 1 Þ . 1þ 2 2 m 2 n

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

389

Further, the derivative converges to 0 as y ! 1 and has the same sign as X  zp1 S 1 for sufficiently negative y. Moreover, since the maximum occurs at y , with probability one, we have ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u 2 u z S 22 p M 2 : jZ m;n ðy Þj4jY  zp2 S 2 j t 1 þ 2 n Consequently, the large sample 100ð1  aÞ% confidence region becomes (a) an interval

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u u z2p2 S 22 t if za=2 ojY  zp2 S 2 j ojZ M 1þ m;n ðy Þj; 2 n

(b) the complement of an interval ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u 2 u z S 22 p if jY  zp2 S 2 j t 1 þ 2 pza=2 ojZ M m;n ðy Þj; 2 n (c) the whole real line

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u u z2p2 S 22 t if jY  zp2 S 2 j ojZ M 1þ m;n ðy Þjpza=2 : 2 n

We see that the region fails to be a bounded interval if the percentile in the denominator is not significantly different from zero, or, ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u u z2p2 S22 t jY  zp2 S2 j pza=2 1þ 2 n For our alternative non-parametric procedure, we consider x^ 1p1  yx^ 2p2 .

&

Theorem 4.2. Let X 1 ; . . . ; X m be a random sample of from F 1 which has a positive continuous derivative F 01 ðÞ in a neighborhood of x1p1 , let Y 1 ; . . . ; Y n be a random sample from F 2 which has a positive continuous derivative F 02 ðÞ in a neighborhood of x2p2 , and let the two samples be independent. Let y ¼ x1p1 =x1p2 be unknown but finite. Under the condition that limm;n!1 m=ðm þ nÞ ¼ l ð0olo1Þ, pffiffiffiffiffiffiffiffiffiffiffiffi m þ n ðx^ 1p1  yx^ 2p2 Þ converges in distribution to the normal with mean 0 and variance p1 ð1  p1 Þ p2 ð1  p2 Þ þ y2 . l½F 01 ðx1p1 Þ2 ð1  lÞ½F 02 ðx2p2 Þ2 Proof. From the proof of the Theorem 3.2, we conclude that the normal with mean 0 and variance p1 ð1  p1 Þ p2 ð1  p2 Þ þ y2 . 2 0 l½F 1 ðx1p1 Þ ð1  lÞ½F 02 ðx2p2 Þ2

pffiffiffiffiffiffiffiffiffiffiffiffi ^ m þ nðx1p1  yx^ 2p2 Þ converges in distribution to

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

390

Under the conditions of Theorem 4.2, we have that x^ 1p1  yx^ 2p2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p1 ð1  p1 Þ p ð1  p2 Þ þ y2 2 0 2 0 m½F 1 ðx1p1 Þ n½F 2 ðx2p2 Þ2 is, asymptotically, a standard normal random variable. Introducing the consistent kernel estimates, we consider x^ 1p1  yx^ 2p2 s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ZM;NP ðyÞ ¼ , m;n p1 ð1  p1 Þ 2 p2 ð1  p2 Þ þy 0 0 m½F^ ðx^ 1p Þ2 n½F^ ðx^ 2p Þ2 1

2

1

2

ZM m;n ðyÞ

which has the same structure as in (2). Moreover, since the unique maximum occurs at " #," # p1 ð1  p1 Þ ^ p2 ð1  p2 Þ ^ x2p2 x1p1 , y ¼  0 0 m½F^ ðx^ 1p Þ2 n½F^ ðx^ 2p Þ2 1

2

2

2

with probability one, we have ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1  p2 Þ jx^ 2p2 j ojZM;NP 0 m;n ðy Þj: n½F^ 2 ðx^ 2p2 Þ2 Also, x^ 2p2 lim Z M;NP m;n ðyÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; y!1 p2 ð1  p2 Þ 0 n½F^ ðx^ 2p Þ2 2

2

x^ 2p2 lim Z M;NP m;n ðyÞ ¼  sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . y!1 p2 ð1  p2 Þ 0 n½F^ ðx^ 2p Þ2 2

2

The 100ð1  aÞ confidence region becomes (a) an interval

,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1  p2 Þ ojZ M;NP if za=2 ojx^ 2p2 j 0 m;n ðy Þj; n½F^ ðx^ 2p Þ2 2

2

(b) the complement of an interval ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1  p2 Þ ^ if jx2p2 j pza=2 ojZ M;NP 0 m;n ðy Þj; n½F^ 2 ðx^ 2p2 Þ2 (c) the whole real line ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1  p2 Þ if jx^ 2p2 j ojZ M;NP 0 m;n ðy Þjpza=2 : n½F^ 2 ðx^ 2p2 Þ2 The confidence region fails to be a bounded interval if the percentile in the denominator is not significantly different from 0, or ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1  p2 Þ pza=2 : jx^ 2p2 j 0 n½F^ ðx^ 2p Þ2 2

2

&

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

391

5. Example and simulation We consider data (see Table 2) on the modulus of rupture (MOR) of Douglas fir specimens, provided to us by J. Evans, whose percentile estimates were given in Aplin et al. (1986). The normal plots suggest that data are normally distributed and it is reasonable to assume that the variances are equal so k ¼ 1. From the summary statistics m ¼ 107, n ¼ 103,

s21 ¼ 2; 354; 470, s22 ¼ 2; 528; 864,

x ¼ 4840:3, y ¼ 7077:9,

spooled ¼ 1565:8,

xð6Þ ¼ 2350:2 yð6Þ ¼ 4597:8.

we determine 95% confidence intervals for ratio of 5th percentiles (see Johnson and Huang, 2003, for the particular choice of kernel suggested by Silverman, 1986). All of the methods give similar results (see Table 3) because x2;:05 is far from 0. Under the transformations X i ! c1 X i and Y j ! c2 Y j with c1 , c2 40, we have y ! c1 y=c2 and the values of Z m;n ðyÞ and Z M m;n ðyÞ remain unchanged. Consequently, the coverage probabilities for the normal theory procedures remain the same for all c1 , c2 . In the non-parametric setting, our kernel satisfies 0 NP M;NP ^0 ^ F^ i ðci x^ ipi Þ ¼ c1 i F i ðxipi Þ, i ¼ 1, 2, and so Z m;n ðyÞ and Z m;n ðyÞ each remain the same. Consequently, the coverage probabilities remain the same for all c1 , c2 40. To illustrate that regions other than intervals are possible and to estimate the actual coverage probabilities, we turn to simulation. In Table 4 we present estimated coverage probabilities for samples sizes m ¼ 100 ¼ n Table 2 Data sets MOR ðlb=in2 Þ of 2  4 in, Grade 2, Green 5418.6 4795.9 7061.8 6617.3 6136.7 7529.2 6455.7 6082.7 5511.1 5231.6 5851.4 4281.8 4230.9 2524.8 4896.3 6351.2 2764.8 4432.9 5128.6 5681.0 4690.8 2327.6 2642.7 4760.3 3340.9 3207.7 5270.6 1420.3 3015.2 4451.7 2444.5 3747.0 3879.0

(30% moisture content) 6307.9 6964.0 6357.9 7643.6 5976.8 4607.6 9213.0 6051.4 4161.0 4818.4 5325.4 5651.6 5894.0 2350.2 3022.1 4187.7 4274.4 3854.5 1931.2 3556.0 4392.0 2105.5

MOR ðlb=in2 Þ of 2  6 in, select structural, Green (30% 9579.3 6475.0 8374.9 9364.2 7896.3 8041.5 9024.7 7614.5 8793.8 6836.1 7774.3 6724.0 8018.8 7078.2 9015.2 8935.7 7271.5 7776.8 7962.0 7806.5 7762.6 5169.9 8267.5 4597.8 6598.4 5253.9 8643.2 7323.3 6835.0 7449.2 5477.9 7288.0 6244.2 6893.2 10100.9 5548.2 6922.1 5120.6 6067.5 4639.9

6674.1 7311.8 4414.5 4050.5 5325.4 3917.7 2822.7 4420.0 2748.2 3396.4 5161.1

8153.4 6997.6 5268.4 5677.2 5818.8 4429.6 5465.8 3095.2 4461.2 6767.2 4658.4

6843.5 4533.1 8145.4 5531.8 4787.2 3938.0 3770.9 5289.3 4892.6 2566.1

7011.3 5691.8 4616.1 4872.4 5988.6 5143.3 3168.1 3440.7 5078.2 1228.4

5817.3 6245.9 3508.1 3677.3 5530.9 4044.5 4994.6 4533.1 5278.7 5883.6

moisture content) 9265.1 10653.4 6720.6 8685.8 7887.9 5045.7 8388.6 7684.5 7418.5 6575.0 7445.3 6533.7 7099.8 6842.3 7803.5 5616.3 5783.9 4606.4 3680.3 5399.4

8051.2 11077.9 6556.9 6800.9 5495.9 3911.9 7853.9 7137.9 5679.9 4917.3

9828.1 9178.9 6574.8 8254.7 7820.9 6768.4 6594.6 8925.9 7984.7 5398.0

7930.1 8375.9 5576.5 6700.2 6122.8 7865.4 6724.0 7182.6 6014.0 4173.1

8304.1 10342.8 7194.2 7859.9 5345.8 9499.1 3981.0 5208.5 6159.1 7418.5

Table 3 Results applying the data sets Non-central t

Zm;n ðyÞ

ZM m;n ðyÞ

ZNP m;n ðyÞ

Z M;NP m;n ðyÞ

ð0:410; 0:583Þ

ð0:405; 0:633Þ

ð0:410; 0:640Þ

ð0:395; 0:628Þ

ð0:400; 0:635Þ

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

392

Table 4 Estimated coverage probabilities. N T ¼ 5000 replications Zm;n ðyÞ

ZM m;n ðyÞ

ZNP m;n ðyÞ

Z M;NP ðyÞ m;n

87:58% ð0:4664Þ

94:86% ð0:3123Þ

88:82% ð0:4456Þ

93:18% ð0:3565Þ

and a case where the denominator is close to zero. m1 ¼ 1:93;

s1 ¼ 2;

x1;0:05 ¼ 1:36,

m2 ¼ 1:93;

s2 ¼ 1;

x2;0:05 ¼ 0:285.

The estimated standard deviations are given in parentheses. ZM m;n ðyÞ

Interval

Outside interval

Real line

Overall

Number Coverage number Percentage of coverage

N I ¼ 2329 2147 92.19% (0.3795%) Interval

N C ¼ 2669 2594 97.19% (0.2337%) Outside interval

NR ¼ 2 2 100% (0) Real line

N T ¼ 5000 4743 94.86% (0.3123%) Overall

N I ¼ 1582 1360 85.97% (0.4912%)

N C ¼ 3376 3257 96.48% (0.2606%)

N R ¼ 42 42 100% (0)

N T ¼ 5000 4659 93.18% (0.3565%)

Z M;NP m;n ðyÞ Number Coverage number Percentage of coverage

A comparison of the modified procedures reveals that the non-parametric approach results in substantially fewer actual intervals. See Huang (2001) for additional simulations and examples. Acknowledgements The authors wish to thank James Evans, United States Forest Products Laboratory, for providing the data for the example. References Aplin, E.N., Green, D.W., Evans, J.W., Barrett, J.D., 1986. The influence of moisture content on the flexural properties of douglas fir dimension lumber. Research Paper 475, US Department of Agriculture, Forest Service, Forest Products Laboratory, Madison, WI. Evans, J.W., Johnson, R.A. , Green, D.W., Verrill, S.P., 2005. Applications of the Weibull distribution in wood engineering. In: Balakrishnan, N. (Ed.), Weibull Distributions, to appear. Fieller, E.C., 1954. Some problems in interval estimation. J. Roy. Statist. Soc. Ser. B 16, 175–185. Huang, L.-F., 2001. Confidence regions for the ratio of percentiles. Ph.D. Dissertation, Department of Statistics, University of Wisconsin, Madison. Hwang, J.T.G., 1995. Fieller’s problems and resampling techniques. Statist. Sinica 5, 161–171. Johnson, R.A., Huang, L.-F., 2003. Some exact and approximate confidence regions for the ratio of percentiles from two different distributions. In: Lindqvist, B., Doksum, K. (Eds.), Mathematical and Statistical Methods in Reliability. World Scientific, Singapore, pp. 455–468. Johnson, N., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions, vol. 2, second ed. Wiley-Interscience, New York. McDonald, G.C., 1981. Confidence intervals for vehicle emission deterioration factors. Technometrics 23, 239–242. Ogawa, J., 1983. On the ‘Confidence Bounds’ of the ratio of the means of a bivariate normal distribution. Ann. Inst. Statist. Math. 35, 41–48. Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.