Confidence regions for the ratio of percentiles

ARTICLE IN PRESS Statistics & Probability Letters 76 (2006) 384–392 www.elsevier.com/locate/stapro Conﬁdence regions for the ratio of percentiles Li...

Download PDF

202KB Sizes 3 Downloads 111 Views

Report

PDF Reader
Full Text

ARTICLE IN PRESS

Statistics & Probability Letters 76 (2006) 384–392 www.elsevier.com/locate/stapro

Conﬁdence regions for the ratio of percentiles Li-Fei Huanga,, Richard A. Johnsonb a

Applied Statistics and Information Science Department, Ming Chuan University, Taipei, Taiwan b Department of Statistics, University of Wisconsin, Madison WI 53706, USA Received 28 November 2004; received in revised form 18 July 2005 Available online 15 September 2005

Abstract Motivated by applications in the lumber industry, we derive conﬁdence regions for the ratio of percentiles from two different populations. Generalizing work on inferences concerning the ratio of two means, we develop an exact conﬁdence procedure when the two populations are normal and have the same variance. Other cases for normal populations are treated by large sample methods. General populations are also treated in a large sample context. The different large sample procedures are compared with a small simulation study. An example, using strength of lumber data, is also given. r 2005 Elsevier B.V. All rights reserved. Keywords: Non-central t distribution; non-parametric estimation; Strength of lumber; Large sample comparisons

1. Introduction In the wood industry, it is common practice to compare two different strength properties for lumber of the same dimension, grade and species or the same strength property for lumber of two different dimensions, grades or species. Engineers often express a comparison in terms of the ratio of two strength properties. For example, the ratio of mean bending strengths. Because United States lumber standards are given in terms of population ﬁfth percentiles, the ratio is often expressed in terms of the ﬁfth percentiles of two strength distributions rather than the means. Aplin et al. (1986) give point estimates of the ratio of dry to green lumber. We develop conﬁdence regions for the ratios of percentiles from two different populations. Both normal population and non-parametric procedures are derived. Most of the existing literature on ratios deals with the ratio of means and, in particular, ratios of means of normal distributions. Fieller (1954) and Ogawa (1983) treat bivariate normal distributions. McDonald (1981) obtains conﬁdence regions for the ratio of means of two independent normal distributions arising in a straightline linear model. Hwang (1995) uses a resampling approach to construct conﬁdence regions for this same ratio. Evans et al. (2005) show how to obtain large sample conﬁdence intervals for the ratio of Weibull percentiles. Although it is common to compare the same percentiles of two different distributions F 1 ðÞ and F 2 ðÞ, initially we allow for different percentiles. Let p1 and p2 be speciﬁed, so x1p1 ¼ inffx : F 1 ðxÞXp1 g and x2p2 ¼ inffy : F 2 ðyÞXp2 g Corresponding author.

E-mail address: [email protected] (L.-F. Huang). 0167-7152/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2005.08.034

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

385

are the ﬁrst population’s 100p1 th percentile and the second population’s 100p2 th percentile, respectively. The corresponding population ratio of percentiles is y ¼ x1p1 =x2p2 . 2. Conﬁdence regions for the ratio of percentiles—independent normal distributions We obtain exact conﬁdence regions for the ratio of percentiles for two different normal distributions when the ratio of variances is known. To set notation, let X i , i ¼ 1; . . . ; m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y j , j ¼ 1; . . . ; n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the samples be independent. Here, xipi ¼ mi þ F1 ðpi Þsi is the lower 100pi th percentile i ¼ 1, 2 where FðÞ is the standard normal cdf. We also let zpi be the upper 100pi th percentile of the standard normal distribution, so xipi ¼ mi zpi si for i ¼ 1, 2. Our ﬁrst result establishes that a certain random quantity has a non-central t distribution. Theorem 2.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the random samples be independent. Let y ¼ x1p1 =x2p2 . If the ratio of k ¼ s1 =s2 is known, the random quantity 0sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ11 1 y2 ðX yY Þ=k ﬃ þ 2 A rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Tðy; kÞ ¼ @ Pm Pn 2 m nk k ðX i X Þ2 þ ðY j Y Þ2 i¼1 j¼1 mþn2

follows the non-central t distribution with m þ n 2 degrees of freedom and non-centrality parameter 0sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ11 1 y2 A y @ þ dðy; kÞ ¼ zp1 zp2 . m nk2 k Remark. If the two variances are the same, that is, k ¼ 1, then 0sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ11 1 y2 A X yY ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ rP þ TðyÞ ¼ @ Pn m m n ðX i X Þ2 þ ðY j Y Þ2 i¼1

j¼1

mþn2

follows the non-central t distribution with m þ n 2 degrees of freedom and non-centrality parameter 0sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ11 1 y2 A þ dðyÞ ¼ @ ðzp1 yzp2 Þ. m n Proof. Here k ¼ s1 =s2 and y ¼ ðm1 zp1 s1 Þ=ðm2 zp2 s2 Þ. Under the assumptions, the random quantity ðX yY Þ=ks2 is distributed as Nð0; 1=m þ y2 =nk2 Þ plus the constant m1 =s1 m2 y=s1 ¼ zp1 ðy=kÞzp2 and Pn Pn Pm Pm 2 2 2 2 j¼1 ðY j Y Þ j¼1 ðY j Y Þ i¼1 ðX i X Þ i¼1 ðX i X Þ þ ¼ þ s21 s22 s22 k2 s22 is independently distributed as w2mþn2 ð0Þ. The result follows by the deﬁnition of the non-central t as a ratio of these random quantities (see Johnson et al., 1995, pp. 508–518). An exact conﬁdence region for y is generated by considering the test of H 0 : y ¼ y0 versus H 1 : yay0 for each possible y0 and then collecting all y0 for which the null hypothesis is not rejected. We use an equal tail test. For the given k and any speciﬁed signiﬁcance level a, we determine Lðy0 ; kÞ and Uðy0 ; kÞ such that a ¼ P½Tðy0 ; kÞoLðy0 ; kÞ ¼ P½Tðy0 ; kÞ4Uðy0 ; kÞ 2

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

386

Table 1 Some 95% conﬁdence regions for ratio of percentiles of two independent normal populations with equal variances spooled ¼ 2

m¼n

x ¼ 5, y ¼ 10

x ¼ 10, y ¼ 5

Means ðp1 ¼ p2 ¼ 0:5Þ

10 20

ð0:359; 0:659Þ ð0:403; 0:605Þ

ð1:518; 2:786Þ ð1:651; 2:485Þ

5th percentiles ðp1 ¼ p2 ¼ 0:05Þ

10 20

ð0:087; 0:458Þ ð0:055; 0:398Þ

ð1; 11:368Þ [ ð2:183; 1Þ ð2:510; 18:222Þ

and then reject the null hypothesis if Tðy0 ; kÞoLðy0 ; kÞ or Tðy0 ; kÞ4Uðy0 ; kÞ. A 100ð1 aÞ% conﬁdence region is the collection of all y0 such that Lðy0 ; kÞpTðy0 ; kÞpUðy0 ; kÞ. & Table 1 illustrates possible conﬁdence regions for the ratio of percentiles from two normal distributions having equal variances. The region can be a bounded interval, the complement of an interval, or the whole real line. The latter two cases occur when the estimated percentile in the denominator and both percentiles are too close to zero, respectively. Table 1 also shows that the regions for x2p2 =x1p1 can be obtained from those for the ratio x1p1 =x2p2 . 3. Large-sample conﬁdence regions When large samples are available, we can obtain conﬁdence intervals for the ratio of percentiles even when the ratio of variances is unknown. The ﬁrst result still requires normal populations. Theorem 3.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the samples be independent. If limm;n!1 m=m þ n ¼ l ð0olo1Þ, and x2p2 a0, pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ X zp1 S 1 x1p1 mþn Y zp2 S 2 x2p2

!

converges in distribution to the normal with mean 0 and variance ! z2p2 s22 x21p1 s21 1þ þ 1þ . 2 lx22p 2 ð1 lÞx42p 2 2 qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ Pm Pn 2 2 Proof. Since S1 ¼ i¼1 ðX i X Þ =ðm 1Þ and S 2 ¼ j¼1 ðY j Y Þ =ðn 1Þ, from the central limit theorem and a square root transformation on the sample variances, we have the well-known result z2p1

!

02

3 2 31 m1 X B6 7 6 7C pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃB6 S 1 7 6 s1 7C D 6 7 m þ nB 7C B6 Y 7 6 C! N 4 ð0; RÞ, @4 5 4 m2 5A s2 S2 where R ¼ diag½s21 =l; s21 =2l; s22 =ð1 lÞ; s22 =2ð1 lÞ: To apply the delta method, we set t ¼ ðt1 ; t2 ; t3 ; t4 Þ0 and gðtÞ ¼

t1 zp1 t2 . t3 zp2 t4

(1)

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

387

We conclude that gððX ; S 1 ; Y ; S 2 Þ0 Þ ¼ ðX zp1 S1 Þ=ðY zp2 S2 Þ is asymptotically normal with mean y and covariance matrix ðm þ nÞ1 Rg where 0 qg qg diag½s21 =l; s21 =2l; s22 =ð1 lÞ; s22 =2ð1 lÞ qt qt ! ! 2 2 2 2 2 zp zp s2 x1p1 s1 ¼ 1þ 1 þ 1þ 2 : & 2 2 lx2p 2 ð1 lÞx42p 2 2

Rg ¼

Remark. If s1 ¼ s2 ¼ s, the limiting distribution is x1p1 2 N ;s x2p2

" 1þ

z2p1 2

!

1 þ lx22p2

1þ

z2p2 2

!

x21p1 ð1 lÞx42p2

#! .

In any case, x1p1 can be estimated by x zp1 s1 , x2p2 can be estimated by y zp2 s2 , s21 can be estimated by s21 , and s22 can be estimated by s22 . Therefore, an approximate 100ð1 aÞ% conﬁdence interval for y is vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ ! ! u 2 2 2 2 2 u 2 þ z 2 þ z x zp1 s1 s2 ðx zp1 s1 Þ s1 p1 p2 za=2 t þ y zp 2 s 2 2m 2n ðy zp2 s2 Þ2 ðy zp2 s2 Þ4 which is always a bounded interval. However, the simulations in Section 5 show poor coverage rates when x2p2 is near zero. When sample sizes are large, a non-parametric approach is possible. Let ½w denote the smallest integer that is greater than or equal to w. We estimate x1p1 by the order statistic x^ 1p1 ¼ X ðr1 Þ where r1 ¼ ½mp1 and estimate x2p2 by the order statistic x^ 2p2 ¼ Y ðr2 Þ where r2 ¼ ½np2 . Theorem 3.2. Let X 1 ; . . . ; X m be a random sample from F 1 ðÞ; let Y 1 ; . . . ; Y n from F 2 ðÞ; and let the samples be independent. If the population density function F 0i ðÞ, is positive and continuous in a neighborhood of xipi , for i ¼ 1; 2, limn;m!1 m=ðn þ mÞ ¼ l ð0olo1Þ, and x2p2 a0, then ! pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ x^ 1p1 x1p1 mþn x2p2 x^ 2p 2

converges in distribution to the normal distribution with mean 0 and variance x21p1 p2 ð1 p2 Þ p1 ð1 p1 Þ þ . lx22p2 ½F 01 ðx1p1 Þ2 ð1 lÞx42p2 ½F 02 ðx2p2 Þ2 Proof. Since ½x^ 1p1 ; x^ 2p2 0 is asymptotically normal with mean 0 and covariance matrix " # 1 p1 ð1 p1 Þ p2 ð1 p2 Þ diag ; mþn l½F 01 ðx1p1 Þ2 ð1 lÞ½F 02 ðx2p2 Þ2 applying the delta method with gððt1 ; t2 Þ0 Þ ¼ t1 =t2 gives the result. The resulting approximate 100ð1 aÞ% conﬁdence interval for y is vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ u 2 u ^x1p x^ 1p1 1 p2 ð1 p2 Þ up1 ð1 p1 Þ 1 , za=2 t 0 þ 0 2 4 x^ 2p2 ½F^ 2 ðx^ 2p2 Þ2 ½F^ 1 ðx^ 1p1 Þ2 mx^ 2p nx^ 2p 2

2

0 where we estimate the value of the density function F 0i ðxipi Þ by the consistent kernel estimates F^ i ðx^ ipi Þ for i ¼ 1; 2 as suggested in Silverman (1986)). &

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

388

4. An alternative large-sample approach For normal populations, under our alternative approach, we consider X zp1 S 1 yðY zp2 S2 Þ and derive large sample conﬁdence regions that may be intervals, complements of intervals, or sometimes even the whole real line. Theorem 4.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 , let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 , and let the two random samples be independent. Let y ¼ x1p1 =x1p2 be unknown but finite. Under the condition that m limm;n!1 mþn ¼ l ð0olo1Þ, the random quantity pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ m þ nðX zp1 S1 yðY zp2 S 2 ÞÞ converges in distribution to the normal with mean 0 and variance ! ! z2p1 2 z2p2 2 1 1 2 1þ 1þ s1 þ y s2 . l ð1 lÞ 2 2 Proof. From the normal limit in (1), we conclude that the linear combination pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ m þ nðX zp1 S1 yðY zp2 S 2 ÞÞ converges in distribution to normal with mean 0 and variance ! ! z2p1 2 z2p2 2 1 1 2 1þ 1þ s1 þ y s2 . l ð1 lÞ 2 2 Under the condition that y ¼ x1p1 =x2p2 , according to Slutsky’s theorem, X zp1 S 1 yðY zp2 S 2 Þ ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ZM ! ! ﬃ m;n ðyÞ ¼ v u 2 u z2p1 S 21 z S22 p t 1þ þ y2 1 þ 2 2 m 2 n

(2)

is, asymptotically, standard normal. As a function of y, Z M m;n ðyÞ has a single maximum which is, in absolute value, greater than the asymptotes. In particular, Z M ðyÞ satisﬁes m;n Y zp2 S2 ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ lim Z M ! ; m;n ðyÞ ¼ v u u z2p2 S 22 t 1þ 2 n

y!1

Y zp2 S 2 ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ lim Z M ! m;n ðyÞ ¼ v u u z2p2 S 22 t 1þ 2 n

y!1

and S2 S2 ð1 þ z2p1 =2Þ 1 ðY zp2 S 2 Þ þ yð1 þ z2p2 =2Þ 2 ðX zp1 S1 Þ d M m n Z ðyÞ ¼ . 2 2 3=2 dy m;n 2 S zp S 1 2 2 2 ð1 þ 21 Þ þ y ð1 þ zp2 =2Þ m n Consequently, the derivative vanishes at a unique value # ! " 2 2 zp2 S 21 z S p 2 y ¼ 1 þ 1 ðY zp2 S 2 Þ ðX zp1 S 1 Þ . 1þ 2 2 m 2 n

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

389

Further, the derivative converges to 0 as y ! 1 and has the same sign as X zp1 S 1 for sufﬁciently negative y. Moreover, since the maximum occurs at y , with probability one, we have ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ,v ! u 2 u z S 22 p M 2 : jZ m;n ðy Þj4jY zp2 S 2 j t 1 þ 2 n Consequently, the large sample 100ð1 aÞ% conﬁdence region becomes (a) an interval

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ,v ! u u z2p2 S 22 t if za=2 ojY zp2 S 2 j ojZ M 1þ m;n ðy Þj; 2 n

(b) the complement of an interval ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ,v ! u 2 u z S 22 p if jY zp2 S 2 j t 1 þ 2 pza=2 ojZ M m;n ðy Þj; 2 n (c) the whole real line

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ,v ! u u z2p2 S 22 t if jY zp2 S 2 j ojZ M 1þ m;n ðy Þjpza=2 : 2 n

We see that the region fails to be a bounded interval if the percentile in the denominator is not signiﬁcantly different from zero, or, ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ,v ! u u z2p2 S22 t jY zp2 S2 j pza=2 1þ 2 n For our alternative non-parametric procedure, we consider x^ 1p1 yx^ 2p2 .

&

Theorem 4.2. Let X 1 ; . . . ; X m be a random sample of from F 1 which has a positive continuous derivative F 01 ðÞ in a neighborhood of x1p1 , let Y 1 ; . . . ; Y n be a random sample from F 2 which has a positive continuous derivative F 02 ðÞ in a neighborhood of x2p2 , and let the two samples be independent. Let y ¼ x1p1 =x1p2 be unknown but finite. Under the condition that limm;n!1 m=ðm þ nÞ ¼ l ð0olo1Þ, pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ m þ n ðx^ 1p1 yx^ 2p2 Þ converges in distribution to the normal with mean 0 and variance p1 ð1 p1 Þ p2 ð1 p2 Þ þ y2 . l½F 01 ðx1p1 Þ2 ð1 lÞ½F 02 ðx2p2 Þ2 Proof. From the proof of the Theorem 3.2, we conclude that the normal with mean 0 and variance p1 ð1 p1 Þ p2 ð1 p2 Þ þ y2 . 2 0 l½F 1 ðx1p1 Þ ð1 lÞ½F 02 ðx2p2 Þ2

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ^ m þ nðx1p1 yx^ 2p2 Þ converges in distribution to

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

390

Under the conditions of Theorem 4.2, we have that x^ 1p1 yx^ 2p2 sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ p1 ð1 p1 Þ p ð1 p2 Þ þ y2 2 0 2 0 m½F 1 ðx1p1 Þ n½F 2 ðx2p2 Þ2 is, asymptotically, a standard normal random variable. Introducing the consistent kernel estimates, we consider x^ 1p1 yx^ 2p2 s ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ZM;NP ðyÞ ¼ , m;n p1 ð1 p1 Þ 2 p2 ð1 p2 Þ þy 0 0 m½F^ ðx^ 1p Þ2 n½F^ ðx^ 2p Þ2 1

2

1

2

ZM m;n ðyÞ

which has the same structure as in (2). Moreover, since the unique maximum occurs at " #," # p1 ð1 p1 Þ ^ p2 ð1 p2 Þ ^ x2p2 x1p1 , y ¼ 0 0 m½F^ ðx^ 1p Þ2 n½F^ ðx^ 2p Þ2 1

2

2

2

with probability one, we have ,sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ p2 ð1 p2 Þ jx^ 2p2 j ojZM;NP 0 m;n ðy Þj: n½F^ 2 ðx^ 2p2 Þ2 Also, x^ 2p2 lim Z M;NP m;n ðyÞ ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ; y!1 p2 ð1 p2 Þ 0 n½F^ ðx^ 2p Þ2 2

2

x^ 2p2 lim Z M;NP m;n ðyÞ ¼ sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ . y!1 p2 ð1 p2 Þ 0 n½F^ ðx^ 2p Þ2 2

2

The 100ð1 aÞ conﬁdence region becomes (a) an interval

,sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ p2 ð1 p2 Þ ojZ M;NP if za=2 ojx^ 2p2 j 0 m;n ðy Þj; n½F^ ðx^ 2p Þ2 2

2

(b) the complement of an interval ,sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ p2 ð1 p2 Þ ^ if jx2p2 j pza=2 ojZ M;NP 0 m;n ðy Þj; n½F^ 2 ðx^ 2p2 Þ2 (c) the whole real line ,sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ p2 ð1 p2 Þ if jx^ 2p2 j ojZ M;NP 0 m;n ðy Þjpza=2 : n½F^ 2 ðx^ 2p2 Þ2 The conﬁdence region fails to be a bounded interval if the percentile in the denominator is not signiﬁcantly different from 0, or ,sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ p2 ð1 p2 Þ pza=2 : jx^ 2p2 j 0 n½F^ ðx^ 2p Þ2 2

2

&

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

391

5. Example and simulation We consider data (see Table 2) on the modulus of rupture (MOR) of Douglas ﬁr specimens, provided to us by J. Evans, whose percentile estimates were given in Aplin et al. (1986). The normal plots suggest that data are normally distributed and it is reasonable to assume that the variances are equal so k ¼ 1. From the summary statistics m ¼ 107, n ¼ 103,

s21 ¼ 2; 354; 470, s22 ¼ 2; 528; 864,

x ¼ 4840:3, y ¼ 7077:9,

spooled ¼ 1565:8,

xð6Þ ¼ 2350:2 yð6Þ ¼ 4597:8.

we determine 95% conﬁdence intervals for ratio of 5th percentiles (see Johnson and Huang, 2003, for the particular choice of kernel suggested by Silverman, 1986). All of the methods give similar results (see Table 3) because x2;:05 is far from 0. Under the transformations X i ! c1 X i and Y j ! c2 Y j with c1 , c2 40, we have y ! c1 y=c2 and the values of Z m;n ðyÞ and Z M m;n ðyÞ remain unchanged. Consequently, the coverage probabilities for the normal theory procedures remain the same for all c1 , c2 . In the non-parametric setting, our kernel satisﬁes 0 NP M;NP ^0 ^ F^ i ðci x^ ipi Þ ¼ c1 i F i ðxipi Þ, i ¼ 1, 2, and so Z m;n ðyÞ and Z m;n ðyÞ each remain the same. Consequently, the coverage probabilities remain the same for all c1 , c2 40. To illustrate that regions other than intervals are possible and to estimate the actual coverage probabilities, we turn to simulation. In Table 4 we present estimated coverage probabilities for samples sizes m ¼ 100 ¼ n Table 2 Data sets MOR ðlb=in2 Þ of 2 4 in, Grade 2, Green 5418.6 4795.9 7061.8 6617.3 6136.7 7529.2 6455.7 6082.7 5511.1 5231.6 5851.4 4281.8 4230.9 2524.8 4896.3 6351.2 2764.8 4432.9 5128.6 5681.0 4690.8 2327.6 2642.7 4760.3 3340.9 3207.7 5270.6 1420.3 3015.2 4451.7 2444.5 3747.0 3879.0

(30% moisture content) 6307.9 6964.0 6357.9 7643.6 5976.8 4607.6 9213.0 6051.4 4161.0 4818.4 5325.4 5651.6 5894.0 2350.2 3022.1 4187.7 4274.4 3854.5 1931.2 3556.0 4392.0 2105.5

MOR ðlb=in2 Þ of 2 6 in, select structural, Green (30% 9579.3 6475.0 8374.9 9364.2 7896.3 8041.5 9024.7 7614.5 8793.8 6836.1 7774.3 6724.0 8018.8 7078.2 9015.2 8935.7 7271.5 7776.8 7962.0 7806.5 7762.6 5169.9 8267.5 4597.8 6598.4 5253.9 8643.2 7323.3 6835.0 7449.2 5477.9 7288.0 6244.2 6893.2 10100.9 5548.2 6922.1 5120.6 6067.5 4639.9

6674.1 7311.8 4414.5 4050.5 5325.4 3917.7 2822.7 4420.0 2748.2 3396.4 5161.1

8153.4 6997.6 5268.4 5677.2 5818.8 4429.6 5465.8 3095.2 4461.2 6767.2 4658.4

6843.5 4533.1 8145.4 5531.8 4787.2 3938.0 3770.9 5289.3 4892.6 2566.1

7011.3 5691.8 4616.1 4872.4 5988.6 5143.3 3168.1 3440.7 5078.2 1228.4

5817.3 6245.9 3508.1 3677.3 5530.9 4044.5 4994.6 4533.1 5278.7 5883.6

moisture content) 9265.1 10653.4 6720.6 8685.8 7887.9 5045.7 8388.6 7684.5 7418.5 6575.0 7445.3 6533.7 7099.8 6842.3 7803.5 5616.3 5783.9 4606.4 3680.3 5399.4

8051.2 11077.9 6556.9 6800.9 5495.9 3911.9 7853.9 7137.9 5679.9 4917.3

9828.1 9178.9 6574.8 8254.7 7820.9 6768.4 6594.6 8925.9 7984.7 5398.0

7930.1 8375.9 5576.5 6700.2 6122.8 7865.4 6724.0 7182.6 6014.0 4173.1

8304.1 10342.8 7194.2 7859.9 5345.8 9499.1 3981.0 5208.5 6159.1 7418.5

Table 3 Results applying the data sets Non-central t

Zm;n ðyÞ

ZM m;n ðyÞ

ZNP m;n ðyÞ

Z M;NP m;n ðyÞ

ð0:410; 0:583Þ

ð0:405; 0:633Þ

ð0:410; 0:640Þ

ð0:395; 0:628Þ

ð0:400; 0:635Þ

ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392

392

Table 4 Estimated coverage probabilities. N T ¼ 5000 replications Zm;n ðyÞ

ZM m;n ðyÞ

ZNP m;n ðyÞ

Z M;NP ðyÞ m;n

87:58% ð0:4664Þ

94:86% ð0:3123Þ

88:82% ð0:4456Þ

93:18% ð0:3565Þ

and a case where the denominator is close to zero. m1 ¼ 1:93;

s1 ¼ 2;

x1;0:05 ¼ 1:36,

m2 ¼ 1:93;

s2 ¼ 1;

x2;0:05 ¼ 0:285.

The estimated standard deviations are given in parentheses. ZM m;n ðyÞ

Interval

Outside interval

Real line

Overall

Number Coverage number Percentage of coverage

N I ¼ 2329 2147 92.19% (0.3795%) Interval

N C ¼ 2669 2594 97.19% (0.2337%) Outside interval

NR ¼ 2 2 100% (0) Real line

N T ¼ 5000 4743 94.86% (0.3123%) Overall

N I ¼ 1582 1360 85.97% (0.4912%)

N C ¼ 3376 3257 96.48% (0.2606%)

N R ¼ 42 42 100% (0)

N T ¼ 5000 4659 93.18% (0.3565%)

Z M;NP m;n ðyÞ Number Coverage number Percentage of coverage

A comparison of the modiﬁed procedures reveals that the non-parametric approach results in substantially fewer actual intervals. See Huang (2001) for additional simulations and examples. Acknowledgements The authors wish to thank James Evans, United States Forest Products Laboratory, for providing the data for the example. References Aplin, E.N., Green, D.W., Evans, J.W., Barrett, J.D., 1986. The inﬂuence of moisture content on the ﬂexural properties of douglas ﬁr dimension lumber. Research Paper 475, US Department of Agriculture, Forest Service, Forest Products Laboratory, Madison, WI. Evans, J.W., Johnson, R.A. , Green, D.W., Verrill, S.P., 2005. Applications of the Weibull distribution in wood engineering. In: Balakrishnan, N. (Ed.), Weibull Distributions, to appear. Fieller, E.C., 1954. Some problems in interval estimation. J. Roy. Statist. Soc. Ser. B 16, 175–185. Huang, L.-F., 2001. Conﬁdence regions for the ratio of percentiles. Ph.D. Dissertation, Department of Statistics, University of Wisconsin, Madison. Hwang, J.T.G., 1995. Fieller’s problems and resampling techniques. Statist. Sinica 5, 161–171. Johnson, R.A., Huang, L.-F., 2003. Some exact and approximate conﬁdence regions for the ratio of percentiles from two different distributions. In: Lindqvist, B., Doksum, K. (Eds.), Mathematical and Statistical Methods in Reliability. World Scientiﬁc, Singapore, pp. 455–468. Johnson, N., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions, vol. 2, second ed. Wiley-Interscience, New York. McDonald, G.C., 1981. Conﬁdence intervals for vehicle emission deterioration factors. Technometrics 23, 239–242. Ogawa, J., 1983. On the ‘Conﬁdence Bounds’ of the ratio of the means of a bivariate normal distribution. Ann. Inst. Statist. Math. 35, 41–48. Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.

Confidence regions for the ratio of percentiles

Confidence regions for the ratio of percentiles

Recommend Documents