ARTICLE IN PRESS
Statistics & Probability Letters 76 (2006) 384–392 www.elsevier.com/locate/stapro
Confidence regions for the ratio of percentiles Li-Fei Huanga,, Richard A. Johnsonb a
Applied Statistics and Information Science Department, Ming Chuan University, Taipei, Taiwan b Department of Statistics, University of Wisconsin, Madison WI 53706, USA Received 28 November 2004; received in revised form 18 July 2005 Available online 15 September 2005
Abstract Motivated by applications in the lumber industry, we derive confidence regions for the ratio of percentiles from two different populations. Generalizing work on inferences concerning the ratio of two means, we develop an exact confidence procedure when the two populations are normal and have the same variance. Other cases for normal populations are treated by large sample methods. General populations are also treated in a large sample context. The different large sample procedures are compared with a small simulation study. An example, using strength of lumber data, is also given. r 2005 Elsevier B.V. All rights reserved. Keywords: Non-central t distribution; non-parametric estimation; Strength of lumber; Large sample comparisons
1. Introduction In the wood industry, it is common practice to compare two different strength properties for lumber of the same dimension, grade and species or the same strength property for lumber of two different dimensions, grades or species. Engineers often express a comparison in terms of the ratio of two strength properties. For example, the ratio of mean bending strengths. Because United States lumber standards are given in terms of population fifth percentiles, the ratio is often expressed in terms of the fifth percentiles of two strength distributions rather than the means. Aplin et al. (1986) give point estimates of the ratio of dry to green lumber. We develop confidence regions for the ratios of percentiles from two different populations. Both normal population and non-parametric procedures are derived. Most of the existing literature on ratios deals with the ratio of means and, in particular, ratios of means of normal distributions. Fieller (1954) and Ogawa (1983) treat bivariate normal distributions. McDonald (1981) obtains confidence regions for the ratio of means of two independent normal distributions arising in a straightline linear model. Hwang (1995) uses a resampling approach to construct confidence regions for this same ratio. Evans et al. (2005) show how to obtain large sample confidence intervals for the ratio of Weibull percentiles. Although it is common to compare the same percentiles of two different distributions F 1 ðÞ and F 2 ðÞ, initially we allow for different percentiles. Let p1 and p2 be specified, so x1p1 ¼ inffx : F 1 ðxÞXp1 g and x2p2 ¼ inffy : F 2 ðyÞXp2 g Corresponding author.
E-mail address:
[email protected] (L.-F. Huang). 0167-7152/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2005.08.034
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
385
are the first population’s 100p1 th percentile and the second population’s 100p2 th percentile, respectively. The corresponding population ratio of percentiles is y ¼ x1p1 =x2p2 . 2. Confidence regions for the ratio of percentiles—independent normal distributions We obtain exact confidence regions for the ratio of percentiles for two different normal distributions when the ratio of variances is known. To set notation, let X i , i ¼ 1; . . . ; m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y j , j ¼ 1; . . . ; n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the samples be independent. Here, xipi ¼ mi þ F1 ðpi Þsi is the lower 100pi th percentile i ¼ 1, 2 where FðÞ is the standard normal cdf. We also let zpi be the upper 100pi th percentile of the standard normal distribution, so xipi ¼ mi zpi si for i ¼ 1, 2. Our first result establishes that a certain random quantity has a non-central t distribution. Theorem 2.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the random samples be independent. Let y ¼ x1p1 =x2p2 . If the ratio of k ¼ s1 =s2 is known, the random quantity 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi11 1 y2 ðX yY Þ=k ffi þ 2 A rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Tðy; kÞ ¼ @ Pm Pn 2 m nk k ðX i X Þ2 þ ðY j Y Þ2 i¼1 j¼1 mþn2
follows the non-central t distribution with m þ n 2 degrees of freedom and non-centrality parameter 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi11 1 y2 A y @ þ dðy; kÞ ¼ zp1 zp2 . m nk2 k Remark. If the two variances are the same, that is, k ¼ 1, then 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffi11 1 y2 A X yY ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rP þ TðyÞ ¼ @ Pn m m n ðX i X Þ2 þ ðY j Y Þ2 i¼1
j¼1
mþn2
follows the non-central t distribution with m þ n 2 degrees of freedom and non-centrality parameter 0sffiffiffiffiffiffiffiffiffiffiffiffiffiffi11 1 y2 A þ dðyÞ ¼ @ ðzp1 yzp2 Þ. m n Proof. Here k ¼ s1 =s2 and y ¼ ðm1 zp1 s1 Þ=ðm2 zp2 s2 Þ. Under the assumptions, the random quantity ðX yY Þ=ks2 is distributed as Nð0; 1=m þ y2 =nk2 Þ plus the constant m1 =s1 m2 y=s1 ¼ zp1 ðy=kÞzp2 and Pn Pn Pm Pm 2 2 2 2 j¼1 ðY j Y Þ j¼1 ðY j Y Þ i¼1 ðX i X Þ i¼1 ðX i X Þ þ ¼ þ s21 s22 s22 k2 s22 is independently distributed as w2mþn2 ð0Þ. The result follows by the definition of the non-central t as a ratio of these random quantities (see Johnson et al., 1995, pp. 508–518). An exact confidence region for y is generated by considering the test of H 0 : y ¼ y0 versus H 1 : yay0 for each possible y0 and then collecting all y0 for which the null hypothesis is not rejected. We use an equal tail test. For the given k and any specified significance level a, we determine Lðy0 ; kÞ and Uðy0 ; kÞ such that a ¼ P½Tðy0 ; kÞoLðy0 ; kÞ ¼ P½Tðy0 ; kÞ4Uðy0 ; kÞ 2
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
386
Table 1 Some 95% confidence regions for ratio of percentiles of two independent normal populations with equal variances spooled ¼ 2
m¼n
x ¼ 5, y ¼ 10
x ¼ 10, y ¼ 5
Means ðp1 ¼ p2 ¼ 0:5Þ
10 20
ð0:359; 0:659Þ ð0:403; 0:605Þ
ð1:518; 2:786Þ ð1:651; 2:485Þ
5th percentiles ðp1 ¼ p2 ¼ 0:05Þ
10 20
ð0:087; 0:458Þ ð0:055; 0:398Þ
ð1; 11:368Þ [ ð2:183; 1Þ ð2:510; 18:222Þ
and then reject the null hypothesis if Tðy0 ; kÞoLðy0 ; kÞ or Tðy0 ; kÞ4Uðy0 ; kÞ. A 100ð1 aÞ% confidence region is the collection of all y0 such that Lðy0 ; kÞpTðy0 ; kÞpUðy0 ; kÞ. & Table 1 illustrates possible confidence regions for the ratio of percentiles from two normal distributions having equal variances. The region can be a bounded interval, the complement of an interval, or the whole real line. The latter two cases occur when the estimated percentile in the denominator and both percentiles are too close to zero, respectively. Table 1 also shows that the regions for x2p2 =x1p1 can be obtained from those for the ratio x1p1 =x2p2 . 3. Large-sample confidence regions When large samples are available, we can obtain confidence intervals for the ratio of percentiles even when the ratio of variances is unknown. The first result still requires normal populations. Theorem 3.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 ; let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 ; and let the samples be independent. If limm;n!1 m=m þ n ¼ l ð0olo1Þ, and x2p2 a0, pffiffiffiffiffiffiffiffiffiffiffiffi X zp1 S 1 x1p1 mþn Y zp2 S 2 x2p2
!
converges in distribution to the normal with mean 0 and variance ! z2p2 s22 x21p1 s21 1þ þ 1þ . 2 lx22p 2 ð1 lÞx42p 2 2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Pm Pn 2 2 Proof. Since S1 ¼ i¼1 ðX i X Þ =ðm 1Þ and S 2 ¼ j¼1 ðY j Y Þ =ðn 1Þ, from the central limit theorem and a square root transformation on the sample variances, we have the well-known result z2p1
!
02
3 2 31 m1 X B6 7 6 7C pffiffiffiffiffiffiffiffiffiffiffiffiB6 S 1 7 6 s1 7C D 6 7 m þ nB 7C B6 Y 7 6 C! N 4 ð0; RÞ, @4 5 4 m2 5A s2 S2 where R ¼ diag½s21 =l; s21 =2l; s22 =ð1 lÞ; s22 =2ð1 lÞ: To apply the delta method, we set t ¼ ðt1 ; t2 ; t3 ; t4 Þ0 and gðtÞ ¼
t1 zp1 t2 . t3 zp2 t4
(1)
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
387
We conclude that gððX ; S 1 ; Y ; S 2 Þ0 Þ ¼ ðX zp1 S1 Þ=ðY zp2 S2 Þ is asymptotically normal with mean y and covariance matrix ðm þ nÞ1 Rg where 0 qg qg diag½s21 =l; s21 =2l; s22 =ð1 lÞ; s22 =2ð1 lÞ qt qt ! ! 2 2 2 2 2 zp zp s2 x1p1 s1 ¼ 1þ 1 þ 1þ 2 : & 2 2 lx2p 2 ð1 lÞx42p 2 2
Rg ¼
Remark. If s1 ¼ s2 ¼ s, the limiting distribution is x1p1 2 N ;s x2p2
" 1þ
z2p1 2
!
1 þ lx22p2
1þ
z2p2 2
!
x21p1 ð1 lÞx42p2
#! .
In any case, x1p1 can be estimated by x zp1 s1 , x2p2 can be estimated by y zp2 s2 , s21 can be estimated by s21 , and s22 can be estimated by s22 . Therefore, an approximate 100ð1 aÞ% confidence interval for y is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ! ! u 2 2 2 2 2 u 2 þ z 2 þ z x zp1 s1 s2 ðx zp1 s1 Þ s1 p1 p2 za=2 t þ y zp 2 s 2 2m 2n ðy zp2 s2 Þ2 ðy zp2 s2 Þ4 which is always a bounded interval. However, the simulations in Section 5 show poor coverage rates when x2p2 is near zero. When sample sizes are large, a non-parametric approach is possible. Let ½w denote the smallest integer that is greater than or equal to w. We estimate x1p1 by the order statistic x^ 1p1 ¼ X ðr1 Þ where r1 ¼ ½mp1 and estimate x2p2 by the order statistic x^ 2p2 ¼ Y ðr2 Þ where r2 ¼ ½np2 . Theorem 3.2. Let X 1 ; . . . ; X m be a random sample from F 1 ðÞ; let Y 1 ; . . . ; Y n from F 2 ðÞ; and let the samples be independent. If the population density function F 0i ðÞ, is positive and continuous in a neighborhood of xipi , for i ¼ 1; 2, limn;m!1 m=ðn þ mÞ ¼ l ð0olo1Þ, and x2p2 a0, then ! pffiffiffiffiffiffiffiffiffiffiffiffi x^ 1p1 x1p1 mþn x2p2 x^ 2p 2
converges in distribution to the normal distribution with mean 0 and variance x21p1 p2 ð1 p2 Þ p1 ð1 p1 Þ þ . lx22p2 ½F 01 ðx1p1 Þ2 ð1 lÞx42p2 ½F 02 ðx2p2 Þ2 Proof. Since ½x^ 1p1 ; x^ 2p2 0 is asymptotically normal with mean 0 and covariance matrix " # 1 p1 ð1 p1 Þ p2 ð1 p2 Þ diag ; mþn l½F 01 ðx1p1 Þ2 ð1 lÞ½F 02 ðx2p2 Þ2 applying the delta method with gððt1 ; t2 Þ0 Þ ¼ t1 =t2 gives the result. The resulting approximate 100ð1 aÞ% confidence interval for y is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2 u ^x1p x^ 1p1 1 p2 ð1 p2 Þ up1 ð1 p1 Þ 1 , za=2 t 0 þ 0 2 4 x^ 2p2 ½F^ 2 ðx^ 2p2 Þ2 ½F^ 1 ðx^ 1p1 Þ2 mx^ 2p nx^ 2p 2
2
0 where we estimate the value of the density function F 0i ðxipi Þ by the consistent kernel estimates F^ i ðx^ ipi Þ for i ¼ 1; 2 as suggested in Silverman (1986)). &
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
388
4. An alternative large-sample approach For normal populations, under our alternative approach, we consider X zp1 S 1 yðY zp2 S2 Þ and derive large sample confidence regions that may be intervals, complements of intervals, or sometimes even the whole real line. Theorem 4.1. Let X 1 ; . . . ; X m be a random sample from a normal distribution with mean m1 and variance s21 , let Y 1 ; . . . ; Y n be a random sample from a normal distribution with mean m2 and variance s22 , and let the two random samples be independent. Let y ¼ x1p1 =x1p2 be unknown but finite. Under the condition that m limm;n!1 mþn ¼ l ð0olo1Þ, the random quantity pffiffiffiffiffiffiffiffiffiffiffiffi m þ nðX zp1 S1 yðY zp2 S 2 ÞÞ converges in distribution to the normal with mean 0 and variance ! ! z2p1 2 z2p2 2 1 1 2 1þ 1þ s1 þ y s2 . l ð1 lÞ 2 2 Proof. From the normal limit in (1), we conclude that the linear combination pffiffiffiffiffiffiffiffiffiffiffiffi m þ nðX zp1 S1 yðY zp2 S 2 ÞÞ converges in distribution to normal with mean 0 and variance ! ! z2p1 2 z2p2 2 1 1 2 1þ 1þ s1 þ y s2 . l ð1 lÞ 2 2 Under the condition that y ¼ x1p1 =x2p2 , according to Slutsky’s theorem, X zp1 S 1 yðY zp2 S 2 Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ZM ! ! ffi m;n ðyÞ ¼ v u 2 u z2p1 S 21 z S22 p t 1þ þ y2 1 þ 2 2 m 2 n
(2)
is, asymptotically, standard normal. As a function of y, Z M m;n ðyÞ has a single maximum which is, in absolute value, greater than the asymptotes. In particular, Z M ðyÞ satisfies m;n Y zp2 S2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lim Z M ! ; m;n ðyÞ ¼ v u u z2p2 S 22 t 1þ 2 n
y!1
Y zp2 S 2 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lim Z M ! m;n ðyÞ ¼ v u u z2p2 S 22 t 1þ 2 n
y!1
and S2 S2 ð1 þ z2p1 =2Þ 1 ðY zp2 S 2 Þ þ yð1 þ z2p2 =2Þ 2 ðX zp1 S1 Þ d M m n Z ðyÞ ¼ . 2 2 3=2 dy m;n 2 S zp S 1 2 2 2 ð1 þ 21 Þ þ y ð1 þ zp2 =2Þ m n Consequently, the derivative vanishes at a unique value # ! " 2 2 zp2 S 21 z S p 2 y ¼ 1 þ 1 ðY zp2 S 2 Þ ðX zp1 S 1 Þ . 1þ 2 2 m 2 n
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
389
Further, the derivative converges to 0 as y ! 1 and has the same sign as X zp1 S 1 for sufficiently negative y. Moreover, since the maximum occurs at y , with probability one, we have ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u 2 u z S 22 p M 2 : jZ m;n ðy Þj4jY zp2 S 2 j t 1 þ 2 n Consequently, the large sample 100ð1 aÞ% confidence region becomes (a) an interval
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u u z2p2 S 22 t if za=2 ojY zp2 S 2 j ojZ M 1þ m;n ðy Þj; 2 n
(b) the complement of an interval ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u 2 u z S 22 p if jY zp2 S 2 j t 1 þ 2 pza=2 ojZ M m;n ðy Þj; 2 n (c) the whole real line
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u u z2p2 S 22 t if jY zp2 S 2 j ojZ M 1þ m;n ðy Þjpza=2 : 2 n
We see that the region fails to be a bounded interval if the percentile in the denominator is not significantly different from zero, or, ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ,v ! u u z2p2 S22 t jY zp2 S2 j pza=2 1þ 2 n For our alternative non-parametric procedure, we consider x^ 1p1 yx^ 2p2 .
&
Theorem 4.2. Let X 1 ; . . . ; X m be a random sample of from F 1 which has a positive continuous derivative F 01 ðÞ in a neighborhood of x1p1 , let Y 1 ; . . . ; Y n be a random sample from F 2 which has a positive continuous derivative F 02 ðÞ in a neighborhood of x2p2 , and let the two samples be independent. Let y ¼ x1p1 =x1p2 be unknown but finite. Under the condition that limm;n!1 m=ðm þ nÞ ¼ l ð0olo1Þ, pffiffiffiffiffiffiffiffiffiffiffiffi m þ n ðx^ 1p1 yx^ 2p2 Þ converges in distribution to the normal with mean 0 and variance p1 ð1 p1 Þ p2 ð1 p2 Þ þ y2 . l½F 01 ðx1p1 Þ2 ð1 lÞ½F 02 ðx2p2 Þ2 Proof. From the proof of the Theorem 3.2, we conclude that the normal with mean 0 and variance p1 ð1 p1 Þ p2 ð1 p2 Þ þ y2 . 2 0 l½F 1 ðx1p1 Þ ð1 lÞ½F 02 ðx2p2 Þ2
pffiffiffiffiffiffiffiffiffiffiffiffi ^ m þ nðx1p1 yx^ 2p2 Þ converges in distribution to
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
390
Under the conditions of Theorem 4.2, we have that x^ 1p1 yx^ 2p2 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p1 ð1 p1 Þ p ð1 p2 Þ þ y2 2 0 2 0 m½F 1 ðx1p1 Þ n½F 2 ðx2p2 Þ2 is, asymptotically, a standard normal random variable. Introducing the consistent kernel estimates, we consider x^ 1p1 yx^ 2p2 s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ZM;NP ðyÞ ¼ , m;n p1 ð1 p1 Þ 2 p2 ð1 p2 Þ þy 0 0 m½F^ ðx^ 1p Þ2 n½F^ ðx^ 2p Þ2 1
2
1
2
ZM m;n ðyÞ
which has the same structure as in (2). Moreover, since the unique maximum occurs at " #," # p1 ð1 p1 Þ ^ p2 ð1 p2 Þ ^ x2p2 x1p1 , y ¼ 0 0 m½F^ ðx^ 1p Þ2 n½F^ ðx^ 2p Þ2 1
2
2
2
with probability one, we have ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1 p2 Þ jx^ 2p2 j ojZM;NP 0 m;n ðy Þj: n½F^ 2 ðx^ 2p2 Þ2 Also, x^ 2p2 lim Z M;NP m;n ðyÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; y!1 p2 ð1 p2 Þ 0 n½F^ ðx^ 2p Þ2 2
2
x^ 2p2 lim Z M;NP m;n ðyÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi . y!1 p2 ð1 p2 Þ 0 n½F^ ðx^ 2p Þ2 2
2
The 100ð1 aÞ confidence region becomes (a) an interval
,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1 p2 Þ ojZ M;NP if za=2 ojx^ 2p2 j 0 m;n ðy Þj; n½F^ ðx^ 2p Þ2 2
2
(b) the complement of an interval ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1 p2 Þ ^ if jx2p2 j pza=2 ojZ M;NP 0 m;n ðy Þj; n½F^ 2 ðx^ 2p2 Þ2 (c) the whole real line ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1 p2 Þ if jx^ 2p2 j ojZ M;NP 0 m;n ðy Þjpza=2 : n½F^ 2 ðx^ 2p2 Þ2 The confidence region fails to be a bounded interval if the percentile in the denominator is not significantly different from 0, or ,sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p2 ð1 p2 Þ pza=2 : jx^ 2p2 j 0 n½F^ ðx^ 2p Þ2 2
2
&
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
391
5. Example and simulation We consider data (see Table 2) on the modulus of rupture (MOR) of Douglas fir specimens, provided to us by J. Evans, whose percentile estimates were given in Aplin et al. (1986). The normal plots suggest that data are normally distributed and it is reasonable to assume that the variances are equal so k ¼ 1. From the summary statistics m ¼ 107, n ¼ 103,
s21 ¼ 2; 354; 470, s22 ¼ 2; 528; 864,
x ¼ 4840:3, y ¼ 7077:9,
spooled ¼ 1565:8,
xð6Þ ¼ 2350:2 yð6Þ ¼ 4597:8.
we determine 95% confidence intervals for ratio of 5th percentiles (see Johnson and Huang, 2003, for the particular choice of kernel suggested by Silverman, 1986). All of the methods give similar results (see Table 3) because x2;:05 is far from 0. Under the transformations X i ! c1 X i and Y j ! c2 Y j with c1 , c2 40, we have y ! c1 y=c2 and the values of Z m;n ðyÞ and Z M m;n ðyÞ remain unchanged. Consequently, the coverage probabilities for the normal theory procedures remain the same for all c1 , c2 . In the non-parametric setting, our kernel satisfies 0 NP M;NP ^0 ^ F^ i ðci x^ ipi Þ ¼ c1 i F i ðxipi Þ, i ¼ 1, 2, and so Z m;n ðyÞ and Z m;n ðyÞ each remain the same. Consequently, the coverage probabilities remain the same for all c1 , c2 40. To illustrate that regions other than intervals are possible and to estimate the actual coverage probabilities, we turn to simulation. In Table 4 we present estimated coverage probabilities for samples sizes m ¼ 100 ¼ n Table 2 Data sets MOR ðlb=in2 Þ of 2 4 in, Grade 2, Green 5418.6 4795.9 7061.8 6617.3 6136.7 7529.2 6455.7 6082.7 5511.1 5231.6 5851.4 4281.8 4230.9 2524.8 4896.3 6351.2 2764.8 4432.9 5128.6 5681.0 4690.8 2327.6 2642.7 4760.3 3340.9 3207.7 5270.6 1420.3 3015.2 4451.7 2444.5 3747.0 3879.0
(30% moisture content) 6307.9 6964.0 6357.9 7643.6 5976.8 4607.6 9213.0 6051.4 4161.0 4818.4 5325.4 5651.6 5894.0 2350.2 3022.1 4187.7 4274.4 3854.5 1931.2 3556.0 4392.0 2105.5
MOR ðlb=in2 Þ of 2 6 in, select structural, Green (30% 9579.3 6475.0 8374.9 9364.2 7896.3 8041.5 9024.7 7614.5 8793.8 6836.1 7774.3 6724.0 8018.8 7078.2 9015.2 8935.7 7271.5 7776.8 7962.0 7806.5 7762.6 5169.9 8267.5 4597.8 6598.4 5253.9 8643.2 7323.3 6835.0 7449.2 5477.9 7288.0 6244.2 6893.2 10100.9 5548.2 6922.1 5120.6 6067.5 4639.9
6674.1 7311.8 4414.5 4050.5 5325.4 3917.7 2822.7 4420.0 2748.2 3396.4 5161.1
8153.4 6997.6 5268.4 5677.2 5818.8 4429.6 5465.8 3095.2 4461.2 6767.2 4658.4
6843.5 4533.1 8145.4 5531.8 4787.2 3938.0 3770.9 5289.3 4892.6 2566.1
7011.3 5691.8 4616.1 4872.4 5988.6 5143.3 3168.1 3440.7 5078.2 1228.4
5817.3 6245.9 3508.1 3677.3 5530.9 4044.5 4994.6 4533.1 5278.7 5883.6
moisture content) 9265.1 10653.4 6720.6 8685.8 7887.9 5045.7 8388.6 7684.5 7418.5 6575.0 7445.3 6533.7 7099.8 6842.3 7803.5 5616.3 5783.9 4606.4 3680.3 5399.4
8051.2 11077.9 6556.9 6800.9 5495.9 3911.9 7853.9 7137.9 5679.9 4917.3
9828.1 9178.9 6574.8 8254.7 7820.9 6768.4 6594.6 8925.9 7984.7 5398.0
7930.1 8375.9 5576.5 6700.2 6122.8 7865.4 6724.0 7182.6 6014.0 4173.1
8304.1 10342.8 7194.2 7859.9 5345.8 9499.1 3981.0 5208.5 6159.1 7418.5
Table 3 Results applying the data sets Non-central t
Zm;n ðyÞ
ZM m;n ðyÞ
ZNP m;n ðyÞ
Z M;NP m;n ðyÞ
ð0:410; 0:583Þ
ð0:405; 0:633Þ
ð0:410; 0:640Þ
ð0:395; 0:628Þ
ð0:400; 0:635Þ
ARTICLE IN PRESS L.-F. Huang, R.A. Johnson / Statistics & Probability Letters 76 (2006) 384–392
392
Table 4 Estimated coverage probabilities. N T ¼ 5000 replications Zm;n ðyÞ
ZM m;n ðyÞ
ZNP m;n ðyÞ
Z M;NP ðyÞ m;n
87:58% ð0:4664Þ
94:86% ð0:3123Þ
88:82% ð0:4456Þ
93:18% ð0:3565Þ
and a case where the denominator is close to zero. m1 ¼ 1:93;
s1 ¼ 2;
x1;0:05 ¼ 1:36,
m2 ¼ 1:93;
s2 ¼ 1;
x2;0:05 ¼ 0:285.
The estimated standard deviations are given in parentheses. ZM m;n ðyÞ
Interval
Outside interval
Real line
Overall
Number Coverage number Percentage of coverage
N I ¼ 2329 2147 92.19% (0.3795%) Interval
N C ¼ 2669 2594 97.19% (0.2337%) Outside interval
NR ¼ 2 2 100% (0) Real line
N T ¼ 5000 4743 94.86% (0.3123%) Overall
N I ¼ 1582 1360 85.97% (0.4912%)
N C ¼ 3376 3257 96.48% (0.2606%)
N R ¼ 42 42 100% (0)
N T ¼ 5000 4659 93.18% (0.3565%)
Z M;NP m;n ðyÞ Number Coverage number Percentage of coverage
A comparison of the modified procedures reveals that the non-parametric approach results in substantially fewer actual intervals. See Huang (2001) for additional simulations and examples. Acknowledgements The authors wish to thank James Evans, United States Forest Products Laboratory, for providing the data for the example. References Aplin, E.N., Green, D.W., Evans, J.W., Barrett, J.D., 1986. The influence of moisture content on the flexural properties of douglas fir dimension lumber. Research Paper 475, US Department of Agriculture, Forest Service, Forest Products Laboratory, Madison, WI. Evans, J.W., Johnson, R.A. , Green, D.W., Verrill, S.P., 2005. Applications of the Weibull distribution in wood engineering. In: Balakrishnan, N. (Ed.), Weibull Distributions, to appear. Fieller, E.C., 1954. Some problems in interval estimation. J. Roy. Statist. Soc. Ser. B 16, 175–185. Huang, L.-F., 2001. Confidence regions for the ratio of percentiles. Ph.D. Dissertation, Department of Statistics, University of Wisconsin, Madison. Hwang, J.T.G., 1995. Fieller’s problems and resampling techniques. Statist. Sinica 5, 161–171. Johnson, R.A., Huang, L.-F., 2003. Some exact and approximate confidence regions for the ratio of percentiles from two different distributions. In: Lindqvist, B., Doksum, K. (Eds.), Mathematical and Statistical Methods in Reliability. World Scientific, Singapore, pp. 455–468. Johnson, N., Kotz, S., Balakrishnan, N., 1995. Continuous Univariate Distributions, vol. 2, second ed. Wiley-Interscience, New York. McDonald, G.C., 1981. Confidence intervals for vehicle emission deterioration factors. Technometrics 23, 239–242. Ogawa, J., 1983. On the ‘Confidence Bounds’ of the ratio of the means of a bivariate normal distribution. Ann. Inst. Statist. Math. 35, 41–48. Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall, London.