Normalizing unbiased estimating functions

Normalizing unbiased estimating functions

Journal of Statistical Planning and Inference 136 (2006) 689 – 704 www.elsevier.com/locate/jspi Normalizing unbiased estimating functions Jinfang Wan...

170KB Sizes 1 Downloads 60 Views

Journal of Statistical Planning and Inference 136 (2006) 689 – 704 www.elsevier.com/locate/jspi

Normalizing unbiased estimating functions Jinfang Wanga,∗,1 , Nobuhiro Taneichib a Graduate School of Science and Technology, Chiba University 1-33 Yayoi-cho, Inage-ku, Chiba 263-8522,

Japan b Obihiro University of Agriculture and Veterinary Medicine Inada-cho, Obihiro, Hokkaido 080-8555, Japan

Received 9 July 2003; accepted 30 August 2004 Available online 13 October 2004

Abstract We consider a method for setting second-order accurate confidence intervals for a scalar parameter by applying normalizing transformations to unbiased estimating functions. Normalizing a nonlinear estimating function is usually easier than normalizing the estimator defined as the solution to the corresponding estimating equation. This estimator usually has to be obtained by some iterative algorithm. Numerical examples include a canonical Poisson regression and the estimation of the correlation coefficient. Numerical comparisons are made with the asymptotically equivalent method called estimating function bootstrap proposed recently by Hu and Kalbfleisch (Canad. J. Statist. 28 (2000) 449). © 2004 Elsevier B.V. All rights reserved. MSC: 62E20; 62F25; 62F40; 62H20; 62J02 Keywords: Bootstrap; Confidence interval; Edgeworth expansion; Second-order accuracy; Nonlinear regression; Normalizing transformation; z-transformation

1. Introduction Let Y1 , . . . , Yn be independent discrete or continuous random variables having possibly different distributions. Assume that these distributions depend on a common scalar

∗ Corresponding author. Tel.: +81 43 290 3663; fax: +81 43 290 2828.

E-mail address: [email protected] (J. Wang). 1 Partially supported by Grand-in-Aid 15500179 from the Japanese Ministry of Education, Science, Sports and

Culture. 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.08.016

690

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

parameter , inference about which is to be based on the following unbiased additive estimating function: g() =

n 

gi (Yi , ),

(1.1)

i=1

where E {gi (Yi , )} = 0 for any  ∈  and i = 1, . . . , n. The estimating function (1.1) may arise in the maximum likelihood estimation, the quasi-likelihood inference, the weighted ˆ least-squares estimation, etc. The root ˆ to the estimating equation g()=0, assumed unique in this paper (see e.g. Small and Wang (2003) for discussions on estimating equations with multiple roots), defines an estimator of . Under mild conditions (e.g., Crowder, 1986), ˆ is √ n-consistent for the true value of . Only in exceptional cases, there will exist an analytical ˆ Usually, g() ˆ = 0 will have to be solved using some iterative methods, such formula for . as the iteratively reweighted least squares method (Green, 1984) for generalized linear regressions. In this paper, we consider a simple analytical method for obtaining an approximate confidence distribution for  based on g() of (1.1). The associated confidence intervals Iˆ = (−∞, w ) are shown to be second-order accurate, meaning that Pr { w } = 1 −  + o(n−1/2 ), where 1 −  is the nominal coverage level and n the sample size. The confidence limit w is defined by the solution to s (w ) = z ,

((z ) = ),

where (·) is the CDF of the unit normal variate N(0, 1), and s () the normalized estimating function defined by (2.8). There is a large literature on setting accurate confidence intervals. Most methods however treat as the primal an estimator, which is assumed to be available in closed form. The nonparametric plug-in estimators are such examples. Given such an estimator, and other conditions on moments, second-order accurate intervals can be obtained by using Cornish–Fisher expansions (Barndorff–Nielsen and Cox, 1989), or asymptotically equivalent methods, such as inverse Edgeworth expansions (e.g., Hall, 1983) or theories of normalizing transformations (e.g., Fisher, 1921; Konishi, 1981, 1991). Alternatively, computationally more intensive methods, such as the BCa bootstrap (Efron, 1987), also lead to second-order accurate intervals. When an estimator is available in closed form, the bootstrap is much to recommend because good approximation to the bootstrap distribution of the estimator can be computed with relative ease. When an estimator is defined in an iterative manner, as is the case with (1.1), the above methods are either difficult to apply or computationally too costly. A different strategy is to treat as the basic element the given estimating function g() rather than the estimator ˆ ˆ = 0. The idea is to associate a confidence distribution with the parameter defined by g() of interest using the properties of an unbiased estimating function, an idea which may be traced back to the work of Wilks (1938), who proved, among other things, the asymptotic optimality of the score function in terms of the expected length of confidence intervals. The

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

691

confidence intervals considered in this paper have the same accuracy as the computationally more intensive method based on the estimating function bootstrap (EF bootstrap) proposed recently by Hu and Kalbfleisch (2000), a brief review of which will be given in Section 3.1. The rest of the note is organized as follows. In Section 2 we introduce a class of normalizing transformations for unbiased estimating functions. In Section 3 we report numerical results in two cases: a canonical Poisson regression problem and the estimation of the correlation coefficient. Our results are compared with those based on the EF bootstrap and Fisher’s z-transformation for Pearson’s product moment correlation coefficient. In Section 3 we also give a short analysis of the law school data of Efron (1987). Lastly, Section 4 contains a short discussion on remaining problems which require further study. 2. Normalizing unbiased estimating functions 2.1. Normal theory interval Let  ∈  be a scalar parameter and g() an unbiased additive estimating function defined by (1.1). Assume that the second and third moment of g() exist. Let 2 ()=n−1 E {g 2 ()} and 3 () = n−1 E {g 3 ()}, both quantities being assumed to depend only on . Using the unbiasedness and independence of gi (Yi , ), we can write 2 () = n−1

n  i=1

  E gi2 (Yi , ) ,

3 () = n−1

n  i=1

  E gi3 (Yi , ) .

(2.2)

Note that n2 () is simply the Fisher information when g() is a score function. If gi have the same distribution then 2 () = E {g12 (Y1 , )} and 3 () = E {g13 (Y1 , )}. In the following discussion we shall assume that both 2 () and 3 () are of O(1). In Section 4 we discuss a method when both 2 () and 3 () may depend on nuisance parameters. The central limit theorem, with mild conditions similar to (a)–(c) of Wilks (1938, p. 169), applies to g  d T () → N(0, 1), where T () = g()/ n2 (). This fact motivates a method for constructing a confidence interval for . Suppose that T () is monotonically decreasing in , and u solves T (u ) = z , then IN = (−∞, u ) defines a (1 − )-level confidence interval for . To see how accurate IN is, we first note that the speed of convergence of T () to N(0, 1) is typically O(n−1/2 ). To appreciate this, we can formally invert the characteristic function of T () 1 2 1 2 1 T (t) = e− 2 t + √ 3 (){2 ()}−3/2 (it)3 e− 2 t + o(n−1/2 ) 6 n

to obtain the following formal Edgeworth expansion for the CDF of T (): 1 Pr {T ()t} = (t) + √ 3 (){2 ()}−3/2 (1 − t 2 )(t) + o(n−1/2 ), 6 n

(2.3)

692

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

where (t) = d(t)/dt is the density function of N(0, 1). The coefficient n−1/2 3 () {2 ()}−3/2 in the second term of (2.3) is the skewness of T (). By (2.3) we see that using (t) to approximate Pr{T () t} will be in error by O(n−1/2 ) unless 3 () = 0. However, (2.3) implies that Pr{ u } = 1 − Pr{T ()z } = 1 −  + O(n−1/2 ). That is, the normal theory interval IN = (−∞, u ) is only first-order accurate unless 3 () = 0. 2.2. Normalizing transformations The above discussion on normal theory intervals also suggests that to improve accuracy of IN , we may first seek a function of g() so that the CDF of which can be approximated by (t) with error o(n−1/2 ). Now consider a real-valued monotonic function f (x) defined at a neighborhood of the origin. Let f (x) be three times continuously differentiable at x = 0. Denote by f (0), f (0) the first two derivatives of f (x) at x = 0. Assume that f (0)  = 0. Expanding f (g()/n) at the origin and evaluating the first two moments yields that

E {f (n−1 g())} = f (0) + (2n)−1 f (0) 2 () + o(n−1 ), var {f (n−1 g())} = n−1 {f (0)}2 2 () + o(n−1 ).

Let bn () = f (0) + (2n)−1 f (0) 2 (), and define the following bias-corrected and standardized quantity: √ n (2.4) h() = √ {f (n−1 g()) − bn ()}. f (0) 2 () It is easily seen that h() also has N(0, 1) as its asymptotic limit. To improve upon normal approximation, we evaluate, up to the order n−1/2 , the first three cumulants of h() as follows: E {h()} = o(n−1 ),

var {h()} = 1 + o(n−1/2 ), cum {h()} = n−1/2 {2 ()}−3/2



 f (0) 3 () + 3 {2 ()}2 + o(n−1/2 ). f (0)

Using these expressions, we invert the characteristic function of h(), obtaining the following Edgeworth expansion for the CDF of h():   {2 ()}−3/2 f (0) 2 {2 ()} Pr {h() t} = (t) + 3 () + 3 √ f (0) 6 n × (1 − t 2 ) (t) + o(n−1/2 ).

(2.5)

Similar to (2.3), It is seen that normal approximation to the distribution of h() is only accurate to O(n−1/2 ) in general. However, since f (·) in (2.5) is at our choice, we may

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

693

therefore reduce the error by choosing f (·) to satisfy the following condition:  2 3 () f (0) + 3 2 () f (0) = 0.

(2.6)

We shall call the key equation (2.6) the normality identity because for any f satisfying (2.6) normal approximation to the CDF of h() in (2.5) will be in error by only o(n−1/2 ). Now let = −{2 ()}−2 3 ()/3. Consider the following monotonic functions of x: f (x) = ( + )−1 (x + 1)1+ / ,

(2.7)

where   = 0 is a constant. It can be easily verified that f satisfy the normality identity (2.6). It is also clear that any location and scale transformation of f , af  (x) + b, also satisfy the normality identity and that the standardized quantity (2.4) is invariant under such transformations. Consequently, if we use (2.7), or any linear transformation of it, to define (2.4), then we obtain the following transformed estimating functions:

√ n 1  g() 1+ / s () = − 1 − √ 2 (), 1+ √ n 2 n ( + ) 2 ()

(2.8)

which will satisfy the property that Pr{s ()t} = (t) + o(n−1/2 ).

(2.9)

The functions f (x) given by (2.7) will be called normalizing transformations because of (2.9). Functions (2.7) are closely related to the normalizing transformations proposed by Taneichi et al. (2002), who considered a class of power transformations for a given estimator satisfying certain moment conditions. It is possible to extend the definition of s () to include  = 0. Letting  → 0 in (2.7), we obtain f0 (x) ≡ lim f (x) = →0

1 x e ,

(2.10)

a function also satisfying the normality identity (2.6). Consequently, quantity (2.4) based on f0 (x), √ n 1  s0 () = √ {exp( n−1 g()) − 1} − √ 2 () 2 n 2 ()

(2.11)

will also satisfy (2.9) if we replace  by  = 0. The exponential function (2.10) is therefore also a normalizing transformation. Function (2.10) appears to be first derived by Konishi (1991) in considering the problem of setting confidence intervals based on a plug-in estimator for a functional parameter; see Konishi (1991, Theorem 2). The class of normalizing transformations (2.7) therefore unifies the power and exponential form transformations. Summarizing, we have the following result.

694

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

Theorem 2.1. Let g() be an unbiased additive estimating function given by (1.1). Let  be a real number, which is functionally independent of . Then we have √  

n 1  g() 1+ / Pr − 1 − √ 2 () t ) (1 +  √ n 2 n ( + ) 2 () = (t) + o(n−1/2 ),

(2.12)

where = −{2 ()}−2 3 ()/3 and 2 () and 3 () are given by (2.2). If, in addition, s () is monotonically decreasing in , and w; solves s (w; ) = z , then   Pr w; = 1 −  + o(n−1/2 ) .

(2.13)

That is, the intervals I = (−∞, w; ) are second-order accurate for any fixed . Property (2.13) is a direct consequence of (2.12) by virtue of monotonicity of s () in . To see the effect of  on I = (−∞, w; ), we note that the fourth cumulant of s () can be written as 4; = E {s () − E (s ())}4 − 322; = O(n−1 ), where 2; = O(1) is the variance of s (). It follows that the kurtosis of s () is equal to (322; + 4; )/22; = 3 + O(n−1 ), the leading term being the kurtosis of N(0, 1). This fact suggests that coverage properties of I will be insensitive to the value of ; see next section for numerical comparisons. Nevertheless, s0 () has an important advantage over s () for   = 0: s0 () is invariant under reparameterization. To see this, suppose that =( ) is a smooth monotone increasing function. Let a( ) =  ( ). If we use ˜ to denote the corresponding quantities in -scale, then we see that g( ) ˜ = a( ) g(( )) is the unbiased additive estimating function for with ˜ 2 ( ) = a 2 ( ) 2 (( )), ˜ 3 ( ) = a 3 ( ) 3 (( )) and ˜ = /a( ). Using these quantities we then immediately have s˜0 ( ) = s0 ().

3. Some examples Now we report some numerical results and compare our results with those using the EF bootstrap and other standard methods. The EF bootstrap is invariant under reparameterization and accurate to the second-order as well. Now we give a brief review of this method.

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

695

3.1. The estimating function bootstrap Let estimating function g() be defined as in (1.1). The studentized quantity  Tg () = g()/ V (),

where V () =

n 

{gi (yi , ) − g()/n}2

i=1

will have N(0, 1) as its asymptotic limit as n → ∞. Observing data y1 , . . . , yn , let zi = ˆ where ˆ solves g() ˆ = 0. Instead of using normal approximation, the EF bootstrap gi (yi , ), approximates the distribution of Tg () by the bootstrap distribution of   n n    (z∗ − z¯ ∗ )2 , ˆ = Tg∗ () (3.14) zi∗ i i=1

i=1

 where z¯ ∗ = n−1 i zi∗ and z1∗ , . . . , zn∗ are bootstrap samples independently drawn with replacement from z1 , . . . , zn . Variants of this method can be obtained if we choose a different type of studentization. For example, we may wish to consider T () = {n2 ()}−1/2 g() if 2 () is available. ˆ and Tg () is monotonically If zˆ  is the 100th percentile of the distribution of Tg∗ () ∗ ∗ decreasing in , then I = (−∞, u ) defines a (1 − )-level EF bootstrap interval, where u∗ solves Tg (u∗ ) = zˆ  . The quantity zˆ  , a value usually have to be found using a Monte Carlo approximation, is intended to correct the normal quantile z . Hu and Kalbfleisch (2000) applied this method to a number of examples and showed that the EF bootstrap is not only less computationally intensive but also usually more accurate than the traditional bootstrap methods which set confidence intervals based on bootstrap distributions of an estimator. 3.2. Nonlinear regression We consider a simple log linear model by assuming that E(Yi ) = i = exp( xi ), i = 1, . . . , n, where xi are known scalar covariates. This example is also considered by Hu and Kalbfleisch (2000), who further assumed that the Y ’s have symmetrical distributions. Here we shall consider an asymmetric case where the Y ’s are independent Poisson variables. Observing y1 , . . . , yn , we obtain the score function for  g() =

n 

xi (yi − i ),

[ i = exp( xi )]

(3.15)

i=1

which is an unbiased additive estimating function. Quantities (2.2) now can be written as 2 () = n−1

n  i=1

xi2 i ,

3 () = n−1

n  i=1

xi3 i .

(3.16)

To see the effect of normalizing transformations, we let n = 20, the true value of  equal to −0.7, and xi = log(2i). Such applications may arise when yi represent the number of events occurring in time interval Ti and xi the (logarithm of) duration of Ti (see, e.g. Lindsey,

696

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704 2 y=x

1

1 0

0 score

-1 -2

Normal Quantile

Normal Quantile

2

x=y 0.5 20.4 40.3 60.2 80.1 100.

1 0 -1 -2

-2

-1

0 Quantile

1

2

-2

-1

0 Quantile

1

2

Fig. 1. Normal Q–Q plots of the normalized Poisson scores for n = 20. Left: s1 (), s0 () and the studentized score; right: s () for  = 0.5(19.9)100. The values in the legends show the values of  and the 45◦ lines indicate exact normality.

1996, Chapter 9). In such cases, the interest is usually the pattern of how the number of evens decay as time elapses. Based on 5000 pseudo samples, Fig. 1 (left) compares the normal Q–Q plots of the studentized Poisson score T () = {n2 ()}−1/2 g() with those of the normalized scores s () by applying (2.8) to (3.15) and (3.16) for  = 1, 0 . The second-order accuracy of both s1 () and s0 (), their distributions being very close to each other, can be clearly seen from this plot. From Fig. 1 (right), which shows the normal Q–Q plots of s () for various values of , we see that the distributions of s () are almost unaffected by the choice of . Table 1 shows the coverage errors, multiplied by 100, of 95%-level left-sided intervals based on three methods: the studentized score T () (score), the EF bootstrap (EF-boot), and the normalized score s0 (normalized) with the value of  set to zero. The errors were estimated in each case, except the bootstrap, using 10,000 repeatedly drawn pseudo random numbers, the standard errors, multiplied by 1000, being shown in the parentheses. The EF bootstrap intervals were based on T (), which, for this example, gave much better results than results based on the semiparametric version (3.14). Coverage errors for the bootstrap intervals were computed using 2000 randomly drawn samples, each being bootstrapped 2000 times. The third column of Table 1 display coverage errors below the nominal error rate 5%, showing that the first-order normal theory intervals based on T () tend to be conservative. The error rates slowly approach the nominal level as  and n get large. In contrast, the second-order accurate intervals based on the normalized score s0 (), the fifth column of Table 1, have coverage errors much closer to the nominal error level almost for all  and n. The EF bootstrap, the fourth column, is seen to do a better job than the first-order normal theory. For small sample sizes, however, the intervals using the normalized score appear to have better coverage property than ones based on the EF bootstrap. 3.3. The law school data Efron (1987, p. 178) analyzed a data set consisting of the indexes (LSAT, GPA) from 15 American law schools. The interest was to construct a confidence interval for the correlation coefficient between LSAT and GPA, the sample product moment correlation coefficient

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

697

Table 1 Coverage errors (×102 ) of 95%-level left-sided intervals based on the studentized score (score), the EF bootstrap (EF-boot), and the normalized score s0 (normalized) for a Poisson regression model



n

Score

EF-boot

Normalized

−0.6

10 20 30 40 50

0.00 ( — ) 2.45 (1.55) 2.60 (1.59) 3.15 (1.75) 3.37 (1.80)

1.11 (2.54) 5.56 (5.27) 5.87 (5.33) 5.17 (5.01) 4.63 (4.74)

5.11 (2.20) 4.00 (1.96) 4.56 (2.09) 4.37 (2.04) 4.77 (2.13)

−0.4

10 20 30 40 50

2.36 (1.52) 3.35 (1.80) 3.51 (1.84) 3.75 (1.90) 2.97 (1.70)

4.15 (4.58) 5.41 (5.09) 5.39 (5.07) 4.91 (4.84) 4.05 (4.41)

4.13 (1.99) 4.68 (2.11) 4.41 (2.05) 4.84 (2.15) 4.31 (2.03)

−0.2

10 20 30 40 50

3.30 (1.79) 4.06 (1.97) 3.23 (1.77) 3.53 (1.85) 3.88 (1.93)

5.41 (5.11) 5.91 (5.27) 4.80 (4.78) 5.55 (5.12) 4.30 (4.54)

4.64 (2.10) 5.09 (2.20) 3.70 (1.89) 3.85 (1.92) 4.20 (2.01)

−0.0

10 20 30 40 50

3.89 (1.93) 3.29 (1.78) 3.87 (1.93) 4.41 (2.05) 4.10 (1.98)

7.06 (5.74) 5.80 (5.23) 5.50 (5.10) 4.45 (4.61) 3.45 (4.08)

4.95 (2.17) 3.70 (1.89) 4.08 (1.98) 4.60 (2.09) 4.21 (2.01)

The errors, except the bootstrap, were estimated using 10,000 repeatedly drawn samples, numbers in parentheses showing the standard errors (×103 ). The bootstrap errors were computed using 2000 randomly drawn samples, each being bootstrapped 2000 times.

r being 0.776. Efron’s 90% BCa interval, which is second-order accurate, was shown to be close to the corresponding interval based on Fisher’s z-transformation. Using this data set, Table 2 compares the central 90% and 95% intervals for using the normalized transformation with  = 0 (normalized), the studentized score (score), BCa , Fisher’s z-transformation (Fisher I), z-transformation with bias correction (Fisher II) and the EF bootstrap (EF-boot). For the first two methods we assumed that (LSAT, GPA), after standardization, follows the standard bivariate normal distribution, the normal score function being also used in the EF bootstrap. In next subsection we give some details on the bivariate mormal model, where Fisher’s z-transformation with bias correction is also described. From Table 2 we see that the intervals based on the normalized score are very close to those using the bias-corrected z-transformation. The lower limits from the EF bootstrap are outside the permissible range for due to the heavy right tail of the bootstrap distribution of (3.14).

698

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

Table 2 Confidence intervals for the correlation coefficient for the law school data of Efron (1987, p. 178) Method

Lower

Upper

(Length/ R L)

Lower

Upper

(Length/ R L)

Normalized Score BCa Fisher I Fisher II EF-boot

0.541 0.484 0.419 0.513 0.533 −1.398

90% 0.876 0.869 0.926 0.906 0.892 0.929

(0.335/0.35) (0.385/0.26) (0.507/0.42) (0.393/0.49) (0.359/0.47) —

0.446 0.343 0.328 0.443 0.473 −1.903

95% 0.889 0.878 0.942 0.921 0.907 0.955

(0.443/0.29) (0.535/0.20) (0.614/0.37) (0.478/0.43) (0.434/0.43) —

The asymmetry index R/L is the ratio of the right side of an interval, measured from the MLE (0.789) or the sample correlation coefficient (0.776) whichever applicable, to the left side.

3.4. The correlation coefficient To investigate more systematically the estimation of the correlation coefficient, we now let (Xi , Yi ), i =1, . . . , n, be independent bivariate normal variables with E(Xi )=E(Yi )=0 and var(Xi ) = var(Yi ) = 1, the correlation coefficient being the parameter required for estimation. Let   qi (yi , ) = (1 − 2 )−2 (1 + 2 )xi yi − (xi2 + yi2 ) + ( − 3 ) .  The score function q( )= ni=1 qi (yi , ) is an unbiased additive estimating function, which is approximately normally distributed with mean zero. A little bit algebra gives the second and third cumulants of q( ) as n2 ( ) = n(1 − 2 )−2 ( 2 + 1),

n3 ( ) = n ( 2 − 1)−3 (2 2 + 6).

(3.17)

From (3.17) we see that q( ) will be highly skewed for | | ≈ 1. Applying (2.8) to (3.17), we then obtain s ( ) and the second-order normal approximation (2.12) for s ( ). We note in passing that one might use Cardano’s formula (see, e.g. Small and Wang, 2003, p. 162) to find an analytic solution for the consistent root ˆ to the likelihood equation q( ) = 0. It is however a tedious task to evaluate the corresponding cumulants for . ˆ To investigate finite sample properties of the second-order accurate intervals I = (−∞, w; ), (2.13), we compared the results using I with those based on Fisher’s z-transformation, z(r) = 0.5 log((1 + r)/(1 − r)), where r is the sample product moment correlation coefficient. It is known that z(r), after a bias correction, also has a second-order normal approximation (Konishi, 1981), that is √ Pr[Z x] = (x) + O(n−1 ), where Z = n{z(r) − z( ) − /(2n)}. (3.18) Fig. 2 compares the normal Q–Q plots using s0 ( ) and Z for n = 11, 21. The distributions of s ( ) were found to be insensitive to the choice of . In each case pseudo random samples were drawn 5000 times from the standardized bivariate normal population with = −0.9. From Fig. 2 we see that while the distributions of the normalized score s0 ( ) are approximated by N(0, 1) nicely for n as small as 11, the left tails of Z are seen to be lighter

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704 n = 11

n = 21 2

exact norm s_0 Fisher’s z

1

Normal Quantile

Normal Quantile

2

0 -1 -2

699

-2

-1

0 -1 -2

0 Quantile

1

exact norm s_0 Fisher’s z

1

2

-2

-1

0 Quantile

1

2

Fig. 2. Normal Q–Q plots of the normalized score s0 ( ) and the bias-corrected z-transformation. For n = 11, 21, pseudo random samples were drawn 5000 times from the standardized bivariate normal population with = −0.9. The 45◦ line indicates perfect normality.

Table 3 Coverage errors (×102 ) of 95%-level intervals based on Fisher’s z-transformation (Fisher I), z-transformation with bias correction (Fisher II), the studentized score (score), the normalized score s0 (normalized), and the EF bootstrap (EF-boot) for the normal correlation coefficient n



Fisher I

Fisher II

Score

Normalized

EF-boot

−0.9

2.4215 (1.57) 7.3889 (2.89) 2.8709 (1.73) 7.0849 (2.80) 3.1089 (1.80) 6.6622 (2.63) 3.9133 (1.99) 5.9578 (2.46) 4.4989 (2.13) 4.8593 (2.22) 5.0599 (2.26) 4.1693 (2.06) 5.4588 (2.35) 3.8031 (1.97) 6.5674 (2.63) 3.3108 (1.85) 7.8157 (2.94) 2.5971 (1.65) 7.5034 (2.91) 2.9640 (1.74)

5.4900 (2.33) 7.3889 (2.89) 5.5490 (2.37) 8.8741 (3.11) 5.7783 (2.42) 8.7790 (2.99) 6.5045 (2.54) 8.1017 (2.83) 6.6005 (2.55) 7.1824 (2.67) 7.1709 (2.66) 6.5291 (2.54) 7.6295 (2.75) 6.5197 (2.55) 8.7226 (3.00) 5.7186 (2.41) 9.5392 (3.21) 5.4411 (2.35) 7.5034 (2.91) 6.0013 (2.43)

4.9578 (2.22) 1.9111 (1.51) 6.0418 (2.47) 2.1947 (1.60) 6.7003 (2.59) 2.6849 (1.71) 6.3247 (2.50) 3.6630 (1.95) 5.4810 (2.34) 4.6569 (2.18) 4.5508 (2.15) 5.1958 (2.28) 3.4279 (1.89) 6.0829 (2.47) 2.5954 (1.69) 6.4388 (2.55) 2.0826 (1.56) 6.0743 (2.47) 1.9064 (1.51) 5.1005 (2.25)

3.7992 (1.95) 4.7109 (2.34) 3.9100 (2.01) 4.7114 (2.31) 4.6741 (2.19) 4.5009 (2.19) 5.0344 (2.25) 5.1067 (2.29) 5.1748 (2.28) 5.2536 (2.30) 5.1342 (2.27) 4.8571 (2.21) 4.7389 (2.20) 4.9217 (2.23) 4.9425 (2.30) 4.4824 (2.15) 4.6559 (2.31) 4.0996 (2.05) 4.7415 (2.35) 3.9799 (2.00)

2.5726 (1.59) 11.679 (3.70) 1.9163 (1.39) 11.654 (4.61) 1.0277 (1.03) 5.0241 (2.77) 1.0046 (1.02) 2.8220 (1.88) 1.0624 (1.06) 1.4216 (1.26) 1.4886 (1.29) 1.1093 (1.08) 2.3849 (1.72) 0.9911 (1.01) 5.4742 (2.87) 0.9864 (1.01) 11.450 (4.57) 1.7657 (1.33) 11.670 (3.72) 2.4344 (1.55)

−0.7 −0.5 −0.3 11

−0.1 0.1 0.3 0.5 0.7 0.9

Upper (lower) rows: right- (left-) sided intervals; numbers in parentheses: standard errors (×103 ) estimated using 10,000 simulated samples.

than that of N(0, 1), a phenomenon continuing for sample size as large as 50 (not shown here). Tables 3 and 4 give simulated coverage errors, multiplied by 100, of 95%-level one-sided intervals based on five methods: Fisher’s z-transformation (Fisher I), z-transformation with

700

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

Table 4 Confidence intervals for the normal correlation coefficient, continued from Table 3 for n = 20 n



Fisher I

Fisher II

Score

Normalized

EF-boot

−0.9

3.4310 (1.82) 6.7107 (2.58) 3.5541 (1.86) 6.0450 (2.44) 3.9996 (1.97) 6.1494 (2.45) 4.6824 (2.12) 5.9168 (2.39) 4.7740 (2.15) 5.1316 (2.23) 5.5697 (2.31) 4.7904 (2.15) 5.4837 (2.31) 4.3258 (2.05) 6.4651 (2.51) 4.0126 (1.97) 6.3066 (2.49) 3.5725 (1.86) 6.1148 (2.47) 3.4023 (1.82)

3.4310 (1.82) 6.7107 (2.58) 5.5937 (2.31) 6.0450 (2.44) 5.7166 (2.33) 7.0739 (2.61) 6.1994 (2.43) 6.9132 (2.57) 5.9700 (2.39) 6.4069 (2.47) 6.8425 (2.55) 6.1677 (2.42) 6.6133 (2.52) 5.8251 (2.36) 7.2263 (2.64) 5.7105 (2.33) 6.3066 (2.49) 5.1267 (2.22) 6.1148 (2.47) 3.4023 (1.82)

4.3038 (2.03) 3.1104 (1.79) 6.1995 (2.42) 2.6072 (1.63) 6.1105 (2.41) 3.0124 (1.74) 5.9365 (2.38) 4.0678 (2.00) 5.2605 (2.25) 4.4991 (2.09) 4.7551 (2.15) 5.3575 (2.27) 3.7379 (1.92) 5.8049 (2.35) 3.4411 (1.86) 6.1350 (2.41) 2.6532 (1.65) 5.7524 (2.34) 2.6100 (1.65) 4.6467 (2.11)

3.3206 (1.79) 5.2194 (2.30) 4.3720 (2.05) 4.4050 (2.10) 4.4339 (2.07) 4.4874 (2.11) 5.0162 (2.20) 5.0334 (2.22) 5.1186 (2.22) 4.9684 (2.19) 5.2846 (2.26) 5.1853 (2.23) 4.6930 (2.14) 4.8627 (2.16) 5.0261 (2.23) 4.6897 (2.13) 4.6010 (2.15) 4.0468 (1.98) 4.8791 (2.22) 3.6331 (1.87)

3.4500 (1.83) 9.0277 (2.92) 2.8390 (1.67) 10.347 (3.64) 2.2312 (1.48) 7.9177 (3.15) 1.3522 (1.16) 4.0342 (2.10) 1.3835 (1.18) 2.4743 (1.58) 1.9921 (1.43) 1.2836 (1.14) 4.3548 (2.18) 1.2427 (1.11) 7.2120 (3.02) 1.9879 (1.40) 10.708 (3.69) 3.2615 (1.78) 8.5537 (2.85) 3.2203 (1.77)

−0.7 −0.5 −0.3 20

−0.1 0.1 0.3 0.5 0.7 0.9

bias correction (3.18) (Fisher II), the studentized score (score), the normalized score s0 (normalized), and the EF bootstrap (EF-boot). The intervals using z-transformation are based on the fact that z(r) is approximately distributed as N(z( ), 2z ), 2z =(n−1)−1 +2(n−1)−2 (Stuart and Ord, 1994, p. 568). For the normalized score we let  = 0, other values of  leading to very similar results. The EF bootstrap intervals were computed using 1000 bootstrap replications for each sample. In each case the upper row gives the error estimate for the right-sided interval and the lower row shows the error estimate for the left-sided interval. The error estimates in each case were computed using 10,000 repeatedly drawn samples, the standard errors (multiplied by 1000) being shown in the parentheses. For n = 11 (Table 3), coverage errors based on z-transformation, the third column, have large discrepancies with the nominal error level 5%, especially when | | is large. For negative , while the right-sided intervals tend to be too conservative, the left-sided ones show systematic larger error rates. The situation is exactly reversed for > 0. Bias correction, the fourth column, does not seem to help much here. However, looking at the sixth column based on the normalized score s0 ( ), we see that intervals (2.13) are not only more symmetrical in terms of coverage error but also close to the nominal error rate. These intervals correct their counterparts based on the studentized score, the fifth column, in an expected manner. From Table 4 we see that the above observations continue to hold for n = 20, but to a lesser extent due to the increased sample size. The EF bootstrap intervals display very similar properties to those based on z-transformation. The EF bootstrap intervals seem to “over-correct” the normal-theory intervals (fifth column), especially when | | is large.

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

701

4. Discussions In this note we gave an analytical formula, s (), (2.8), for associating a confidence distribution with a scalar parameter of interest  based on an unbiased estimating function. The resulting confidence intervals are second-order accurate. Our method is asymptotically equivalent to the estimating function bootstrap but avoids the sometimes heavy computation at the price of requiring the specification of the second and third cumulants of the estimating function. The evaluation of these quantities, unlike in traditional estimator-oriented asymptotic theories, is usually an easy task using the corresponding cumulants of the underlying random variables. The theory of Section 2 can be extended to estimating functions which are only asymptotically unbiased. Such estimating functions arise, for instance, with profiled estimating functions by eliminating “nuisance” parameters. There are several limitations to the present approach. First, it is not clear how to extend the theories discussed here to multi-parameter cases. In the latter case the estimating function bootstrap may be considered for use. Second, the normalized estimating functions may not be globally monotonic in the parameter space, as is the case for our correlation coefficient example. This difficulty is present with all methods which use nonlinear estimating functions as the primal for inference. However, we note that for the purpose of setting a confidence interval with a fixed level 1 − , s () failing to be monotonic near the boundary of the parameter space, a case experienced in many examples, may cause no practical difficulty. Another obvious limitation with the present approach is the assumption that both 2 () and 3 () of (2.2) depend only on the parameter of interest . In general both quantities may depend on a vector nuisance parameter. When this is the case the EF bootstrap is recommended for setting accurate confidence intervals. The theory of EF boot1/2 (y, ), where s(y, ) = strap ismotivated by an Edgeworth expansion of st = s(y, )/v  −1/2 −1 −1 n gi (yi , ), g(y, gi (yi , ) and v(y, ) = n [gi (yi , ) − g(y, ¯ ) = n ¯ )]2 . If we let v 2 = lim sup n−1 n→∞



E [gi (Yi , )]2 ,

3 = lim sup n−1



n→∞

E [gi (Yi , )]3 ,

then Hu and Kalbfleisch (2000) show, under certain regularities, that 1 Pr(st x) = (x) + √ 3 v −3 (2x 2 + 1)(x) + O(n−1 ). 6 n

(4.19)

Formula (4.19) motivates another nonresampling based method. Hall (1992) show that if a quantity S admits an Edgeworth expansion of the form √ Pr( nS x) = (x) + n−1/2 (ax 2 + b)(x) + O(n−1 ), where a, b are known constants and  an unknown parameter, then normal approximation to the following transformed statistic: T (S) = S + aˆS 2 + a 2 ˆ 2 S 3 /3 + bˆ/n

(4.20)

702

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

is second-order accurate, that is, √ Pr( nT (S)x) = (x) + O(n−1 ). The monotonic transformation (4.20) is a modification to an earlier transformation considered by Johnson (1979). Applying Hall’s formula (4.20) to (4.19) we have the following result.  ˆ which are assumed ˆ ˆ = n−1 n g 3 (Yi , ), Theorem 4.1. Let vˆ 2 = n−1 ni=1 gi2 (Yi , ), 3 i=1 i √ 2 n-consistent for estimating v and 3 , respectively. Let 1 2 −3/2 3 1 st ˆ vˆ W = st + √ ˆ 3 vˆ −3/2 (2st2 + 1) + 27n 3 6 n

(4.21)

then we have Pr{W x} = (x) + O(n−1 ).

(4.22)

To conclude this section we summarize the results of a small simulation study for estimating the common mean in stratified samples. This example was also considered by Hu and Kalbfleisch (2000), where there are independent samples yij , j = 1, . . . , ni from the ith stratum for i = 1, . . . , k. There is a large literature on inference concerning under the semiparametric framework assuming that E(Yij ) = and V (Yij ) = 2i , the 2i being unknown. Many authors (e.g. Neyman and Scott, 1948), proposed the following estimating function: g=

k  i=1

 gi ,

where gi = ni (ni − 2)(y¯i − )

ni 

(yij − )2 ,

j =1

 i where y¯i = nj =1 yij /ni . This function g was also used by Hu and Kalbfleisch (2000). We considered the case k = 30, = 0, n1 = · · · = n10 = 6, n11 = · · · = n20 = 7, n21 = · · · = n30 = 8, and used the same values of 2i as in Hu and Kalbfleisch (2000): ( 1 , . . ., 10 ) = (0.5, 0.6, . . ., 1.4), ( 11 , . . ., 20 )=(1.0, 1.2, . . ., 2.8), ( 21 , . . ., 30 )=(0.5, 1.0, . . ., 5.0). Table 5 gives the results using three methods: (1) the simple normal approximation for the distribution of st (naive), (2) the EF bootstrap (EF-boot), and (3) the cubic transformation (4.21) (cubic). In Table 5 both lower and upper confidence intervals are given; the first row in each case gives the mean of the confidence limits using 10,000 simulations (numbers in the parentheses show the standard errors) and the second row shows the estimated coverage errors. For each simulated sample the bootstrap resamples were drawn 2000 times. Our results are compatible with the corresponding results of Hu and Kalbfleisch (2000). The results for the three methods shown here are relatively close to each other. This is partially because we have chosen k to be relatively large (for numerical stability). Hu and Kalbfleisch (2000) show that for k as small as 12 the EF-bootstrap has clear advantage over the naive method and other traditional bootstrap methods.

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

703

Table 5 Average lower/upper confidence limits (with standard errors) and coverage errors for the common means in stratified normal samples with k = 30 strata Level

Naive

EF-boot

Cubic

80%

L-limit L-error U-limt U-error

−0.12(0.10) 0.122 0.11(0.10) 0.129

−0.12(0.11) 0.123 0.12(0.19) 0.128

−0.11(0.14) 0.125 0.11(0.10) 0.139

90%

L-limit L-error U-limt U-error

−0.15(0.15) 0.078 0.15(0.12) 0.068

−0.16(0.15) 0.067 0.15(0.13) 0.07

−0.17(0.38) 0.076 0.15(0.11) 0.077

95%

L-limit L-error U-limt U-error

−0.19(0.12) 0.029 0.17(0.36) 0.033

−0.20(0.12) 0.027 0.19(0.16) 0.027

−0.20(0.25) 0.032 0.19(0.14) 0.037

Three methods are compared: the simple normal approximation for st (naive), the EF bootstrap (EF-boot), and the cubic normal transformation (4.21) (cubic). Entries are based on 10,000 replications and the EF bootstrap use 2000 bootstrap samples for each simulated sample.

Acknowledgements We thank the Editor and the anonymous referees for useful comments, especially for connection with the work of Hall (1992), which lead to substantial improvement of the paper. References Barndorff–Nielsen, O.E., Cox, D.R., 1989. Asymptotic Techniques for Use in Statistics, Chapman & Hall, London. Crowder, M., 1986. On consistency and inconsistency of estimating equations. Econ. Theory 2, 305–330. Efron, B., 1987. Better bootstrap confidence intervals (with discussions). J. Amer. Statist. Assoc. 82, 171–200. Fisher, R.A., 1921. On the probable error of a coefficient of correlation deduced from a small sample. Metron 1, 1–32. Green, P.J., 1984. Iteratively reweighted least squares maximum likelihood estimation and some robust and resistant alternatives. J. Roy. Statist. Soc. B 46, 149–192. Hall, P., 1983. Inverting an Edgeworth expansion. Ann. Statist. 11, 569–576. Hall, P., 1992. On the removal of skewness by transformation. J. Roy. Statist. Soc. B 54, 221–228. Hu, F., Kalbfleisch, J.D., 2000. The estimating function bootstrap (with discussions). Canad. J. Statist. 28, 449–672. Johnson, N.J., 1979. Modified t tests and confidence intervals for asymmetrical populations. J. Amer. Statist. Assoc. 73, 536–547. Konishi, S., 1981. Normalizing transformations of some statistics in multivariate analysis. Biometrika 68, 647–651. Konishi, S., 1991. Normalizing transformations and bootstrap confidence intervals. Ann. Statist. 19, 2209–2225. Lindsey, J.K., 1996. Parametric Statistical Inference, Clarendon Press, Oxford.

704

J. Wang, N. Taneichi / Journal of Statistical Planning and Inference 136 (2006) 689 – 704

Neyman, J., Scott, E.L., 1948. Consistent estimates based on partially consistent observations. Econometrica 16, 1–32. Small, C.G., Wang, J., 2003. Numerical Methods for Nonlinear Estimating Equations, Clarendon Press, Oxford. Stuart, A., Ord, K., 1994. Kendall’s Advanced Theory of Statistics: vol. I, Distribution Theory, Sixth Ed., Edward Arnold, London. Taneichi, N., Sekiya,Y., Suzukawa, A., Imai, H., 2002. A formula for normalizing transformation of some statistics. Commun. Statist.-Theory Methods 31 (2), 163–179. Wilks, S.S., 1938. Shortest average confidence intervals from large samples. Ann. Math. Statist. 9, 166–175.