.•
journal of
ELSEVIER
Journal of Statistical Planning and Inference 65 (1997) 213 231
statistical planning and inference
Edgeworth expansions for nonparametric distribution estimation with applications Pilar H. Garcia-Soid/m a*'t, Wenceslao Gonzfilez-Manteiga b~-~, Jos6 M. Prada-S',inchez b" ~'Departamento de Estadislica e hwestigaciOn Operalit'a, Universidad de l:ig,o. E. U de b2vmdio.~ Empresariales, Torrecedeira 105, CP. 3620& l:Tgo, Spain b Departamento de Estadistica e h~vestigaciOn Operatit,a, Unicersidad de 5anliago, Campus Universitario Sur, C.P. 15771, .~;antiag:/ de Composlela, Spain
Received 27 May 1996; received in revised t'orm 18 February 1997
Abstract In this paper, we will investigate thc n o n p a r a m e t r i c estimation of the distribution function F of an absolutely c o n t i n u o u s r a n d o m variable. Two m e t h o d s are analyzed: the first one based on the empirical distribution function, expressed in terms of i.i.d, lattice r a n d o m variables and, secondly, the kernel method, which inw)lves nonlattice r a n d o m vectors d e p e n d e n t on the sample size n; this latter procedure produces a s m o o t h distribution estimator that will be explicitly corrected to reduce the effect of bias or variance. F o r b o t h methods, the nonStudentized a n d Studentized statistics are considered as well as their b o o t s t r a p c o u n t e r p a r t s a n d asymptotic expansions are constructed to a p p r o x i m a t e their distribution functions via the Edgeworth e x p a n s i o n techniques. O n this basis, we will o b t a i n confidence intervals for F(xl and state the coverage error order achieved in each case. ( 1997 Elsevier Science B.V. A M S class!lication: primary 62G05; secondary 62E20 K e y w o r d s : B a n d w i d t h parameter; B o o t s t r a p method; Coverage error; Edgeworth expansion:
Empirical distribution function; Kerncl method; S m o o t h e d distribution function
1. Introduction Let X 1 , X 2 . . . . . X , , be i.i.d, and absolutely continuous random variables with common distribution function f and density f The traditional nonparametric
* Corresponding author. 1 Supported in part by Grant XUGA20701B96 from Xunta de Galicia. -' Supported in part by Grant PB95-0826 from DGICYT. 0378-3758/97:$17.00 1997 Elsevier Science B.V. All rights reserved Pll $0378-3758[97)000591
214
P.I¢ Garcia-Soidr)n et al. /Journal of Statistical Planning and ln]erence 65 (1997) 213 231
estimator of a distribution function has been the empirical distribution function (EDF) defined as
F,(x) = -
F/j= 1
I,x, <_,I,
where I A denotes the indicator function of a set A. Several properties of the E D F are well known; in particular, the exponential rate of convergence of F,, has been studied by Sanov (1957) or Hoadley (1967). However, F, is a discrete distribution and, hence, it seems to be more appropriate to use a smooth estimator for a continuous distribution F, obtained by taking an integrand of a density estimator. Thus, using the kernel method, the corresponding distribution estimator, that will be referred to as the smoothed distribution function (SDF), would be given by
P.,h(x)
1 ~,
Ix - X j \
1~1j= 1
where K ( z ) = S~_o: k ( y ) d y , k is a kernel density and h(n) denotes the bandwidth parameter. The estimator F,,h, introduced by N a d a r a y a (1964), has asymptotically the same mean and variance as Fn, under certain conditions of regularity and its uniform strong convergence has been stated by Winter (1973) and Yamoto (1973). In addition, the S D F may be considered as a smooth version of the E D F obtained by locally fitting a polynomial and using weighted L2-norm as a minimization criterion; see Lejeune and Sarda (1992) and Sarda (1993) for a description of this general technique to obtain smooth distribution estimators. Different alternatives have been proposed to compare both procedures for estimation of the distribution function; in this sense, Azzalini (1981), Swanepoel (1987) or Shirahata and Chu (1992) have used, respectively, the mean squared error (MSE), the mean integrated squared error (MISE) or the integrated squared error (ISE) criteria. Moreover, arguments involving an appropriate selection of the smoothing parameter have been used to investigate whether F,,h has better performance than F,; for instance, a representation of the relative defficiency at a single point of the E D F with respect to the S D F has been established by Falk (1983). On the other hand, Hall and Wood (1995) provided heuristic support for the claim that using a kernel-type estimator it is possible to smooth out the discontinuities that appear when a confidence band based on the E D F is considered. The aim of this paper is to analyze both methods for estimation of an absolutely continuous distribution function F at a single point, via the Edgeworth expansion techniques; in each case, we will consider the non-Studentized and Studentized statistics as well as their bootstrap counterparts and obtain expansions that approximate their distribution functions. In the first situation, since the E D F is expressed in terms of i.i.d, and lattice random variables, we shall see in Section 2 that the first term in the expansions depends on a
P.H. Gatz'ia-Soi&'m et al./Journal o/Stati,s'lical PlamHng and h!/brence 63 (1997~ 213 23l
2i5
periodic and noncontinuous function, so that the error commited in approximating to the distribution of each statistic by the standard normal distribution is of order n 1 _, Secondly, we will tackle in Section 3 the estimation o f F by using the kernel method. either directly or by explicit corrections on the statistics to reduce the effect of bias or variance. In each case, since the distribution of the random vectors involved is non-lattice and dependent on the sample size n, we will apply the Edgcworlh expansion techniques for triangular arrays, developed in Garcia-Soidim (19951 and. consequently, the expansions obtained will be expressed in terms of smoolh functions. Thus, when the SDF is considered, we shall conclude that the error respect to the standard normal distribution is of a decreasing order, as tile number of corrections increases; in particular, three corrections on the estimators bascd on thc SDF will be required to equal the behaviour of the E D F with respect to the normal approximation. The expansions obtained, as indicated above, will depend on appropriate and unknown derivatives of the distribution function or the distribution function itself: therefore, we may only use the normal approximation to construct confidence interwHs for F(x), whose coverage error will be guaranteed to be of no larger order lh~ln thal of the distribution estimator with respect to the standard normal distribution. Next, we will consider in Section 4 the smoothed bootstrap version of all the estimators proposed and expand their distribution functions, conditional on the sample. There are some arguments supporting the use of the smoothed bootstrap method: first, the underlying distribution function F is assumed to be continuous and, secondly, the validity of Edgeworth expansions will follow directly if the estimator has a nonzero absolutely continuous component, when the kernel procedure is applied. In addition, terms in these bootstrap version of Edgeworth expansions do not dcpend c,n unknown parameters and the distribution of each estimator may be approximated by that of its bootstrap counterpart, as suggested in Hall (1991) for density estimation. and used to construct a confidence interval for F(x), obtaining a substantial improvement on the normal approximation in kernel-type estimation. Moreover, when using the bootstrap approximation, we shall see that the coverage probability achieved for non-corrected statistics based on the SDF does not improve that of the estimators constructed fi'om the EDF. However, one bias correction on the former statistics will be enough to state the better behaviour of the kernel method. Along this work, we will have to estimate several bandwidth parameters, used either in the construction of kernel-type estimators or for determining the amount of smoothing when resampling. There are several alternatives: in the classical theory o1" the kernel method, it has been suggested the bandwidth parameter be selected so that it minimizes asymptotically, in some sense, the error corresponding to the smoothed distribution function; see, for instance, Azzalini (1981) or Swanepoc1119871 that used the MSE or the MISE of kernel-type estimators, respectively, to determine the asymptotically optimal bandwidth parameter, when estimating the distribution function. A cross-validation procedure is studied in Sarda {1993), where the asymptolic optimality of the selected parameter is stated with respect to quadratic measures of
216
P.H. Garcia-Soid~m et al./Journal o f Statistical Planning and InJbrence 65 (1997) 213 231
error. As an alternative to minimizing a selection criterion. Altman and L6ger (1995) introduced a plug-in estimator of the asymptotically optimal bandwidth. In this paper, we have considered the MSE criterion for choosing the bandwidth parameters and, proceeding on this way, the coverage error order of the different confidence intervals proposed has been stated. We will note particularly, in Section 4, that the use of the smoothed bootstrap method gives rise to the fact that a bandwidth parameter be involved in the estimation of the distribution function as well as that of some of its derivatives; hence, we suggest the bandwidth parameter should be selected in order to minimize asymptotically the MSE of the highest-order derivative of the smoothed distribution function where it appears. It is clear that the order of magnitude of the bandwidth parameter increases with the order of the derivative of the distribution function that must be estimated and, therefore, this procedure may provide a degree of oversmoothing in the estimator; however, it will ensure that the contribution of the variance to the MSE, in each case, will not be of larger order than that of the bias and renders terms equal in order of magnitude to the MSE of the highest-order derivative of the SDF involved. In practice, the MSE criterion described above for selecting a bandwidth parameter requires specification of unknown parts, dependent on the theoretical distribution F. We have considered a plug-in method to estimate these quantities, as described in Section 5, where we include a simulation study carried out to compare the performances of the Studentized estimators proposed in order to construct confidence intervals for F(x).
2. Estimators based on the E D F
First, we will consider the EDF: F,(x) = 1_ ~, I,,x,<<.~l, El j= 1
where the r a n d o m variables Y2 = l{x, <.x}, 1 ~ j <.%n are i.i.d, and lattice, taking only values 0 and 1 with probabilities 1 - F ( x ) and F(x), respectively. Then, E ( Y ~ ) = F(x), for all sE N, s/> 1 and, in particular, var(Y1) = V = F ( x ) - F(x) 2. Denote by supp(F) ° the interior of the support of F, supp(F). We will assume: (F1) x ~ supp(F) ° The above hypothesis will guarantee that 0 < F ( x ) < 1; hence, we may take Oo,l,n(X) = nl/2 (F(x) F,(x)- - -F ( xF(x) ) 2 ) 1/2'
O°'2'n(X)
=
nl/2
F,(x) -- F(x) (F,(x) -- V,(x)2) '/2
to represent a non-Studentized or a Studentized version of the EDF, respectively. Note that, for all r > 0, one has, uniformly in y e ~: ~a(Uo,z.,(x ) ~ y) - °~(Uo,l.,(x ) ~ a,(y)) = O(n r) where a,(y) = nl/2(y2(1 - 2F(x)) + y(y2 + 4nV)l/2)(2(n + y2)V 1/2)-1
(1)
P.H. Garcia-Soid[m et al. / Journal o f Statistical P&nninv and lnfi, rence 65 (199 7j 2t3 23l
2i7
Therefore, we may develop the distribution of both statistics by using Theorem 23.1 of Bhattacharya and Ranga Rao (1976) and relation (1); recall that additional terms of all orders, depending on periodic and not smooth functions, must be added to compensate the error in approximating to a discrete distribution, that of Uoz,(x) (i = 1.2), by a smooth distribution, in this case the standard normal distribution. In particular, we may obtain the expansion up to a remainder of order n 1, as given in the following theorem. Theorem 2.1. Assuming condition F1. one has. un(formly in v e ~:
.~(Uo.i.,,(X ) ~< 3') = ~ ( y ) q- lrt --
II-
I;2 v -
1"211 -- 2F(x))V
12p( .i(
l'1(I)1 F)
1,'2Sl(nF(x ) + nli2 qo.i( y) V l":>)(l:qy ) 4- ()(#1 l) (2)
J0r i = 1.2, where 4) and qb denote the standard normal distribution and density. r e s p e c t i v e l y , S1(2 ) 2 -- [Z] -- 2.1 p<),l(yl 1 y 2 q). 1(V) Y, PO.e(Y) 1 + ,~V'- aml qo.2(Y) = a,,(y). Then, in view of T h e o r e m 2.1, the error commited in approximating the distribution of each statistic, based on the E D F , by the standard normal distribution is of exact o r d e r n ~,2.
3. Estimators based on the kernel method
Now, the kernel m e t h o d may be used to construct a nonparametric distribution estimator: n
f,,.,, (.-<) ...--1E'
(?,:.--Xj~
where K(y) = .i y , k(z) dz, k is a kernel-type density and h(n) represents the bandwidl h parameter. We will assume that k is a continuous, symmetric and compactly supported density function, such that k e g ( ' ( s u p p ( k ) ) and /.vl ¢_ 0, where supp(k) denotes the interior of supp(k). The following hypotheses will bc imposed on the bandwidth parameter: (hi) 1ll 2h(tl)2 -- 0(1). (h2) lnn/nh(n) = 0(1). With regard to the distribution function, when using the kernel method, a series ,)f conditions will be required of F to state rigorously the Edgeworth expansions approximating the distribution of the estimators constructed, depending on the n u m b e r of corrections involved. Ill each case. we will indicate which of the fl)llowing hypotheses are needed: (FI) x e supp(F) . (F'2) F e ~ 3(supp(F) ). (F3) F e ~(, 5(supp(F) ). (F'4) F e ~e;,7(supp(F) ).
218
P.H. Garcla-Soiddn et al./Journal of Statistical Planning and Inference 65 (1997) 213 23l
Bear in mind that condition F'(x) > 0 follows straightforwardly from F1, F2 and the fact that F is an absolutely continuous distribution function. The r a n d o m variables A~., = K ( ( x - Xj)/h(n)), 1 <~j <~ n, are i.i.d., nonlattice and uniformly bounded; in addition, the distribution of AI., is dependent on the sample size n. This means that the m o m e n t s of A I,, are affected by this dependence on the sample size and satisfy E(A~].,,) = O(1) = F(x) + O(h(n)), for all s~ N, s ~> 1. In particular, the sequence {V,,}, where V,, denotes the variance of A 1 .... converges to V = F(x) - F(x) 2 > 0, under F1. Thus, we m a y consider a non-Studentized or a Studentized version of the SDF, given, respectively, as follows: F,,h(x) -- F(x) Ua,l,,(x) = n 1/2 (F(x) F(x)2) 1/2' -
U1,2,,(x) = n 1'2
F,,,h(x) -- F(x) (P,,,~
(x)
-
&~(x)~)
1'~'
Now, Eq. (1) remains valid, just replacing Uo,~,, by Ul,i,,, for i = 1, 2. According to this, the validity of Edgeworth expansions for the distribution of the above statistics would be reduced to proving the existence of an expansion that a p p r o x i m a t e s the distribution of Ut,l,, and this will require that the r a n d o m variables involved satisfy a Cram6r's condition, dependent on the sample size n, as suggested in Hall (1991) for density estimation. L e m m a 3.1. Assuming conditions F1 and F2, for each 6 > 0 there exists a positive constant D(x, 6), not depending on n, satisfying:
(3)
sup [ E(exp{itAl,,~)l < 1 - g(n)D(x,(~) Itl>a
.for all n sufficiently large, where t ~ R and { g(n)} is a sequence of positive values such that [g(n)} " ~ ' , 0 and {ng(n)/lnn} " ~ , + oo. See Section Then, we U1.1,,, and, a remainder
A.1 in the appendix for a p r o o f of this result. m a y obtain an Edgeworth expansion to develop the distribution of therefore, that of U1.2,,,. In particular, a s y m p t o t i c expansions up to of order nl/Zh(n) 3 + h ( n ) 2 + n-1/2h(n), are given as follows.
Theorem 3.2. Assuming conditions F1 and F2, one has, uniformly in y 6 JR: ~ ( U l , i , n ( X ) ~ y) = (P(y) -- ½ n U 2 h ( n ) 2 V - 1/2 k2,0 F"(x)@(y)
+ h(n) V - 1k1,1 F'(x)yO(y) + + n - 1/2(1 -- 2F(x)) V - 1/2 Po.i(y)~b(y } + O(nl/2h(n) 3 + h(n) 2 + n 1/2h(n)),
(4)
where ki.i = ~ +~ uik(u)K(u)Jdu and P0,i are as defined in Theorem 2.l,Jor i = 1,2.
P.H. Garck~-Soi&'m et al./Journal ql Stali.slical Plamffng and lnli_'rence 65 (1997) 213 231
2 I0
The bandwidth h(n) that minimize asymptotically the MSE of f;,.h(X) is of size n ~" under E1 and F2, and this selection of h(n) satisfies conditions hl and h2. In consequence, Theorem 3.2 shows that the error commited when approximating the distribution of these estimators, based on the SDF, by the standard normal distribution is of order n- 1,,(,. Now, bear in mind that this latter error order is due to the bias of the above statistics, Ul.i.,,(x), for i = 1,2, which produces the O(lllah(ll)2)-ternl icl (4), since: EIA 1.,,)= F ( x ) + ½h(n)2k2.oF"(x) +-O(h(11)3).
In view of this, next we will correct the non-Studentized and Studentized statistics based on the SDF with the aim of reducing the effect of bias, where F"lx) may be estimated by differentiating a kernel distribution estimator to obtain:
F']"h'(X)
'
t1111 (17) 2
L K"( j=
1
x4
~ hl(H ) 7"
The new bandwidth parameter h101) satisfies: (hll) hi (17) o(1). (hi2) h(n)/hl(lO = o(1). Now, take: U2.1.,,(x) br2.2.,,(x)
,1.2
f,,.h(x) -- f ( x ) -- ~hln)1 , , kz.l~F,,.<(x) ~,, (F(x)
--
FIXJ2) 1:2
= 1112 ffn,h(X) -- F(.-,,T) - - l h{FI)2 k 2 oFj. h, X}
(ffn.h {X) -- g,.h(X) 2 )1,2
Write B/.,, = K " ( ( x - Xi)/hl(n)), then both statistics may be expressed in lerms of the i.i.d, and u n i f o r m l y bounded random vectors Xr, , = (Ai.,,, B~.,,), I ~ j ~< H, which satisfy a Cram~r-type condition as follows.
L e m m a 3.3. Assuming conditions F1 and F2, lor each (5 > 0 there exists a positil e ccmstaHt D 1(x, cS), not depending on n. sati@,in.q: sup I E ( e x p { i Q , Xl.,,)})[ < 1 -- g(tl)Dl(x,~5)
{51
IP!I > ,~
.lot all 11 large, where t • ~2 and {,q(n)[ sati,~/ies the c o M i t i o n s required in L e , m l a 3.
Tile proof of this lemma will be outlined in Section A.2 of the appendix. Note. that EIA~.,,B~i,,)-O(hl(n))
for all sl,s2er~,
s2 ~> I.
((~
Thus, if W,, denotes the covariance matrix of X i.... the sequence {W,,} converges lo a singular matrix; this means that the eigenvalues obtained on inverting ~47,, a r e not bounded and this condition is needed to validate the Edgeworth expansions for triangular arrays.
P.H. Garcla-Soiddn et al. / Journal of Statistical Planning and Inference 65 (1997) 213 231
220
The fact that var(Bl..) = h l ( n ) c F ' ( x ) + O(hl(n)2), for c = ~ + ~ ( K " ( z ) ) Z d z > 0, suggests to rescale the second component of the random vector Xi,.. Hence, take /3j,. = hi (n)-1/2 Bj,. and write I~. for the covariance matrix of (A j,.,/~j..); then, one has
0) wheeispoivedenite
W. ~
cF'(x) '
However, on changing scale, the transformed random vectors are not bounded, since the sequence {hi(n)} tends to 0, and this will affect the order of the remainder term in the expansions approximating the distribution of Uz,~,.(x), for i = 1,2. In fact, an Edgeworth expansion in this context is a power series in n -I/z, where the coefficient of n -j/2 is qj,.(y)(o(y), for some polynomial qj,. expressed as a combination of the cumulants up to the (j + 2)th of the random vectors involved; according to this, the n J/Z-term in the expansion might be now at worst of order ( n h l ( n ) ) -j/2, by relation (6). Recall that conditions h2 and h12 imply the sequence {nha(n)} tends to +o0.
Thus, next we will construct specific Edgeworth expansions which are guaranteed to be valid, since Theorem 20.1 of Bhattacharya and Ranga Rao (1976) may be adapted for triangular arrays, when the limit matrix is singular, as proved in Garcia-Soidfin (1995). Theorem 3.4. A s s u m i n 9 conditions F1 and F3, one has, uniformly in y c R: ~O-~(U2,i,n(X) ~ y) = ~ ( y ) q- h ( n ) v - l k l , l V ' ( x ) y ¢ o ( y )
+ ¼ nX/Zh(n)2h, (n) 2 V
l k20r'V(x)(o(y )
+ ~n 1/2(1 - 2F(x)) V - 1/2po,i(y)(a(y ) + O ( n l / Z h ( n ) 3 + nl/2h(n)Zh 1 (n) 3 + h(n) 2 + n-1/2h(n)).
Bear in mind that the size of the bandwidth hi(n) that optimizes performance of the estimator F~,h,(X) is n-I/V; hence, from relation (7) we may deduce that the error in approximating the distribution of U2,i,.(x) by the standard normal distribution is of order n-1/3; this means that the error order achieved has been reduced, compared with the one derived from (4), previous to correct for bias. The h(n)-term appearing in (7) comes from the variance of the statistics considered since vat(A 1,.)= F ( x ) - F ( x ) 2 - 2 h ( n ) k l , l F ' ( x ) + O(h(n)2). In view of the arguments above, next we will proceed to compensate the term of order h(n) appearing in the expansions, by adjusting the variance, say U3 1 .(x) = n 1/2 ff .,a(X) - F ( x ) - l h(n)Z k2,0ff'.', a, (x) ' ' (F(x) F ( x ) 2 - - 2 h ( n ) k l , l F ' ( x ) ) 1/2 ' -
U3,:..(x) = n 1/2
-
ff.,h(X) -- F ( x ) -- ½h(n) e k2.0F£',,,(x) (ff,,.h(X) -- P..h(X) e -- 2h(n)kl.lff'.,h2(X)) 1/z'
P.H. Garcia-Soidhn et al./Journal (?/ Statistical Planning and lr~[breme 65 (1997) 213 23l
221
where
1
. K,(x_ x,)
P:.h,(x) - nh-2(n) j~=l
k [~2(~)-/
and the bandwidth p a r a m e t e r h2(n) satisfies: (h21) h2(n)/hl(n) = 0(1). (h22) h(n)/hz(n) = 0(1). Observe that F'(x) has not been estimated in U3,~..(x), since the variance is assume,] to be k n o w n for the non-Studentized statistic and so will it be an a p p r o x i m a t i o n to il. Then, an Edgeworth expansion to a p p r o x i m a t e the distribution of U3.1,.(x) may be derived straightforwardly from that of Uz.~,.(x), due to the relationship between both statistics:
~¢(u3.,,.(x) <~ y) = ~(u~.~,.(x) <~ h.(y)) for b.()') = y (1 - 2 h ( n ) V - l k l , l F ' ( x ) ) 1/2. Now, take C i,. = K'((x - Xj)/h2(n)), then U3.2,n(X ) m a y be expressed as a nonlinear function applied on a sum of i.i.d, r a n d o m vectors, Yj,. = (A j,., Bj,., Cs,.), 1 ~
,~{U3.i,n(X ) ~-~y) = q~(y) ~- ¼nl/2h(n)Zhl(n) 2 V i k2,0 Fp~'(x)O( y) + +n 1/2(1 - 2F(x))V
~"2po.i(y)~b(y)
+ O(nl/Zh(n) 3 + nt/Zh(n)Zhl(n) 3 + h(n)hz(n) 2 + n-1/2h(n)). 18! Following this procedure, we m a y correct again for bias the latter statistics. In fact, under a s s u m p t i o n F3, one has
E Aim
2hl(n)2k2.0Bl,,
= F(x)--¼h(n)2hl(n)2k~.oFlV(x) + o(h(n)ahl{n)2).
P.H. Garcia-Soidhn et al. /Journal o/'Statistical Planning and hfference 65 (1997) 213 231
222
In consequence, write Ug.i.,(x), i = 1,2, for the resulting estimators, say
U4,1,,(x)=n,/zF,,a(X)
-
F(x) - lh(n)2k2,0Fnt, h , ( X ) ~ - l h ( n ) 2 h i (n) 2 k2,0Fn.h~(x 2 ~|v ) (F(x) -- F(x) 2 -- 2h(n)kl,lF'(x)) 1/2
U4.2,,(x) = n 1'2 F,.h(X) -- F(x) - - ½h(n) 2 kz,off,;'h,(x ) -t- ¼ h ( n ) Z h l
(n)Zk 2 0F~v (x) (F,.h(X) -- ff,.h(X) 2 -- 2h(n)kl.lF',.h~(X)) 1/2
where
plV ,x, -
1
,.h,~ ,
nh3(n)4
~ K,v(x-X,~ \ ha(n) ]
j=l
and the bandwidth parameter h3(n) satisfies: (h31) h3(n) = 0(1). (h32) h l ( n ) / h 3 ( n ) = 0(1). By similar arguments, we m a y approximate their respective distributions, bearing ~ in mind that the size of the bandwidth ha(n) that minimizes the M S E of t ~ I,V hAX) is n
1/11
Theorem 3.6. Assumin9 conditions F1 and F4, one has, uniformly in y ~ R: S~(U4,i,n(X)
~
y) = @(y) + ~ n - 1/2(1 -- 2F(x))
V
1/2
iJo,i"(y)@(.V)
- ~nl/Zh(n)2th(n)2ha(n) 2 V 3/2 k~,o F~V(x)4)(y) + O(nl/2h(n) ~ + nl/2h(n)Zhl(n) 4 + nX/2h(n)2hl(n)2h3(n)3 + h(n)hz(n) 2 + n-1/2h(n))
(9)
Relation (9) shows that the error order with respect to the standard normal distribution equals that of each estimator based on the E D F which is n - 1/2 as well as improves the results achieved previously with the S D F previous to the second correction for bias; to appreciate this, Table 1 presents the error orders committed when a p p r o x i m a t i n g the distribution function of each estimator by the standard normal distribution. R e m a r k 3.7. Table 1 makes it clear that the error order committed when approximating the distribution of each estimator based on the S D F by the standard n o r m a l distribution decreases as the n u m b e r of corrections increases. In general, such a procedure of successive corrections on the estimators m a y be iterated in the sense that we Table 1 Standard normal approximation orders
Uo.i,o(x)
U~,i,.(x)
Ue,i,.(x)
U3,i,.(x)
U4,i..(x)
?l - 1,2
n - I ,,6
n
n - 19142
1~ 1,2
1/3
P.If. Garcia-Soid~'m et al./Journal ~/ Statistical Planning and lr~lerence 65 (1997) 213 231
223
can adjust the statistics constructed in each stage with the aim of compensating terms in the expansion due to the bias or variance, as it has been proposed. Some more complex work should be done to remove the effect of skewness onto the latter estimators, U4,~.,(x), that would be the next step on this procedure of reducing the error order with respect to the standard normal distribution achieved in Theorem 3.6, as described very generally in Hall (1992); in fact, it is easy to check that terms in order of magnitude n ~.2, appearing in (9), are due to third-order moments. Remark 3.8. Recall that the use of the kernel method involves the estimation of a number of bandwidth parameters that would not be the point when the EDF is considered. Nevertheless, an important advantage in using the kernel procedure is that the corresponding Edgeworth expansions in this context involve smooth functions and, therefore, may be approximated in terms of their respective bootstrap counterparts, stating exactly the error order committed in each case. This fact will give rise to better coverage error orders than those obtained from the statistics based on the EDF, as we will show in the following section. Remark 3.9. Our main interest is to construct confidence intervals for F(x), as indicated in Section 2 and, from this point of view, a first idea would be to invert the expansions obtained in Sections 2 and 3; however, they are dependent on appropriate and unknown derivatives of the distribution function or the distribution function itself. This means that so far we should use just the normal approximation, so that denoting by z= the ~-quantile of the standard normal distribution, we may construct the one-sided confidence interval at level c~ for F(x) from inequality Uj.i,n(x') ~< z,. Then, the coverage error order achieved would be n - " , for each j, as given in Table 1, since :~(Ui.i.,,(x ) <~ z~) = ~ + O(n-C').
4. Bootstrap estimators The standard bootstrap method, introduced by Efron (1979), is a strategy for estimating standard errors, setting confidence intervals for parameters as well as approximating distribution functions. The key idea is that the relationship between the distribution F and the sample is similar to the relationship between the empirical distribution F, and a secondary sample drawn from it. Nevertheless, since the E D F is a discrete distribution, then samples constructed from F, in the bootstrap simulation will have some rather peculiar properties; for instance, all the values taken by the members of the bootstrap samples will be drawn from the original sample values, and nearly every sample will contain repeated values. The smoothed bootstrap is a modification to the standard bootstrap procedure to aw,)id samples with the above properties. The essential idea of the smoothed bootstrap is to perform the repeated sampling not from Fn, but from a smoothed version of F,,,
224
P.H. Garcia-SoidSn et al. /Journal of Statistical Planning and Inference 65 (1997) 213 231
usually a kernel-type estimator: 1
"
/x
-
X j\
where L ( z ) = 5~_~ l(y)dy, 1 is a kernel density and g(n) represents the bandwidth parameter. Then, a bootstrap sample )7" of size n, given by )7" . . (X*, . . , X~* , ), will be obtained by drawing at random from the distribution function/~,.~. The question of whether the smoothed bootstrap is superior to the usual bootstrap, and for which smoothing parameter, has been worked on by a few authors (Silverman and Young, 1987; Hall and Martin, 1988 or Hall et al., 1989); although, in most cases, no definitive answers exist, especially in small samples. In a variety of contexts, smoothing influences only second-order properties of the estimator, while requiring greater computation and choice of a suitable amount of smoothing. There are problems, however, where smoothing may affect the rate of convergence of the estimator; see De Angelis and Young (1992) for examples in this context, where a procedure based on the smoothed bootstrap is also suggested and illustrated. We have considered the smoothed bootstrap procedure; however, there are other possibilities to be investigated such as the method of shrinking in the smoothed bootstrap, suggested in Silverman and Young (1987), with the aim of preserving the variance structure or, even, the rescaled version of the smoothed bootstrap proposed in Wang (1995), which produces estimators that have the asymptotic minimum mean (integrated) squared error. In Sections 2 and 3, we have seen that each random variable of interest depends on )7, as well as on the underlying distribution F, and may be represented as U~,i,,(x) = T~,i,,(X,, F), for each i,j and for some function Tj,i, .. Thus, the smoothed bootstrap method consists of approximating the sampling distribution of Tj,i,,(Y,,, F), under F, by the bootstrap distribution of U*i,,(x) = Tj, i,,(X,, ~* F,,o), ~ under &,o. The bandwidth parameter g(n), as suggested in Section 1, will be selected so that it minimizes asymptotically the MSE of the highest-order derivative of .F,,o(x) involved in the corresponding Edgeworth expansion; this means MSE(P~"0)(x)) = O((n g(n) 2"- 1)- 1 ~_ g(n)4) and, therefore, g(n) = O(n 1/(2,,+3)), if the expansion depends on the derivatives of F,,0 up to the mth, provided that FEcdm+Z(supp(F)°). Since the procedure suggested for estimating the bandwidth parameter conveys the fact that h(n) is a function of the random sample )~., an appropriate bandwidth for the bootstrap version of --,,h~(J)could be estimated by the resample values rather than sample values. However, this approach causes other problems, as is remarked in Hall (1992), because the common procedure for calculating h(n) involves high computation as well as a degree of subjectivity; for instance, when a pilot bandwidth is selected. According to this and bearing in mind that h(n) is assumed at first to be nonrandom, we will consider the same bandwidth parameter for both the bootstrap and nonbootstrap estimators.
P.H. Garcia-Soidgm et al. / Journal ol" Statistical Planning and h~/i,rence 65 (199 7j 213 231
225
Recall that one of the arguments supporting the use of the s m o o t h e d bootstrap m e t h o d was that the underlying distribution function F (supposed to be absoluteiy continuous) would be estimated by an absolutely continuous distribution, say, F..,~. and this will be e n o u g h to guarantee that the bootstrap version of all the estimators considered, either constructed from the E D F or based on the kernel method, will admit Edgeworth expansions, conditional on the sample )7.; moreover, the corresp o n d i n g expansions would be obtained from their counterparts just substituting the expectations with respect to F for the conditional expectations with respect to F,,•q. or. in other words, replacing F-I.,) _.,~ by F°"L for each m in the expansion, so that order of the remainder term holds now with probability one. Take into account that --n.q~lmli,X;] -- /TIm)(x) = Op((J(/1) 2) Op(FI 2;~2m ~ M), then
,~(C~,i,n(.¥) ~ Y/'Xn) - ~ ( U j . i , ,
~ . 3') =
Op(H
e,)
(j = O, 1,2,3,4),
~1~)}
where eo = ½, due to the fact that the n k'2-term in the expansion of U0,~.,, depends on a periodic and n o n - c o n t i n u o u s function S~ and e i c i + 2 / [ 2 / + m(j)], for i > 1, where c i is given in Table 1 and re(j) corresponds to the highest order derivative of F involved in the expansion of U i.~,,(x). According to Theorems 3.2, 3.4, 3.5 and 3.6. m(j) will take values 2, 4, 4 and 6, for./equaling 1, 2, 3 and 4, respectively. Thus, the error orders committed if the distribution function of each estimator is approximated by' that of its bootstrap estimator, with probability one, are outlined in Table 2. R e m a r k 4.1. W r i t e u~.,i :~n(X) for the >-quantile of L ~* i.~.,,(.x"), for .j = 0, 1 "~ ~ 4. and i 1.2. Then. in view of Table 2, the one-sided confidence interval for F ( x ) at level constructed from the inequality U i . , . ( x ) <~ u*~ ..... (x) has a coverage error of order Op(n "'). R e m a r k 4.2. Table 2 shows that confidence intervals constructed from U 1 .i,n(x). which represents the S D F without corrections, do not improve the coverage error orders achieved from U o , , , ( x ) , based on the E D F : however, it is enough to make just one explicit correction to reduce the effect of bias, in order to state the better behaviour of the kernel method, say, construct contidence intervals from Uj,~.~(xt. for j >~ 2: see Remarks 3.9 and 4.1. R e m a r k 4.3. Recall that we may just guarantee that c o n f d e n c e interwfls obtained from n o n - b o o t s t r a p statistics have coverage errors with deterministic orders as accounted in Table 1; on the other hand, when the bootstrap statistics are considered, Table 2 Bootstrap approximation orders U0,.~(x)
Ut.~.,,(x)
Ue.~.,,(xl
U i,, v)
Ua.i.,,Ixl
I1 12
tl
tl
tl
tt
19 -k.2
1733
293 462
1~)30
226
P.H. Garcla-Soidan et al. /Journal o f Statistical Planning and Inference 65 (1997) 213 231
the coverage error orders achieved, with probability one, are shown in Table 2. Thus, comparing both tables, at first sight we may check that no difference in order of magnitude appears between confidence intervals constructed from the statistics based on the E D F or from their bootstrap counterparts; however, this conclusion does not remain valid for the kernel procedure, since it is clear that the use of the bootstrap method produces a substantial improvement with respect to the orders achieved for non-bootstrap estimators, no matter what the number of corrections may be. Remark 4.4. In view of the arguments above, to conclude which of both methods for estimation of the distribution function is to be preferred, the E D F or the kernel method, we suggest the second one, despite the fact that it conveys the assumption of a number of derivatives of the distribution function, in addition to the estimation of several bandwidth parameters, according to the number of corrections involved. In particular, the best coverage error order is achieved by correcting twice the estimators, when using the kernel procedure, since: 419 17 ~ ~19 ~ 4293 2 < 1 ~ ~362"
5. Simulation study A Monte Carlo simulation study has been conducted to compare the performances of the Studentized estimators proposed in order to construct 95% confidence intervals for F(x). We have used 1000 samples of size n = 20, 100, and 500, generated from five distributions (exponential, beta, log-normal, standard normal and weibull) and two points have been considered for each of these distributions. For each sample, a 95% confidence interval for F(x) has been obtained and then we have calculated the proportion of intervals containing the true value F(x), which provides an estimation of the coverage probability for each method. When the E D F has been considered, the normal quantiles are considered (see Remark 3.9), since the coverage error order achieved in this case is deterministic and of the same magnitude as that obtained from the bootstrap statistics. The estimators based on the S D F have been computed from the kernel density k(y) = 256315(1 - - y2)4, for - 1 ~< y ~< 1, which satisfies the conditions required in Section 3 and is smooth enough so as to be used in the estimation of the bandwidth parameters, in the manner described below; now, the bootstrap quantiles have been used to construct confidence intervals for F(x), as they provide better coverage probabilities than the normal approximation does (see Remark 4.1). The results achieved are shown in Table 3, where each Studentized estimator Uj,z,,(x), used in the construction of confidence intervals, is abbreviated by j, for j = 0, 1,2, 3, 4. The bandwidth parameters h(n), hi (n), hz(n ) and h3(n) have been chosen to minimize that MSE of F,,h(x), F,,hl(x), ^" F'n,h2(X) and F,^wh~(x), respectively, and using plug-in estimates of the unknown parts. The essential idea is to note that F~J)(x) may be estimated by differentiating a kernel distribution estimator Xn,h ~'tJ)o and the MSE
PIH. Garcia-Soidt)n et al. ,,:Journal ol Statistical Planning and lnfi, rence 65 (199D 213 231 criterion provides an asymptotically optimal bandwidth
227
p a r a m e t e r as follows:
1 ) r j F ' ( x ) n - ,]l.,zj+ 3) (sr(J+ 2)(X)) 2 ~
[-(2j hO'AMSE(H) = " + ,/
for s = j
L
(11)
"~
~ y - k ( y ) d y a n d a p o s i t i v e c o n s t a n t r i, o n l y d e p e n d e n t o n k a n d j, g i v e n by
rj = j'+ ~ (KIJ)(),)) 2 dy. T h e n , w e h a v e u s e d a p l u g - i n m e t h o d t o e s t i m a t e t h e u n k n o w n p a r t s in (1 l), so t h a t e a c h d e r i v a t i v e o f F a p p e a r i n g in ho,AMSe(n), F"~(x), h a s b e e n e s t i m a t e d by' --n,qo~ ~tm) w h e r e t h e p i l o t b a n d w i d t h .qo(n) h a s b e e n s e l e c t e d r e p l a c i n g j by m in (1 1) t o g e t h e r w i t h t h e a s s u m p t i o n unknown
of normality which provide estimates of lhc
p a r t s , say:
,qo(n) = [ { 2 m - (s ~5~": l)rmqb:.~,(x)n + 21(X)) 2
l-]l/(2m+ 3)
Table 3 Estimated coverage probabilities Distribution function
x
0.5 Exp( 1) 2.0
0.3 Beta(3.2) 0.7
0.5 Log-N{0, 1) 2.0
0.5 Normal(0, 1) 0.5
0.5 Weibull(3, 2) 1.0
n
.j = 0
.j =
1
j = 2
/ = 3
/ = 4
20 100 500 20 100 500
0.925 0.927 0.947 0.725 0.765 0.943
0.981 0.962 0.957 0.756 0.801 0.964
0.963 0.962 0.956 0.770 0.798 0.962
0.862 0.925 0.95(/ 0.741 0.885 0.955
0.828 (I.932 0.951 (}.716 (}.883 ().954
20 100 500 20 100 500
0.859 0.968 0.941 0.890 0.942 0.946
0.897 0.966 0.958 0.917 0.970 0.958
0.879 0.959 0.953 0.911 0.961 0.956
0.835 0.945 0.953 0.882 0.938 0.953
(t.823 0945 0.950 0.875 0.931 0.947
20 100 500 20 100 500
0.893 0.931 0.960 0.864 0.912 0.956
0.996 0.975 0.969 0.837 0.906 0.938
0.995 0.970 0.961 0.846 0.921 0.952
0.860 0.891 0.955 0.765 0.867 0.944
0.852 0.872 0.959 (I.78 I (I.879 0.943
20 100 500 20 100 500
0.885 0.966 0.955 0.864 0.958 0.960
0.988 0.975 0.974 0.974 0.971 0.969
0.962 0.971 0.960 0.955 0.970 0.954
0.897 0.932 0.948 0.868 0.942 0.947
/}.843 0.933 /).944 0.837 0.932 0.946
20 100 500 20 100 500
0.886 0.945 0.956 0.651 0.904 0.958
0.934 0.961 0.964 0.759 0.927 0.953
0.923 0.953 0.958 0.731 0.914 0.947
0.884 0.936 0.947 0.678 0.887 0.946
0.867 0.928 0.944 0.663 (7.887 (/.940
228
P.H. Garcia-Soiddn et al. /Journal of Statistical Planning and Inference 65 (1997) 213- 231
where fi and 82 represent the sample mean and variance, respectively, and ~b;~,~denotes the normal distribution with mean/~ and variance 6 2. The kernel density 1 used to construct the bootstrap estimators has been l = ~b (standard normal density). For each sample, the bandwidth parameter 9(n) has been selected in order to minimize asymptotically the MSE of the highest-order derivative of F,,o(x) involved in the corresponding Edgeworth expansion, P~J) n , 0 \ tx~I~ by a similar method as mentioned above for ho(n), just considering l instead of k. Then, 1000 bootstrap replications where used to estimate the bootstrap quantiles. Remark 5.1. For small sample sizes (n = 20), Table 3 shows that confidence intervals constructed from the estimators based on the SDF, when the variance has not been adjusted (j = 1,2), perform better than the others, in the sense that the coverage probabilities estimated are closer to the confidence level 0.95. Not surprisingly, these results as well as those of the EDF, j = 0, are particularly superior to the values achieved from the statistics Uj, z,,(x), for j = 3,4, due to the number of bandwidth parameters to be estimated together with the fact that the correction on variance may provide a negative estimate of it, for small n. As the sample size increases, confidence intervals constructed from the E D F (j = 0) give estimates of coverage probabilities closer to 0.95 than the SDF without corrections (j = 1); on the other hand, kernel-type estimators where the variance has been adjusted have an important improvement in terms of coverage with respect to the other estimators. The coverage probabilities estimated for large sample sizes, n = 500, illustrate the theoretical behaviour, so that the SDF where the bias and variance have been corrected, j = 3, provides in general better results than those produced by the EDF or the other kernel-type estimators.
Acknowledgements We would like to thank Professor Peter Hall for his motivation and helpful suggestions during his stay in the University of Santiago de Compostela. The authors are also grateful to the referee and the Editor whose comments contributed to improve this paper.
Appendix A.1. Proof of Lemma 3.1 Notice that for all a ~ R and e, > 0: IE[exp{itA~,.}] I = h(n)
exp{itK(z)} F'(x -- h(n)z)dz --oo
El] + [23
P.H. Garcia-Soid[Tn et al./Journal o f Statistical Planning and lnfi~Jrence 65 f1997) 213 231
= h(n)
+
2 2 ~)
F'(x - h(n)z)dz
+ h(n) ~a,-~: ~ e x p { i t K ( z ) } F ' ( x - h(n)z)dz .
By F1 a n d F2, F ' is b o u n d e d , c o n t i n u o u s o n x a n d F'(x) > O, so that given ~:, > 0:
]F'(x - u) - F'(x)[ < ~oF'(x)
(12)
for all u c ( - 6o,6o), for s o m e positive c o n s t a n t 6o. In c o n s e q u e n c e , for large n:
~
a+e
[1] = 1 - h(n)
l
F'(x
h(n)z)dz ~ 1 - 2h(n)~:(1 - ~:o)F'(x).
i:
O n the o t h e r h a n d , c o n s i d e r a r a n d o m v a r i a b l e Z t h a t has the u n i f o r m d i s t r i b u t i o n o n (a c, a + c); t h e n f r o m the c o n d i t i o n s i m p o s e d o n k, we m a y d e d u c e that there exists a ~ ~ a n d c > 0 such t h a t k(y) > 0, for all y c [a - c, a + c]. Therefore, K (Z) has a n o n z e r o a b s o l u t e l y c o n t i n u o u s c o m p o n e n t a n d , hence, K(Z) verifies the usual Crambr's condition: sup
~ " + ' : e x p {irK(z)} dz < 2cD2(c~), for s o m e positive c o n s t a n t D2(3) < 1. (13)
In c o n s e q u e n c e , supl,l>,~[-2] < 2h(n)c(Co + D2(/~))F'(x). N o w , i n e q u a l i t y (3) follows t a k i n g D(x, ~) = 2c(1 - 2Co - D2(i~))F'(x) a n d ,q(n) = h(n).
A.2. Pcoof o f Lemma 3.3 I n e q u a l i t y (5) c a n
be r e d u c e d
to
proving
that,
for s o m e
Ds(x,~)> 0 and
D4(x, ~) > 0: [3] = s u p [E(exp{it~Al.. + it2Bl.,,})[ < 1 -- h~(n)D3(x, 6), It_,l > ,~
[4] =
sup
[E(exp{it~A~.,, + itzBl.,})l < 1 - h(n)D4(x,O),
It, I > o, 11_,1 ~< ~
First. we will t a k e i n t o a c c o u n t that for all a c ~ a n d t: > 0: [3] ~< I - hi(n)
F'(x - hl(n)z)dz
exp~,tN~+it2K"(z)
+ s u p h,(n) I t : l > ,~
-c
e'(x-h,lnizt
,
I n p a r t i c u l a r , t h e r e exists a 6 ~ a n d c > 0 such t h a t K ' " exists o n [a - ~:, a + c], t a k e n o n z e r o v a l u e s there a n d O ¢ ( a - ~:,a + ~,). T h e n , c o n s i d e r a r a n d o m v a r i a b l e
P.H. Garcia-Soidhn et al. /Journal of Statistical Planning and Inference 65 (1997) 213--231
230
Z1 uniformly distributed on (a - ~, a + e), to obtain that Cramhr's condition:
K"(Z1) verifies the usual
sup fa '+~ exp{itK"(z)} dz < 2~.D5(6), for some positive constant
D5(3) < 1. (14)
In addition, k is c o m p a c t l y supported and 0 ¢ ( a - e,a + e); then, by h12 and for large n, one has K(hl(n)z/h(n))= d~{0,1}, for all z e ( a - e,a + e), for large n. Therefore,
f a+~exp f i t l K \ (hl(n)z~ h(n) J +it2K''(z) } f ' ( X - h l ( n ) z ) dz K"(z)}F'(x -- hl(n)z)dz
= t'+~exp{it2
(15)
da
Then, the inequality holds for [3] by (12), (14) and (15). O n the other hand, a+e
[4] ~< 1 -
h(n)
I
Ja
+
F'(x - h(n)z)dz ¢
,,~, _~sup~,,,,,~ ~ h(,O
<<.1 - h(n) +
ih~);
~ ~ - hi,)zl
F'(x - h(n)z)dz
sup h(n) It21~<6,1t,I> ~
× exp {it1K(z)}
+
.Jo ~ . - ~exp i,,.,,:(zt + , , ~ , ,
sup
exp ~
1
it2K"(h(n)z~; \hl(n)JJ 1
F'(x -- h(n)z) dz
h(n) fl +~ exp{itlK(z)}F'(x - h(n)z)dz .
It2l ~< 6,1ttl > zJ
(16)
-e
Given ~ > 0, bear in mind that lexp{iy} - 11 < ~1, for all y such that [Yl < 61, for some positive constant 61. Then, by hi2 and for large n, one has
exp{it2K''(h(n)z)~\hl(n)}j --1
for all
z~(a - ~, a + ~) and
[t2l
< ~:,
~< ~. In consequence, by (12), (16) and (17), we obtain
[4] ~< 1 - 2h(n)~[1 - eo - el(1 + eo)] + sup
(17)
h(n) f l +~exp{it~K(z)}F'(x- h(n)z) dz .
It~l > 6
Now, we m a y use (13) as in l e m m a 3.1 to conclude this proof.
P.H. Garcia-SoMdn et al. /Journal Ol Statistical Planning and lqfbrence 65 (1997) 213 231
231
References Altman, N.. Leger, C., 1995. Bandwidth selection for kernel distribution flmction estimation. J. Statist. Plann. Inference 46, 195 214. Azzalini, A., 1981. A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68. 1,326 328. Bhattacharya, RN.. Denker, M., 1990. Asymptotic Statistics. Birkhauser Verlag, Berlin. Bhattacharya, R.N.. Ghosh. J.K., 1978. On lhe validity of the fornlal Edgeworth expansion, Ann. Statist. 6 (2), 434 451. Bhattacharya, R.N.. Ranga Rao, R.. 1976. Normal Approximation and Asymptotic Expansion,, Wilc~. New Y o r k De Angelis. D., Young, G.A., 1992. Smoothing the bootstrap, lnternat. Statist. Rev. 60 11 ), 45 56 Efron, B., 1979. Bootstrap methods: another look at the Jackknife. Ann. Statist. 7, 1 26. Falk, F.Y.. 1983. Relative efficiency and deficiency of kernel type estimators of distribution [unctions. Statist. Neerlandica 37. 2, 73 83. Garcia-Soidim. P., 1994. Generalizacidn de los desarrollos de Edgeworth. Aplicacion a la cstimaci6n de la funcion de distribucion. Ph.D. Thesis, Universidad de Santiago. Garcia-Soidim, P.. 1995. Edgeworth expansions for triangular arrays. Under revision. ttall, P., 1991. Edgeworth expansions ['or nonparametric density estimators, with applications. Statistics 22, 215 232. Hall. P., 1992. "l'he Bootstrap and Edgeworth Expansion. Springer, New York. Hall P.. Martin, A.M.. 1988. Exact convergence rage of bootstrap quantile variance estimator. Proban. Theory Rel. 80. 261 268. Hall, P.. Wood. A.T.A.. 1995. On the effect of smoothing on the coverage accuracy of confidence bands for a distribution function. Unpublished manuscript. Hall, P.. DiCiccio, T.,I.. Romano, J.P.. 1989. On smoothing arid the bootstrap. Ann. Statist. Plann. Inference 17, 692 7(/4. Hoadley. A.B., 1967. On the probability of large deviations of fimctions of several elnpirical cdf's. Aim. Math. Statist. 38, 360 381. Le}enne. M., Sarda, P., 1992, Smooth estimators of distribution and density functions. Con~pnt. Stati>a. Data Anal. 14, 457 471. Nadaraya. E.A.. 1964. Some new estimates for distribution functions. Theory of l'robah. Appl. 497 ,',(10. Saner, I.N.. 1957. On the probability of large deviations of random variables. Sel. Trans. Math. Statist. Probab. 1. 213 244. Sarda, P.. 1993. Smoothing parameter selection for smooth distribution functions. J. Statist. Plann. Inference 35, 65 75. Shirahala, S.. Chu, 1.. 1992. Integrated squared error of kernel-type estimator of distribution function. Aim. Inst. Statist. Math. 44. 3, 579 591. Silverman, B.W., Young, G.A., 1987. The bootstrap: to smooth or not to smooth?. Biometrika 74. 469 479. Swanepoel, JH., 1987. Mean integrated squared error properties and optimal kernels when estimating adislribution function. C o m m u n . Statist. Theory 17. 11. 3785 3799. Wang, S., 1995. Optimizing the smoothed bootstrap. Ann. Inst. Statist. Math. 47, 1, 65 80. Winter, B.B.. 1973. Strong uniform consistency' of integrals of density estimators. ('anad. J. Statist. 1, ;247 253. g a m e t e . H., 1973. Uniform convergence of an estimation of distribution functions. Bull. Math. Statist. 15. 69 78.