Statistics and Probability Letters 121 (2017) 90–98
Contents lists available at ScienceDirect
Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro
A note on estimating cumulative distribution functions by the use of convolution power kernels Benedikt Funke ∗ , Christian Palmes TU Dortmund, Germany
article
abstract
info
Article history: Received 12 April 2016 Received in revised form 4 October 2016 Accepted 5 October 2016 Available online 15 October 2016
Our paper investigates the nonparametric estimation of cumulative distribution functions of nonnegative valued random variables using convolution power kernels. Our proposed consistent estimator avoids boundary effects near the origin. We present its asymptotic properties and give a short simulation study. © 2016 Elsevier B.V. All rights reserved.
Keywords: Distribution function estimation Mean squared error Boundary bias Convolution power kernels
1. Introduction Let X1 , . . . , Xn be univariate i.i.d. random variables with unknown cumulative distribution function F and density f which is supported on (0, ∞). When estimating the continuous function F , it seems to be natural to use smooth estimators rather than the empirical distribution function Fˆn (x) :=
n 1
n i =1
1Xi ≤x ,
(1.1)
which is discontinuous by its definition. A natural choice for the continuous estimation of F at a given point x ∈ (0, ∞) is the use of the kernel estimator F¯n (x) :=
n 1
n i =1
K
x − Xi h
,
(1.2)
where K (u) :=
u
k(z )dz −∞
and k is a symmetric probability density function with finite second moment. As usual, h ≡ hn denotes the bandwidth which fulfills h → 0 as n → ∞; see for example Nadaraya (1964) or Jones (1990) for classical references in this context. Asymptotic properties as well as bandwidth selection methods have examined by many authors. We refer to Swanepoel
∗
Corresponding author. E-mail address:
[email protected] (B. Funke).
http://dx.doi.org/10.1016/j.spl.2016.10.004 0167-7152/© 2016 Elsevier B.V. All rights reserved.
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
91
(1988), Altman and Léger (1995), Bowman et al. (1998), Liu and Yang (2008), Giné and Nickl (2009) and Tenreiro (2013) and the references therein. In our context, it is well known that F¯n is an asymptotically unbiased estimator of the unknown distribution function F in the interior region I := {x ∈ (0, ∞)|xh−1 → ∞} iff h → 0. Moreover, when F is two times continuously differentiable it holds that Bias(Fˆn (x)) = E [F¯n (x)] − F (x) = O(h2 ),
as n → ∞, x ∈ I.
However, it is also well known that F¯n suffers from boundary effects. For x ∈ B , where B := {x ∈ (0, ∞)|∃κ > 0 : xh−1 → κ} denotes the boundary region, it holds that (cf. Tenreiro (2013)): E [F¯n (x)] − F (x) = O(h),
as n → ∞.
(1.3)
Hence, in order to avoid this slower rate of convergence, boundary correction methods are required. In this note, we follow an idea, originally proposed by Comte and Genon-Catalot (2012), who investigate the nonparametric estimation of densities, using the so called convolution power kernels. The basic idea of their estimator is as follows: Consider a probability density K on (0, ∞) with ∞
zK (z )dz = 1 0
and let U 1 , . . . , Um be i.i.d. random variables with distribution K (u)du and finite second moment. The arithmetic mean m 1 U¯ m := m j=1 Uj has distribution Km (u)du where Km (u) := mK ∗ . . . ∗ K (mu) := mK ∗m (mu) and ∗ denotes the usual convolution product. By the use of the law of large numbers, the distribution of U¯ m converges weakly to δ1 , where δ1 denotes the Dirac distribution at the given point E [Ui ] = 1. For a given x ∈ (0, ∞), Comte and Genon-Catalot (2012) propose in view of these considerations, an estimator for the value f (x) according to fˆm (x) :=
n 1
nx k=1
Km
Xk
x
.
Using an elementary change of variables, one can easily obtain that E [fˆm (x)] = E [f (xU¯ m )] which should be, under certain smoothness conditions on f , close to f (x) as m is large. Comte and Genon-Catalot (2012) investigate the asymptotic rates of convergence of the mean-squared error of this estimator and proposed an adaptive selection method for the regularity parameter m. This estimator results in a pointwise consistent density estimator which is free of the boundary effect (1.3). The contribution of the present note is to adapt this method to the field of nonparametric distribution function estimation. Compared to density estimation, boundary correction methods in this context have not received much attention in the literature, yet. Koláček (2008) adapts the well-known reflection method to distribution function estimation and Tenreiro (2013) who uses boundary kernels in this context. Both mentioned methods suffer from the fact that they require the choice of additional parameters. For example, the reflection method is based on a suitable transformation, which has to be specified. The boundary kernel method depends on the definition of an appropriate boundary region in which a predetermined boundary kernel is used. By using the proposed convolution power kernel method we do not have to incorporate additional parameters besides the power m. Additionally, our estimator produces only non-negative values and is itself a distribution function as well. The latter property is not shared by the mentioned alternative boundary correction methods. Finally, our method requires that the unknown distribution function F is known to belong to a certain Hölder class, which is a quite more general assumption compared to Koláček (2008) and Tenreiro (2013). The present note is organized as follows: Section 2 is devoted to the construction of our proposed distribution function estimator as well as an investigation of its pointwise asymptotic properties. In Section 3 we present a Monte Carlo simulation study, whereas Section 4 concludes and involves an outlook for further research. 2. Convolution power kernel based distribution function estimation In this section, we focus on the construction of the convolution power kernel based distribution function estimator, inspired by Comte and Genon-Catalot (2012). At first, note that the introductory presented estimator fˆm (x) =
n 1
nx k=1
Km
Xk x
92
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
is not a density. To get a so called ‘‘bona fide estimator’’ we have to add an additional scaling term: n 1
fˆ2,m (x) :=
nx k=1
Km
Xk
Xk
x
x
.
This reformulation causes that the resulting estimator is a probability density. In this case, a change of variable results in E [fˆ2,m (x)] = E [U¯ m f (xU¯ m )] such that the incorporated factor does not influence the asymptotic behavior of the original density estimator; cf. Comte and Genon-Catalot (2012) for a further discussion. In view of these facts Fˆm (x) :=
(0,x)
fˆ2,m (t )dt =
x
fˆ2,m (t ) dt ,
x>0
(2.4)
0
is a distribution function, too. We will use this estimator for our further investigations and call it the convolution power kernel based distribution function estimator. In order to study the asymptotic behavior of our estimator, we will start with the derivation of the rates of convergence of the bias and the variance terms. First observe that Fˆm (x) =
n 1
x
fˆ2,m (t )dt =
n k=1
0
=
n 1
Xk x
n k=1 ∞
−
n k=1
Xk
Xk x
n 1
= 1−
z2
Km (z )
x 0
Xk t
K 2 m
Xk
t
dt
Xk
n 1 ∞
z
n k=1
dz = 2
Km (z )dz =: 1 −
0
n 1
n k=1
Xk x
Km (z )dz
Wm
Xk x
,
(2.5)
where Wm is the distribution function of Km . To study the variance and the bias, we need the definition of a certain Hölder class of functions with positive support. Definition 2.1. Let
Σ (β, C ) := f : (0, ∞) → R, f (l) exists and is bounded for l = ⌊β⌋ and |f (l) (x) − f (l) (y)| ≤ C |x − y|β−l be a set of functions, which are ⌊β⌋-times differentiable with bounded derivatives and whose ⌊β⌋th derivative is Hölder continuous with exponent β − ⌊β⌋ ∈ (0, 1]. Moreover, we need the following definition; cf. Comte and Genon-Catalot (2012), Definition 3.1. Definition 2.2. We say that K is a kernel of order l, if (I) K is a density on R+ satisfying ∞
tK (t )dt = 1, 0
∞
(t − 1)2 K (t )dt =: µ2 (K ) < ∞,
0
∞
K 2 (t )dt = ∥K ∥22 < ∞. 0
(II) Let the convolution kernel Km satisfy ∞
Km (u)du/u = 1 + O(1/m)
Im := 0
for large m. (III) Let
νγ :=
∞
|u − 1|γ K (u)du.
0
Then, it exists a γ ≥ 4 such that νγ < ∞. (IV) The distribution K (u)du admits moments up to order l. We are now ready to state our main theorem, namely the rates of the asymptotic mean squared error.
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
93
Theorem 2.3. Suppose that K is kernel of order l = ⌊β⌋ and that the unknown distribution function fulfills F ∈ Σ (β, C ),
β ≥ 2.
Then, as m tends to infinity, it holds that
Var(Fˆm (x)) = n−1 (1 − F (x))F (x) − O(m−1/2 ) + o((nm1/2 )−1 ) and additionally Bias(Fˆm (x)) = E [Fˆm (x)] − F (x) ≤ C (β, x, K )m−1 , where C (β, x, K ) denotes a generic and m-independent constant. In order to prove these rates, we make use of Lemma A.2 of Comte and Genon-Catalot (2012), which is stated below for the sake of completeness. Lemma 2.4. Let the density K fulfill the assumptions (I) and (III) , then it holds for all α ≤ γ that ∞
|z − 1|α Km (z )dz ≤
0
c (α) mα/2
.
Proof of Theorem 2.3. We already stated the fact that Fˆm (x) = 1 − 1n
n
i=1
Wm
Xi x
. Note that Wm is a distribution function
for all m. Using the i.i.d. assumption on the sample Xi , i = 1, . . . , n, this yields for the variance
Var(Fˆm (x)) = Var 1 −
= Var
n 1
n i =1
n 1
n i=1
Wm
Xi x
Xi
Wm
=
x
1
E Wm2
n
X1
x
− E Wm
X1
2
x
.
We start with the derivation of the first term where we use a simple substitution and partial integration as follows
E Wm2
X1
∞
=
x
Wm2 (y/x)f (y)dy = x
0
=
Wm2
∞
Wm2 (z )f (xz )dz 0
(z )F (xz ) ∞
∞ z =0
−2
∞
Wm (z )Km (z )F (xz )dz
0
Wm (z )Km (z )F (x + x(z − 1))dz .
=1−2 0
In order to be able to use Lemma 2.4, we set r := min(γ , ⌊β⌋). Now we make use of a Taylor expansion in order to derive that ∞
Wm (z )Km (z )F (x + x(z − 1))dz
1−2 0
(x(z − 1))r (r ) =1−2 Wm (z )Km (z ) F (x) + F (ξx,z ) l! r! 0 l=0 ∞ ∞ r −1 l x (l) = 1 − 2F (x) Wm (z )Km (z )dz − 2 F (x) Wm (z )Km (z )(z − 1)l dz l ! 0 0 l =1 ∞ xr −2 F (r ) (ξx,z )Wm (z )Km (z )(z − 1)r dz r! 0 ∞ r −1 l x (l) = 1 − F (x)[Wm2 (z )]∞ − 2 F ( x ) Wm (z )Km (z )(z − 1)l dz z =0 l ! 0 l =1 ∞ xr (r ) −2 F (ξx,z )Wm (z )Km (z )(z − 1)r dz r! 0 =: 1 − F (x) − Rm (x).
∞
r −1 (x(z − 1))l
(l)
94
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
Now we investigate the residual term Rm (x). Under the smoothness assumption on F it holds that
|Rm (x)| ≤ 2
r −1 l x l =1
l!
|F (l) (x)|
∞
Wm (z )Km (z )|z − 1|l dz + 2 0
xr
∞
r!
|F (r ) (ξx,z )|Wm (z )Km (z )|z − 1|r dz
0
r −1 l x c (l) xr c ( r ) . 2C1 + 2C2 . m−1/2 , l / 2 r /2 l ! m r ! m l=1
where C1 :=
sup |F (l) (x)|,
max
C2 := sup |F (r ) (x)|.
l=1,...,r −1 x∈R+
x∈R+
We will now proceed with the derivation of the expectation of our estimator as follows
∞ ∞ Wm (z )f (zx)dz Wm (y/x)f (y)dy = 1 − x =1− x 0 0 ∞ = 1 − ([Wm (z )F (zx)]∞ − Km (z )F (zx)dz ) z =0 0 ∞ =1− 1− Km (z )F (x + x(z − 1))du 0 ∞ r −1 (x(z − 1))l (l) (x(z − 1))r (r ) = Km (z ) F (x) + F (ξx,z ) dz l! r! 0 l =0 ∞ ∞ x2 F ′′ (x) ∞ ′ = F (x) Km (z )dz + xF (x) Km (z )(z − 1)dz + Km (z )(z − 1)2 dz
E [Fˆm (x)] = 1 − E Wm
X1
0 r −1
+
x l =3
l!
= F (x) + +
F (l) (x)
Km (z )(z − 1)l dz +
0
(l)
x
r
r!
∞
0
Km (z )(z − 1)r F (r ) (ξx,z )dz
0
∞
2
l!
∞
x2 F ′′ (x)
r −1 l x l =3
2
0 l
Km (z )(z − 1)2 dz 0
F (x)
∞
Km (z )(z − 1) dz + l
0
xr r!
∞
Km (z )(z − 1)r F (r ) (ξx,z )dz .
0
Making again use of Lemma 2.4, we see that the remainder terms are negligible compared to the first one:
∞ ∞ r −1 xl (l) xr l r (r ) . m−3/2 . F ( x ) K ( z )( z − 1 ) dz + K ( z )( z − 1 ) F (ξ ) dz m m x ,z l ! r ! 0 0 l =3 Hence, the bias has the following form Bias(Fˆm (x)) = E [Fˆm (x)] − F (x) =
x2 F ′′ (x)
∞
2
Km (z )(z − 1)2 dz ≤
x2 F ′′ (x)c (2)
0
2m
. m−1 .
In order to match the form of the variance, we can, obviously, conclude that
E Wm
X1
x
= 1 − F (x) + O(m−1 ).
This fact together with the derivation of the first term of the variance leads to
Var n
−1
n i =1
Wm
Xi x
=n
−1
=n
−1
E
Wm2
X1 x
− E Wm
X1
2
x
1 − F (x) − (1 − F (x))2 − O(m−1/2 ) − O(m−1 ) + o(m−1 )
= n−1 F (x)(1 − F (x)) − O (nm1/2 )−1 + o (nm1/2 )−1 . Remark 2.5. We see that the rate of the bias term cannot be improved even when F gets smoother. As in Comte and GenonCatalot (2012), convex combinations of different smoothing kernels Km could provide a remedy in the context of distribution function estimation, too.
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
95
Remark 2.6. We see, that the rate of the bias term holds uniformly for all x ∈ R+ . Hence, our estimator overcomes the boundary effect, as it was stated in our introduction. Moreover, the following corollary can easily be deduced in consequence of the upper theorem and presents the condition √ under which Fˆm (x) is n-consistent for F (x). Corollary 2.7. Under the assumptions of Theorem 2.3, the pointwise mean squared error of the proposed estimator has the following form
MSE (Fˆm (x)) = Bias(Fˆm (x))
2
+ Var(Fˆm (x)) = O m−2 + n−1 − (nm1/2 )−1
as m, n → ∞. Moreover, Fˆm (x) reaches the parametric rate, if m = O(nα ) for α ≥ 1/2. In order to compare these rates with the performance of the usual empirical distribution function Fˆn (x), we shortly remind that MSE (Fˆn (x)) = O(n−1 ) as n → ∞. Hence, from an asymptotic point of view, both rates coincide in the first order. However, optimizing the MSE of the convolution power kernel estimator leads to an improvement as the following corollary shows: Corollary 2.8. Let mopt fulfill mopt = O(n2/3 ) then the MSE exhibits the order MSE (Fˆmopt (x)) = O(n−1 − n−4/3 ) as n → ∞. As we already mentioned in our introduction, distribution function estimation has been a topic for many researchers. These rates have also been derived by Read (1972) for a distribution function estimation based on linear interpolation, by Azzalini (1981) for an usual kernel based estimator as well as by Leblanc (2012) in the context of Bernstein polynomial based nonparametric estimators. 3. Simulation study In this section we present a Monte Carlo simulation study that investigates the finite sample behavior of our newly introduced convolution power estimator. For this purpose, exponential distributions with rates λ = 0.5, 1, 1.5 and Weibull distributions with scale parameter s = 1 and shape parameters k = 0.5, 1, 1.5, 5 are estimated using the empirical distribution function (1.1), the regular kernel estimator (1.2) with standard Gaussian kernel and different bandwidth parameters and the convolution estimator (2.4) with Gamma (a = 1), inverse Gaussian (a = 1) and uniform convolution kernels, which are defined as follows: (i) Let Ui ∼ U [0, 2], then Km (x) =
m
⌊mx /2⌋
(m − 1)!2m
i=0
(−1)i
m i
(mx − 2i)m−1 1[0,2] (x).
Observe that K is, by construction, obviously not continuous. The more m increases, the more increases the regularity of the corresponding Km , too. (ii) Let Ui ∼ G(a, a),
a > 0,
where G(a, b) denotes the Gamma distribution with parameters (a, b). Due to the well-known convolution properties of the Gamma distribution, one can derive Km (x) =
(am)am am−1 −amx x e 1(0,∞) (x). Γ (am)
96
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
Fig. 1. Boundary behavior.
(iii) Let Ui ∼ IG(a, a),
a > 0,
where IG(a, b) denotes the inverse Gaussian distribution with parameters (a, b). Using some basic calculations, one can show that in this case
√
a m
exp ma2 (1 − 1/2(1/x + x)) 1(0,∞) (x). Km ( x ) = √ 2π x3
In all considered cases average MSE values are calculated. Special focus is dedicated to the zero boundary due to the expected boundary correction behavior of our convolution kernel estimator. Based on a realization of n = 100 i.i.d. Weibull random variables with scale parameter s = 1 and shape parameter k = 0.5 resp. k = 1, the top two plots in Fig. 1 depict the result of the regular kernel estimator with bandwidth h = 0.2 ≈ 100−1/3 and the Gamma convolution kernel estimator with m = 40 ≈ 2n2/3 on the interval (0, 0.1]. The bottom two plots present approximations of the MSE based on 10 000 Monte Carlo samples. Observe the larger boundary errors of the regular kernel estimator in both parameter settings in comparison to the smaller boundary errors using our convolution kernel method. Let :=MSEemp
∂ RMSEaverage = 102
10−2
MSE(x) dx
102
0
≈
102 k=1
10−2
MSEemp (x) dx 0
MSE k · 10−4
102
MSEemp k · 10−4
k=1
denote the relative (R) average MSE based on n = 100 i.i.d. samples around the boundary, i.e. on (0, 0.01]. To be more precise, MSE(x) denotes the MSE of the regular kernel estimator (1.2) resp. the convolution kernel estimator (2.4) and MSEemp (x) denotes the MSE of the empirical distribution (1.1) at some point x > 0. Table 1 presents such relative average boundary errors for all mentioned exponential and Weibull distributions using the kernel estimator (1.2) with bandwidths h = 0.01, 0.05, 0.1, 0.2 (≈n−1/3 ), 0.3, 0.5 and the convolution kernel estimator (2.4) with Gamma (m = 40), inverse Gaussian (m = 40) and uniform (m = 10) convolution kernels. All values are approximated via Monte Carlo Simulations based on 10 000 samples. The integral in (2.4) is evaluated using the representation (2.5) together with suitable MATLAB build-in functions. The boundary errors of the regular kernel estimators are increasing as h increases. This fact occurs, as expected, since a larger bandwidth has more difficulties to handle the vanishing probability mass at the origin. Observe that the boundary errors of the convolution kernel estimators are in all cases, except for the h = 0.01 case, significantly
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
97
Table 1 Relative boundary average MSE values. Exp(λ) MSEemp
Wbl(1, k)
λ = 0.5
λ=1
λ = 1.5
k = 0.5
k=1
k = 1.5
k=5
2.5e−5
4.8e−5
7.4e−5
6.0e−4
4.9e−5
4.2e−6
9.5e−22
0.6837 4.3932 15.441 56.317 118.53 292.09
0.7162 7.2590 27.518 99.419 201.19 454.67
0.7827 9.6546 36.428 126.35 244.83 511.45
0.6641 2.1629 5.8189 14.282 22.702 38.216
0.7153 7.0632 26.877 97.177 196.62 444.21
0.8070 9.3777 52.412 328.16 938.02 3169.0
1.0000 6.8e11 2.7e13 1.8e15 5.5e16 2.3e18
0.9090 0.9095 0.8947
0.9086 0.9091 0.8941
0.9096 0.9100 0.8955
0.9512 0.9516 0.9422
0.9094 0.9098 0.8950
0.8809 0.8811 0.8644
1.0000 1.0000 1.0000
λ = 0.5
λ=1
λ = 1.5
k = 0.5
k=1
k = 1.5
k=5
9.7e−4
4.9e−4
3.4e−4
1.1e−3
4.9e−4
3.3e−4
1.2e−4
0.9943 0.9719 0.9448 0.8976 0.8649 0.8618
0.9897 0.9471 0.8982 0.8401 0.8648 1.2394
0.9841 0.9228 0.8588 0.8386 1.0352 2.2036
0.9963 0.9819 0.9731 0.9765 1.0071 1.1494
0.9888 0.9458 0.8967 0.8382 0.8632 1.2412
0.9828 0.9195 0.8535 0.8197 1.0111 2.3475
0.9546 0.8128 0.9046 3.3862 9.9861 32.6212
0.8749 0.8762 0.8743
0.8787 0.8799 0.8802
0.8788 0.8800 0.8799
0.9203 0.9212 0.9107
0.8767 0.8779 0.8782
0.8654 0.8658 0.8964
2.1915 2.1540 3.2652
Kernel h h h h h h
= 0.01 = 0.05 = 0.1 = 0.2 = 0.3 = 0.5
Conv-Kernel Gamma Inverse-Gaussian Uniform
Table 2 Relative average MSE values. Exp(λ) MSEemp
Wbl(1, k)
Kernel h h h h h h
= 0.01 = 0.05 = 0.1 = 0.2 = 0.3 = 0.5
Conv-Kernel Gamma Inverse-Gaussian Uniform
smaller than the boundary errors of the regular kernel estimators. However, due to insufficient smoothing, the h = 0.01 estimator has the drawback that it produces larger errors if we consider the case bounded away from the origin, cf. Table 2. It is remarkable that, excluding the Wbl(1, 5) distribution, all boundary errors of our convolution kernel estimator are even smaller than the boundary error of the empirical distribution that does not suffer from a symmetric kernel smoothing. The 1.0000 values in the last column of Table 1 are due to Wbl(1, 5)((0, 0.01]) ≈ 10−10 ≪ 1. As a result every Weibull realization of our Monte Carlo simulation is greater than 0.01 and the empirical distribution, all convolution kernel estimators and the regular kernel estimator with h = 0.01 yield zero on (0, 0.01]. This explains the first value of the last column as well, since
10−2
9.5e−22 ≈ 0 Finally, let
|F (x) − 0|2 dx where F denotes the cdf of the Wbl(1, 5) distribution.
RMSEaverage =
≈
1 10 − 102 k=1
10−1
10 10−1
MSE k · 10−1
MSE(x) dx
102
1 10 −
10−1
MSEemp k · 10−1
10
10−1
MSEemp (x) dx
k=1
denote the relative (R) average MSE on the interval [0.1, 10]. Observe that all seven distributions in this simulation study have over 95% mass on the interval [0, 10]. Table 2 states the RMSEaverage values of the exponential and Weibull distributions in our parameter settings. The values are calculated using 10 000 Monte Carlo samples. The convolution kernel estimator behaves in average similar to the regular kernel estimator with optimal bandwidth h = 0.2 bounded away from the origin. Further, in all cases the choice of the convolution kernel does not significantly influence the RMSE. We have performed upon simulations using the three convolution kernels with different m ∈ {5, 7, 9} ∪ {10 · k : k = 1, . . . , 10}. Values of m close to 40 in the inverse Gaussian and Gamma cases and values of m close to 10 in the uniform case yield optimal results in our settings. Summarizing, our numerical results clearly demonstrate that the convolution kernel estimator removes the boundary errors of the regular kernel estimator. Moreover, bounded away from the origin, the MSE of the convolution kernel estimator
98
B. Funke, C. Palmes / Statistics and Probability Letters 121 (2017) 90–98
behaves in average similar to the MSE of the regular kernel estimator with optimal bandwidth. In addition, both at the boundary and bounded away from the origin, the average MSE of the convolution power estimator is lower than the average MSE of the empirical distribution. 4. Conclusion In this paper, a new nonparametric method for estimating unknown continuous distribution functions is presented. Motivated by the newly proposed convolution kernel method for density estimation by Comte and Genon-Catalot (2012), we extend their idea to the case of estimating distribution functions. When best implemented, we showed that our estimator reaches the parametric rate and, thus, the empirical as well as the convolution kernel estimator are equivalent in terms of the MSE up to the first-order. Moreover, when considering the second-order convergence rate, it turns out that our estimator asymptotically outperforms the empirical one by choosing m carefully according to Corollary 2.8. Moreover, compared to a symmetric kernel based distribution function estimator (cf. Jones, 1990), our estimator is free of boundary effects. It would be preferable to investigate more sophisticated methods for selecting the smoothing parameter m as, for example, cross-validation-based methods. Moreover, the proposed estimator could be used in the context of nonparametric estimation of the receiver operating characteristic (ROC), which is a curve that analyzes the accuracy of a binary decision test. Lloyd and Yong (1999) as well as Jokiel-Rokita and Pulit (2013) showed that smooth distribution function estimators perform better than the empirical counterpart and, hence, it would be interesting to investigate the performance of the convolution power kernel based estimator as a viable alternative. References Altman, N., Léger, C., 1995. Bandwidth selection for kernel distribution function estimation. J. Statist. Plann. Inference 46, 195–214. Azzalini, A., 1981. A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika 68, 326–328. Bowman, A., Hall, P., Prvan, T., 1998. Bandwidth selection for the smoothing of distribution functions. Biometrika 85, 799–808. Comte, F., Genon-Catalot, V., 2012. Convolution power kernels for density estimation. J. Statist. Plann. Inference 142 (7), 1698–1715. Giné, E., Nickl, R., 2009. An exponential inequality for the distribution function of the kernel density estimator with applications to adaptive estimation. Probab. Theory Related Fields 143, 549–596. Jokiel-Rokita, A., Pulit, M., 2013. Nonparametric estimation of the ROC curve based on smoothed empirical distribution functions. Stat. Comput. 23 (6), 703–712. Jones, M.C., 1990. The performance of kernel density functions in kernel distribution function estimation. Statist. Probab. Lett. 9, 129–132. Koláček, J., 2008. An improved estimator for removing boundary bias in kernel cumulative distribution function estimation. In: Proceedings in Computational Statistics. COMPSTAT’08. Springer, New York, pp. 549–556. Leblanc, A., 2012. On estimating distribution functions using Bernstein polynomials. Ann. Inst. Statist. Math. 64, 919–943. Liu, R., Yang, L., 2008. Kernel estimation of multivariate cumulative distribution function. J. Nonparametr. Stat. 20, 661–677. Lloyd, C.J., Yong, Z., 1999. Kernel estimators of the ROC curve are better than empirical. Statist. Probab. Lett. 44 (3), 221–228. Nadaraya, E.A., 1964. Some new estimates for distribution functions. Theory Probab. Appl. 9, 497–500. Read, R.R., 1972. Asymptotic inadmissibility of the sample distribution function. Ann. Math. Statist. 43, 89–95. Swanepoel, J.W.H., 1988. Mean integrated squared error properties and optimal kernels when estimating a distribution function. Comm. Statist. Theory Methods 17, 3785–3799. Tenreiro, C., 2013. Boundary kernels for distribution function estimation. REVSTAT - Statist. J. 17 (2), 169–190.