Reducing the mean squared error in kernel density estimation

Reducing the mean squared error in kernel density estimation

Journal of the Korean Statistical Society ( ) – Contents lists available at SciVerse ScienceDirect Journal of the Korean Statistical Society journ...

622KB Sizes 0 Downloads 112 Views

Journal of the Korean Statistical Society (

)



Contents lists available at SciVerse ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Reducing the mean squared error in kernel density estimation Jinmi Kim, Choongrak Kim ∗ Department of Statistics, Pusan National University, Pusan, 609-735, Republic of Korea

article

info

Article history: Received 20 August 2012 Accepted 25 December 2012 Available online xxxx AMS 2000 subject classifications: primary 62G07 secondary 62G20

abstract In this article, we propose a version of a kernel density estimator which reduces the mean squared error of the existing kernel density estimator by combining bias reduction and variance reduction techniques. Its theoretical properties are investigated, and a Monte Carlo simulation study supporting theoretical results on the proposed estimator is given. © 2013 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

Keywords: Bias reduction Higher-order kernel Skewing method Variance reduction

1. Introduction Kernel density estimation is the most widely used nonparametric method in the univariate case. Good references in this area are Silverman (1986) and Wand and Jones (1995), among others. One typical feature of the kernel density estimator is that it underestimates at peaks and overestimates at troughs, and accordingly entails large bias in those regions. In an effort to overcome this problem, there have been many proposals for reducing the bias. Among them, the simplest approach, with a long history, is using a higher-order kernel (Bartlett, 1963; Muller, 1984; Parzen, 1962; Schucany & Sommers, 1977). Other proposals include the local kernel density estimator (Samiuddin & El-sayyad, 1990), the variable kernel density estimator called the adaptive kernel estimator (Abramson, 1982; Breiman, Meisel, & Purcell, 1977; George & David, 1992), the transformation kernel density estimator (Ruppert & Cline, 1994), or the multiplicative method of bias correction (Jones, Linton, & Nielsen, 1995). For an excellent review on these subjects, see Jones and Signorini (1997). As a semiparametric method, Cheng, Choi, Fan, and Hall (2000) suggested a locally parametric density estimator using both parametric and nonparametric techniques. On the other hand, Kim, Kim, and Park (2003) suggested a skewing method to reduce the order of bias from the second power of the bandwidth to the fourth power at the expense of a slight increase in variance by a constant factor. In fact, their method is motivated by a convex combination method reducing the bias in local linear regression (Fan & Gijbels, 1996) by Choi and Hall (1998), and the skewing method can be regarded a version of the generalized jackknifing approach of Schucany and Sommers (1977) and Jones and Foster (1993). Recently, Cheng, Peng, and Wu (2007) suggested a technique for reducing the variance in local linear regression using the idea of the skewing method. They showed that the variance reducing technique reduces the constant factor of the leading term of the asymptotic variance while the order of the leading term of the bias remains the same. Therefore, the variance reducing technique of Cheng et al. (2007) can be regarded as an analogue of the variance reduction version of the skewing



Corresponding author. Tel.: +82 10 5043 2284. E-mail address: [email protected] (C. Kim).

1226-3192/$ – see front matter © 2013 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.jkss.2012.12.003

2

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



method in local linear regression, while the bias reducing technique of Choi and Hall (1998) is a bias reduction version of the skewing method. A natural and intuitive application of the variance reduction technique in local linear regression is obtaining an analogue version in kernel density estimation. In this paper, we propose a version of a kernel density estimator reducing the mean squared error by combining the ideas of both bias reduction and variance reduction techniques. We show, theoretically and numerically, that the proposed estimator has smaller bias and smaller variance than the kernel density estimator. In Section 2, the kernel density estimator with asymptotic properties is introduced, and the results on the bias reduction estimator of Kim et al. (2003) and its asymptotic properties are given. Also, we derive several versions of a variance reduction estimator using the idea of Cheng et al. (2007). In Section 3, we propose a mean squared error reduction estimator by combining the ideas of both bias reduction and variance reduction techniques. Extensive numerical studies on the proposed estimators are given in Section 4. Concluding remarks are given in Section 5. 2. Bias and variance reduction Let X1 , . . . , Xn be a random sample from a distribution with an unknown density f which we wish to estimate. The classical kernel density estimator of f with kernel K is defined by n 1 

fˆ (x) =

nh i=1

 K

x − Xi



h

,

(1)

where h is a positive smoothing parameter, called the bandwidth or the window width. Usually, kernel K is chosen to be a unimodal probability density function that is symmetric about zero, i.e.,



K (u)du = 1,

K (u) = K (−u).

Provided that f is twice continuously differentiable, the asymptotic bias and variance of fˆ (x) are, as n → ∞, h → 0 and nh → ∞, given by E [fˆ (x)] − f (x) =

1 2

h2 f ′′ (x)µ2 + o(h2 )

(2)

and Var [fˆ (x)] =

1 nh

f ( x)

respectively, where µ2 =

 

K 2 (t )dt + o



1 nh



,

(3)

t 2 K (t )dt.

2.1. Bias reduction estimator The classical kernel density estimator fˆ (x) underestimates at peaks and overestimates at troughs, and accordingly entails large bias in those regions. To reduce the bias of fˆ (x), many methods have been suggested so far. Here, we introduce a skewing method, suggested by Kim et al. (2003), which is defined as fˆB (x) =

λ1 fˆ1 (x) + fˆ (x) + λ2 fˆ2 (x) , λ1 + 1 + λ2

where λ1 , λ2 > 0 are weights, l1 < 0, l2 > 0 are constants to be determined, fˆj (x) = fˆ (x + lj h) − lj hfˆ ′ (x + lj h) for j = 1, 2, and fˆ ′ (x) is the first derivative of fˆ (x), i.e., fˆB (x) is a convex combination of fˆ1 (x), fˆ (x), and fˆ2 (x). They showed that, by choosing λ1 = λ2 ≡ λ and −l1 = l2 ≡ l(λ) with



l(λ) = (1 + 2λ)µ2 /(2λ) where µk =



1/2

≡ l,

t k K (t )dt, the bias of fˆB (x) is O(h4 ), while that of fˆ (x) is O(h2 ). With these choices of constants, the estimator

fˆB (x) can be written as

  n  x − Xi ˆfB (x) = 1 , KB nh i=1

h

(4)

J. Kim, C. Kim / Journal of the Korean Statistical Society (

(a) r ∈ (0, 1).

)



3

(b) r ∈ (−1, 0).

Fig. 1. Three equally spaced points αx,j , j = 0, 1, 2. (a) r ∈ (0, 1) and (b) r ∈ (−1, 0).

where KB (x) = (2λ + 1)

−1



 K (x) + λ K (x + l) + K (x − l) − l{K (x + l) − K (x − l)} . 





(5)

Note that KB (x) in (5) is actually a fourth-order kernel. As an asymptotic result for fˆB (x), Kim et al. (2003) showed that E [fˆB (x)] − f (x) =

1 24

 4 (4)

h f

(x) µ4 −

3(1 + 6λ) 2λ

 µ

2 2

+ o( h 4 )

and 1 f (x)V (λ) + o Var [fˆB (x)] =



nh

respectively, where V (λ) =



1 nh



,

KB2 (x)dx.

2.2. Variance reduction estimator Recently, Cheng et al. (2007) suggested a technique for reducing the variance in local linear regression using the idea of the skewing method. They showed that the variance reduction technique reduces the constant factor of the leading term of the asymptotic variance while the order of the leading term of the bias remains the same. Therefore, the variance reduction technique of Cheng et al. (2007) can be regarded as an analogue of the variance reduction version of the skewing method in local linear regression, while the bias reduction technique of Choi and Hall (1998) is a bias reduction version of the skewing method. A natural and intuitive application of the variance reduction technique in local linear regression is obtaining an analogue version in kernel density estimation, and this is quite straightforward, as mentioned by Cheng et al. (2007). In this section, we apply the variance reduction technique in local linear regression to the kernel density estimation problem, and we use the same notation and methods as in Cheng et al. (2007). For any x, define three equally spaced points αx,j = x − (r + 1 − j)δ h, j = 0, 1, 2, such that

αx,0 = αx,1 − δ h,

α x ,1 = x − r δ h ,

αx,2 = αx,1 + δ h,

where δ > 0 is a binwidth and h is a bandwidth. Note that x = αx,1 + r δ h ∈ [αx,0 , αx,2 ] for −1 ≤ r ≤ 1 and δ h = αx,1 − αx,0 = αx,2 − αx,1 , as depicted in Fig. 1. Construct a linear combination of fˆ (αx,0 ), fˆ (αx,1 ) and fˆ (αx,2 ), and then the variance reduction estimator of f at x is defined by



fˆV (x) =

Aj (r )fˆ (αx,j ),

j=0,1,2

where A0 (r ) =

1 2

r (r − 1),

A1 (r ) = 1 − r 2 ,

A2 (r ) =

1 2

r (r + 1)

and fˆ (αx,j ) =

n 1 

nh i=1

 K

αx,j − Xi h

n

=

 1  x − Xi nh i=1

K



h

 − (r + 1 − j)δ .

In fact, the estimator fˆV (x) can be written as

  n  x − Xi ˆfV (x) = 1 KV , nh i=1

h

(6)

4

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



where



KV ( x ) =

Aj (r )K (x − (r + 1 − j)δ).

(7)

j=0,1,2

Then it can be shown that fˆV (x) has the same asymptotic bias as fˆ (x) and a smaller asymptotic variance than fˆ (x). First, we compute the mean and variance of fˆV (x) in Theorem 2.1 (the proof is quite straightforward). Theorem 2.1. Assume that f has second bounded and continuous derivatives in a neighbourhood of x, that the kernel K is nonnegative, bounded, and symmetric density function, and that h → 0 and nh → ∞ as n → ∞. Then, the asymptotic bias and variance of fˆV (x) are 1

E [fˆV (x)] − f (x) =

h2 f ′′ (x)µ2 + o(h2 )

2

and 1

Var [fˆV (x)] =

nh

f (x)V (δ) + o



1



nh

,

respectively, where V (δ) =

 

=

KV2 (t )dt K 2 (t )dt − r 2 (1 − r 2 )C (δ).

(8)

Here, C (δ) = 1.5C (0, δ) − 2C (0.5, δ) + 0.5C (1, δ), with C (a, δ) =



K (t − aδ)K (t + aδ)dt .

Note that fˆV (x) and fˆ (x) have the same leading term in asymptotic bias, since, for any symmetric kernel function K , C (δ) is nonnegative for any δ ≥ 0. Therefore, the asymptotic variance of fˆV (x) is smaller than that of fˆ (x) by the amount 1 nh

f (x)r 2 (1 − r 2 )C (δ).



Note that 0 < r 2 (1 − r 2 ) ≤ 1/4 for any r ∈ (−1, 1) \ {0}, and it attains its maximum 1/4 at r = ±1/ 2. 2.3. Improved estimators in variance reduction 2.3.1. Positively and negatively skewed estimators



Since r = ±1/ 2 give minimum variance in fˆV (x), we define new estimators as follows:





fˆ± (x) =

Aj



1

±√

2

j=0,1,2

fˆ (αx,j ).

To express fˆ± (x) as a kernel type estimator, note that

 A0

1

 =



2   1

1 4

(1 −



2) = A2





1

−√

2

≡ A0 ,

  1 = A1 − √ ≡ A1 , 2 2 2     √ 1 1 1 A2 √ = (1 + 2) = A0 − √ ≡ A2 . A1



2

=

1

4

2

Then, K+ (x) =



 Aj

1

= A0 K x −

K



2

j=0,1,2



 



 x−

  √ +1−j δ 1

2

        1 1 1 √ + 1 δ + A1 K x − √ δ + A2 K x − √ − 1 δ 2

2

2

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



5

and





K− (x) =

 

1

−√

Aj

2

j=0,1,2





= A0 K x +

x−

K

   1 −√ + 1 − j δ 2

        1 1 1 √ + 1 δ + A1 K x + √ δ + A2 K x + √ − 1 δ . 2

2

2

Therefore, n 1 

fˆ± (x) =

nh i=1

 K±

x − Xi



h

,

(9)

where



K± (x) =

Aj

      1 1 ±√ K x − ±√ + 1 − j δ . 2

j=0,1,2

2

The asymptotic property of fˆ± (x) is straightforward from Theorem 2.1, and it is stated in the following corollary. Corollary 2.1. Under the conditions in Theorem 2.1, the asymptotic bias and variance of fˆ± (x) are E [fˆ± (x)] − f (x) =

1 2

h2 f ′′ (x)µ2 + o(h2 )

and 1 Var [fˆ± (x)] = f (x)



K (t )dt − 2

nh

C (δ)



 +o

4

1



nh

,

respectively. 2.3.2. Average estimator As both fˆ+ (x) and fˆ− (x) are not symmetric in the sense of using data information around x, it is reasonable to take the average, 1

fˆA (x) =

2



fˆ+ (x) + fˆ− (x) ,

which can be written as

  n  x − Xi ˆfA (x) = 1 , KA nh i=1

(10)

h

where KA (x) =

=

1 2

K+ (x) + K− (x)



 

1  2 j=0,1,2

 x−

Aj K

      1 √ +1−j δ +K x+ √ +1−j δ , 1

2

2

with A0 =

1 4

(1 −



2),

A1 =

1 2

,

A2 =

1 4

(1 +



2).

Corollary 2.2. Under the conditions in Theorem 2.1, the asymptotic bias and variance of fˆA (x) are E [fˆA (x)] − f (x) =

1 2

h2 f ′′ (x)µ2 + o(h2 )

and Var [fˆA (x)] =

1 nh

f (x)



K (t )dt − 2

C (δ) 4



D(δ) 2



 +o

1 nh



,

(11)

6

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



respectively, where



D(δ) =

C (δ)

K (t )dt − 2

4



K+ (t )K− (t )dt ,



with



K+ (t )K− (t )dt =

√ √ √ √ √ (3 − 2 2)C ( 2 + 2, δ/2) + (3 + 2 2)C (2 − 2, δ/2) + 2C ( 2, δ/2) 16  √ √ √ √ + 4(1 − 2)C ( 2 + 1, δ/2) + 4(1 + 2)C ( 2 − 1, δ/2) . 1 

Note that the quantity D(δ) is nonnegative for any δ ≥ 0, and that Corollaries 2.1 and 2.2 are analogues of Corollary 1 and Theorem 2 of Cheng et al. (2007). 3. Mean squared error reduction So far, we have considered reducing the bias and variance in kernel density estimation, and these are done separately. A more desirable approach in kernel density estimation would be reducing both the bias and the variance simultaneously, i.e., we develop kernel density estimators reducing the mean squared error (MSE), which is sum of the squared bias and variance. In fact, we may consider two versions of reducing the MSE: one is a variance reduction estimator equipped with a bias reducing kernel instead of using the ordinary kernel, and the other is a bias reduction estimator equipped with a variance reducing kernel instead of using the ordinary kernel. But, it is not difficult to recognize that the two versions give the same estimator, since both the bias reduction and the variance reduction are based on linear combinations of kernel estimators. 3.1. Mean squared error reduction estimator We consider a variance reduction estimator equipped with the bias reducing kernel KB . There are several versions for the variance reduction estimator in Section 2, and of them we use the average estimator fˆA because it gives the best result in reducing the variance. Therefore, we denote the variance reduction estimator equipped with the bias reducing kernel KB by fˆM , and accordingly this estimator is defined by fˆM (x) =

n 1 

nh i=1

 KM

x − Xi



h

,

where KM (x) =



1  2 j=0,1,2



Aj KB

 x−

      1 1 √ + 1 − j δ + KB x + √ + 1 − j δ , 2

2

with A0 =

1 4

(1 −



2),

A1 =

1 2

,

A2 =

1 4

(1 +



2),

and KB (x) in Eq. (5). Hence, the proposed estimator fˆM depends on three parameters: h (bandwidth), λ (skewing parameter), and δ (binwidth). Note that the variance reduction estimator equipped with the bias reduction kernel KB is equivalent to the bias reduction estimator equipped with the variance reducing kernel KA . Therefore, KM also can be written as follows:





KM (x) = (2λ + 1)−1 KA (x) + λ KA (x + l) + KA (x − l) − l{KA′ (x + l) − KA′ (x − l)}



,

with KA (x) in Eq. (11). Theorem 3.1. Assume that the density f has four  bounded and continuous derivatives in a neighbourhood of x, that the kernel K is nonnegative, bounded, and symmetric with K = 1 and finite fourth moment, and that h → 0 and nh → ∞ as n → ∞. Then, the asymptotic bias and variance of fˆM (x) are

  1 4 (4) 3(1 + 6λ) 2 3 4 ˆ E [fM (x)] − f (x) = h f ( x ) µ4 − µ2 − δ + o(h4 ) 24 2λ 4 and Var [fˆM (x)] =

1 nh

f (x)V (λ, δ) + o



1 nh



,

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



7

Fig. 2. Plots of KM for some δ and λ based on the Gaussian kernel.

Fig. 3. Plots of KM for some δ and λ based on the Epanechnikov kernel.

respectively, where V (λ, δ) =

 

=

KM2 (t )dt KB2 (t )dt −

C (λ, δ) 4



D(λ, δ) 2

(12)

and C (λ, δ) =

3 2

CB (0, δ) − 2CB (0.5, δ) +

1 2

CB (1, δ)

and D(λ, δ) =

C (λ, δ)

√ √ 1  − (3 − 2 2)CB ( 2 + 2, δ/2) 4 16 √ √ √ + (3 + 2 2)CB (2 − 2, δ/2) + 2CB ( 2, δ/2)  √ √ √ √ + 4(1 − 2)CB ( 2 + 1, δ/2) + 4(1 + 2)CB ( 2 − 1, δ/2) , 

KB2 (t )dt −

with CB (a, δ) =



KB (t − aδ)KB (t + aδ)dt .

Theorem 3.1 states that the proposed estimator fˆM has the same order of asymptotic bias as fˆB and has the same form of asymptotic variance as fˆA . To see how the mean squared error reducing kernel KM (x) is different from the ordinary second-order kernel K (x), we plot them for the Gaussian kernel and the Epanechnikov kernel. Figs. 2 and 3 show KM (x) for several values of δ and λ. We see that V (δ, λ) in Eq. (12) is minimized at δ = 10 and λ = 0.01 in the Gaussian kernel and at δ = 4.6 and λ = 0.01 in the Epanechnikov kernel (see Fig. 4).

8

J. Kim, C. Kim / Journal of the Korean Statistical Society (

(a) Gaussian kernel.

)



(b) Epanechnikov kernel.

Fig. 4. Perspective plots of the V (δ, λ). (a) The Gaussian kernel and (b) the Epanechnikov kernel.

Remark 3.1. Cheng et al. (2007) discussed the choice of bandwidth h and binwidth δ in detail in the local linear regression case. For h, they suggested, for example, minimizing the AMISE (asymptotic mean integrated square error), and for δ , they suggested choosing based on the sample size n. For the choice of λ, Choi and Hall (1998) suggested several methods. Among them, minimizing the leading term of the variance of the bias reducing estimator is one practical choice. Their discussion can be directly applicable to the density estimation problem, too. 4. Numerical studies To see the numerical performance of the proposed estimators, we consider three densities, used in Marron and Wand (1992). (a) The standard Gaussian: N (0, 1). , ( 59 )2 ). (b) A skewed unimodal: 15 N (0, 1) + 51 N ( 12 , ( 23 )2 ) + 53 N ( 13 12 1 2 2 1 2 2 (c) A bimodal: 2 N (−1, ( 3 ) ) + 2 N (1, ( 3 ) ). In this numerical study, 100 replications are done for sample sizes n = 30, 100, and 300. We consider two types of kernel, Gaussian and Epanechnikov, and we compare four estimators: (i) (ii) (iii) (iv)

the classical kernel estimator fˆ , the bias reduction estimator fˆB , the variance reduction estimator fˆA , and the MSE reduction estimator fˆM .

4.1. Local behaviour First, to see the local behaviour, especially for peak or trough areas, of each estimator, we evaluate the MSE (variance

+ squared bias), 5th and 95th percentiles, range, and inter-quartile range based on 100 replications for four estimators: the classical kernel estimator fˆ , the bias reduction estimator fˆB , the variance reduction estimator fˆA , and the MSE reduction estimator fˆM . For the Gaussian density N (0, 1), we evaluate those statistics at x = 0. Results are given in Table 1 and Fig. 5. For the choice of parameters h, λ, and δ , we used the following rules. First, we used the same bandwidth h, which minimizes the MSE of fˆ , for the four estimators. For the skewing parameter λ in fˆB , we used λ minimizing the MSE of fˆB and used the same λ to estimate fˆM . For the binwidth δ in fˆA , we used δ minimizing the MSE of fˆA and used the same δ to estimate fˆM . Therefore, the MSE values of fˆB , fˆA , and fˆM could be smaller than those appearing in Table 1 if we use other values minimizing the MSE. We see that fˆM is best. Also, fˆB performs quite well, but fˆA is not good in reducing the MSE even though it reduces the variance a little compared to fˆ . For the skewed unimodal density, we evaluate statistics at x = 1. For the bimodal density, we evaluate statistics at x = −1, 0, and 1 (see Table 1 and Fig. 5). We see similar results. Conclusively, as far as the local behaviour at peaks or troughs is concerned, fˆM is best, fˆB is next, and fˆA and fˆ are worst. 4.2. Global behaviour Next, to see the global behaviour of each estimator throughout the entire range, we evaluated the MISE for the four estimators. For the choice of parameters h, λ, and δ , we used the same rules as in the local behaviour case. Therefore, the MISE values of fˆB , fˆA , and fˆM could be smaller than those that appear in Table 2 if we use other values minimizing the MISE.

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



9

Table 1 MSE of the four estimators at peak or trough areas for the three densities based on the Gaussian kernel, with n = 300 and 100 replications. Values represent the mean squared error (MSE × 104 ), variance (VAR × 104 ), and squared bias (SBIS × 104 ), respectively. Density

Gaussian

Skewed

x

0

1

−1

Bimodal 0

1

(a) Gaussian.

Estimator

h



0.29

fˆB

0.29

fˆA

0.29

fˆM

0.29



0.12

fˆB

0.12

fˆA

0.12

fˆM

0.12



0.17

fˆB

0.17

fˆA

0.17

fˆM

0.17



0.11

fˆB

0.11

fˆA

0.11

fˆM

0.11

fˆ fˆB fˆA fˆM

0.15 0.15

λ 0.07

2.16 0.07

2.16

0.08 0.08

0.08

11.59

6.92

6.98

5.43

1.55

9.51

3.68

5.84

4.68

6.23

3.83

2.40

68.51

46.78

21.72

38.48

6.40

30.38

29.82

1.84

42.27

30.82

11.45

28.62

21.94

6.68

21.20

19.44

1.76

1.84

25.55

15.89

9.67

1.84

19.89

16.26

3.62

7.69

6.27

1.42

6.51

6.33

0.18

2.68

7.09

5.39

1.70

2.68

6.03

5.74

0.30 10.57

28.82

18.26

20.40

17.08

3.32

1.90

24.98

13.05

11.93

1.90

18.81

14.01

4.80

(c) Bimodal (x = −1).

(b) Skewed unimodal.

(d) Bimodal (x = 0).

SBIS

60.20

0.10 0.10

VAR

44.87

0.05 0.05

MSE

1.84

0.08

0.15 0.15

δ

(e) Bimodal (x = 1).

Fig. 5. Boxplots of the four estimators at peak or trough areas for the three densities based on the Gaussian kernel, with n = 300 and 100 replications.

10

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



Table 2 MISE of the four estimators based on the Gaussian kernel, with n = 100 and 100 replications. Values represent the mean integrated squared error (MISE × 103 ), the integrated variance (IV × 103 ), and the integrated squared bias (ISB × 103 ), respectively. Density

Gaussian

Skewed

Bimodal

Estimator

h

fˆ (x)

0.45

fˆB (x)

0.45

fˆA (x)

0.45

fˆM (x)

0.45

fˆ (x)

0.18

fˆB (x)

0.18

fˆA (x)

0.18

fˆM (x)

0.18

fˆ (x)

0.22

fˆB (x)

0.22

fˆA (x)

0.22

fˆM (x)

0.22

λ

δ

0.09 0.09

ISB

3.49

1.70 0.84

4.08

3.24

4.91

2.78

2.13

1.55

4.02

2.86

1.16

15.29

11.00

4.29

13.89

11.06

2.83

1.40

14.86

9.58

5.28

1.40

13.91

10.27

3.64

0.10 0.10

IV

5.19 1.55

0.11 0.11

MISE

13.64

9.69

3.94

11.70

9.59

2.11

1.52

13.12

8.34

4.79

1.52

11.65

8.85

2.80

Table 3 MISE of the four estimators based on the Gaussian kernel, with n = 300 and 100 replications. Values represent the mean integrated squared error (MISE × 103 ), the integrated variance (IV × 103 ), and the integrated squared bias (ISB × 103 ), respectively. Density

Gaussian

Skewed

Bimodal

Estimator

h

fˆ (x)

0.35

fˆB (x)

0.35

fˆA (x)

0.35

fˆM (x)

0.35

fˆ (x)

0.14

fˆB (x)

0.14

fˆA (x)

0.14

fˆM (x)

0.14

fˆ (x)

0.18

fˆB (x)

0.18

fˆA (x)

0.18

fˆM (x)

0.18

λ

δ

0.07 0.07

ISB

1.78

0.71 0.24

1.82

1.58

2.25

1.31

0.94

1.81

1.73

1.34

0.40

7.32

4.95

2.37

6.11

4.80

1.30

1.54

7.04

4.19

2.85

1.54

6.08

4.39

1.69

0.08 0.08

IV

2.50 1.81

0.09 0.09

MISE

5.66

4.17

1.50

4.58

4.01

0.57

1.63

5.39

3.51

1.88

1.63

4.48

3.64

0.84

In fact, we obtained the MISE of the four estimators using parameters minimizing the MISE; the amount of reduction was not appreciable. This phenomenon coincides with the fact that fˆB , fˆA , and fˆM do not depend heavily on parameters λ and δ . Hence, the proposed estimators are quite robust to parameters λ and δ . As shown in Tables 2–3, we recognize the following. (i) The amount of reduction in MISE for fˆA is much less than that for fˆB . In fact, fˆA reduces the variance, but increases the bias compared to fˆB . (ii) Overall, fˆM shows the best performance, as expected. (iii) Most reductions in MISE for the proposed estimators are due to the bias reduction compared to fˆ , i.e., the amount of variance reduction is relatively small. 5. Concluding remarks In this paper, we proposed a version of a kernel density estimator reducing the mean squared error compared to the standard kernel density estimator. In fact, it reduces both the bias and the variance compared to the standard kernel density estimator. Also, numerical studies support this fact. We noticed that, based on the numerical results, a large portion of the reduction in the mean squared error is due to the bias, not the variance. One apparent fact about the proposed estimator is the possibility of negativity, which is inevitable in using higher-order kernels. There are well-known methods for correcting negativity; however, the issue of negativity is worth pursuing in further research. Acknowledgements This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (2010-0026432).

J. Kim, C. Kim / Journal of the Korean Statistical Society (

)



11

References Abramson, I. S. (1982). On bandwidth variation in kernel estimates — a square root law. The Annals of Statistics, 10, 1217–1223. Bartlett, M. S. (1963). Statistical estimation of density function. The Indian Journal of Statistics, Series A, 25, 245–254. Breiman, L., Meisel, W., & Purcell, E. (1977). Variable kernel estimates of multivariate densities. Technometrics, 19, 135–144. Cheng, M. Y., Choi, E., Fan, J., & Hall, P. (2000). Skewing methods for two-parameter locally parametric density estimation. Bernoulli, 6, 169–182. Cheng, M. Y., Peng, L., & Wu, S. H. (2007). Reducing variance in univariate smoothing. The Annals of Statistics, 35, 522–542. Choi, E., & Hall, P. (1998). On bias reduction in local linear smoothing. Biometrika, 85, 333–345. Fan, J., & Gijbels, I. (1996). Local Polynomial Modeling and its Applications. London: Chapman and Hall. George, R. T., & David, W. S. (1992). Variable kernel density estimation. The Annals of Statistics, 20, 1236–1265. Jones, M. C., & Foster, P. J. (1993). Generalized jackknifing and higher order kernels. Journal of Nonparametric Statistics, 3, 81–94. Jones, M. C., Linton, O., & Nielsen, J. P. (1995). A simple bias reduction method for density estimation. Biometrika, 82, 327–338. Jones, M. C., & Signorini, D. F. (1997). A comparison of higher-order bias kernel density estimators. Journal of the American Statistical Association, 92, 1063–1073. Kim, C., Kim, W., & Park, B. U. (2003). Skewing and generalized jackknifing in kernel density estimation. Communications in Statistics. Theory and Methods, 32, 2153–2162. Marron, J. S., & Wand, M. P. (1992). Exact mean integrated squared error. The Annals of Statistics, 20, 712–736. Muller, H. G. (1984). Smooth optimum kernel estimators of densities, regression curves and modes. The Annals of Statistics, 12, 766–774. Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33, 1065–1076. Ruppert, D., & Cline, D. B. H. (1994). Bias reduction in kernel density estimation by smoothed empirical transformations. The Annals of Statistics, 22, 185–210. Samiuddin, M., & El-sayyad, G. M. (1990). On nonparametric kernel density estimates. Biometrika, 77, 865–874. Schucany, W. R., & Sommers, J. P. (1977). Improvement of kernel type density estimators. Journal of the American Statistical Association, 72, 420–423. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. Wand, M. P., & Jones, M. C. (1995). Kernel Smoothing. London: Chapman and Hall.

Further reading Copas, J. B., & Fryer, M. J. (1980). Density estimation and suicide risks in psychiatric treatment. Journal of the Royal Statistical Society. Series A, 143, 167–176. Weisberg, S. (1980). Applied Linear Regression. New York: Wiley.