Bayesian bandwidth selection in discrete multivariate associated kernel estimators for probability mass functions

Bayesian bandwidth selection in discrete multivariate associated kernel estimators for probability mass functions

Journal of the Korean Statistical Society ( ) – Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepa...

410KB Sizes 1 Downloads 53 Views

Journal of the Korean Statistical Society (

)



Contents lists available at ScienceDirect

Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss

Bayesian bandwidth selection in discrete multivariate associated kernel estimators for probability mass functions Nawal Belaid a,∗ , Smail Adjabi a , Nabil Zougab a,b , Célestin C. Kokonendji c a

LAMOS, Laboratory of Modelling and Optimization of Systems, University of Bejaia, Algeria

b

University of Tizi-ouzou, Algeria

c

University of Franche-Comté, LMB UMR 6623 CNRS-UFC, Besançon Cedex, France

article

info

Article history: Received 9 July 2015 Accepted 8 April 2016 Available online xxxx AMS 2000 subject classifications: primary 62G07 secondary 62G99 Keywords: Binomial kernel Cross-validation Discrete triangular kernel MCMC Product kernel

abstract This paper proposed a nonparametric estimator for probability mass function of multivariate data. The estimator is based on discrete multivariate associated kernel without correlation structure. For the choice of the bandwidth diagonal matrix, we presented the Bayes global method against the likelihood cross-validation one, and we used the Bayesian Markov chain Monte Carlo (MCMC) method for deriving the global optimal bandwidth. We have compared the proposed method with the cross-validation method. The performance of both methods is evaluated under the integrated square error criterion through simulation studies based on for univariate and multivariate models. We also presented applications of the proposed methods to bivariate and trivariate real data. The obtained results show that the Bayes global method performs better than cross-validation one, even for the Poisson kernel which is the very bad discrete associated kernel among binomial, discrete triangular and Dirac discrete uniform kernels. © 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

Contents 1. 2. 3.

4.

Introduction............................................................................................................................................................................................. 2 Discrete multivariate associated kernel estimators.............................................................................................................................. 3 Bayesian global bandwidth selector ...................................................................................................................................................... 6 3.1. Univariate case study ................................................................................................................................................................. 7 3.2. Bivariate case study .................................................................................................................................................................... 7 3.3. Multivariate case study .............................................................................................................................................................. 7 3.4. Some comments.......................................................................................................................................................................... 8 Applications to real data and final remarks .......................................................................................................................................... 10 Acknowledgements................................................................................................................................................................................. 11 References................................................................................................................................................................................................ 11



Correspondence to: LAMOS, route de Targa-Ouzemmour, 06000 Bejaia, Algeria. E-mail addresses: [email protected] (N. Belaid), [email protected] (S. Adjabi), [email protected] (N. Zougab), [email protected] (C.C. Kokonendji). http://dx.doi.org/10.1016/j.jkss.2016.04.001 1226-3192/© 2016 The Korean Statistical Society. Published by Elsevier B.V. All rights reserved.

2

N. Belaid et al. / Journal of the Korean Statistical Society (

)



1. Introduction Nonparametric estimation of probability density and probability mass functions (pdf and pmf) by kernel method is one of the most investigated topics in statistical inference; see, e.g., Aitchison and Aitken (1976), Kokonendji and Senga Kiessé (2011), Kokonendji, Senga Kiessé, and Zocchi (2007), Silverman (1986), Simonoff (1996), Wand and Jones (1995) and Wang and Ryzin (1981); see also Somé and Kokonendji (2015) for nonparametric multivariate regression function (rf) estimation by using the associated continuous and discrete kernel method. The discrete kernel estimation has been far less investigated in comparison with continuous kernel estimation. For estimating the pmf of discrete variable, the empirical (Dirac) estimator is used because of its good asymptotical properties. However, this Dirac type kernel estimator is not appropriate with small sample sizes (see, Kokonendji & Senga Kiessé, 2011). Aitchison and Aitken (1976) provided an extension of the Dirac kernel method. But this proposed discrete kernel is appropriate for categorical data and finite count distributions; see also Racine and Li (2004) for rf estimation with both categorical and continuous data. Therefore, recently the so-called discrete associated kernel method is largely developed by several authors; see, e.g., Kokonendji and Senga Kiessé (2011), Kokonendji, Senga Kiessé, and Balakrishnan (2009), Kokonendji et al. (2007), Wansouwé, Kokonendji, and Kolyang (2015), Zougab, Adjabi, and Kokonendji (2012) and Zougab, Adjabi, and Kokonendji (2013a). Note that the semi-parametric estimation of pmf and count rf (crf) is also investigated, see for example Abdous, Kokonendji, and Senga Kiessé (2012) and Senga Kiessé, Zougab, and Kokonendji (2015). They showed that the use of a discrete associated kernel is more appropriate than the use of a continuous kernel for both estimations of pmf and crf of a discrete variable. To our knowledge, the multivariate pmf estimation by discrete kernel method has not been investigated in the literature. This paper introduces a nonparametric discrete kernel estimator for pmf of multivariate discrete data and proposes a Bayesian approach for bandwidth matrix selection. Similarly to the univariate case, the selection of bandwidth matrix is a serious problem in nonparametric pdf and pmf estimation with multivariate associated kernel method. This method depends on the bandwidth and the associated kernel function. The choice of the kernel should be appropriate with respect to the support of the unknown function to be estimated. However, the choice of the bandwidth is substantive because, in the univariate case, the bandwidth is a scalar parameter h strictly positive which controls the degree of smoothing; and, in the multivariate case, the bandwidth is a symmetric and positive definite matrix H, that controls both the degree of smoothing and the form of orientation of the kernel. The bandwidth matrix has d(d + 1)/2 independent elements to be chosen in Rd , d ≥ 1. For this, the choice of the bandwidth in this context is more difficult than that of the univariate case. A simplification can be obtained by imposing H as a diagonal matrix H = Diag d (hj )j=1,2,...,d . But for certain target densities, some authors have shown the importance of full bandwidth matrices, see Chacón and Duong (2011). Several methods have been proposed and studied to select bandwidth matrices for various processings. For example, the classical ones based on the plug-in methods that adopt as criterion the mean integrated square error (MISE) using symmetric kernels, such as the Gaussian kernel, see Chacón and Duong (2010), Duong and Hazelton (2003), and Wand and Jones (1994); and, cross-validation methods including the unbiased, biased, smoothed and least squares cross validation approaches for which we can refer to Bouezmarni and Roumbouts (2010), Chacón and Duong (2011), Duong and Hazelton (2005) and Sain, Baggerly, and Scott (1994). Note that the kernel estimations of density and probability mass functions are fundamental for other studies as kernel regression estimation and kernel discriminant analysis. The Bayesian approach is a good alternative to the classical methods. This approach has received an attention in the literature, particularly in the univariate context for symmetric kernels. We can see, Brewer (1998, 2000) who proposed the global and adaptive Bayesian approach. Gangopadhyay and Cheung (2002), Kulasekera and Padgett (2006) and Kuruwita, Kulasekera, and Padgett (2010) proposed a Bayesian local bandwidth selection. For asymmetric kernels, we can consult Zougab et al. (2012, 2013a); Zougab, Adjabi, and Kokonendji (2013b) who used discrete associated kernels. In the multivariate context, Zhang, King, and Hyndman (2006) presented MCMC method for the global bandwidth matrix selection using the symmetric Gaussian kernel; see also Zhang, King, and Shang (2013) for bandwidth selection in nonparametric regression model with mixed types of regressors. Always for symmetric kernels, we can also refer to De Lima and Atuncar (2010) for the Bayesian local approach for which they have extended to the multivariate case the method described in Gangopadhyay and Cheung (2002), and Hu, Poskitt, and Zhang (2012) for Bayesian adaptive approach with MCMC method. Recently Zougab, Adjabi, and Kokonendji (2014) presented a Bayesian adaptive for the choice of the bandwidth matrix using the Gaussian kernel. As mentioned earlier, there is now a study that used discrete multivariate associated kernels to estimate pmf of multivariate data. Therefore, the main contribution of this paper is to propose a nonparametric estimator for pmf of multivariate data by using discrete associated kernel and develop a Bayesian approach to bandwidth matrix selection. Our study is motivated by several points. First, similar to discrete univariate case, the use of multivariate discrete kernel are more suitable than the use of multivariate continuous kernel for multivariate data. Second, the discrete multivariate associated kernels method is useful in various domains such as in actuarial science, demography, economics, environmental sciences and sport. As third motivation, the classical methods to bandwidth matrix selection such as cross-validation technique do not provide a good estimator, then the Bayesian approach which is considered as a good alternative is presented. The rest of this paper is organized as follows. We give a complete definition of discrete multivariate associated kernels which includes the product ones, and we illustrate some examples in Section 2. In Section 3, we present Bayesian global methods for estimating the bandwidth matrix by using MCMC algorithms through the likelihood cross-validation criterion,

N. Belaid et al. / Journal of the Korean Statistical Society (

)



3

and we examine the performances of the two techniques using data generated from known univariate, bivariate and multivariate densities with small and moderate sample sizes via the integrated squared error (ISE) criterion. Section 4 illustrates two applications on real data, and ends with a summary and final remarks. 2. Discrete multivariate associated kernel estimators In this section, we present a nonparametric associated kernel estimator for probability mass function (pmf) of multivariate data and a definition of the discrete multivariate associated kernel. Let X1 , . . . , Xn be independent and identically distributed d-variate random vectors with an unknown pmf f on Td , a subset of Zd (d ≥ 1). A multivariate associated kernel estimator  fn of f is defined by n 1  fn (x) = Kx,H (Xi ),

x ∈ Td ⊆ Zd ,

n i=1

(1)

where H is a d × d bandwidth matrix (i.e. symmetric and positive definite) such that H ≡ Hn → 0d (d × d null matrix) as n → ∞, and Kx,H (·) is the discrete multivariate associated kernel, parameterized by x and H, and defined as follows: Definition 1. Let Td be the support of the pmf to be estimated, x ∈ Td a target vector and H a bandwidth matrix. A parameterized pmf Kx,H of support Sx,H ⊆ Zd is called ‘‘multivariate discrete associated kernel’’ if the following conditions are satisfied x ∈ Sx,H ,

(2)

E(Zx,H ) = x + a(x, H ),

(3)

cov(Zx,H ) = B(x, H ),

(4)

where Zx,H denotes the random vector with pmf Kx,H and both a(x, H ) = (a1 (x, H ), . . . , ad (x, H )) (bij (x, H ))i,j=1,...,d tend, respectively, to the null vector and the null matrix as H goes to 0d .



and B(x, H ) =

Example 1. The discrete multivariate associated kernel ‘‘Dirac Discrete Uniform’’, see Aitchison and Aitken (1976) and Li and Racine (2007), is defined as d 

DirDU x,H ,c (y) =

1yj =xj

(1 − hj )



1−1y =x

hj

j

cj − 1

j =1

j

,

where Sx,c = ×dj=1 {0, 1, . . . , cj − 1} = Td , with ×dj=1 is the Cartesian product of the sets {0, 1, . . . , cj − 1}, cj ∈ {2, 3, . . .} for j = {1, 2, . . . , d}, and H = Diag d (hj )j . This kernel appropriated for categorical data set satisfies Eqs. (2)–(4), because:

E(Zx,H ) =



 d 

(y1 , . . . , yd ) 

c1 − 1



y1

y1 =0

d  1 (1 − hj ) yj =xj



yd

 d 

y d =0

x 1 + h1

(1 − hj )

1−

 = x+H 1−

+

c1 − 1

x1

+

2

with a(x, H ) → 0 as H → 0d , and cov(Zx,H ) = cov



d 

 [j] Zxj ,hj ,cj

j =1

  = Diag d var (Zx[jj],hj ,cj )

j

hj

h1 c1

h1 c1

= x + a(x, H ), 

j

j

j

,...,

1−1y =x  j

j

cj − 1

x1

c1 − 1

1−1y =x 

hj

cj − 1

1yj =xj





cj − 1 1−1y =x 

hj

j =1

 =



j =1

cd − 1

×

(1 − hj )

j =1

y∈Td

=

1yj =xj

j

2



 , . . . , x d + hd 1 −

,...,1 −

xd cd − 1

+

hd cd 2

xd cd − 1 ⊤

+

hd cd 2

⊤

4

N. Belaid et al. / Journal of the Korean Statistical Society (

 = H Diag d

cj2 x2j

(1 − hj ) − cj (cj − 1)2

− xj

cj2 (1 − hj ) − cj cj − 1

+

cj



)

2cj − 1

2

3





hj cj 2

 j

= B(x, H ), with B(x, H ) → 0d when H → 0d . Finally, the discrete multivariate associated kernel that we have presented in Example 1, after general Definition 1, and the following ones are multiple associated kernels with a diagonal bandwidth matrix. [ j]

[j]

Definition 2 (Multiple Associated Kernel). Let ×dj=1 T1 = Td be the support of the density f to be estimated with T1 (⊆ Z) the support of univariate margin of f . Let x = (x1 , x2 , . . . , xd )



∈×

d j =1

[j]

T1 and H = Diag d (hj )j with hj > 0. Let

discrete univariate associated kernel with its corresponding random variable

[j] Zxj ,hj

[j] K x j ,h j

be the

on Sxj ,hj ⊆ Z, for all j = 1, . . . , d. Then,

the multiple associated kernel is also a discrete multivariate associated kernel: Kx,H (·) =

d 

[j]

Kxj ,hj (·).

(5)

j =1

[j]

Any Kxj ,hj can be in the same family; also we can combine different families of univariate associated kernels in (5). We now point out some other examples of discrete multiple associated kernels (5) from only one component of discrete univariate associated kernel. Example 2. Suppose that h > 0 is a given bandwidth parameter and a ∈ N is an arbitrary and fixed integer called ‘‘arm’’. The discrete triangular kernel is defined on Sx,a = {x, x ± 1, . . . , x ± a} with x ∈ T1 = N as Ta,x,h (y) =

(a + 1)h − |y − x|h , P (a, h)

y ∈ Sx,a ,

where P (a, h) = (2a + 1)(a + 1)h − 2 k=0 kh is the normalizing constant, E(Zx,h ) = x, and var (Zx,h ) ≃ [{a(2a2 + 3a + a 1)/3} log(a + 1) − 2 k=1 k2 log(k)]h + o(h2 ). This kernel satisfies (2)–(4); See Kokonendji et al. (2007).

a

Example 3. An extension of the discrete kernel of Aitchison and Aitken (1976) (Example 1) to all the integers set Z has been proposed by Wang and Ryzin (1981). It is defined on the support Sx = Z, for any x ∈ T1 = Z and h ∈ (0, 1), as Kx,h (y) = (1 − h)1y=x +

1 2

(1 − h)h|y−x| 1|y−x|≥1 ,

y ∈ Sx ,

with E(Zx,h ) = x and var (Zx,h ) = h(1 + h)/(1 − h)2 . This kernel satisfies (2)–(4). Example 4. The binomial kernel is defined on the support Sx = {0, 1, . . . , x + 1} with x ∈ T1 = N and h ∈ (0, 1] as Bx,h (y) =

(x + 1)! y!(x + 1 − y)!



x+h

y 

x+1

1−h

x+1−y

x+1

,

y ∈ Sx .

Note that Bx,h is the pmf of the binomial distribution B (x + h, (x + h)/(x + 1)) with its number of trials x + 1 and its success probability in each trial (x + h)/(x + 1). It is appropriated for count data with small or moderate sample sizes, it does not satisfy (4); with E(Zx,h ) = x + h and var (Zx,h ) = (x + h)(1 − h)/(x + 1), where limh→0 var (Zx,h ) = x/(x + 1) ∈ (0, 1]. We called it second order kernel; see Kokonendji and Senga Kiessé (2011) for more details. Example 5. The Poisson kernel is defined on the support Sx = N, with x ∈ T1 = N and h > 0 as Px,h (y) =

(x + h)y e−(x+h) , y!

y ∈ Sx .

The Px,h is the pmf of the Poisson distribution P (x + h) which is equidispersed i.e. E(Zx,h ) = var (Zx,h ) = x + h, where limh→0 var (Zx,h ) = x, for this reason the Poisson kernel is not interesting. It does not satisfy (4); see also Kokonendji and Senga Kiessé (2011). Fig. 1 shows some forms of the above-mentioned discrete univariate associated kernels. According to Definition 2, the product of discrete univariate associated kernels (5) leads to a useful version of (1) that we call ‘‘multiple associated kernel estimator’’: n d 1   [j]  fn (x) = Kxj ,hj (Xij ),

n i=1 j=1

[j]

xj ∈ T1 ⊆ Z,

(6)

N. Belaid et al. / Journal of the Korean Statistical Society (

)



5

Fig. 1. Shapes of discrete univariate associated kernels: DiracDU, Dirac, discrete triangular a = 4, binomial and Poisson with same target x = 5 and bandwidth h = 0.1.

[j]

[ j]

where T1 is the support of univariate margin of f for j = 1, . . . , d, x = (x1 , . . . , xd )⊤ ∈ ×dj=1 T1 , Xi = (Xi1 , . . . , Xid )⊤ for i = 1, . . . , n and h1 , . . . , hd are the univariate bandwidth parameters. The function

[j] Kxj ,hj

is the jth discrete univariate

associated kernel on the support Sxj ,hj ⊆ Z. In principle, the estimator (6) is more appropriate for distributions without correlation in their components; see Bouezmarni and Roumbouts (2010) and also Kokonendji and Somé (2015) for continuous cases. d    From Kokonendji and Somé (2015) it is known that, for both fn from (1) and (6), we have fn (x) ∈ [0, 1] for all x ∈ T and  x∈Td fn (x) = Cn , where Cn is not necessarily one but a positive and finite constant. In general, we have Cn ̸= 1 if we use discrete associated kernels of Examples 2, 4 and 5; see Wansouwé et al. (2015) for numerical results in univariate cases and also Senga Kiessé et al. (2015) for more theoretical precisions in univariate cases. Thus, a normalization of  fn is necessary for providing a pmf. If we consider associated kernels of Examples 1 and 3 then  fn is always a pmf; that is Cn = 1. Indeed, from Example 1 with Td = ×dj=1 {0, 1, . . . , cj − 1} we successively have:



 fn (x) =

x∈Td

n  d 1 x∈Td

=

n i=1 j=1

d 

(1 − hj ) 1X =x 1j j

(1 − hj )

d 

hj

1−1X

ij =xj

cj − 1



hj

1−1X

1j =xj

cj − 1

x∈Td j=1

=



1X =x ij j

[(1 − hj ) + hj ]

j =1

= 1. As for Example 3 and only in the univariate case (d = 1), we have:



 fn (x) =

x∈Z

 n  1 1 |Xi −x| (1 − h)1Xi =x + (1 − h)h 1|Xi −x|≥1 x∈Z

=

n i=1



2

1

(1 − h)1X1 =x + (1 − h)h|X1 −x| 1|X1 −x|≥1 2

x∈Z

1

= (1 − h) + (1 − h) 2

h|y|

y∈Z\{0}

1

2h

2

(1 − h)

= (1 − h) + (1 − h) = 1.





6

N. Belaid et al. / Journal of the Korean Statistical Society (

)



3. Bayesian global bandwidth selector For any given discrete multiple associated kernel estimator (6), we consider the bandwidth selection problems which are generally crucial in nonparametric estimation. From now we study the Bayesian global approach to estimate the bandwidth diagonal matrix H = Diag d (hj )j in (6). Following some authors as Brewer (1998), Zhang et al. (2006) and Zougab et al. (2013a), we treat the bandwidth matrix H as a random variable with a given prior distribution π (H ). The Bayesian inference concerning H conditional on data is made via the posterior density π (H |data) given by

π (H |X1 , X2 , . . . , Xn ) = π (H )

n 

 fn,−i (Xi )



π (H ) M

i=1

n 

 −1  fn,−i (Xi )dH

,

i =1

where fn,−i (Xi ) = (1/(n − 1)) ℓ=1,ℓ̸=i KXi ,H (Xℓ ) is the likelihood cross-validation function and M is the set of all symmetric and positive definite bandwidth matrices. In our study, we assume that each component hj of H = Diag d (hj )j has a prior distribution denoted π (hj ); e.g., beta and gamma distributions. According to Bayes theorem, the posterior distribution of H is (up to a normalizing constant)



π (H |X1 , X2 , . . . , Xn ) ∝

d 

π (hj ) ×

j =1

1

n   

(n − 1)n

i=1 ℓ=1ℓ̸=i j=1

[j]

KXij ,hj (Xℓj ).

The Bayesian posterior distribution π (H |data) cannot be calculated directly, because no discrete associated kernel presented here (Examples 1–5) has an explicit formula. Thus, the MCMC approach avoids this problem. One important advantage of MCMC technique is that, it is applicable to data of any dimension. Moreover, the sampling algorithm involves no increased difficulty as the dimension of the data increases. This method generates a Markov chain {H (i) }, i ∈ (1, . . . , T ), using the transition kernel and an arbitrary initial value H (0) ; after a number of iterations T , sufficiently large, the Markov chain converges to the interest density π (H |X1 , X2 , . . . , Xn ). Algorithm 1. Metropolis–Hastings algorithm Initialize H (0) For t = 1 to t = T

˜ H˜ ∼ q(·, H (t ) ) Generate H, 

Compute the acceptation probability ρ = min 1, H (t +1) =

˜ |X1 ,X2 ,...,Xn ) q(H (t ) |Ht )π (H q(Ht |H (t ) )π (H (t ) |X1 ,X2 ,...,Xn )



ifu < ρ, u ∼ U[0,1]

˜

H

H (t )

other w ise

ˆ = 1 Compute H T −T0

T

t =T0 +1

H ((t ) ).

We will use the Metropolis–Hastings algorithm (Algorithm 1), where T0 is the burn-in period and U[0,1] denotes the uniform distribution on [0, 1] and q(·, ) is a candidate (or proposal) function. For decreasing the bias of the estimator of H, the iterations T0 are not used in the computing. Now, we establish a simulation study to compare the performances of the Bayesian global approach against the classical method of cross-validation for bandwidth selection using estimator (6). Here, we consider the multivariate least squares cross-validation (LSCV) which is based on the minimization of the ISE; see Bouezmarni and Roumbouts (2010) and Kokonendji and Somé (2015) for more details. The criterion consists in choosing H that minimizes LSCV (H ) =

 n  d  1 x∈Td

n i=1 j=1

2 [j] Kxj ,hj (Xij )



2

n  d 

n(n − 1) i=1 ℓ̸=i j=1

[j]

KXij ,hj (Xℓj ),

therefore Hopt = argmin LSCV (H ). H ∈M

Note that, the sensitivity analysis of prior (beta(α , β ) and gamma(α , β )) can be conducted as in Zougab et al. (2012). √ However, in this paper and following Senga Kiessé et al. (2015) and, Zougab et al. (2014) we choose α = 0.5 and β = n. The choice of β which depends on the sample size n is important for the consistency of nonparametric kernel estimator of pmf and permits the convergence of Bayesian bandwidths estimators to zero (hj → 0 while n → ∞ for j = 1, . . . , d). Also, for different choices of hyper-parameters (α and β ) there is no big change on the H obtained and the ISE performance. Several densities with discrete support or pmf f are considered below for simulations, and all the computations were done using the R software (R Development Core Team, 2015). The performances are examined via: ISE =

 x∈Nd

[ fn (x) − f (x)]2 .

N. Belaid et al. / Journal of the Korean Statistical Society (

)



7

Table 1 Values of ISE (ISE LSCV , ISE Bayes ) for d = 1. The first component is the optimal average of ISE of cross-validation approach, and second one is the optimal average of ISE of Bayesian approach. f

n

DirDU

Poisson

Triangdisc a=4

Triangdisc a=2

τAcptrate

D1

20 50 200 500

(2.6e−1, 3.1e−2) (1.3e−1, 1.4e−2) (3.2e−2, 7.8e−3) (5.8e−3, 1.5e−3)

(4.1e−1, 6.2e−2) (3.2e−1, 2.8e−2) (7.8e−2, 9.5e−3) (9.6e−3, 6.3e−3)

(3.6e−1, 2.4e−2) (1.1e−1, 1.4e−2) (1.3e−2, 3.5e−3) (7.4e−3, 1.2e−3)

(1.6e−1, 0.4e−2) (4.7e−2, 9.8e−3) (4.2e−3, 2.2e−3) (2.9e−3, 0.5e−3)

0.45 0.56 0.57 0.49

D2

20 50 200 500

(4.6e−2, 2.7e−2) (5.5e−3, 3.4e−3) (2.8e−3, 0.4e−3) (1.8e−3, 4.5e−4)

(5.7e−1, 3.3e−1) (0.8e−1, 7.8e−2) (8.8e−2, 8.0e−3) (2.6e−2, 1.3e−3)

(9.8e−2, 7.5e−2) (6.2e−2, 1.9e−2) (3.3e−2, 6.5e−3) (9.7e−3, 6.2e−4)

(7.4e−2, 4.4e−2) (4.9e−2, 1.4e−2) (3.1e−2, 6.2e−3) (4.9e−3, 0.5e−4)

0.61 0.54 0.56 0.52

D3

20 50 200 500

(8.3e−2, 4.5e−2) (7.5e−2, 1.4e−2) (6.2e−3, 3.2e−3) (0.3e−3, 1.7e−3)

(9.1e−1, 8.3e−2) (3.6e−1, 2.0e−2) (7.5e−2, 4.0e−3) (4.1e−2, 2.2e−3)

(6.0e−1, 7.9e−2) (5.4e−2, 0.7e−2) (3.7e−2, 2.3e−3) (5.8e−3, 0.4e−3)

(4.1e−1, 7.1e−2) (3.1e−2, 0.6e−2) (2.5e−2, 2.1e−3) (4.3e−3, 0.3e−3)

0.58 0.45 0.61 0.55

Note: Bold values indicate the best results, which are all for the Bayesian approach.

3.1. Univariate case study We consider three target pmf’s labelled D1, D2 and D3 defined as follows: D1 is the mixture of two Poisson distributions with λ1 = 0.2 et λ2 = 6: f (x) =

1 e−0.2 0.2x

3 e−6 6x

, x ∈ N; 4 x! 4 x! D2 is the Poisson distribution with λ = 8: f ( x) =

+

e−8 8x

, x ∈ N; x! D3 is the binomial distribution with n = 10 and p = 0.9: f ( x) =

10! x!(10 − x)!

0.1x × 0.95−x ,

x ∈ {0, 1, . . . , 10}.

Table 1 presents the optimal average of ISE for the two approaches (Bayes global and cross-validation) with respect to the three univariate pmf’s D1, D2 and D3 and according to different sample sizes n ∈ {20, 50, 200, 500} and for some discrete univariate associated kernels. Each result is obtained from 100 replications. The last column represents the acceptance rate of MCMC with DirDU kernel. 3.2. Bivariate case study Four target pmf’s labelled F1, F2, F3 and F4 are here considered for the bivariate case. F1 is the independent bivariate Poisson distribution with mean µ = 8: f ( x1 , x2 ) =

e−8 8x1 x1 !

×

e−8 8x2 x2 !

,

( x1 , x2 ) ∈ N 2 ;

F2 is the independent bivariate binomial distribution with parameters n = 10 and p = 0.2: f (x1 , x2 ) =

10! x1 !(10 − x1 )!

0.8x1 0.210−x1 ×

10! x2 !(10 − x2 )!

0.8x2 0.210−x2 ,

(x1 , x2 ) ∈ {0, 1, . . . , 10}2 ;

F3 is the independent bivariate geometric distribution with parameter p = 0.7: f (x1 , x2 ) = 0.7(0.3)x1 × 0.7(0.3)x2 ,

( x1 , x2 ) ∈ N 2 ;

F4 is the product of two mixtures of two Poisson distributions with same parameters λ1 = 4 and λ2 = 12: f (x1 , x2 ) =



3 e−4 4x1 4

x1 !

+

1 e−12 12x1 4

x1 !



3 e−4 4x2 4

x2 !

+

1 e−12 12x2 4

x2 !



,

( x1 , x2 ) ∈ N 2 .

Table 2 shows the optimal average of ISE for estimations based on 100 replications. The last column represents the acceptance rate of MCMC with binomial kernel. 3.3. Multivariate case study In order to investigate simulations for higher dimension d > 2, we consider four test pmf’s labelled F5, F6 for d = 3 and F7, F8 for d = 4. Their expressions are given below.

8

N. Belaid et al. / Journal of the Korean Statistical Society (

)



Table 2 Values of ISE (ISE LSCV , ISE Bayes ) for d = 2. The first component is the optimal average of ISE of cross-validation approach, and second one is the optimal average of ISE of Bayesian approach. f

n

DirDU

Poisson

Binomial

Triangdisc a=4

Triangdisc a=2

τAcptrate

F1

20 50 200 500

(5.7e−2, 1.1e−3) (2.2e−2, 6.7e−4) (1.1e−2, 3.9e−4) (7.9e−3, 1.2e−4)

(7.5e−2, 5.8e−2) (2.6e−2, 1.9e−2) (1.7e−2, 3.5e−3) (9.2e−3, 1.8e−3)

(3.6e−2, 0.5e−2) (1.5e−2, 5.6e−3) (9.9e−3, 1.5e−3) (7.8e−3, 9.1e−4)

(3.6e−2, 7.8e−3) (1.2e−2, 4.6e−3) (8.5e−3, 1.5e−3) (2.9e−3, 6.5e−4)

(2.9e−2, 5.9e−3) (0.2e−2, 4.3e−3) (4.4e−3, 1.1e−3) (2.3e−3, 3.9e−4)

0.40 0.39 0.30 0.33

F2

20 50 200 500

(6.5e−2, 1.1e−3) (2.6e−2, 7.7e−4) (1.9e−3, 5.3e−4) (1.3e−3, 2.3e−4)

(7.6e−2, 7.1e−2) (6.4e−2, 3.2e−2) (5.2e−2, 9.7e−3) (8.4e−3, 6.8e−3)

(4.3e−2, 1.9e−2) (1.1e−2, 0.3e−2) (5.5e−3, 2.8e−3) (1.9e−3, 1.8e−4)

(4.9e−2, 0.8e−2) (1.0e−2, 2.7e−3) (0.5e−2, 0.6e−3) (6.8e−3, 3.9e−4)

(3.1e−2, 0.2e−2) (0.3e−2, 2.4e−3) (0.2e−2, 0.3e−3) (5.3e−3, 2.2e−4)

0.39 0.35 0.37 0.50

F3

20 50 200 500

(3.9e−2, 8.8e−3) (2.3e−2, 3.9e−3) (4.5e−3, 3.4e−4) (2.9e−3, 1.5e−4)

(4.7e−2, 3.9e−2) (3.1e−2, 6.7e−3) (1.8e−2, 4.3e−3) (7.8e−3, 2.1e−3)

(1.1e−2, 0.6e−2) (0.7e−2, 5.8e−3) (0.4e−2, 3.8e−3) (1.3e−3, 0.7e−3)

(1.2e−2, 2.4e−3) (0.9e−2, 1.4e−3) (0.6e−2, 0.9e−3) (3.4e−3, 0.2e−3)

(0.9e−2, 2.2e−3) (0.3e−2, 0.1e−3) (0.2e−2, 0.7e−3) (2.7e−3, 0.1e−3)

0.32 0.27 0.29 0.24

F4

20 50 200 500

(4.9e−3, 4.5e−3) (4.1e−3, 3.9e−3) (3.8e−3, 2.7e−3) (8.9e−4, 7.2e−4)

(8.8e−2, 2.9e−2) (6.6e−2, 7.4e−3) (4.2e−2, 5.2e−3) (7.6e−3, 9.3e−4)

(3.2e−3, 1.1e−3) (1.9e−3, 3.9e−4) (9.8e−3, 1.2e−4) (1.7e−4, 2.2e−5)

(3.7e−3, 1.4e−3) (2.5e−3, 1.1e−3) (0.8e−3, 6.3e−4) (1.1e−3, 2.8e−4)

(2.6e−3, 1.2e−3) (1.2e−3, 0.9e−3) (0.5e−3, 6.1e−4) (0.6e−3, 2.2e−4)

0.21 0.27 0.25 0.24

Note: Bold values indicate the best results, which are all for the Bayesian approach.

F5 is the independent trivariate Poisson distribution with mean µ = 4: f (x1 , x2 , x3 ) =

3  e−4 4xℓ

ℓ=1

xℓ !

,

( x1 , x2 , x3 ) ∈ N 3 ;

F6 is the independent trivariate binomial distribution with parameters n = 10 and p = 0.4: f (x1 , x2 , x3 ) =

3  10!0.4xℓ 0.610−xℓ

ℓ=1

xℓ !(10 − xℓ )!

,

(x1 , x2 , x3 ) ∈ {0, 1, . . . , 10}3 ;

F7 is the independent 4-variate geometric distribution with parameter p = 0.2: f (x1 , x2 , x3 , x4 ) =

4 

0.2(0.8)xk ,

( x1 , x2 , x3 , x4 ) ∈ N 4 ;

k=1

F8 is the product of four mixtures of two Poisson distributions with same parameters λ1 = 0.2 and λ2 = 10: f (x1 , x2 , x3 , x4 ) =

4   3 e−0.2 0.2xk k=1

xk !

4

+

1 e−10 10xk 4



xk !

,

(x1 , x2 , x3 , x4 ) ∈ N4 .

Table 3 presents the study in the multivariate case (d ∈ {3, 4}). It reports the ISE with both methods (cross-validation and Bayes global). For each model, the number of replications is again 100 for sample sizes n ∈ {20, 50, 200, 500}. The last column represents the acceptance rate of MCMC with binomial kernel. 3.4. Some comments We have used the random-walk Metropolis–Hasting algorithm (Algorithm 1) to sample the bandwidth from their posterior. The burn-in period T0 contains 1500 iterations and the following 3000 iterations were recorded. From these results we observe that the values of average ISE decrease as the sample size increases for all methods, all models and all discrete (univariate, multivariate) associated kernels. In terms of average ISE, the Bayesian global approach performs better than the cross-validation technique, for all models and for all discrete associated kernels. The acceptance rate represents goods values; it was between 0.2 and 0.4 for d ≥ 2 and around 0.5 if d = 1. To measure the convergence of MCMC method, we calculate two indicators: simulation inefficiency factor (SIF ) and batch-mean standard error (BMSE) using the product binomial kernel. These criteria consist to subdivide the Markov chain into m chains of length n.

 σ2 ; SIF = σ˜2  σ2 = h=

n

 BMSE =

T

m

 (hk − h)2 ,

m − 1 k=1 1

σ˜ 2

N 

T − T0 i=T +1 0

h(i) ,

hk =

σ˜ 2 =

1

kn 

n i=(k−1)n+1 1

h(i) ,

T 

k = 1 , . . . , m;

(h(i) − h)2 .

T − T0 − 1 i=T +1 0

N. Belaid et al. / Journal of the Korean Statistical Society (

)



9

Table 3 Values of ISE (ISE LSCV , ISE Bayes ) for d ∈ {3, 4}. The first component is the optimal average of ISE of cross-validation approach, and second one is the optimal average of ISE of Bayesian approach. f

n

DirDU

Poisson

Binomial

Triangdisc a=4

Triangdisc a=2

τAcptrate

F5

20 50 200 500

(5.6e−3, 7.5e−4) (4.2e−3, 7.1e−4) (2.7e−3, 6.8e−4) (0.6e−3, 1.4e−4)

(4.6e−2, 8.4e−3) (3.1e−2, 7.2e−3) (6.9e−3, 3.2e−3) (2.7e−3, 5.4e−4)

(1.6e−3, 5.5e−5) (1.1e−4, 2.3e−5) (3.3e−5, 8.7e−6) (7.4e−6, 5.1e−7)

(7.6e−3, 2.2e−3) (6.9e−3, 1.1e−3) (6.2e−3, 8.7e−4) (1.9e−3, 4.3e−4)

(7.2e−3, 1.7e−3) (4.3e−3, 0.8e−3) (4.1e−3, 5.5e−4) (1.6e−3, 2.7e−4)

0.31 0.28 0.21 0.24

F6

20 50 200 500

(9.7e−3, 5.2e−3) (8.2e−3, 2.9e−3) (6.5e−3, 0.7e−3) (3.1e−3, 4.9e−4)

(6.7e−2, 9.5e−3) (6.3e−2, 8.1e−3) (5.8e−2, 6.2e−3) (2.5e−2, 1.8e−3)

(3.4e−3, 1.2e−4) (2.7e−4, 7.1e−5) (8.8e−5, 1.9e−5) (4.9e−6, 9.9e−7)

(7.9e−3, 8.5e−4) (7.2e−3, 8.1e−4) (6.3e−3, 5.9e−4) (1.2e−3, 1.8e−4)

(5.6e−3, 7.2e−4) (5.3e−3, 6.1e−4) (4.1e−3, 5.7e−4) (0.7e−3, 1.1e−4)

0.22 0.26 0.25 0.23

F7

20 50 200 500

(8.8e−3, 9.6e−4) (8.1e−3, 7.4e−4) (6.3e−3, 5.1e−4) (9.6e−4, 0.6e−4)

(3.3e−2, 2.3e−3) (3.1e−2, 1.4e−3) (1.6e−2, 0.2e−3) (5.8e−3, 6.7e−4)

(5.9e−6, 3.3e−6) (5.6e−6, 1.8e−6) (1.7e−6, 7.8e−7) (9.6e−7, 3.9e−7)

(9.9e−3, 4.7e−4) (9.1e−3, 4.2e−4) (7.6e−3, 2.5e−4) (2.1e−3, 0.1e−4)

(6.3e−3, 3.5e−4) (6.1e−3, 3.0e−4) (4.4e−3, 2.7e−4) (0.5e−3, 8.2e−5)

0.31 0.25 0.24 0.22

F8

20 50 200 500

(3.7e−2, 1.2e−2) (9.2e−3, 0.7e−3) (7.3e−3, 0.2e−3) (4.2e−3, 8.2e−4)

(9.6e−2, 8.8e−2) (8.5e−2, 6.7e−2) (5.2e−2, 2.1e−2) (1.9e−2, 7.2e−3)

(9.1e−2, 7.1e−3) (7.5e−2, 2.1e−3) (4.5e−2, 1.6e−3) (2.2e−3, 3.1e−4)

(8.1e−2, 7.7e−2) (5.9e−2, 4.6e−2) (3.9e−2, 2.8e−3) (7.1e−3, 1.9e−3)

(6.9e−2, 5.3e−2) (4.6e−2, 4.1e−2) (3.2e−2, 0.4e−3) (3.4e−3, 1.2e−3)

0.26 0.24 0.21 0.25

Note: Bold values indicate the best results, which still correspond to the Bayesian approach.

Table 4 Indicator SIF (h1 , . . . , hd ) and BMSE (h1 , . . . , hd )(×1000), d ∈ {2, 3, 4}. f

n

SIF

BMSE

f

n

SIF

BMSE

F1

20 50 200 500

(6.7, 8.9) (6.4, 10.9) (6.5, 5.8) (7.3, 6.6)

(8.9, 7.5) (8.2, 7.4) (7.8, 5.1) (6.2, 6.5)

F5

20 50 200 500

(11.2, 14.2, 20.8) (22.9, 16.5, 22.1) (16.7, 13.7, 13.8) (19.8, 20.9, 18.8)

(9.1, 8.9, 12.2) (12.1, 10.8, 15.3) (15.6, 9.4, 13.2) (1.2, 4.5, 12.8)

F2

20 50 200 500

(5.1, 5.6) (6.4, 5.7) (5.8, 7.8) (9.4, 9.2)

(8.5, 4.7) (8.6, 5.0) (6.5, 3.5) (5.7, 6.7)

F6

20 50 200 500

(16.7, 15.0, 15.9) (24.2, 20.5, 23.8) (18.8, 23.3, 19.8) (33.8, 28.8, 30.8)

(9.1, 8.0, 7.9) (14.7, 12.4, 12.0) (10.5, 9.7, 9.8) (14.5, 13.5, 29.0)

F3

20 50 200 500

(8.4, 8.5) (11.6, 10.7) (11.9, 10.1) (9.6, 11.9)

(8.2, 5.8) (6.0, 6.9) (3.0, 4.5) (2.1, 5.6)

F7

20 50 200 500

(15.4, 8.4, 9.7, 97.1) (22.7, 13.2, 12.4, 28.1) (30.9, 24.7, 22.9, 29.5) (33.5, 28.7, 25.9, 47.8)

(20.6, 14.3, 17.7, 18.1) (31.1, 12.8, 20.1, 20.2) (43.2, 9.5, 67.5, 75.3) (6.7, 8.7, 23.1, 7.8)

F4

20 50 200 500

(7.3, 9.8) (12.7, 9.9) (15.8, 10.8) (11.7, 12.7)

(5.3, 6.5) (4.1, 7.0) (1.2, 3.4) (1.0, 2.3)

F8

20 50 200 500

(15.1, 14.5, 62.4, 14.3) (23.2, 28.6, 14.7, 50.3) (45.8, 25.5, 22.9, 33.9) (22.7, 57.7, 59.8, 45.2)

(28.2, 22.3, 15.0, 15.4) (8.5, 9.3, 21.8, 22.8) (5.6, 9.8, 6.7, 45.3) (98.7, 76.5, 12.8, 34.2)

Table 5 Comparison of execution times in seconds for one replication (tBayes , tLSCV ). The first component is the execution times of Bayesian approach, and second one is the execution times of cross-validation. f

n 20

50

100

200

F1 F2 F3 F4 F5 F6 F7 F8

(8 s, 10 s) (4 s, 5 s) (6 s, 4 s) (8 s, 15 s) (17 s, 48 s) (18 s, 40 s) (39 s, 57 s) (78 s, 114.6 s)

(18 s, 15 s) (19 s, 11 s) (19 s, 10 s) (17 s, 19 s) (33 s, 50 s) (33 s, 44 s) (52 s, 60 s) (150 s, 138 s)

(58 s, 20 s) (58 s, 42 s) (49 s, 16 s) (58 s, 28 s) (114 s, 57 s) (72 s, 51 s) (114 s, 66 s) (189 s, 174 s)

(216 s, 90 s) (228, 57 s) (186 s, 47 s) (234 s, 49 s) (438 s, 84 s) (420 s, 59 s) (228 s, 174 s) (744 s, 430.8 s)

Note: Bold values indicate the best results.

Table 4 presents SIF and BMSE for all models. The values of indicators show that all the simulated chains have mixed very well. Table 5 shows the results in seconds of time consuming for comparing the performance of the methods. The computations were done by a standard personal computer at 2.0 GHz with 2G RAM and by the R software for one replication. We can observe the advantage of Bayesian approach than cross-validation one for all models except model F3 when the simple size is small (n = 20), and for n = 50, Bayesian approach is better than LSCV for data generated from F4, F5, F6 and F7. However, when the sample size is large (n > 50) for considered models (F1, . . . , F8) the LSCV method is the best.

10

N. Belaid et al. / Journal of the Korean Statistical Society (

)



Table 6 Number of points collected and number of matches played in World Cup 2014. Country

All

Pa.B

Arg

Col

Bel

Brz

Fran

Co.R

Number of points Number of mach

19 7

17 7

16 7

12 5

12 5

11 7

10 5

9 5

Country

Chi

Mex

Suis

Urug

Grec

Alg

Equa

7 4

7 4

6 4

6 4

5 4

4 4

4 3

Nig

Por

Croa

Bos

Cod

Ita

Esp

4 4

4 3

3 3

3 3

3 3

3 3

3 3

Gha

Ang

Cor

Iran

Japo

Aus

Hon

1 3

1 3

1 3

1 3

1 3

0 3

0 3

Number of points Number of mach Country Number of points Number of mach Country Number of points Number of mach

USA 4 4 Rus 2 3 Came 0 3

Table 7 Bacterial counts by 3 samplers in 50 sterile locations. X1

X2

X3

X1

X2

X3

X1

X2

X3

X1

X2

X3

X1

X2

X3

1 8 2 2 5 14 3 7 3 1

2 6 13 8 6 1 9 6 4 9

11 0 5 1 5 7 2 8 12 7

3 3 4 9 5 4 7 1 2 14

6 9 2 7 4 4 3 14 13 9

6 14 25 3 8 7 2 6 0 5

3 1 4 7 8 3 6 2 1 2

8 1 5 6 10 2 8 3 7 9

2 30 15 3 4 10 5 10 3 12

7 2 3 1 4 8 6 4 3 6

10 2 15 8 6 7 6 14 3 8

5 8 3 2 0 3 6 7 14 3

22 5 2 2 4 4 8 3 4 2

9 2 0 1 6 9 4 10 7 4

6 4 6 1 4 2 6 6 10 6

Table 8 Comparison ISE 0 and execution times for first application of Table 6. Kernels

DirDU

Poisson

Binomial

Triangdisc a=4

Triangdisc a=2

ISE 0 (LSCV, Bayes) Execution times

(6.7e−4, 8.2e−6) (7.6 s, 7.4 s)

(5.2e−3, 9.8e−5) (8.7 s, 8.4 s)

(1.4e−5, 7.8e−6) (10.5 s, 8.4 s)

(1.4e−5, 7.8e−6) (9.7 s, 9.1 s)

(1.4e−5, 7.8e−6) (9.9 s, 9.4 s)

Note: Bold values indicate the best results. Table 9 Comparison ISE 0 and execution times for second application of Table 7. Kernels

DirDU

Poisson

Binomial

Triangdisc a=4

Triangdisc a=2

ISE 0 (LSCV, Bayes) Execution times

(9.6e−7, 2.7e−7) (31.5 s, 60.6 s)

(0.2e−5, 4.3e−6) (36.9 s, 64.1 s)

(2.5e−7, 1.2e−7) (33.8 s, 67.6 s)

(7.8e−7, 3.1e−7) (33.1 s, 62.4 s)

(5.1e−7, 2.2e−7) (32.3 s, 64.5 s)

Note: Bold values indicate the best results.

4. Applications to real data and final remarks We now apply the methodology to two real data sets. The first one concerns a bivariate data of size n = 32 (number of countries that have participated in the World Cup 2014); see Table 6. The first variable is the number of points collected by each team and the second variable is the number of matches played by these latter. The second data set is composed by a trivariate data of size n = 50. It represents the study of the relative effectiveness of three different air samplers 1, 2, 3 to detect pathogenic bacteria in sterile rooms, triplets of a microbiologist obtained triplets of bacterial colony counts x1 , x2 , x3 from samplers 1, 2, 3 in each of 50 different sterile locations. See Aitchison and Ho (1989) for the original data set which is also given in Table 7. To evaluate the performance of the two methods, we use the empirical ISE defined by: ISE 0 =



[ fn (x) − f0 (x)]2 ,

x∈Td

where f0 (x) is the empirical estimator or naive estimator with the Dirac kernel. Tables 8 and 9 show, respectively, the results of the first and second application through ISE 0 and execution times (in seconds). The applications to real data sets indicate the strength of the Bayesian global approach to bandwidth selection for pmf estimations using discrete multivariate associated kernels. The very bad discrete associated kernel is the Poisson kernel.

N. Belaid et al. / Journal of the Korean Statistical Society (

)



11

Summary and final remarks This paper proposed a nonparametric estimator for probability mass function of multivariate data. The estimator is based on discrete multivariate associated kernel without correlation structure. An extension to bandwidth matrix in the case with correlation structure is obviously possible. For the choice of the bandwidth diagonal matrix, we presented the Bayes global method against the likelihood cross-validation one, we used the MCMC method for deriving the global optimal bandwidth. We have compared between the presented method and cross-validation (LSCV) method. For evaluating their performances, we have studied the ISE criterion by using simulations for univariate and multivariate models, and real data for bivariate and trivariate data sets. The obtained results show that the Bayes global method performs better than cross-validation one, even for the Poisson kernel which is the very bad discrete associated kernel among all. Future work will be devoted to a Bayesian local bandwidth selection and, also, to the Bayesian adaptive approach for discrete multivariate associated kernel estimators. Acknowledgements Part of this work was done while the first author was at Laboratoire de Mathématiques de Besançon as a visiting scientist, with the support of University of Bejaia by the LAMOS Laboratory Research Grant B00620120001. We sincerely thank an Associate Editor and the anonymous reviewers for their valuable comments. References Abdous, B., Kokonendji, C. C., & Senga Kiessé, T. (2012). On semiparametric regression for count explanatory variables. Journal of Statistical Planning and Inference, 142, 1537–1548. Aitchison, J., & Aitken, C. G. G. (1976). Multivariate binary discrimination by the kernel method. Biometrika, 63, 413–420. Aitchison, J., & Ho, C. H. (1989). The multivariate Poisson-log normal distribution. Biometrika, 76, 643–653. Bouezmarni, T., & Roumbouts, J. V. K. (2010). Nonparametric density estimation for multivariate bounded data. Journal of Statistical Planning and Inference, 140, 139–152. Brewer, M. J. (1998). A modelling approach for bandwidth selection in kernel density estimation. In Proceedings of COMPSTAT (pp. 203–208). Heidelberg: Physica Verlag. Brewer, M. J. (2000). A Bayesian model for local smoothing in kernel density estimation. Statistics and Computing, 10, 299–309. Chacón, J. E., & Duong, T. (2010). Multivariate plug-in bandwidth selection with unconstrained pilot matrices. Test, 19, 375–398. Chacón, J. E., & Duong, T. (2011). Unconstrained pilot selectors for smoothed cross-validation. Australian and New Zealand Journal of Statistics, 53, 331–351. De Lima, M. S., & Atuncar, G. S. (2010). A Bayesian method to estimate the optimal bandwidth for multivariate kernel estimator. Journal of Nonparametric Statistics, 23, 137–148. Duong, T., & Hazelton, M. L. (2003). Plug-in bandwidth matrices for bivariate kernel density estimation. Journal of Nonparametric Statistics, 15, 17–30. Duong, T., & Hazelton, M. L. (2005). Cross-validation bandwidth matrices for multivariate kernel density estimation. Scandinavian Journal of Statistics, 32, 485–506. Gangopadhyay, A. K., & Cheung, K. N. (2002). Bayesian approach to the choice of smoothing parameter in kernel density estimation. Journal of Nonparametric Statistics, 14, 655–664. Hu, S., Poskitt, D. S., & Zhang, X. (2012). Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions. Computational Statistics and Data Analysis, 56, 732–740. Kokonendji, C. C., & Senga Kiessé, T. (2011). Discrete associated kernels method and extensions. Statistical Methodology, 8, 497–516. Kokonendji, C. C., Senga Kiessé, T., & Balakrishnan, N. (2009). Semiparametric estimation for count data through weighted distributions. Journal of Statistical Planning and Inference, 139, 3625–3638. Kokonendji, C. C., Senga Kiessé, T., & Zocchi, S. S. (2007). Discrete triangular distributions and non-parametric estimation for probability mass function. Journal of Nonparametric Statistics, 19, 241–257. Kokonendji, C. C., & Somé, S. M. (2015). On multivariate associated kernels for smoothing general density function. arXiv: 1502.01173. Kulasekera, K. B., & Padgett, W. J. (2006). Bayes bandwidth selection in kernel density estimation with censored data. Journal of Nonparametric Statistics, 18, 129–143. Kuruwita, C. N., Kulasekera, K. B., & Padgett, W. J. (2010). Density estimation using asymmetric kernels and Bayes bandwidths with censored data. Journal of Statistical Planning and Inference, 140, 1765–1774. Li, Q., & Racine, J. S. (2007). Nonparametric econometrics: Theory and practice. Princeton, Oxford: Princeton University Press. Racine, J. S., & Li, Q. (2004). Nonparametric estimation of regression functions with both categorical and continuous data. Journal of Econometrics, 119, 99–130. R Development Core Team (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, ISBN: 3-900051-07-0, http://www.R-project.org. Sain, S. R., Baggerly, K. A., & Scott, D. W. (1994). Cross-validation of multivariate densities. Journal of the American Statistical Association, 89, 807–817. Senga Kiessé, T., Zougab, N., & Kokonendji, C. C. (2015). Bayesian estimation of bandwidth in semiparametric kernel estimation of unknown probability mass and regression functions of count data. Computational Statistics, http://dx.doi.org/10.1007/s00180-015-0627-1. Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall. Simonoff, J. S. (1996). Smoothing methods in statistics. New York: Springer-Verlag. Somé, S. M., & Kokonendji, C. C. (2015). Effects of associated kernels in nonparametric multiple regressions. arXiv: 1502.01488. Wand, M. P., & Jones, M. C. (1994). Multivariate plug-in bandwidth selection. Computational Statistics, 9, 97–116. Wand, M., & Jones, M. (1995). Kernel smoothing. London: Chapman and Hall. Wang, M. C., & Ryzin, J. V. (1981). A class of smooth estimators for discrete distributions. Biometrika, 68, 301–309. Wansouwé, W. E., Kokonendji, C. C., & Kolyang, D. T. (2015). Disake: Discrete associated kernel estimators. URL: http://cran.r-project.org/web/packages/ Disake/. Zhang, X., King, M. L., & Hyndman, R. J. (2006). A Bayesian approach to bandwidth selection for multivariate kernel density estimation. Computational Statistics and Data Analysis, 50, 3009–3031. Zhang, X., King, M. L., & Shang, H. L. (2013). Bayesian bandwidth selection for a nonparametric regression model with mixed types of regressors. https://ideas.repec.org/p/msh/ebswps/2013-13.html. Zougab, N., Adjabi, S., & Kokonendji, C. C. (2012). Binomial kernel and Bayes local bandwidth in discrete functions estimation. Journal of Nonparametric Statistics, 24, 783–795. Zougab, N., Adjabi, S., & Kokonendji, C. C. (2013a). A Bayesian approach to bandwidth selection in univariate associate kernel estimation. Journal of Statistical Theory and Practice, 7, 8–23. Zougab, N., Adjabi, S., & Kokonendji, C. C. (2013b). Adaptive smoothing in associated kernel discrete functions estiamtion using Bayesian approach. Journal of Statistical Computation and Simulation., 83, 2219–2231. Zougab, N., Adjabi, S., & Kokonendji, C. C. (2014). Bayesian estimation of adaptive bandwidth matrices in multivariate kernel density estimation. Computational Statistics and Data Analysis, 75, 28–38.