Deconvolving multidimensional density from partially contaminated observations

Deconvolving multidimensional density from partially contaminated observations

Journal of Statistical Planning and Inference 104 (2002) 147–160 www.elsevier.com/locate/jspi Deconvolving multidimensional density from partially c...

125KB Sizes 1 Downloads 60 Views

Journal of Statistical Planning and Inference 104 (2002) 147–160

www.elsevier.com/locate/jspi

Deconvolving multidimensional density from partially contaminated observations a Department

b Department

Ming Yuana; ∗ Jiaqin Chenb

of Statistics, University of Wisconsin, Madison, WI 53706, USA of Modern Physics, University of Science and Technology of China, Hefei, Anhui 230026, People’s Republic of China

Received 13 March 2000; received in revised form 21 February 2001; accepted 2 June 2001

Abstract We consider the problem of estimating a continuous bounded d-variate joint probability density function of (X1 ; X2 ; : : : ; Xd ) when a stationary process X1 ; : : : ; Xn from the density are partially contaminated by measurement error. In particular, the observations Y1 ; : : : ; Yn are such that P (Yj = Xj ) = p and P (Yj = Xj + j ) = 1 − p, where the errors { j } are independent (of each other and of {Xj }) and identically distributed from a known distribution. We are interested in estimating the joint density of (X1 ; X2 ; : : : ; Xd ). When p = 0, it is well known that deconvolution via kernel density estimators su5ers from notoriously slow rate of convergence (Ann. Statist. 18(2) (1990) 806; J. Multivariate Anal. 44 (1993a) 57). In contrast, for univariate partially contaminated observations (0 ¡ p ¡ 1), almost sure rates of O((ln n=nhn )1=2 ) has been achieved for convergence in L∞ -norm (J. Multivariate Anal. 55 (1995) 246). But in multivariate case, the situation is much more complex because of the dependence among the data. One purpose of this paper is to =ll in this void. Furthermore, under mild conditions, we also obtain the asymptotic mean squared error and asymptotic normality of the estimator with partially contaminated c 2002 Published by Elsevier Science B.V. observations.  MSC: 62G07; 62F05; 62H10 Keywords: Strong mixing; Kernel density estimation; Partially contaminated; Deconvolution; Mean squared error; Strong consistency; Asymptotic normality

1. Introduction Assume that X1 ; : : : ; Xn and 1 ; : : : ; n are two sets of mutually independent variables coming from distributions with densities g and f , respectively. If Yj = Xj +

j ; j = 1; : : : ; n are observed, the problem of estimating g has received much attention in literature recently. Non-parametric estimation of g has been addressed by Caroll and Hall (1988), Fan (1991a, b), Liu and Taylor (1989), Stefanski (1990), Stefanski and ∗

Corresponding author. E-mail addresses: [email protected] (M. Yuan), [email protected] (Jiaqin Chen).

c 2002 Published by Elsevier Science B.V. 0378-3758/02/$ - see front matter  PII: S 0 3 7 8 - 3 7 5 8 ( 0 1 ) 0 0 2 3 8 - 5

148

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

Caroll (1990) and Zhang (1990) among others. Most of the above papers derived rates of convergence for a particular estimator, a given loss function and a speci=ed error distribution under various sets of conditions on the density to be estimated and the family of kernels used in constructing the estimator. For normal, gamma and double-gamma error distributions, optimal pointwise convergence rates are achieved by deconvolving kernel density estimators introduced by Stefanski and Caroll (1990) and Zhang (1990). It was found that even optimal estimation of g su5ers from notoriously slow rates. For normally distributed errors and, in general, for any error distributions whose Fourier transformation decays exponentially, those rates are of logarithm order for pointwise convergence (see Caroll and Hall, 1988). In contrast, for merely partially contaminated observations (0 ¡ p ¡ 1), better rates might be achieved. For univariate case, Hesse (1998) studied the observations Yj = Xj + Tj j where {Tj } is independent (of each other and of {Xj }; { j }) Bernoulli random variables with parameter 1 − p ∈ (0; 1). In univariate case, he showed that the convergence rates are in the same order as the existing optimal rates for ordinary kernel density estimator with uncontaminated (p = 1) observations. This result is understandable from a heuristic point of view. Because there is an expected fraction of np non-contaminated sample elements. We will follow Hesse’s idea in the current note. All the papers mentioned above considered problems under the assumption of independent and identically distributed (iid). But this is not always the case in real applications. Sometimes, we need to consider deconvolving multivariate density of (X1 ; : : : ; Xd ) for a stationary process. Masry (1993a, b) is one of the pioneers in this area. He not only generalized the almost sure convergence of deconvolution problem to multivariate case (see Masry, 1993a), but also established central limit theorems (CLT) for the estimator under mild conditions. Similar to the univariate case, we are forced to face the intolerable slow rates of convergence in multivariate case (see Masry, 1993a). How about the partially contaminated observations? Can we obtain similar results as univariate case? Can the convergence rate in the same order as ordinary kernel density estimator be achieved? The problem is much more complex than the univariate case due to the dependence among the observations and the higher dimensionality of the data. One of the main results in this paper is to give a positive answer to this question. Furthermore, under the same conditions as that for ordinary case, we show the similarity between the kernel density estimator for partially contaminated model and the ordinary kernel density estimator for uncontaminated observations through establishing the mean square error (MSE) and asymptotic normality of the estimator. Thus the model we consider here is Yj = Xj + Tj j ;

j = 1; : : : ; n + d − 1;

where realization of the {Yj } is observed, the noise { j } consists of iid random variables and {Xj } is a stationary process so that for each d ¿ 1, the joint probability density function (pdf) of random variables X1 ; : : : ; Xd exists.

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

149

Masry (1993a) had introduced a deconvolving kernel density estimator from observations contaminated by noise. This estimator, which will be utilized in this note, is   n  Yj − x ; Kn gˆn (x; d) = (nhdn )−1 hn j=1 where x = (x1 ; : : : ; xd ) ∈ Rd ; Yj = (Yj ; : : : ; Yj+d−1 ) and

 Kn

Yj − x hn



 =

1 2

d  [−1;1]d

 exp it

 Yj

−x hn



k ∗ (ti ) dt: ∗ i=1 fp (ti =hn ) d 

Kn is a deconvolution kernel based on the characteristic function fp∗ of the distribution of Tj j , the characteristic function k ∗ of another kernel K, and the bandwidth sequence {hn }, which is taken to converge to zero. We need some notation and assumptions throughout this paper. We suppose that for each couple (t; t  ), |t − t  | ¿ d, the random vector (Yt ; Yt  ) has a density and we set ft; t  = f(Xt ;Xt ) − f ⊗ f: On the other hand, {Xt } is supposed to be strong mixing (see below) such that X (k) 6 k − ;

k ¿1

for some positive constants  and : Furthermore, we suppose that Assumption A. Suppose that ft; t  exists for each couple (t; t  ); |t − t  | ¿ d; Assumption B. Suppose that X (k) 6 k − ; k ¿ 1 and either of the following statements holds: (1) r = sup|t−t  |¿d ||ft; t  ||r ¡ + ∞ for some r ∈ (2; +∞] and  ¿ 2(p − 1)=(p − 2); (2) |ft; t  (z) − ft; t  (z  )| 6 l||z − z  ||; z; z  ∈ R2d for some constant l and  ¿ (2d + 1)=(d + 1). Let us denote by C2; d (b) the space of twice continuous di5erentiable real valued functions f, de=ned on Rd , and such that ||f||∞ 6 b and ||f ||∞ 6 b where f denotes any partial derivative of order 2 for f. Assumption C. Suppose that the density g(x; d) belongs to C2; d (b); Remark 1.1. Assumptions A–C were also assumed for establishing similar results for ordinary kernel density estimation (see Bosq, 1996).

150

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

Assumption D. Suppose (i) K is a non-negative symmetric kernel with    z 2 K(z) d z ¡ ∞; zK(z) d z = 1; K(z) d z = 1; R

R

R

(ii) k ∗ (t) = 0 for |t| ¿ 1; k ∗ (t) and fp∗ (t) are twice continuously di5erentiable, and inf |fp∗ (t)| ¿ p: t

Remark 1.2. Assumption D is concerned with the contamination distribution and the corresponding kernel-type estimator. It is the same as that of the univariate case which was =rst proposed by Hesse (1995). Finally, we de=ne the type of dependence among the data. Denition 1.1. Let {Xt } be a strictly stationary process, its strong mixing coePcient of order k is de=ned as X (k) =

sup

B∈$(Xs ; s6t);C∈$(Xs ; s¿t+k)

|P(B ∩ C) − P(B)P(C)|;

k ¿ 1:

Moreover, {Xt } is said to be strong mixing if X (k) → 0 (k → ∞). In next section, we will give the asymptotic MSE for our estimator. The almost sure convergence rate is obtained in Section 3. Finally, CLT is established for gˆn in Section 4. Some preliminary results to be used will be recited in the appendix. 2. MSE To carry out our further results, we begin the study of gˆn by evaluating the MSE. We will show that, under mild conditions, the error turns out to be the same as ordinary kernel density estimator. Now we are in the position to state our result concerning the MSE. Theorem 2.1. If Assumptions A–D hold; then the choice hn = cn n−1=(d+4) where cn → c ¿ 0 leads to n4=(d+4) E[gˆn (x; d) − g(x; d)]2 → C ¿ 0; where p−2d c2 C= 4



2

R

vK(v) dv

 1 + f(x; d)

−1

|k ∗ (t)|2 dt

2

@2 g (x; d) 16i; j6d @xi @xj 

d

2cp2

and f(x; d) denotes the density of Y1 .

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

We decompose the proof of this theorem to the following lemmas. Lemma 2.1. For each n ¿ d; we have Y (n) 6 X (n − d + 1): Proof. By de=nition Y (n) = 6

sup

| P (AB) − P (A) P (B)|

A∈Fk1 (Y);B∈F∞ k+n (Y)

sup

(X );B∈F∞ A∈Fk+d−1 1 k+n (X )

| P (AB) − P (A) P (B)|

6 X (n − d + 1): Lemma 2.2. Under Assumptions A–D; for any x such that f(x; d) ¿ 0 we have nhdn Var[gˆn (x; d)] → $2 ¿ 0; where

2

$ =

1 2p2



d

1



−1

2

|k (t)| dt

f(x; d):

Proof. It is easy to check that nhdn Var[gˆn (x; d)] = h−d n Var(Z1 ) +

n 2  (n − i + 1) Cov(Z1 ; Zi ) d nhn i=2

= I1 + 2I2 + 2I3 ; where I1 = h−d n Var(Z1 );   p i−1 −d  Cov(Z1 ; Zi ); 1− I2 = hn n i=2   n  i−1 Cov(Z1 ; Zi ) 1 − I3 = h−d n n i=p+1 and

 Zi = Kn

Yi − x hn



 − EKn

Yi − x hn

 :

We will calculate I1 ; I2 ; I3 ; respectively. I1 = h−d n Var(Z1 )     2 Y1 − x Y1 − x −d 2 = hn EKn − EKn hn hn : = J1 − J2 : It is easy to check that J2 → 0.

151

152

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

Another item J1 can be calculated by Parsval’s identity   Y1 − x 2 EK J1 = h−d n n hn    Y1 − x −d 2 = hn Kn f(Y1 ; d) dY1 hn  → f(x; d) Kn2 (u) du =

f(x; d) (2)d

 [−1;1]d

f(x; d) −2d → p (2)d

Thus,

−1

 I1 → f(x; d)

i=1



 1 = f(x; d)

d

 k ∗ (ti ) 2

fp (ti =hn ) dt

[−1;1]

2 d



k ∗ (ti ) dt

d i=1

|k ∗ (t)|2 dt

|Cov(Z1 ; Zi )| 6 h−d n

:

2p2

|k ∗ (t)|2 dt 2p2

For any 1 ¡ i 6 d,

d

d :

        2  Y1 − x Yi − x Y1 − x E Kn Kn + EKn hn hn hn

= J3 (i) + J2 : Because

      Y1 − x Yi − y Kn Kn f1; i (Y1 ; Yi ) dY1 dYi hn hn  ˜ d + i − 1) du dv = hni−1 Kn (u)Kn (v)f(x˜ + y˜ − hn u˜ − hn v;

J3 (i) = h−d n

= O(hni−1 ); where x˜ = (x1 ; : : : ; xd ; 0; : : : ; 0); u˜ = (u1 ; : : : ; ud ; 0; : : : ; 0); y˜ = (0; : : : ; 0; y1 ; : : : ; yd ) and v˜ = (0; : : : ; 0; v1 ; : : : ; vd );

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

153

we have |I2 | 6 d sup

1¡i¡d+1

J3 (i) + dJ2 → 0:

Now only I3 remains to be studied. Holder’s inequality yields

    

Y1 − x Yi − y Kn f1; i (x; y) dx dy

|Cov(Z1 ; Zi )| =

Kn hn hn −2d=r 6 h2d ||Kn ||2q n ||f1; i ||r hn −2d=r 6 h2d ||Kn ||2∞ n ||f1; i ||r hn

6 Ch2d(1−1=r) : n The remaining proof can be carried out using a similar argument as that of Theorem 2:1 of Bosq (1996). Remark 2.1. From the proof, we can also conclude that for any kn  k n  Zi = O(kn hdn ): Var i=1

Lemma 2.3. Under Assumption D; we have h−2 n |E gˆn (x; d) − g(x; d)| → C ¿ 0; where C = p−d h2n

 R

2 vK(v) dv

@2 g (x; d): 16i; j6d @xi @xj 

Proof. The proof is similar to that of Hesse (1995, Theorem 1(a)) by using the Parsval’s identity. Proof of Theorem 2.1. It is clear that E[gˆn (x; d) − g(x; d)]2 = [E gˆn (x; d) − g(x; d)]2 + Var(gˆn (x; d)): Choose hn = O(n−1=(d+4) ), by Lemmas 2.2 and 2.3, we can complete the proof. Remark 2.2. From a heuristic point of view it is not surprising that for the MSE the same rates can be attained as in the non-contaminated (p = 1) case, because there is an expected fraction of np non-contaminated sample elements. At a more technical level, this can also be understood because the Fourier transform of the contaminated variable Tj j is bounded away from 0. This is ful=lled for many reasonable densities according to Assumption D. And hence its reciprocal remains bounded. This implies that the problem is not really ill-posed so that convergence rates as in the non-contaminated case can be expected.

154

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

Remark 2.3. The rate in Theorem 2.1 does not depend on the mixing index , which is in fact as good as in the independent case.

3. Uniformly almost sure convergence Generally, it is rather diPcult to verify that a random sequence is strong mixing. But because geometric ergodicity implies a certain type of strong mixing, geometrically strong mixing (GSM, see below; see Dielbolt and Guegan, 1992), the study of GSM received much attention in literature. Bosq (1996) derived the strong consistent speed for ordinary kernel density estimator. Corresponding to his theorem, one of our main results in this section is devoted to the study of the consistent property for deconvolving kernel estimate. Denition 3.1. If there exist c ¿ 0 and r ∈ [0; 1) such that X (k) 6 cr k ; we will say that {Xt } is GSM. To avoid trivial statement, we assume the following assumption in this section. Assumption E. Suppose positive sequence hn converges to zero such that nhdn =ln n → ∞ as n → ∞. Theorem 3.1. Let {Xt } be a strictly stationary GSM process. Assume that Assumptions A–E hold; we have  1=2 ln n as n → ∞: sup |gˆn (x; d) − E[gˆn (x; d)]| =a:s: O nhdn |x|6n1 In addition, if P(|Y1 | ¿ y) 6 y−3 for positive y large enough and some 3 ¿ 0, where |Y1 | = max16i6d Yi , we have  1=2 ln n d as n → ∞: sup |gˆn (x; d) − E[gˆn (x; d)]| =a:s: O nhdn x∈R Together with Lemma 2.3, we have Corollary 3.1. Let {Xt } be a strictly stationary GSM process. Assume that Assumptions A–E hold and hn = (ln n=n)1=(4+d) ; we have  2=(4+d) ln n as n → ∞: sup |gˆn (x; d) − g(x; d)| =a:s: O n |x|6n1

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

155

In addition; if P(|Y1 | ¿ y) 6 y−3 for positive y large enough and some 3 ¿ 0; where |Y1 | = max16i6d Yi ; we have  2=(4+d) ln n as n → ∞: sup |gˆn (x; d) − g(x; d)| =a:s: O n x∈Rd To describe a more general result, we need some notation.  d 1=2 nhn r(n) = ; ln n 1=2  n Nn = ; h3d n ln n for some  ¿ 0, n (n) = r(n)



n hdn ln n

=[2(2+1)]

{X [r(n) − d + 1]}2=(2+1)

and for some 1 ¿ 0, 

(n) = Nn n1 (n):

Now, we have Theorem 3.2. Assume that Assumptions A–E hold. Suppose that P(|Y1 | ¿ y) 6 y−3 for positive y large enough and some 3 ¿ 0. If the strong mixing coe


j=1

(j) ¡ ∞;

for some  ¿ 0 and 1 ¿ max{1; 2d=3}; then  1=2 ln n sup |gˆn (x; d) − E[gˆn (x; d)]| =a:s: O nhdn x∈R d

as n → ∞:

Similar to Corollary 3.1, we have Corollary 3.2. Assume that Assumptions A–E hold and hn = (ln n=n)1=(4+d) . Suppose that for positive y large enough and some 3 ¿ 0; P(|Y1 | ¿ y) 6 y−3 . If the strong mixing coe


(j) ¡ ∞;

for some  ¿ 0 and 1 ¿ max{1; 2d=3}; then  2=(4+d) ln n sup |gˆn (x; d) − g(x; d)| =a:s: O n x∈Rd

as n → ∞:

156

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

Remark 3.1. We generalized the result of Hesse (1995) in two directions. The dimension of the estimated density increases from d = 1 to d ¿ 1 and the independent assumption is replaced by more general condition, strong mixing. To carry out our proofs, we need the following lemmas. We =rst formula a Bernstein-type inequality for strong mixing sequence. Lemma 3.1. Let {Xt } be a zero-mean real valued process such that Sup16t6n ||Xt ||∞ 6 b: Then for each integer r ∈ [1; n=2] and each t;  ¿ 0; we have   =(2+1)  2nb n t2 + 18 [(r)]2=(2+1) P(|Sn | ¿ t) 6 4 exp − t r 64q$2 (r) + 8ct where

 2

$ (r) = sup Var j

rj  i=r( j−1)+1

 Xi

and q = n=2r and c = max{1; rb}. rj Proof. Let Vj = i=r( j−1)+1 Xi . By Bradley’s Lemma, we can =nd an independent sequence {V2j∗ } such that V2j∗ has the same distribution as that of V2j and P(|V2j∗ − V2j | ¿ 8) 6 18(||V2j || =8)=(2+1) [(r)]2=(2+1) : ∗ ∗ } such that V2j−1 has the same Similarly we can =nd independent sequence {V2j−1 distribution as that of V2j−1 and ∗ P(|V2j−1 − V2j−1 | ¿ 8) 6 18(||V2j−1 || =8)=(2+1) [(r)]2=(2+1) :

Thus

    

t ∗

¿ t + P V2j−1 P(|Sn | ¿ t) 6 P V2j∗ ¿ 4 4    n 2n + sup P |V2j∗ − V2j | ¿ t 2r r    n 2n ∗ + sup P |V2j−1 − V2j−1 | ¿ t 2r r = I1 + I2 + I3 + I4 :

By Bernstein’s inequality, we have   t2 : I1 ; I2 6 2 exp − 64p$2 (r) + 8ct Noting that ||V2j || 6 ||V2j ||∞ = rb;

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

157

we have    2n 6 18(2nb=t)=(2+1) [(r)]2=(2+1) : P |V2j∗ − V2j | ¿ t r Thus,

 I3 6 9

Similarly,



I4 6 9

2nb t 2nb t

=(2+1)

=(2+1)

n [(r)]2=(2+1) : r n [(r)]2=(2+1) ; r

which completes the proof. Lemma 3.2. Assume that Assumptions A–E hold. If the strong mixing coe
for some  ¿ 0; then for each =xed x ∈ R d ; we have  1=2 ln n gˆn (x; d) − E[gˆn (x; d)] =a:s: O as n → ∞: nhdn Proof. To make use of Lemma 3:1 with {Zi } ({Zi } is de=ned in the proof of Lemma 2.2), we should =rst evaluate $2 (r(n)). By Remark 2.1, We have $2 (r(n)) 6 Cr(n)hdn : Because there exists some b ¿ 0 such that |Zi | 6 b ¡ ∞, by Lemma 3.1 P(|gˆn (x; d) − E[gˆn (x; d)]| ¿ B=r(n))  n 

 d

= P Zi ¿ nBhn =r(n) 6 4 exp(−CB2 ln n) i=1

n +C r(n)



n d hn ln n

=[2(2+1)]

{Y [r(n)]}2=(2+1) :

By Lemma 2.1, we have 2

P(|gˆn (x; d) − E[gˆn (x; d)]| ¿ B=r(n)) 6 C(n−CB + (n)): Choosing B suPciently large such that CB2 ¿ 1, the proof is completed by Borel–Cantelli lemma. Lemma 3.3. Assume that Assumptions A–E hold. If the strong mixing coe
158

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

for some  ¿ 0 and 1 ¿ 0; we have



sup |gˆn (x; d) − E[gˆn (x; d)]| =a:s: O

|x|6n1

ln n nhdn

1=2 as n → ∞:

Proof. Denote Ii be cube with left-bottom point at 9i and right-top point at 9(i + 1) 1=2d where 9 = ((h3d . Repeat the argument of Hesse (1995, pp. 255 –256) in n ln n)=n) a multivariate version and use Lemma 3.1 instead of Bernstein inequality, we can complete the proof. Proof of Theorem 3.2. By Lemma 3.3, it suPces to show that for any 1 ¿ max{1=d; 2=3}, lim sup r(n) sup |gˆn (x; d) − E[gˆn (x; d)]| = 0 n→∞

|x|¿2n1

a:s:

This can be completed by a similar argument as that of Hesse (1995, pp. 254 –255).

4. Asymptotic normality The purpose of this section is to establish the asymptotic normality of kernel-type multivariate density estimator gˆn (x; d) based on the noisy observations {Yj }. Asymptotic distributions are obviously useful for constructing con=dent intervals. Theorem 4.1. Assume that Assumptions A–D hold and bandwidth {hn } is a sequence of positive numbers such that hdn −1=3 n ln n

hn → 0 and

→ ∞:

Then for every x such that f(x; d) ¿ 0; we have (nhdn )1=2 [gˆn (x; d) − E gˆn (x; d)] →d N(0; $2 ); where 2

$ =

1 2p2



1

−1

d ∗

2

|k (t)| dt

f(x; d):

Proof. In the light of Theorem 2.1, the proof can be completed in the same manner as that of Masry (1993b, pp. 62– 63) by choosing 1=4  d 1=2  nhn nln n ; sn = : rn = ln n hdn

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

159

Acknowledgements We thank the referees for their insightful comments, which helped us greatly improve the presentation. Appendix A Lemma A.1 (Billingsley, 1995). Let X and Y be two real valued random variables such that X; Y ∈ L∞ (P); then |Cov(X; Y )| 6 4||X ||∞ ||Y ||∞ : Lemma A.2 (Bosq, 1996). If (X; Y ) has an absolutely continuous distribution with respect to Lebesgue measure on R2d ; then  6 1=2||g||1 : If in addition g satis=es the Lipschitz’s condition |g(x ; y ) − g(x; y)| 6 l(||x − x||2 + ||y − y||2 )1=2 ; x; y; x ; y ∈ Rd ; for some constant l; then there exists a constant (d; l) such that ||g||∞ 6 (d; l)1=(2d+1) : Lemma A.3 (Bradley, 1983). Let (X; Y ) be a Rd × R-valued random vector such that Y ∈ Lp (P) for some p ∈ [1; +∞]. Then for any 8 ∈ (0; ||Y ||p ]; there exists a random variables Y ∗ such that (i) PY ∗ = PY and Y ∗ is independent of X . (ii) P(|Y ∗ − Y | ¿ 8) 6 18(||Y ||p =8)p=(2p+1) []2p=(2p+1) : Lemma A.4 (Volkonskii and Rozanov, 1959). Let V1 ; : : : ; VL be random variables measurable with respect to the $-algebras Fji11 ; : : : ; FjiLL ; respectively; with 1 6 i1 ¡ j1 ¡ i2 ¡ · · · ¡ jL 6 n; il+1 − jl ¿ w ¿ 1 and |Vj | 6 1 for j = 1; : : : ; L; then





L L  

Vj − E[Vj ] 6 16(L − 1)(w):

E

j=1 j=1 References Billingsley, P., 1995. Probability and Measure, 3rd Edition. Wiley, New York. Bosq, D., 1996. Nonparametric Statistics for Stochastic Processes. Springer, Berlin. Bradley, R.C., 1983. Approximation theorems for strong mixing random variables. Michigan Math. J. 30, 69–81. Caroll, R.J., Hall, P., 1988. Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83, 1184–1186.

160

M. Yuan, Jiaqin Chen / Journal of Statistical Planning and Inference 104 (2002) 147–160

Dielbolt, J., Guegan, D., 1992. Probabilistic properties of the general nonlinear auto-regressive process of order one, Technical Report No. 125, University Paris VI. Fan, J., 1991a. On the optimal rates of convergence for non-parametric deconvolution problems. Ann. Statist. 19, 1257–1272. Fan, J., 1991b. Global behavior of deconvolution kernel estimates. Statist. Sinica 1, 541–551. Hesse, C., 1995. Deconvolving a density from partially contaminated observations. J. Multivariate Anal. 55, 246–260. Liu, M.C., Taylor, R.C., 1989. A consistent nonparametric density estimator for the convolution problem. Canad. J. Statist. 17 (4), 427–438. Masry, E., 1993a. Strong consistency and rates for deconvolution for stationary random processes. Stochastic Process. Appl. 47, 53–74. Masry, E., 1993b. Asymptotic normality for deconvolution estimators of multivariate densities of stationary processes. J. Multivariate Anal. 44, 57–68. Stefanski, L., 1990. Rates of convergence of some estimators in a class of deconvolution problems. Statis. Probab. Lett. 9, 229–235. Stefanski, L., Caroll, R.J., 1990. Deconvoluting kernel density estimators. Statistics 1, 169–184. Volkonskii, V.A., Rozanov, Yu.A., 1959. Some limit theorems for random functions. Theory Probab. Appl. 4, 178–197. Zhang, C.H., 1990. Fourier methods for estimating mixing densities and distributions. Ann. Statist. 18 (2), 806–831.