b-, .; ?,~
STATISTICS & PROBABILITY LETTERS
; ,. 2 :
ELSEVIER
Statistics & Probability Letters 33 (1997) 1-13
Frequency polygons for weakly dependent processes Michel Carbon a, Bernard Garel b, Lanh Tat Tran c'* a Laboratoire de Probabilites et Statistique, Batiment M2, 59655 Villeneuve d'Ascq Cedex, France b E N S E E I H T Laboratoire d'Electronique, 2 Rue Camichel, 31071 Toulouse Ckdex, France c Department of Mathematics, Indiana University, Bloomington, IN 47405, USA
Received November 1994; revised February 1996
Abstract The purpose of this paper is to investigate the frequency polygon as a density estimator for stationary strong mixing processes. Optimal bin widths which asymptotically minimize integrated mean square errors (IMSE) are derived. Under weak conditions, frequency polygons achieve the same rate of convergence to zero of the IMSE as kernel estimators. They can also attain the optimal uniform rate of convergence (n - l logn) 1/3 under general conditions. Frequency polygons thus appear to be very good density estimators with respect to both criteria of IMSE and uniform convergence. A M S subject classifications: Primary 62G05; secondary 60J25, 62J02, 62M05, 62M09, 62H12 Keywords: Density estimation; Mixing process; Bin width; Frequency polygons
1. Introduction The literature dealing with density estimation for independent random variables is extensive. The reader is referred to Izenman (1991) for a review. The purpose of this paper is to investigate the frequency polygon as a density estimator for stationary weakly dependent stochastic processes satisfying the strong mixing condition defined later on. The frequency polygon is constructed by connecting with straight lines the mid-bin values of a histogram. The computational effort in constructing the frequency polygon is about equivalent to the histogram. Under weak dependence conditions, frequency polygons are shown to achieve the rate of convergence to zero of order n -4/5 with respect to the criterion of IMSE. This is the same rate of convergence to zero of the IMSE of kernel estimators. Histograms can only achieve the slower rate of convergence of the IMSE of order n -2/3. It is also established that frequency polygons attain the uniform rate of convergence (n -I logn) 1/3 under appropriate smoothness conditions. This is the optimal rate of convergence for nonparametric estimators of a density function in the i.i.d, case (see Stone, 1983). Frequency polygons thus appear to be very good * Corresponding author. Supported in part by NSF grant DMS-9403718. 0167-7152/97/$17.00 © 1997 Elsevier Science B.V. All rights reserved PH S01 67-71 52(96)00 104-6
2
M. Carbon et al. I Statistics & Probability Letters 33 (1997) 1 - 1 3
density estimators with respect to both criteria of IMSE and uniform convergence. For background material on frequency polygons, see Scott (1985). Consider a real-valued strictly stationary time series Xt, t = 0,÷1 .... defined on a probability space ( f 2 , ~ , ~ ) . Assume X1 has a uniformly continuous density f on the real line and Xt satisfies the strong mixing condition defined below. Definition 1.1. Denote the a-fields generated by Xt, t<~O and Xt, t>>-n, respectively, by o~°_o~ and ~+oo. Then Xt is strong mixing (or s-mixing) if c~(n) = sup{lP(A n B) - P(A)P(B)[ :A ~ o~°o~, B ~ ~ + ~ } I 0
as n ---* c~.
(1.1)
Recently, density estimation for weakly dependent processes, in particular strong mixing processes, has generated a considerable amount of interest. See for example, Robinson (1983, 1987), Masry (1986), Ioannides and Roussas (1987), Masry and Gyrrfi (1987), Roussas (1988), Gy6rfi et al. (1989), Hart and Vieu (1990) and Tran (1994). The reason is partly due to the fact that many stochastic processes, among them various useful time series models, satisfy the strong mixing property, and the strong mixing property is relatively easy to check. Gorodetskii (1977), Withers (1981), and Athreya and Pantula (1986) have obtained conditions for linear processes and autoregressive processes to be strong mixing. For information on strong mixing processes, see Rosenblatt (1956), Ibragimov (1962), Bradley (1986), Ioannides and Roussas (1987), and Roussas (1988). Our paper is organized as follows: Section 2 provides some preliminaries and background material. The optimal choice of the bin width which asymptotically minimizes the integrated mean square error is derived in Section 3. Theorem 3.1 generalizes results of Scott (1985). In Section 4, the asymptotic variance of the frequency polygon f is obtained. Weak conditions for the uniform convergence o f f on R are obtained in Section 5. Finally, sharp rates of uniform convergence are established in Section 6. The letter C will be used to denote positive constants whose values are unimportant and may vary. All limits are taken as n ~ oo unless indicated otherwise.
2. Preliminaries Consider a partition ... < x_ 2 < x - i < xo < Xl < x2 < " ' " of the real line into equal intervals Ik --- [ ( k - 1)b, kb) of length b = b,, where bn is the bin width. Without loss of generality, we assume that there is a mesh node at zero. Consider the two adjacent histogram bins I0 = [-b, 0) and I1 = [0, b). Denote the number of observations falling in these intervals respectively by v0 and vl. The values of the histogram in these previous bins are given by f 0 = von-lb -1 and f l = vln-lb -1. The frequency polygon f ( x ) is given by f(x) =
-
f0 +
+
fl
for - ~ < x
We assume that b tends to zero as n --, ec. Define Y~,k(x)=
1 if X,- E Ik(x), 0 otherwise.
Then, v0=~Yi,0(x) i=1
and
Vl=~Y/,l(x). i=1
< ~.
(2.1)
M. Carbon et al. I Statistics & Probability Letters 33 (1997) 1-13
3
Let U -~ u(Xi) and V = v(Xj), where u and v be are real-valued measurable functions. L e m m a 2.1. Suppose that
lul
<. c1 and
Ivl
~ c 2 where C1 and C2 are constants. Then
E IUV - EUEVI <~4CIC2~(1j - il). L e m m a 2.2. Suppose that IlUllr < e~ and ]]VI]s < oo where IIUllr = (EIUIr) l/r. I f r -1 + s -1 + h - l = 1,
then
IEUV
-
EUEVq <<.1011UII~II VIIs{~(IJ - il)} 1lb.
One or both o f r and s can be taken to be cx~ for bounded random variables. For the proof of the Davydov inequality in Lemma, see Davydov (1970), Deo (1973) or Hall and Heyde (1980). Denote qi.k(x)= Yi,k ( X ) - EYi, k(x). Corollary 2.1. For each integer k, there exists some ~ EIk such that (i) Icov(r/i,k(x), qi+j,k(x)l <~lO{(x(j)}l/2(f(~k))l/2bl/2 , (ii) ]COV(qi,k_l(X),qi+j,k(X)l <~lO{~(j)}l/Z(f(~k))l/Zbl/2. Proof. (i) Taking r = 2, s = cx~ and h = 2, Lemma 2.2 leads to the following result: if EIU] 2 < +cx~ and
P(IVI > 1 ) = 0, then
IEUV - EUEVI <~1011fll2{~(Ij - il)} 1/2. Taking U = Yi,k(x) and V = Yi+j,k(x), then []UIt 2 = (EY/,k) 1/2 ----(P[X1 E I k ] ) l/z = ( f ( ~ k ) b ) 1/2, where ~k EIk. The proof of (i) thus follows. (ii) The proof can be handled in the same way. Note that ~k is independent o f i and j.
El
Denote the conditional density o f Xj given X~ by fjli for simplicity.
Assumption I. For all i < j and some constant MI,
sup
f jli(ylx) <~M1.
(x,y)ER×R
Example. Let Xt be a stationary autoregressive process o f order 1, for example, Xt = OXt-1 + el where t01 < 1. Asssume the et's are i.i.d, random variables and each et has a standard Cauchy density. Then
x ; = 0 ./-~x~ + z, where Z is a Cauchy r.v. independent of X/ (see Example 2.1 in Tran, 1989) with characteristic function exp(-lul(1 - OJ-i)~(1 - 0)). The conditional density o f Xj given X,. is equal to
fJli(XjlXi)) = f z(xj
-
oJ-ixi).
4
M. Carbon et al. I Statistics & Probability Letters 33 (1997) 1-13
A Cauchy density symmetric about zero takes on its maximum value at zero. Thus, we can take 1 M, -
~(1 - 1 0 1 )
Lemma 2.3. I f Assumption 1 is satisfied, then
ff
ZkXIk
{fio(x,Y) - f ( x ) f ( y ) l
(2.2)
dxdy<<.Mf(~k)b 2 with ~k C Ik.
Proof. Since f is uniformly continuous and integrable, supf(x)
xER
- II fl[ < c¢.
By Assumption 1,
flk×l, Ifio(x' y) - f ( x ) f ( y ) l
<.ff,,**, f ( x ) I f
dxdy
jli(ylx) - f ( y ) l dxdy
<~Mb/1 k f ( x ) dx, where M can be taken to be max{M1, II fll}. The lemma follows by the mean-value theorem.
[]
3. Integrated mean squared error and optimal bin width
For convenience, we define the roughness of the kth derivative of f by 0(3
Rk(f) =
[f(k)(x)]2 dx O0
and
f Pk = Jl~ f(x)dx. As usual, the IMSE is defined as the sum of two terms: the integrated pointwise squared bias, and the integrated pointwise variance. Define 1 Z cov(rh, o, qj, o), qln - n2b2 igj
1 q2n - n2bZ Z cov(qi, bqj,1), igj 1 q3n -- n2b2 E c°v(r/i,°' t/j, 1). i4j
M. Carbon et al. I Statistics & Probability Letters 33 (1997) 1-13 Lemma 3.1. The variance of the frequency polygon f ( x ) defined in (2.1) is given by
varf(x)=(1-b)2[n-~po(1-po)+qlnl
+(1+
b) 2 [~_~pl(1 -
(3.1)
p,)+q2nl +2(~
x2
~
q3n •
Proof. From the expression of the frequency polygon (2.1), varf(x) = ( ~ - - ~ )x2 v a r f o + ( ~ + b ) 2 v a r f l + 2 ( 1 - X b - - ~ )
cov( fo, f l ).
Clearly, varfo = n-T~b2var
i=1
Y/,o
with var
(/__~1) Y~',o
= Zn var ( Y/,o) + Z c o v ( Y i , o , Yj,o) i=0 i4j cov(qi,o, ?]j,o)-
= n Po ( 1 - Po ) + Z
i4j Similarly, varfl =
1
n2b------svar\ i=l
Yi,l
/
= npl(1 -- Pl) + ~.¢ cov(~]i,1,~]j,l). i4j
We get also
1 cov(fo, f l ) = n2b------2cov
i=1
Yi,o,
j=l
Yj,1
1 - n2b2
ZcovIz,o, r,,l/+q3. i=1
But,
cov(Yi,o, Yi,1) = E(Y/,oY/,1) - g(Y/,o)E(Yi, l) Summing up, we get 3.1
= -pOPl.
[]
Define wk = max ((f(~k))l/2, M f ( ( k ) ) .
(3.2)
Lemma 3.2. Assume that ~(n) = O(n -?) for some p > 2. Let 0 < e < (p - 3 ) ( p - 2) -1 and 5" = min {e; 2-1[(1 - e)(p - 2) - 1]}. Then qln + qzn + q3n <~C n - l b - l + d w o •
6
M. Carbonet al. / Statistics& ProbabilityLetters 33 (1997) 1-13
Proof. By Lemma 2.3 and Corollary 2.1
Icov(r/1,0, t/l+j,0)] ~ min{lO(cffj))l/2(f(~k))l/2bl/2, Mf((o)b2}. Thus, we have the following majorizations: Z cov(qi,0,
~
iCj
Icov(~l,0, ~l+j,0)l j=l cx2
<~lOnbl/2max ((f(¢O)l/2)'Mf(~O))
Z
min ((ot(j))1/2, b3/2) .
j=l
Let K = [b-l+~]. Then Z cov(t/i,0, r/j,o) ~
Thus, ql~ and similarly qz, are bounded by Cn-lb-l+:wo. Using Corollary 2.1 (if) and a slight variation of (2.2) it can be shown that q3n is bounded by the same quantity. [] Assumption 2. f is twice continuously differentiable; f " is absolutely continuous with respect to the Lebesgue measure on R; fl/2 E L l, f , f-1/2 E L l, f(k) E L 2 for k = 0, 1, 2, 3. The following lemma (see Scott, 1985, p. 350) will be needed in the sequel. Lemma 3.3. Suppose that q9 is absolutely continuous on (-c~, c¢) with almost everywhere derivative qt and that q~, qY ELl. Let ck be an arbitrary point in bin I~. Then the followin9 sum exists and may be approximated by an inteoral
qg(ck)b =
~o(x) ~ + O(bll~o'lll).
Lemma 3.4. Suppose Assumptions 1 and 2 are satisfied, and suppose that cffn) = O(n -p) for some p > 3. Let e* be as defined in Lemma 3.2 Then
f+_ varf(x) dx - 2 1
*
,
<. -Ro(f) + O(n-lb ~ [ll f II1 + II(fl/2)'lll + b l - : [I fll2ll n
f'll2]
+ n -Ib-l+: [ll fl/2lll + II fill]).
(3.3) Proof. Define mk = f ( ~ k ) . By integration, we obtain from 3.1 the following result:
+b/2varf(x) d x = b (mo b/2 3 ~ =
m~ + q l n ) + b (ml n 3 ~
(mo+ml)-
By Lemma 3.2,
(b/3)(qln + qzn + q3n)<~cn-lb: wo.
m2 + q 2 n ) + b ( room1 n 3 - - - -n + q3n)
(m~+m~+moml)+
5(ql.+q2.+q3.).
M. Carbon et al. / Statistics & Probability Letters 33 (1997) 1 - 1 3
7
Now, sum over all bins. Observe that
Zmkb= k
Z
Pk =
k
f(x) dx= 1. oo
With the help of Lemma with q~ replaced by f2, we get
Z m~b = Z f2(~k)b = Ro(f) + k
O(bll(f2)'lll)
= Ro(f) + O(bll fll2ll
f'tl2).
k
Using the definition of wk in (3.2) and Lemma 3.3,
Z wkb<~ It fmlll + [[Mfl[l + O (b (11/'ill +
II(fl/2)'lll))
k
and 3.3 follows.
[]
Lemma 3.5. If Assumptions 1 and 2 hold, then f_~
49 4R2(f)+ 0 (bSII f"ll21l f'"l12) • bias(x) 2 dx = 2-~-~b
(3.4)
Proof. Let Jk = [((k - 1/2b),((k + 1/2)b) be the kth frequency polygon bin. I f x is a fixed point in J0, then
bias(x)=~bZ(~-b)f"(¢o)+~b2(l+b)f"(~l)-~xf
12
n
(~x).
where G0, ~l, ~x are points in J0. Lemma 3.5 follows by summing up the integrals of the square of the biases over each bin as done in Scott (1985). [] Theorem 3.1. If Assumptions 1 and 2 hold and ~(n) = O(n-P) with p > 3, the value of the bin width that
asymptotically minimizes the IMSE of the frequency polygon is b = 2[15/(49R2(f))]l/Sn -1/5. Proof. The IMSE is found by combining the results of Lemmas 3.4 and 3.5. The leading terms of the IMSE are found to be
3nb +
b4Rz(f)"
It is thus sufficient to minimize this function with respect to b.
[]
It follows from Theorem 3.1 that frequency polygons can achieve the rate of convergence to 0 of order with respect to the criterion of integrated mean square error. This is the same rate of convergence to zero of the IMSE of non-negative kernel estimators. n -4/5
Remark 3.1. It is possible to obtain Theorem 3.1 without the assumption of the third derivative of f in Assumption 2. Let I~ = [(k - 1)b, kb] and Jk = [(k - 0.5)b,(k + 0.5)b], k = 0,4-1,4-2 . . . . . Let 2 be the probability measure associated with f . For any x E Jk,
E[f(X) - f ( X ) ] = (k + 0.5 - x/b)2(Ik)/b + ( - k + 0.5 + x/b)2(Ik+l )/b - f(x) = Tlk(X) + T2k(x) + T3k(x),
8
M. Carbon et al. / Statistics & Probability Letters 33 (1997) 1 - 1 3
where we denote
Tlk(x) = (k + 0.5 -x/b){A(Ik)/b - f ( ( k - 0.5)b)}, T2k(x) = ( - k + 0.5 + x/b){2(Ik+l )/b - f ( ( k + .5)b)}, T3~(x) = (k + 0.5 - x/b)f((k - 0.5)b) + ( - k + 0.5 + x/b)f((k + 0.5)b) - f(x). Note that
r /o 1[f"(tu + (1 -
[Tlk(x)]2dx<~ Cb4 .
t)(k - 0.5)b)]2dudt.
,l Ik
Define
t2(f",u,e) = sup{I f " ( u + h )l : lhl <~e}. If we assume that either lira [ ~ t22(f",u,e) du < c~ ~-~oj _ ~ or
sup O
//
O2(ft,, u, 5) du < c~.
c~
Then it follows that ~ - ~ < k < o ~ fjk[Tlk(x)] 2 dx < c¢ by noting that
Z
fl, fo l [ f ' ' ( t u + ( 1 - t ) ( k - O ' 5 ) b ) ] 2 d u d t < c x ~ "
--~
k
This argument takes care of the term Tlk(X) and T2k(x). Similar arguments can be used to handle T3k(x). 4. Asymptotic variance o f f Assumption 2". There exists a constant C > 0 such that
If(x) - f(x')l ~
~t(n) = O(n-P) for some p > 2, then nbvarf(x)-
+7
f(x) ~ O.
Proof. Without loss of generality, take x E [-b/2,b/2). By 3.1
nbvarf(x):(1-b)2[p°(lbP°)+nbq,n] +(l+b)2[pl(l-pl)+nbq2n]b +2 (41-- x~)[L___...~p0p,+nbq3n] .
(4.1)
M. Carbon et al. I Statistics & Probability Letters 33 (1997) 1-13
By Lemma 3.2, X
nb (~
-
2
2
~)qln+(~
-
+ b ) q 2 n + 2 ( 1 -- x~2) q3n <~Cb~*wo.
(4.2)
Since P0 = fo b f(u) du, Assumption 2 implies that max{0, f(x)b
-
Cb2} ~ Po <<.f(x)b +
C b 2.
Thus, max{0, f(x) - (C + fZ(x))b + C2b3} ~
<~nbvarf(x)<~ ( ~ + ~---~-~)f ( x ) + B
with 4x2
2
]
(1
4X2~
and B = [C (2 + ~-~) - --~-f4x2 z(x)J] b - ( 1 - --~--,/4x2"~Cf(x)b 2 + C2b3 + Cbe*cOo. The lemma follows easily, because A and B tend to 0 as n ---+co.
[]
5. Uniform consistency under a strong mixing condition Assumption Lemma
3. Suppose EIXI I(2+~)/T < +o<~ for some positive e and T.
5.1. Let An = (-co, -n T) U (n T, +oo). Suppose Assumption 3 holds. Then
P [ l i m supf(x)=01 =0. Ln---m° xEA,,
The proof of Lemma 5.1 is a consequence of Lemma 3.2 in Tran (1994). Express the frequency polygon as 1
n
/(x) = ;z E i=1
where for -b/2 <.x < b/2,
(4.3)
10
M. Carbon et al. / Statistics & Probability Letters 33 (1997) 1 - 1 3
Lemma 5.2. Suppose Assumption 1 is satisfied and ~(n) = O(n -p) for some p > 2. Then (i) nbn SUPx6R ~--]~in__lvar(Ai(x))<~ C, (ii) nbn SUPx~R ~ i 4 j Ic°v( Ai(x ), Aj(x ) )l <<-C. Proof. Lemma 5.2 follows by a careful analysis of the proof of Lemma 3.2.
[]
Lemma 5.3. Suppose ~(n) = O(n -p) for some p > 2 and suppose Assumptions 1 and 2* hold. (i) Assume b ~ 0 in such a manner that nb/log n ~ ~ as n ~ c~. Suppose for all C > 0 tp(n, 1)g(n) --~ O,
(5.1)
for some 0 < z < 1/2 and some positive function g(n) with ~-]n~=l(9(n)) -1 < c~, where tp(n, 1) = b -2-T log n{off[Cnb/log n])}2T.
(5.2)
P [lim sup f ( x ) -
(5.3)
Then Ln--+~ xED
f ( x ) = 0 ] = 1, J
where D is a compact subset of R. (ii) Suppose in addition that Assumption 3 is satisfied and all conditions in (i) hoM except with (5.1) strenothened to
(5.4)
¢p(n,2)g(n) ~ O, where ~o(n,2) = nT~p(n, 1). Then P ["li~m~sup f ( x ) -
f ( x ) =01 =
(5.5)
Lemma 5.3 is essentially the same as Lemma 3.3 in Tran (1994). Its proof can be obtained by using Lemma 5.2 and an approximation of strong mixing random variables by independent ones to obtain sharp exponential bounds. See Tran (1994) for more details. Theorem 5.1. (i) Suppose ~(n) = O(n -p) for some p > 2 and Assumptions 1 and 2* hold. Suppose b ~ 0 in such a manner that
q(n, 1)
= b -2-~-2~p
nl+2Zp(log n) 2+2to (log log n) TM --~ 0
(5.6)
for some 0 < z < 1/2 and some e > O. Then we have
[nmSu
:01
:,
where D is a compact subset of R. (ii) Suppose in addition that Assumption 3 holds and q(n, 1)n r ~ 0. Then we obtain Prfirnsuplf(x)-f(x)l=O Ln
xER
] = 1. J
Proof. Note that { off[Cnb/log n])}2~ ~
(5.7)
M. Carbon et al. I Statistics & Probability Letters 33 (1997) 1 - 1 3
11
A simple computation shows that (5.6) implies (5.1) for 9(n) = nlogn(loglogn) TM. Also, (5.6) implies that nb/logn ---+ cx~. Observe that ~-']~n~l(9(n)) -1 < cxz for this choice of 9(n). The theorem then follows from Lemma 5.3. [] Example 5.1. Take b to be the optimal bin width obtained in Section 3. Then (5.6) is satisfied if for some 0 < z<<.p-I, - 1 / 5 [ - 2 - z - 2zp] - 2zp + 1 < 0.
(5.8)
If we choose z = p-1 with p > 2, then (5.8) is satisfied. Theorem 5.2. (i) Suppose ~(n) = O(e-sn) for some s > 0 and Assumptions 1 and 2* hold. I f nb/(logn) 2 --+ cx~, then , ' ,,
-
:o I - - , .
where D is a compact subset o f R. (ii) Suppose Assumption 3 is also satisfied Then we have P [lim sup I f ( x ) - f ( x ) I = 01 = 1. Ln---*~ xCR J
(5.9)
Proof. (i) Choose a positive function q~(n) Y c~ such that nb/(tp(n)(logn) 2) ~ c~. Then {ct([Cnb/log n] )}2¢ ~< exp(-Cnb/log n) ~< exp(-C~p(n) log n) = n -q°(n). Thus, (5.1) is clearly satisfied for 9(n) = n -2 and for arbitrary values of z. The proof follows from Part (i) and Lemma 5.3 (ii). []
6. Rate of convergence under a strong mixing condition Assumption 3". Suppose Assumption 3 holds and in addition [xil/2rf(x)--~ 0 as x--* 0. Lemma 6.1. I f Assumption 3* holds, then sup f ( x ) = o((nb/log n)-1/2). xEA~
Proof. If x E A., then (nb/log n)l/Z f ( x ) <.nl/2 f ( x ) <~ix Il/2r f ( x ) , which tends to zero by Assumption 3*. Denote
[]
7t(n) = max{b; (nb(log n) -1 )-1/2}. Lemma 6.2. Suppose ct(n) = O(n -p) for some p > 2 and suppose Assumptions 1 and 2* hold. (i) Assume b ~ 0 in such a manner that nb/logn ~ cx~ as n ~ oe. Suppose for all C > 0 ~p(n, 3)9(n) ~ O,
(6.1)
12
M. Carbon et al. I Statistics & Probability Letters 33 (1997) 1-13
for some 0 < "c < 1/2 and some positive function g(n) with ~n~=l (g(n)) -1 < ~ , where (p(n, 3) ~ b -(3+~)/2 n (1+O/2 log n 0-z)/2 {~([Cnb/log n])} 2r.
Then sup I f ( x ) - f ( x ) [ = O ( ~ ( n ) )
a.s.
(6.2)
xED
where D is a compact subset of R. (ii) Suppose in addition that Assumption 3* is satisfied and all conditions in (i) hold except with (6.1) strengthened to q~(n, 4)g(n) - - O,
(6.3)
where ~o(n,4) - nTq~(n,3). Then sup I f ( x ) - f ( x ) [ = O(~U(n))
a.s.
(6.4)
xER
Using Lemma, 5.2 the proof of L e m m a 6.2 can be obtained by a slight variation of the argument of Lemma 6.2 in Tran (1994). Theorem 6.1. (i) Suppose ~(n) = O(n -p) for some p > 2. Suppose Assumptions 1 and 2* hold. Suppose b ~ 0 in such a manner that q(n,2) equals (6.5)
b -(~+3+2pr)/2 n (~+3-2p~)/2 ( l o g n ) (3-~+2pT)/2 ( l o g l o g n ) TM
for some 0 < ~ < 1/2 and some e > O. Then sup Lf(x) - f ( x ) ] = O ( ~ ( n ) )
a.s.
xED
where D is a compact subset of R. (ii) Suppose in addition that nTq(n,2) --* 0 and Assumption 3* holds. Then sup If(x) - f ( x ) l = O ( ~ ( n ) )
a.s.
xcR
Proof. The theorem follows from L e m m a 6.2 by choosing 9(n) as in Theorem 5.1.
[]
E x a m p l e 6.1. (i) Take b = Cn -1/5 where b is the optimal bin width derived in Section 3. Then (6.5) is satisfied if for some 0 < ~ < 1/2, (3z + 9 - 6zp)/5 < 0.
(6.6)
Since the left-hand side of (6.6) is a continuous function of z, (6.6) holds for some 0 < ~ < 1/2 if it holds for • = 1/2. After simplification, we obtain p > 21/6. (ii) Take b = (n -1 logn) 1/3. Then ~U(n) = (n -1 l o g n f f 3, which is the optimal rate of convergence in the i.i.d, case. This rate can be achieved if p > 7. R e m a r k 6.1. Under the assumption that the second derivative o f f is bounded, the supremum norm can achieve the optimal rate (n -1 logn) 2/5. This can be done by a slight variation of the argument outlined in Tran (1994).
M. Carbon et al. / Statistics & Probability Letters 33 (1997) 1-13
13
Acknowledgements The authors would like to thank the Associate Editor and the referee for many valuable comments and suggestions. References Athreya, K.B. and S.G. Pantula (1986), Mixing properties of Harris chains and autoregressive processes, J. Appl. Probab. 23, 880-892. Bradley, R.C. (1986), Basic properties of strong mixing conditions, in: Dependence in Probability and Statistics, Vol. 11 (Birkh~iuser, Basal, pp 165-192. Davydov, Yu A. (1970), The invariance principle for stationary processes, Theory. Probab. Appl. 14, 487-498. Deo, C.M. (1973), A note on empirical processes of strong mixing sequences, Ann. Probab. 1, 870-875. Gorodetskii, V.V. (1977), On the strong mixing property for linear sequences, Theory Probab. Appl. 22, 411-413. Gyrrfi, L., W. H~irdle, P. Sarda and P. Vieu (1989), Nonparametric Curve Estimation From Time Series, Lecture Notes in Statistics, Vol. 60 (Springer, New York). Hall, P. and C.C. Heyde (1980), Martingale Limit Theory and its Applications (Academic Press, New York). Hart, J.D. and P.Vieu (1990), Data driven bandwidth choice for density estimation based on dependent data, Ann. Statist. lg, 873-890. lbragimov, I.A. (1962), Some limit theorems for stationary sequences, Theory Probab. Appl. 4, 347-382. Ioannides, D. and G.G. Roussas (1987), Note on uniform convergence for mixing random variables, Statist. Probab. Letr 5, 279-285. Izenman, A.J. (1991), Recent developments in nonparametric density estimation, J. Amer. Statistic. Assoc. 86, 205-224. Masry, E. (1986), Recursive probability density estimation for weakly dependent stationary processes, IEEE Trans. Inform. Theory IT-IS, 254-267. Masry, E. and L. Gyrrfi (1987), Strong consistency and rates for recursive probability density estimators of stationary processes, J. Multivariate Anal. 22, 79-93. Robinson, P.M. (1983), Nonparametric estimators for time series, J. Time Series Anal 4, 185-207. Robinson, P.M. (1987), Time series residuals with application to probability density estimation, Z Time Series Anal. 8, 329-344. Rosenblatt, M. (1956), A central limit theorem and a strong mixing condition, Proc. Nat. Acad. Sci. USA 42, 43-47. Roussas, G.G. (1988), Nonparametric estimation in mixing sequences of random variables, J. Statist. Plann. Inference 18, 135-149. Scott, D.W. (1985), Frequency polygons, theory and application, J. Amer. Star Assoc. g0, 348-354. Stone, C.J. (1983), Optimal uniform rate of convergence for nonparametric estimators of a density function and its derivatives, in: M.H. Rezvi, J.S. Rustagi, and D. Siegmund, eds., Recent advances in Statistics: Papers in Honor of H. Chernoff, pp. 393-406. Tran, L.T. (1989), Recursive density estimation under dependence, IEEE Trans. Inform. Theory IT-35, 1103-1108. Tran, L.T. (1994), Density estimation for time series by histograms, J. Statist. Plann. Inference 40, 61-79. Withers, C.S. (1981), Central limit theorems for dependent variables I, Z. Wahrsch. Verw. Gebiete 57, 509-534.