,,
• ,j i
Journal of Statistical Planning and Inference 52 (1996) 143-159
ELSEVIER
joumal of statistical planning and inference
Adaptive smoothing for a penalized NPMLE of a non-increasing density J i a y a n g Sun 1, M i c h a e l W o o d r o o f e * ' Statistics Department, The University of Michigan, Ann Arbor, M1 48109, USA
Received 12 July 1993; revised 17 June 1994
Abstract
A penalized version of the well-known non-parametric maximum likelihood estimator of a non-increasing density f has been developed recently. The penalized version depends on a smoothing parameter, as well as the data. Here some adaptive choices of the smoothing parameter are considered. The asymptotically optimal smoothing parameter depends on fthrough fl = - ½f(0)f'(0). In the adaptive procedures, estimates of fl are used to determine the smoothing parameter. Two such procedures are shown to be theoretically correct and practically viable. Keywords: Brownian motion; Non-parametric maximum likelihood; Simulation; Strong approximation
1. Introduction
The non-parametric maximum likelihood estimator (hereafter, N P M L E ) of a nonincreasing density, f say, on (0, oo) was discovered by Grenander (1956) and is admirably described by Prakasa-Rao (1983) and Robertson et al. (1988). Recently, Woodroofe and Sun (1993) have developed a penalized version of the N P M L E , f . ( ~ ; - ) say, which depends on a smoothing parameter ~ > 0 as well the data. That is, f.(~;.) maximizes a penalized log likelihood function of the form 1~(9) = ~ log[g(xi)] - n~g(O +) i=l
a m o n g all non-increasing densities 9 on [0, ~ ) , where g(0 + ) = limx-og(X). See (2) below for a computable form. An important advantage of the penalized version is that
* Corresponding author. 1Research supported by the National Science Foundation. 0378-3758/96/$15.00 © 199~-Elsevier Science B.V. All rights reserved SSDI 0 3 7 8 - 3 7 5 8 ( 9 5 ) 0 0 1 1 4 - X
144
M. Woodroofe, J. Sun /Journal of Statistical Planning and lnference 52 (1996) 143-159
for suitable ~,f,(ct; 0 + ) is a consistent estimator o f f ( 0 + ) = limx,of(x), whereas the (unpenalized) N P M L E is not consistent. The purpose of this work is to suggest some adaptive choices of the smoothing parameter ct and discuss some of their properties. The problem discussed here differs from related work on adaptive bandwidths, splines, etc. in that the nature of the smoothing is different and the smoothing parameter only affects the asymptotics of the estimator at one point, the left end point. The estimators are described in Section 2, where the main result is stated and its proof is outlined. The results of some simulation experiments are presented in Section 3. The proof of the main result is completed in Sections 4 and 5.
2. Main results To describe the estimators, let f denote a non-increasing density on [0, oo); let XI . . . . . X, ~i"dfbe a sample from f: and let 0 = Xo < x~ < ... < x. < ~ denote the ordered values of 0 and X~ . . . . ,X,. Woodroofe and Sun (1993) (hereafter, [WS]) showed that the penalized N P M L E has the same form as the unpenalized one, but with transformed variables W~, ..., I4', in place ofx~ . . . . , x,. Given a (small) smoothing parameter ct, 0 < ~ < x,. let W0=0
and
Wk = Wk(~)=a + y~Xk,
k = 1. . . . . n,
where y, is the unique, positive solution to the equation y=minl.
as/n }
+ ~xs '
(1)
See Lemma 1 of [WS] for existence and an alternative expression. If x. > ~, then the penalized NPMLEf.(~;. ) is a step function, with discontinuities only at xl, ..., x., for which
f.(Ct;Xk) = min max (j -- i)/n o<.,
(2)
for k = 1, ..., n. That is,f.(a;. ) maximizes l,(g) among non-increasing densities g. The notation is chosen so that f, :=f,(0;.) is the unpenalized N P M L E and differs slightly from that of [WS]. The derivation of (2) follows the general outline of the derivation of f. but differs in some details. The expressionf.(~; • ) is defined by (2) only if x, > a. Since the values of a of interest are of order n-Z/a, and x, is the largest order statistic, the complementary event {x, ~< a} is very unlikely, but a global definition is useful. Letf.(a;.) be the uniform density on [ 0 , x . ] , / f x , ~< a. From Proposition 1 of [WS], the behavior off.(a; x) is asymptotically insensitive to for fixed x > 0. So, attention is directed to f.(a;0 + ) below. A useful alternative
M. WoodrooJe, J. Sun/Journal of Statistical Planning and Inference 52 (1996) 143-159
145
expression forf,(e;0 +) isf,(cg0 + ) = (1 - T=)/c~for 0 < a < x,, which follows easily oo ^ from the relation ~o f,(0~;x) dx = 1. See Remark 2 of [WS]. Let F denote the distribution function corresponding t o f a n d suppose throughout the paper that
F(x) =fox - f i x 2 + o(x 2) as xNO,
(3)
where 0 < f o = f ( 0 +) < oe and 0 ~
F(x) =fox + 0(x a) as x ' ~ 0
(3')
for some results. If there is a constant 0 < c < oo (independent of n) for which C( -=- 0~, = cn -2/3.
Vn/> 1,
then
F/1/3{fn(O~;O"}-) - f o }
=> S(fl, c),
as n-~oo,
(4)
where
S(fl, c) = sup t>O
w ( t ) - tc + #t 2)
,
(5)
t
and
=fofl. Here W(t), 0 ~< t < oo, denotes a standard Brownian motion, and =* denotes convergence in distribution. See Proposition 2. It is easily seen that S([1,c) is continuous in 0
10 w.p. 1 for all c > 0, since supt W ( t ) = ~ , and l i m c ~ S ( 0 , c ) = 0 w.p. 1, since W(t)/t--,O w.p. 1 as : ~ o o . So, EI S(0, c) l is minimized as c -~ oo, if fl = 0. If fl > 0, then simple rescaling leads to S(fl,
C) = D
fll/3s(1, cfll/3),
(6)
where = Ddenotes equality in distribution. The distribution of S(1, c) is complicated, but the value of c at which E lS(1,c)l is minimized, say c*, may be approximated by simulation as c* ~ 0.649. The distribution of S(1, c*) too may be approximated by simulation. See Figs. 1 and 2 and Table 1. Given c*, the value of c at which the limiting distribution in (4) has the smallest expected absolute value is c = c*fl-1/3. This suggests estimating c and the
146
M. Woodroofe,£ Sun/Journal of Statistical Planning and lnference 52 (1996) 143-159
asymptotically optimal a by ~. = c*fl~.-1/3
and
~.
-~- C * f l n l / 3 n
(7)
-2/3,
A
where ft,, n ~> 1, are almost surely positive estimators of ft. The unknown fl may be estimated in several ways. Two simple ones depend on the observation that fl = - ½ f ( O ) f ' ( O ) , i f f is differentiable at 0. Let ao Con-2/3 be a preliminary estimate for ~; let 1 ~< m = m. = m,(X~, ..., X . ) <~n be integer valued random variables: and let =
/~. = m a x (
2x,.
, n-q
}
(8)
and fl". = max {f.(~o;O)~'~=,Xk[f.(ao;O)--f.(ao;Xk)] } 2Ek=,X 2 ' n -q , 1
(9)
^
where 0 < q < ~. Observe that ft. and fl'~ are positive estimators. Proposition 1. Suppose that f is continuously differentiable on [0, e], where e > O. If o~= cn -2/3, where 0 < c < o0, then sup [ f . ( ~ ; x ) - f ( x ) [ = Op(n-'/31ogn)
as n ~ o o .
O<~x<~e
Corollary. Define fl. and fl'. by (8) and (9). I f l <~m. = m.(Xt . . . . . X . ) <<.n are integer valued random variables for which m = o(n) and n/m = Op[nt/a/log n], then fl~ ~P fl and /~'.--,Pfl as n - , ~ . The proofs of Proposition 1 and its corollary are presented in Section 5. To state the main result, let/~., n/> 1, denote almost surely positive estimators offl, define ~. by (7), and let ].(x)
z.
-
0 .< x <
n U6 log n
and
1 e.=--, z.
n/>2.
(10) 1
Observe that 1~ft. + 1~ft. = Op(Z3) as n ~ oo in (8) and (9), since q < i. A
Theorem 1. Suppose that (3) holds; let ft., n >>.1, be a consistent sequence of almost surely positive estimators of fl for which fi~-i = Op(Z3); and define ~. by (7). (i) I f f l > O, then nl/3{fn(O + ) - f o }
~ fll/3S(1,c*)
as n--*c~.
(ii) I f fl = 0 and (3') holds, then (11) holds.
(11)
M. Woodroofe,J. Sun/Journal of Statistical Planning and lnference 52 (1996) 143-159
147
(iii) In either case, n l / 3 ( f . ( x ) - f ( x ) } and nl/3{fn(X ) - f ( x ) } have the same limiting distribution, if any, for all 0 < x < oo for which f ( x ) < f ( 0 ) (where f . = f . ( 0 ; . ) is the unpenalized N P M L E ) . The limiting distribution of nl/3{fn(x ) - f ( x ) } was found by Prakasa-Rao (1969) under general conditions and is discussed by G r o e n e b o o m (1986). The proof of T h e o r e m 1 is easily derived following two preliminary results. The first of these is intuitive.
Lemma 1. f,(ct;0 + ) is continuous and non-increasin 9 in ~ w.p. 1 n 0 <%ct < x.. Proof. That f.(ct;0 + ) is continuous in ct is clear from the relation j?.(ct; 0 + ) = (1 - y,)/ct, since ?, is continuous. See (1). F o r the monotonicity recall that l, denotes the penalized log likelihood function. If 0 ~< ct < ~' < x. and f.(a;0 + ) < f . ( ~ ' ; 0 +), then /~,[f.(~;-)] = l.[ff.(a;.)] - (a' ~)f.(a;0 + ) > t~Ef.(~';')] - (~' - ~)f,(~';O + ) = / ~ , [ f . ( ~ ' ; . ) ] , a contradiction. [] -
Let
Sn(C) = nl/3{fn(cn-2/3;O '[-) --fo} for 0 ~< c < ~ , so that nl/3(f,(O + ) - f o } = S,(d.) for n ~> 1. Next, let C[a,b] denote the space of continuous functions on [a, b], endowed with the uniform topology, for 0 < a < b < ~ . Then (the restrictions to [a, b] of) S. are in C Fa, b] for all such a and bandalln>~l.
Proposition 2. (i) I f O < a < b < ~ , then S.=~ S(fl,.) in C[a,b] as n - - * ~ . (ii) I f fl = O, then S.(t.) ~ 0 in probability as n ~ ~ (where t, is as in (10)). The proof uses strong approximation as in G r o e n e b o o m (1986). See Section 5 for the details.
Proof of Theorem 1. (i) Suppose first that fl > 0; and let 0 < a < c*fl- 1/3 < b. Then c~'.--. c*fl- 1/3 in probability. So,
nl/3{fn(O q") --fo} : Sn(cn) ==~S(fl, c* ~ -1/3) "= fll/3S( l, C* ) by Proposition 2, (1) and (6), since the evaluation functional which maps (c, S) into S(c) is continuous on [a, b] x C [a, hi. (ii) For the case, fl = 0, it suffices to show that S.(~,) --* 0 as n ~ ~ . Let 0 < a < ~ . Since S.(c) is non-increasing in 0 < c < ~ and ~, ~ P oo,
s.(~.) <. S.(a) + [S.(~.) - S.(a)] l{~. ~< a} ~ S(O, a),
148
M. Woodroofe, J. Sun/Journal of Statistical Planning and lnference 52 (1996) 143-159
as n ~ (since S.(a)=~S(a) by Proposition 2 and P{6. ~< a} ~ 0 as n--*oo); and S(0, a ) ~ e 0 as a ~ ~ , as noted above. Similarly, since 6. = Op(Z.) (by assumption),
s.(6.) i> s.(~.) + [s.(6.) - s.(~.)] 1{6. > r.} ~ ' o as n ~ ~ , by Proposition 2. Assertion (ii) follows. The proof of (iii) uses the two relations,
f.(~;x)=min{f,(~;O+),l~(x)},
Vx > 0, on {x, > ct}
and 1 - 7, = af.(a; 0 +) ~< af,(0 +). The first of these follows directly from (2), and the second from Remark 2 of [WS], as noted above. From (i)f.(0 +) ~ P f ( 0 +) as n ~ ~ . Ifx > 0 andf(x) < f(0), then there is a continuity point y o f f for which 0 < y < x and f(y) < f ( 0 +), by the assumed left continuity, in which case f.(x) <<,f,(y)~ef(y) as n ~ o o . So if x > 0 a n d f ( x ) < f ( 0 +) thenf.(x) =f,(x)/y~. and
?n(X ) __fin(X ) =
( ~ 1 __ 1)J~(X) ~
~nfn(X)2/~)~t.
with probability approaching one as n ~ oo. The right side is of order ~., and 02. = n-2/3/~, = Op(n-2/3%) = Op(n-1/3) as n ~ o o . (iii) follows. []
3. Simulations The distribution of S(1, c) is complicated but may be studied numerically. For c > 0, let
a(c)=inf{t>O: W(t)-(c +t2)
S(1, c)}
t
Then, using the relation P{W(t)<<, a + bt, Vt >10} = 1 - e -zab with a = 1 v c and b > 0, it is easily seen that P{a(c) > b + c + w + 1} ~< e -2b + ~ ( - w) for all b > 0 and w > 1, where • denotes the standard normal distribution function. When b = 3.8 and w = 3.3, the right-hand side is at most 0.001. So, the supremum over all t > 0 in (5) may safely be replaced by the supremum over 0 ~< t ~< 8.1 + c. Next, letting m denote a positive integer and Z~, Z2 . . . . . denote i.i.d, standard normal random variables, the Wiener process may be approximated by a continuous, piecewise linear process with knots at k/m, k = 0, 1,2, ..., and values Wm(k/m)= (1/x/rn)~j<~kZj there. So, the distribution of S(1, c) may be approximated by that of
sm(1,c)=
sup t~<8.1+c
~l~[m(t) -- (C + t 2)
t
which may be approximated by simulation. Monte Carlo estimates of E IS(1, c) l for selected c are presented in Fig. 1 with m = 10000. From (the middle of the first row of)
M. Woodroofe, J. Sun/Journal of Statistical Planning and Inference 52 (1996) 143-159 IE(S)l.
41.4
Lab
po2
e.0
E(l$1).
Ly
go
~4
~
~r.-O.O i I R
I:(S'2).
m.;~
~
~
149
i:m-2
~O
e, N ,..dLO.W,0~ E(S'2).
o.~
ao
o,o
~r
oo
o.e
L tI-.ALIL~W
ao
oe
o.T
~o
m .~,a~um|
o ~4
eo
aM to -,~
~T
~o
im
Fig. 1. Simulated IE(S)I, E(ISI) and E(S 2) as c varies S = Sm(1,c,p)= sup,.
this figure, it appears that E IS(l, c)l is minimized when c = c* ~ 0.649. M o n t e Carlo estimates of the distribution function of S(1, c*) are presented in Table 1, and a kernel estimate of the density of S(1,c*) in Fig. 2. In view of (6), the table m a y be useful in setting confidence in setting confidence intervals for f(0). M o n t e Carlo estimates of the mean, standard deviation, and selected percentiles of the estimators ]~/a and/~,1/3 for samples of size n -- 50, 100, and 200 from a standard exponential distribution (fll/3 = 0.794) are presented in Table 2 for several choices of m (in (8) and (9)). The final table shows h o w the procedure works. F o r samples of size n = 50, 100, and 200, from standard exponential and half n o r m a l distributions and selected ~o, M o n t e Carlo estimates of n l/aEIf,(O) - f ( 0 ) l are presented in Table 3 and c o m p a r e d to fll/aEIS(1,c*)l. R e m a r k 1. (i) T o show that S(1, c) is continuous in c, one first shows that a(c) is almost surely positive simultaneously for all 0 < c < oo, since [W(t) - (c + t2)]/t a p p r o a c h e s ~ as t --} 0 or ~ and is non-decreasing in c > O, both w.p. 1. It follows that IS(1, c') - S(1, c)I <~ (c' - c)/tr(c) for 0 < c < c' < o¢. (ii) While the exact distribution of S(1, c) is complicated, there are simple bounds for the tails. In fact, -
P { S ( 1 , c ) <~ - 2} <~ P{W(1) ~ c + 1 - £} = * ( c + 1 - 2) and P { S ( 1 , c ) > ,~} <<.P { 3 t >~ 0 ~ • ( t )
> c + ~.t} ~ e -2cx
for all 2 > O. In particular, S(1, c) has m o m e n t s of all orders.
M. Woodroofe,J. Sun/Journal of Statistical Planning and Inference 52 (1996) 143-159
150
Table 1 Estimates of G(x) = P{S(1,c*) <~x} x
O(x)
-4.9 4.8 -4.7 -4.6 - 4.5 - 4.4 - 4.3 -4.2 -4.1 -4.0 -3.9 -3.8 -3.7 -3.6 - 3.5 -3.4 - 3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005 0.0015 0.0025 0.0055 0.0065 0.0085 0.0125 0.0160
x
O(x)
x
-2.7 - 2.6 -2.5 -2.4 - 2.3 - 2.2 - 2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 - 1.1 -1.0 -0.9 -0.8 -0.7 -0.6
0.0185 0.0235 0.0290 0.0385 0.0480 0.0600 0.0680 0.0815 0.0975 0.1185 0.1430 0.1650 0.1900 0.2185 0.2490 0.2795 0.3085 0.3410 0.3845 0.4180 0.4420 0.4805
-0.5 - 0.4 -0.3 -0.2 - 0-1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6
O(x)
x
O(x)
x
O(x)
0.5095 0.5430 0.5735 0.6110 0.6440 0.6680 0.6920 0.7140 0.7385 0.7610 0.7835 0.8025 0.8165 0.8345 0.8485 0.8650 0.8770 0.8905 0.9035 0.9160 0.9260 0.9325
1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
0.9370 0.9415 0.9465 0.9520 0.9565 0.9610 0.9655 0.9680 0.9725 0.9745 0.9780 0.9795 0.9815 0.9835 0.9860 0.9885 0.9890 0.9910 0.9940 0.9950 0.9955 0.9960
3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0
0.9960 0.9960 0.9965 0.9965 0.9970 0.9985 0.9985 0.9990 0.9995 0.9995 0.9995 0.9995 0.9995 0.9995 0.9995 0.9995 0.9995 0.9995 1.0000 1.0000 1.0000 1.0000
Notes: Estimates are based on 2000 simulated data points. Sample mean = - 0.4010, SD of sample mean = 0.0289.
R e m a r k 2. W e h a v e f o c u s e d a t t e n t i o n o n E IS(1,c)I here. H o w e v e r , a s i m i l a r t r e a t m e n t is p o s s i b l e u s i n g o t h e r c r i t e r i a - f o r e x a m p l e E I S ( l , c) 2] o r I E [ S ( 1 , c)][. Figs. 1 and 2 show the similarities and (minor) differences among these approaches. From t h e p r o o f o f T h e o r e m 1, it is c l e a r t h a t s i m i l a r r e s u l t s h o l d for t h e o t h e r c r i t e r i a . O n l y t h e v a l u e o f c* is c h a n g e d slightly.
4.
Some
iemmas
It f o l l o w s d i r e c t l y f r o m (2) t h a t f , ( ~ ; 0
+ ) = SUpx>oF,(x)/(~ + 7~x) for 0 < ~ < x , ,
w h e r e F, d e n o t e s t h e e m p i r i c a l d i s t r i b u t i o n f u n c t i o n . L e t f , ( c t ) = s u p x > o F,(x)/(~ + x) f o r ~ > 0. F u r t h e r , w i t h ~ = cn-2/3, let
and
Z.(c) -- n " { L ( ~ ) - f o } -- sup z.(c, x) x>O
f o r 0 < c, x < ~
a n d n i> I. O b s e r v e t h a t Z,(c) is n o n - i n c r e a s i n g in c.
M. Woodroofe, J. Sun/Journal of Statistical Planning and Inference 52 (1996) 143-159
41
• t41. a ~m.
3 aS. ~ M . ~ e ~
4
•
4
.,11
~
0 St 1.0 ~
•
151
•
ar~. ¢ 1~.. o.qlmOn i
Fig. 2. Density estimates of S'(1, c, p) The window width h are default width in S and 'optimal width', i.e. 0.9. n- 1/5min {sd, iqr/1.34}.
L e m m a 2. I f C. = op(nl/a), then supo~c<~ c. IS.(c) - Z,(c)[ ~ P O as n ~ o o . Proof. As noted above 1 - 7 ~ = ~ t f . ( ~ ; 0 + ) ~ < ~ t f . ( 0 + ) cn-2/3 where 0 ~< c ~< C,, then
for 0 < c t < x . .
So, if
=
0 ~o (~ + ~x)(~ + x) (1
+)
-
0~fn(0 +)2 ~ Op(n-2/3Cn), with probability a p p r o a c h i n g one as n ---}oo (on x. > n - 2/sc.); and the right-hand side is of smaller order than n - 1/3 as n --} oo. [] Strong Approximation. If the probability space is rich enough, then there are Brownian motions W., n ~> 1, and Brownian bridges B,, n >/1, for which lB.(t) = Wn(t ) - tW~(1), for all 0 ~< t ~< 1 and n >/1, and 1 F.(x) -- F ( x ) = ~ - ~ [B.[F(x)] + R . ( x ) ,
0 <<.x < oo,
where / l o g n'~ suplg.(x)lx = O L - - - ~ / w . p .
1
as n ~ (~.
34. Woodroofe, J. Sun~Journal o f Statistical Planning and Inference 52 (1996) 143-159
152
Table 2 S u m m a r y statistics for estimates/~1/3 and (/~,)1/3 (target value is 0.794) n 50
~to
Type
Min
25%tile
Mean
SD
Median
75%
Max
0.0526 (fight)
ml m2 m3 r rnl m2 m3 r ml m2 m3 r ml rn2 m3 r ms
0.0876 0.2253 0.2362 0.1918 0.0876 0.2590 0.2949 0.2345 0.0709 0.2176 0.2314 0.2009 0.0553 0.2443 0.3055 0.1878 0.0990 0.2972 0.3813 0.2306 0.0641 0.1451 0.2514 0.1817 0.1137 0.2478 0.2276 0.2401 0.0985 0.2181 0.3694 0.2149 0.0250 0.2240 0.2623 0.2035
0.528 0.584 0.576 0.509 0.629 0.654 0.637 0.576 0.418 0.476 0.488 0.422 0.535 0.610 0.609 0.546 0.802 0.795 0.756 0.703 0.390 0.474 0.483 0.430 0.533 0.608 0.630 0,551 0.677 0.724 0.714 0.641 0.362 0.471 0.497 0.404
0.826 0.795 0.737 0.665 1.137 0.966 0.865 0.799 0.571 0.598 0.584 0.512 0.836 0.803 0.762 0.685 1.683 1.272 1.102 1.023 0.515 0.561 0.562 0.498 0.833 0.810 0.774 0,705 1.273 1.084 0.961 0.915 0.471 0.540 0.557 0.465
0.0988 0.0689 0.0514 0.0496 0.1748 0.1022 0.0733 0.0744 0.0494 0.0380 0.0303 0.0272 0.1425 0.0904 0.0725 0.0662 0.4220 0.2230 0.1638 0.1439 0.0564 0.0424 0.0336 0.0302 0.1971 0.1300 0.0926 0.0996 0.4090 0.2495 0.1690 0.1742 0.0703 0.0519 0.0430 0.0417
0.718 0.731 0.698 0.632 0.940 0.876 0.804 0.740 0.543 0.578 0.569 0.496 0.751 0.744 0.719 0.652 1.260 1.093 0.971 0.919 0.511 0.553 0.556 0.495 0.736 0.762 0.743 0.667 1.005 0.938 0.877 0.823 0.466 0.535 0.552 0.463
1.010 0.933 0.844 0.773 1.364 1.137 1.012 0.942 0.700 0.690 0.657 0.585 0.019 0.937 0.868 0.786 2.094 1.551 1.303 1.239 0.630 0.642 0.631 0.558 1.016 0.935 0.881 0.819 1.553 1.270 1.097 1.088 0.575 0.607 0.611 0.518
3.701 2.769 2.039 2.020 6.394 4.384 2.739 3.029 1.556 1.342 1.217 0.925 4.446 2.312 2.136 2.076 11.457 6.738 4.946 3.976 1.273 1.115 1.015 0.848 3.441 2.250 1.767 1,663 6.938 4.616 3.410 2.532 1.008 0.997 1.016 0.853
0.03 (small)
0.12 (big)
100
0.0325 (right)
0.01 (small)
0.1 (big)
200
0.0205 (right)
0.01 (small)
0.08 (big)
mz m3 r ml m2 m3 r ms m2 m3 r rnl m2 m3 r ml m2 m3 r
Notes: Simulation size is 1000. The types ml, m2, m3 and r indicate the estimates using ft, based on x,,,, x,,,, x,,, and/~', on x,. Here mi's are the th j u m p point off,(~o, x) and r = k, + m21~k. . . . ), where k, = o(n). In this simulation, (kso, kloo ~, k2oo) = (10,20,20)./~a/3 based on m2 and (/~,)1/3 are better estimates.
by the theorem of Komlos et al. (1975) See also, Cs6rgo and Revesz (1981). It follows that 1 ./n Z.(c,x)
B.[F(x)]
- fo~ -
[fox -
nl/3 ×
for a l l 0 ~ < c , x < o o a n d n / > l .
~-f-x
F(x)] +
R.(x)
M. Woodroofe, J. Sun/Journal of Statistical Planning and Inference 52 (1996) 143-159
153
Table 3 Comparisons of nll3Elf,(O) - f ( 0 ) l and flil3ElS(l,c*)l cto
flll31S(l,c*)l
A = n1/3 if,(0 ) - f ( 0 ) l n=50 Mean
SD
n = 100 Mean
SD
n=200 Mean
SD
Mean
SD
(Exp) right
0.883 0.876
0.0026 0.0025
0.891 0.884
0.0027 0.0025
0.905 0.901
0.0027 0.0025
0.8676
0.0141
Small
0.907 0.889
0.0029 0.0027
0.958 0.922
0.0034 0.0030
0.941 0.924
0.0032 0.0029
Big
0.873 0.879
0.0024 0.0023
0.884 0.892
0.0024 0.0023
0.899 0.921
0.0024 0.0023
(Normal) 0.713 Right 0.700
0.0024 0.0022
0.711 0.702
0.0023 0.0022
0.709 0.703
0.0023 0.0022
0
0
Small
0.780 0.755
0.0030 0.0028
0.783 0.757
0.0030 0.0028
0.814 0.786
0.0033 0.0030
Big
0.691 0.685
0.0022 0.0021
0.694 0.687
0.0022 0.0021
0.688 0.686
0.0021 0.0021
The right, big and small ~to, for n = 50, 100, 200, used in simulating A are Exp: (0.0516, 0.0003, 0.012), (0.0325, 0.001, 0.01), (0.0205, 0.001, 0.08) Normal: (0.0956, 0.003, 0.16), (0.0631, 0.002, 0.01), (0.0416, 0.001, 0.08) Notes: Mean and SD of IS(1,c*)l are from the data used for Table 1. Main entries are simulated A values and their sd where the first row is based on/7, with m2 and the second on/7',. Simulation size is 10000. It looks as if oversmoothing had smaller effect than undersmoothing. When data are from the half normal distribution, A decreases slowly.
R e g a l i n g . Let
Z~.(c,t)
=
w.~(t) -
c -/Tt ~
t
Z:.(c)
and
=
sup
O
for 0 ~< t, c < ~
Z~.(c,t)
and n >/1. Then each W~, n >/1, is another standard Brownian m o t i o n . So, the finite-dimensional distributions ofZ~(c, t), 0 < c, t < ~ do not depend on n. Clearly,
~.[F(fot'~] \nii3jj L
fo
= ~
W:(t)+
fo
[R'.(t)-R:(t)],
M. Woodroofe, J. Sun/Journal of Statistical Planning and Inference 52 (1996) 143 159
154
where R'n(t) -- [~/~/n*(t) -- ~/~(t)]
and
(Sot)
R'~(t) = ~o F \n~13jWn(1)
and n i> 1. So, writing x ---J of tlnl/3 / ,
forO~
v~/~n(t) c -- fin(t) + an(t) t + cl(fo n'13) ' -
-
where
n 'V t fin(t) =
io
L."-
(Jot F
t,n")/
and n2/3
pn(t) = ~o R.(x) + R'n(t) -- R~(t) for 0 ~< t < @, n ~> 1. Recall that z, = nll6/logn and ~, = 1/~, for n >~ 2.
Lemma 3. I f 0 < ~ < ~ , then 1 lim sup -[fin(t) n~oo 0~
sup sup
1
-
fit 21
=
(13)
0
Ipn(t) l ~ 0
sup
(14)
I Z * ( c , t ) - Z~,(c,t)l = 0;
(15)
a<~c<~be..<~t<~
a n d / f ( Y ) holds, then (13) and (15) hold with z replaced by Kznfor any ~: > O. Proof. Relation (13) is clear; and if F ( x ) = f o x
"t- 0(X 3)
ax
x~,0, then /7 = 0
and
supo<,<~.lfin(t)/t[ <~ tc2n21aO(zZn/n) = 0(1) as n --+~ for any x. F o r (14), it is clear that sup,.~,<~(1/t)[n21aRn(x)l~O w.p. 1 and s u p o ~ < ~ (1/t)lR~(t)l <<,n-~16folW.(1)[ ~ P 0 as n ~ . So, it suffices to show that sup,. ~,< ~ (1/t)l R',(t)[ ~ 0 in probability as n ~ oc. Let @(h) = x/{hlog(1 + h - ~)}, 0
sup
[ w.(t) - W.(s)
n ~> 1.
(16)
Then each K , is finite w.p. 1 by Levy's modulus of continuity for Brownian motion, and the distribution of K . does not depend on n (so that Kn is stochastically bounded).
M. Woodroofe, J. Sun /Journal of Statistical Planning and lnference 52 (1996) 143 159 There is a constant C for which f o x - F ( x ) < < . C x
7-
So, for
[fo2t - F (SotS]
n1/6
1 IR'.(t) l
2 for all 0 ~ < x < o o .
155
t-7-oK"O L,, 1"
t,,"JJ
/,/1/6 7{ ( <.T~o K.O(CfoZn-2i3t2)~n-li6K. C l o g l
n2/3"~'~ + Cfot2) j,
since ~ is non-decreasing. The right-hand side is maximized at t = e. and the m a x i m u m value approaches zero as n ~ oo. Relation (15) now follows easily, since I Z * ( c , t ) - Z~.(c,t)l --
t + c/(fo nl/3)
1
for 0 < t < o o .
~ / ~ ( t ) -- c - fit 2
W~"(t) - c - #,(t) + p,(t)
t C
t [ # t 2 - #.(t)] + tp.(t) +7"--T-~ [W.(t) -- c - fit 2] j o n -,~
[]
The Argmax. F o r 0 < c < oo and n ~> 1, let a.(c) = inf{x > 0: Z , ( c , x) = Z,(c)}.
In the p r o o f of Proposition 1, b o u n d s are required for the distributions of these r a n d o m variables. F o r a, b, c > 0 and n ~> 1, let B,=
bnl/6
[I3,(t)~< f o a + 2n1/~ --f~--ot,
V0~
}
Then P ( B ' . ) ~ e -"b + o(1) as n --, oo, as is easily seen by considering the Brownian m o t i o n U(t) = (1 + t)B(s), where s = t/(1 + t) and 0 < t < oc. L e m m a 4. (i) a,(c) is n o n - d e c r e a s i n 9 in 0 ~ c < oc. (ii) F o r a n y c > O, 1 / a , ( c ) = Op(n l/s) as n ~ oc. (iii) F o r a n y c > O, a , ( c ) = Op(n- 1/3) as n --* oc. Proof. If 0 < y < oc, then
z>y
[_0~+2 fo >x.
= I s u p inf (~ + x ) F , ( z ) - (~ + z ) F . ( x ) > 0 / ) 1 2 > y X
fo (18)
156
M. Woodroofe, J. Sun~Journal of Statistical Planning and Inference 52 (1996) 143 159
for all 0 ~< c < oo. Assertion (1) follows, since the last line is non-decreasing in 0~>. sup~ ~<, ~<~Z*(c, t) =:. sup~ ~<, ~<~[~/(t) - (c + ~tz)]/t as n -o oo for any e < z < oo, and sup [W(t) - (c + B t z ) ] / t ~ S ( c , [ 3 ) as e ' ~ 0 and ~ --. oc. So, lim s u p P { Z . ( c )
<~ z} <<.P { S ( c , B ) <~ z},
V z s ~.
(19) 1
F o r (ii), let A.,6 be the event that F.(x) - F(x) <~ 5foe for all 0 <~ x <~ 6 n - 1/3 for 0 < 6 < 1 and n >/1. Then, using strong approximation, it is easily seen that
~
limP(A.,~)=2~
~
- 1
for all 0 < 6 < 1. Now, A.,6 implies that - cfo/2(6 + cn 1/3) for all 0 <%x <.%6n-1/3. So,
Z . ( c , x ) <<. - nl/Zfoe/2(a + x) <~
cfo
~,
P { a . ( c ) <~ 6n -1/3 } <.%P(A'.,~) + P Z.(c) <~ - 2(6 + cn 1/3)j
which approaches zero as first n --* oc and then 6 ~,0, by (19). F o r (iii) it is necessary to consider the casesfl > 0 and f l = 0 separately. If fl > 0, then l i m . ~ supx ~>6 Z . ( c , x ) = - o o w.p. 1. for any 6 > 0 by the Glivenko Cantelli 1 2 Theorem. Let 6 > 0 be so chosen that f0x - F(x) >1 5 f l x for all 0 < x ~< 6. Next, given b > 0, let B. be the event (17) with a = c/2. Then P(B'.) <~ 2e-1/2bc + o(1) as n - - , m , and B. implies F.(x) = F(x) + n l/2Bn[F(x)] + R . ( x ) <~f o x - 5flx x 2 + foe + b n - 1/3x for all 0 ~< x ~< 6 for all sufficiently large n. So, for 4 b / ( f l n 1/3) <-%x <<.6 and sufficiently large n, B. implies that
Z.(c, x) <~ n 1/3 b n -
1/3X - -
l flX2 ~ __ ½b.
c n - 2/3 + X
So
P cr.(c) >
<~ P { Z . ( c ) <%- ½b} + P(B'.) + P{¢7.(c) > 6}
for all large n, and the right-hand side approaches zero as first n --, oo and then b ~ oo. If f l = 0 , then / ~ = 0 , so that S(c,/~)>O w.p. 1 and, therefore, lim,~olimsup..~P{Z.(c) ~< r/} = 0 , by (19). Given z > 1, define B. by (17) with a = ~.2/3 and b = z - 1/3. Then P(B'.) -%~ zn -1/3, B. implies
Z . ( c , x ) ~< ~ 1 / 3
1(~.2/3
-- c ) f o n - e / 3
~-k-x
+ z - l / 3 n 1/3 X
~<(1 + f o ) z -1/3
M. Woodroofe, J. Sun/Journal of Statistical Planning and Inference 52 (1996) 143-159
157
So,
P{a.(c) > zn -I/3 } <~P{Z.(c) <~(I +fo)z -'/3 } + P(B'.). which a p p r o a c h e s zero as first n ~ oo and then z ~ oo.
[]
5. Proofs of the propositions Proof of Proposition 2. Since z~ = o(n'/3), it suffices to prove Proposition 2 with S replaced by Z. See L e m m a 2. (i) Let a*(c)=nl/3a~(c)/fo and a ~ ( c ) = i n f { t > O : Z~(c,t)=Z~(c)}, and recall that the finite-dimensional distributions of Z~(c, t) do not depend on n. Then a*(c) and a~,(c) are non-decreasing in c and a~(c) + 1/a~(c) is stochastically b o u n d e d by L e m m a 4 and R e m a r k l(i). So, i f 0 < a < b < ~ , e > 0 , and 0 < 6 < A < ~ , then
P {~<~c<~bSUpIZ.(c)
- Z*~(c)l >~e}
~
IZ*(c,t)- Z"n(c,t)l>~e}
+ P{a*(a)/x a~(a) < 5 or a*(b)v a*~(b)> A}, which a p p r o a c h e s zero as first n ~ oo and then fi --* 0 and A ~ oo, by L e m m a 3. This establishes (i). T h a t m a x {0, Z,(zn)} ~ P 0 as n ~ o o follows from L e m m a 2 and (12) with C, replaced by z,; and if 0 < b < oo, then
Z~(z.) >~ Z*(z.,bz~) =
W~(br~) - z~ - fl~(bz~) + p ~ ( z ~ ) bz. + c/fon 1/3
1
b
in probability as n ~ by L e m m a 3. Assertion (ii) follows by letting b ~ o0.
[]
Proof of Proposition 1. T o begin, let ~ = cn- 2/3 and 0 ~< y ~< n - 1/3. Then
}n(cn-2/3;y)
- f(y)
<.}n(cn-2/3;O -F) --f(O
+) + f(O +) -- f ( n - l / s ) ,
which is independent of 0 ~< y ~< n - ,/a and of order n - ' / 3 by Proposition 2 and the assumptions on f. F o r the interval [ n - 1/3, e'], let Mn = supo ~, <, ~ 11B,(t) -- ~3,(s)[/O(t -- s), where 0 is as in (16). Then M, < oo w.p. 1 for all n >~ 1, and the distribution of Mn does not
M. Woodroofe, J. Sun /Journal of Statistical Planning and lnference 52 (1996) 143-159
158
depend on n. If n - 1/3 ~ y ~/~, let x = y f , ( y ) ~< sup
n
1/3.
Then
r.(z) - F.(x) 77,--X
z>y
F(z) - F(x) + ~r~ { a . [ F ( z ) ] - ~.{F(x)]} + O ~< sup
2rex
Z>y
M.
<~f(x) + ~
x
O E F ( y ) - F(x)]
y - x
÷ O(/1 - 2/3
log n).
So,
f.(y) -- f (y) <.f (x) -- f (y) + ~ . n - 1/6 x ~(fon- 1/3) + O(n-2/3 log n) <~ Cn-1/3 + O(n-1/310gn). uniformly in y ~ n -1/3. It follows that supo~y~<~ fn(cn-U3;y) - f ( y ) <~ Op(n- 1/3 log n). That supo ~ y ~, f ( y ) - f , ( c n - 1/3; y) ~< Op(n- 1/3 log n) may be established similarly to complete the proof.
Proof of Corollary 1. T o begin observe that 2fl = - f ' ( 0 ) , iffis continuously differentiable near zero. So, e(x):-- I f ( 0 ) - f ( x ) 2flxl/x-~O as x~0. Let M, -- maxk ~ m If.(~; Xk) --f(Xk)l, where m is as in the Corollary and ct -- cn- 2/3. Then M. = op(n-1/alogn) and maxk<~mlnXk/k- 1/fol ~ a 0 as n - ~ (using Proposition 1 and the probability integral transformation). N o w If~(ct; 0 + ) -f~(~; Xk) -- 2flXkl <<, IEf.(=;0 +) -f.(=;x~)] - [ f ( o ) - f ( x k ) ] l + l E f ( O ) --f(Xk)] -- 2f~xkl ~< 2M. + e(Xk)Xk for all k ~< m. So,
If.(oc; 0 ÷)lZf~(~; 0 ÷)-f.(~;
x~)l - 2¢~xk I
~< I E f.(~; 0 +) - fo ] E f.(~; 0 +) - f.(~; x~)] I + fo If.(~; 0 +) - f.(~;x~) - 2f~ Xk ~< M.{2M. + I-2f~ + e(Xk)]Xk} +foE2M. + e(xk)xk] for all k ~< m. The consistency of/~. follows immediately by dividing by x,. and using At m r Proposition 1. For ft,, let s, = ~k= lXk for r = 1, 2. Then
Ifl. - [31 <<"l k~=1Xkf"(e; Xk)[ f"(e;O + )2s2-f.(e; Xk)] -- 2flXk
<~2 f l M . + (fo + M . ) m a x [e(Xk)l + (M2. + M.fo) sl , k<~m
$2
probability as n ~ ~ , since s, = ~ k m= 1 X~ fon-" -'Y,k=1 " k r ~ f o ' n - ' m ' + l / ( r + 1) in probability as n ~ for r = 1.2 (where which
approaches
zero
in
M. Woodroofe, J. Sun~Journal of Statistical Planning and Inference 52 (1996) 143-159
,~
159
means that the ratio of the two sides approaches one) and, therefore, Op(n/m) = Op(nl/a/logn) as n ~ o o . []
S1/S 2 =
References Cs/Srgo, M. and P. Revesz (1981). Strong Approximation in Probability and Statistics. Academic Press, New York. Grenander, U. (1956). On the theory of mortality measurement. Part II. Skand. Akt., 39, 125 153. Groeneboom, P. (1985). Estimating a monotone density. Proc. Berkeley Conf. in Honor of Neyman and Kiefer, vol. 2, 529-555. Wadsworth. Komios, J, P. Major and G. Tusnaldy (1975). An approximation of partial sums of independent random variables and sample D.F., I. Zeit. Wahr. 32, 111-132. Prakasa-Rao, B.L.S. (1969). Estimation of a unimodal density. Sankhya (A) 31, 23-36. Prakasa-Rao, B.L.S. (1983) Nonparametricfunction estimation. Academic Press, New York. Robertson, T., F. Wright and R. Dykstra (1988). Order Restricted Inference. Wiley, New York. Woodroofe, M. and J. Sun (1993). A penalized maximum likelihood estimator off(0 + ) whenfis non increasing. Statistica Sinica 3, 501 515.