t,
-
,f,
Journal of Statistical Planning and Inference 57 (1997) 29-38
ELSEVIER
joumaJ of statistical planning and inference
Reducing the variance by smoothing Luisa Turrin Fernholz Department of Statistics, Temple University, Philadelphia, PA 19122, USA Received 1 February 1995; revised 1 May 1995
Abstract
In this paper we show that versions of statistical functionals which are obtained by smoothing the corresponding empirical d.f. with an appropriate kernel can reduce the variance and the mean square error of the statistic. This is shown by studying the influence function of the functional. The smaller variance is achieved when the influence function is either discontinuous or piecewise linear with convexity towards the x-axis. Examples for M- and L-estimators are given.
A MS classifications: Primary 62G20; secondary 60FI 5 Keywords: Statistical functionals, Kernel estimator, Influence function, Smoothing, Asymptotic variance
1. Introduction
Let XI . . . . . X11 be i.i.d, random variables with common d . f . F . The corresponding empirical d.f. Fn is defined to be
F,(x) = _1 ~ n
A(x - Xi),
i=1
where A(x) = { ~
i f x < 0, ifx~>0.
We consider the d.f. estimator F~ which is obtained by taking the convolution of F11 with some density k11,F11 = F11 * k11. In this case
F11(x) = F11 • kn(x ) i1 -
!
--n
i=1
0378-3758/97/$17.00 (~) 1997 Elsevier Science B.V. All rights reserved PH S 0 3 7 8 - 3 75 8 ( 9 6 ) 0 0 0 3 3 - X
30
L. 72 Fernholz/Journal of Statistical Plannin9 and Inference 57 (1997) 29 38
where Kn(x) = J~-o~ kn(t)dt. The kernel sequence kn is generated by a symmetric density k and a sequence of positive real numbers {an} with an = o(1), and is defined by =
,
n~>l.
(1)
an
We shall use regular kernel sequences (see Fernholz, 1993), that is, sequences that satisfy
kn(t)dt = o(n -1/2) {Itl>b~} for some sequence {bn} of positive real numbers such that bn = o(n-1/2). Throughout this paper we shall assume that k has support in [ - 1 , 1]. In general, we shall denote the convolution of any function h with k by (h * k ) ( x ) = f h ( x - t)k(t)dt. It was proved in Fernholz(1993) that for Fr6chet or Hadamard differentiable functionals, the new statistic T(F'n) has the same asymptotic (normal) distribution as T(Fn) when the influence function (see Hampel, 1974) satisfies certain regularity conditions. This fact suggests that any advantage of T(-~n) over T(Fn) would be more apparent for small samples. To study this, we must try to separate the sample size from the kernel sequence. Let T~ be the functional defined by
I",(G) = T(G • k,),
(2)
where G is any d.f., {kn} is a kernel sequence with support in [-an, an], and * denotes the convolution
(G * kn)(x) = f G(x - t)kn(t)dt. The new functional T, depends on n, on T, and on the kernel k,. Let us now fix k = k,, for some v ~> 1, and write T = Tv. The statistic 1"(Fn) will depend on a fixed kernel k which now is independent of the sample size n. The purpose of temporarily fixing the kernel while allowing the sample size to vary is to allow us to study the individual functionats in the sequence {Tv}. This will then be used to analyze the effects of smoothing which is dependent on sample size. The new statistic T(Fn) estimates T ( F ) = T(F) rather than T(F), so a bias, T ( F ) T(F), is introduced. For symmetric F and T corresponding to a location estimate of the center of symmetry, we have T ( F ) = T(F), so the bias will be zero. In general, if F has a bounded second derivative, the Taylor expansion of F gives
F(x) - F(x) =
#
(F(x - t) - F ( x ) ) k , ( t ) d t
a
= ½F"(x)
tZka(t) dt + o(a 2) a
= 1F"(x)a 2
s2k(s) ds + o(a 2) -1
<~ M O ( a 2) + o(a 2) = O(a2),
L.T. FernholzlJournalof Statistical Planningand Inference 57 (1997) 29 38
31
where F"(x)<~M for all x. Therefore, for the supremum norm 11 ]], we have [IF - F]] = O(a2). Now, for a Fr6chet differentiable functional T, the bias is given by ~(F)
-
T(F) = ./IFT, F(X)d(F - F)(x) + Rem(F - F ) = f (F(x) - F(x)) d(IFv, F)(X) + Rem(F - F) ~< O ( [ I F - F'[[)+ o ( ] ] F - F I I ) = O(a 2)
when IF satisfies certain regularity conditions. This follows immediately from the definition of the Fr6chet derivative. It follows that the contribution of the bias to the mean square error is O(a4). W e now consider IF~-,F , the influnce function of T at F. It was proved in Fernholz ( 1 9 9 3 ) that IF~-,F = IFTy * k = IFTy. Let a2 and ~2 denote the asymptotic variances of T and T, respectively. We introduce the corresponding generalized variances by defining
~2 (G) = / ( I F T , F(X)) 2 dG(x) J and ~z (G) = fJ(IF-~,F(X)) 2 dG(x),
(3)
where G is any d.f. near F. This is in the spirit of robustness, which allows G to vary in a neighborhood of the model. Clearly, when G = F then a 2 (G) = cr2 and ~2 (G) = ~2. These generalized variances will be compared for statistical functionals with influence functions which are either constant with a jump at some point x0 or are continuous and piecewise linear with certain convexity. Examples of such functionals include sample quantiles, some L-estimators, and M-estimators. In Section 2 we present the main results in propositions 1 and 2 and we give direct applications to L- and M-estimators. The proofs of three preliminary lemmas are given in the appendix.
2. M a i n results
Throughout this section, T will denote a statistical functional with influence function IFT,F and T will be the associated functional defined by (2) when a = av is the fixed bandwidth for the regular kernel ka with support in [ - a , a].
32
L. T. Fernholz/ Journal of Statistical Plannin9 and Inference 57 (1997) 29-38
Lemma 1. Let q~(x) =
-1
t f x < 0,
1
~fx~>O,
and let ka be a regular kernel with support [ - a , a]. For ~ = ck * ka there exists a constant C > 0 such that a q~2 ( x ) d ~ a
~(x))2dx~Ca.
--
An immediate consequence of Lemma 1 is the following.
Proposition 1. Let T be a functional with influence function IFr,F(X) = c(A(x - xo) - ½), where xo is fixed, and suppose that c = c ( F ) is a constant which depends on the d . f F. L e t ka be a regular kernel with support [-a,a], and suppose that ~ = c(F • ka) satisfies c - ~ = o(a). Let ~F, a 2, and "62 be defined as in section 1. Then there exists a constant M > 0 such that f o r small enough a c~2 (G) - ~2 (G) > Ma f o r any d . f G with density satisfying G'(x) = g(x) > m > 0 on (xo - a, xo + a).
Proof. We have 1Fr,F(X) = c(A(x - xo) - ½), so for the smoothed functional T, 1F~',F = IFTy
~__"C c--lffT, F"
Now,
= f
((IFT, F(X)) 2 --C-~C-2(:FT, F(X)) 2) dG(x)
= (1 - 72 c 2) J(i~r,F(X))2 dG(x)
(4)
L.T. Fernholzl Journal of Statistical Planning and Inference 57 (1997) 29 38
33
We have ( 1 -- "~2 C--2)
f(ffr, F(X))2 dG(x) =
(c
-
"~)(c + Y)c -2
f(ffT, F(X))z d G ( x )
= o(a)
by hypothesis, and xo+a ( (IF T,F(X ) ) - (I'FT, F(X ) ) 2 ) dG(x)
>1 m ~ xo-a
((IFT, F(X)) 2 -- ( ~ T , F ( X ) ) 2)
d(x)
>~ mCa
by Lemma 1. Hence, the proposition is proved.
[]
Note that the inequality (4) holds for all G with density bounded away from zero in ()co- a, xo + a). In general, x0 may depend on F, however any change in x0 when F is replaced by F will be of the same order as the bias. Since the bias has been shown to be O(a2), this change will not affect the results. Proposition 1 can be applied directly to sample quantiles, and to L-estimates with a discontinuous influence function. The next two lemmas, whose proofs can be found in the appendix, will be used to show that 62, the variance corresponding to T, may be smaller or larger than a 2 according to whether the convexity of the influence function is 'towards' or 'away from' the x-axis. Lemma 1 corresponds to a function with convexity 'away' from the x-axis, whereas in Lemma 2 the convexity is 'towards' the x-axis. Lemma 2. L e t ~b(x) = Ixl and let ka be a regular k e r n e l with support in [ - a , a ] . L e t (9 = (9 * ka. Then there exists a constant C > 0 such that
F a
~9(x)dx -
f
qS(x)dx>~Ca 2. a
Lemma 3. L e t 49(x) = Ixl - 1, and let ka be as in L e m m a 2. Then there e x i s t s a constant C > 0 such that f o r s m a l l enough a,
f ~/~2(X)dx - / (~(x)) 2 dx•
Ca 2.
Since the convolution does not affect linear functions, the results of Lemma 3 can be applied to influence functions which are locally of the form IF(x) = 7 +/~lxl + v x , where a, fl, and 7 are constants. Hence we have
34
L. T. Fernholz / Journal of Statistical Plannin 9 and Inference 57 (1997) 29-38
Proposition 2. Let F be any d f a n d let T be a functional with influence function I F - I F r y of the form IF(x) =
-b x
if x--.< - b, i f - b < x < b,
b
for b<.x,
for some number b > O. Let k~ be a regular kernel with support in [ - a , a] and let a 2 and-~2 be defined as in (3). I f G is a d.f with density g which satisfies g(x)>..-m > 0 for some constant m and all x near b. Then there exists M > 0 such that a2(G) - ~2(G) > Ma 2
uniformly in G, for small enough a. Proof. Choose a such that g(x) > m for x E (b - a,b + a), and such that --2
IF (x) ~ IF2(x) for all x, where I~F = IF * k~ and IF2(x) = (IF(x)) 2. Then
~2(G) - a2(G) = f (IF2(x)dx - IF2(x))dG(x) = / (IF2(x) dx - I"F2(x))g(x) dx /> m
- fF2(x)) a
>~ inCa 2 by L e m m a 3.
[]
The results of Proposition 2 are easily extended to more general cases of piecewise linear influence functions. But the reduction of the variance by smoothing depends on the convexity 'towards' the x-axis. Proposition 2 can be applied directly to M-estimates of location. Recall that for a given function 0, an M-estimator is a solution T(Fn) = 0n of the equation f O ( x 0n) dFn(x) = 0. It was shown in Fernholz(1993) that for a fixed kernel k, the smoothed version is a root I'(F,) = 0n of the equation
f ~b(x
-
0,) dF.(x)
0.
But
f O(x - -O.)
= / f ~b(x-'On)k(x- t) dFn(t) dx
=/
- -On) dF.(t),
L. T. Fernholz l Journal of Statistical Plannin9 and Inference 57 (1997) 29-38
35
where tp = ~ , k. In this case, smoothing Fn is equivalent to smoothing the function ~9. Consider the following examples: (a) Huber estimators with qJ-function of the form
~(x)=
mx mb
i f x < b, i f x>~b.
(see Hampel et al., 1986, p. 105). Here the M-estimates with ~ = ~9 • k will have a smaller variance. (b) Redescending M-estimates with Hampel's piecewise linear function:
~(x) = - t k ( - x ) =
x,
for O<<,x<~a,
a
for a ~
C-Xa
c-b
0
(see Huber, 1981, Section 4.8). In this case we should not smooth ~ on the entire line. The variance will be reduced if we smooth ~O only inside the interval ( - c , c ) where is convex 'towards' the x-axis.
3. Closing Remarks Note that when the sample size is n and we let v = n so that a = av = an = o(n 1/2), then Proposition 1 implies that ~r2 _ -.~2 > Man,
with an = o(n -1/2) > 0. For the cases of Proposition 2, we have 0.2 __ ~2 ~> m ( a n ) 2
with an o(n -1/2) > 0. This indicates that for the special case of the M-estimates mentioned above, the gain due to smoothing in terms of reducing variance is smaller when compared to sample quantiles and to some L-estimators. Kernel smoothing will reduce the variance in the cases treated above, and since the bias involved in smoothing is either zero or O(an2), the contribution of the bias to the mean square error will be O(an4). Thus, the mean square error will decrease by smoothing. The advantage of T(Fn) over T(Fn) would be more apparent for small samples. The results presented in Proposition 1 plus the fact that smoothing preserves asymptotic normality and reduces local shift sensitivity (see Fernholz, 1993) suggest that kernel smoothing is most beneficial when the influence function is discontinuous, as in the case of sample quantiles and some L-estimators. =
36
L. T. Fernholz / Journal of Statistical Plannin9 and Inference 57 (1997) 29-38
Appendix Proof of Lemma 1. We have a symmetric kernel/ca with support in [ - a , a], such that ka(x) = a - l k ( x / a ) . Then for x E ( - a , a ) , ~(x) =
f
4 ( x - t ) k a ( t ) dt a
=
ka(t)dt -
ka(t) dt
a
=
// //
ka(t)dt -
a
=
ix
ka(t) dt
a
ka(t)dt
x
x/a =
k ( s ) ds. -- X/O
This shows that ~ isnondecreasing. It also shows that we can find a constant c, 0 < c < 1, such that 4 ( c a ) < ~ . 1 Since 141 ~ ~ 0 . Therefore, if we choose c as above,
f
f
a
I~1 ~ 141
/o
a
fo c° ~>
( 1 - ¼)dx
>1 3ca.
Proof of Lemma 2. Since ka(x) = a - l k ( x / a ) ~(x) =
f f_
Ix - t l k a ( t ) d t
a
=
Ix -- t [ k ( t / a ) d ( t / a )
a
=
Ix -- a s [ k ( s ) d s 1 1
= a f [x/a 1
slk(s)ds
[]
and ka has support in [ - a , a ] , we have
L.T. Fernholz l Journal o f Statistical Plannino and Inference 57 (1997) 29-38
37
1
= a
>1 a
f0
(Is
+x/al + Is -x/al)k(s)ds
/o1
2sk(s) ds
> ca
for some positive constant c. Without loss o f generality, we can assume c ~< 1. Note that qS(x)~> ~b(x) for all x, and therefore
x )dx a
a
>1
[. (ca
- Ixl) dx
ca
>~ 2c2a 2 _ c2a 2, and the lemma is proved with C = c 2. P r o o f of Lemma 3. Let q)(x) = Ixl - 1, and let ka be as in Lemma 2. Note that the same argument used in L e m m a 2 shows that ~ ( x ) > c a - 1. Also ~(x)>~ck(x ) for all x. If we choose a ~ < l , t h e n we have qS(x)~< - ½ and ~(x)~<0 for x E [ - a , a ] . Hence it follows that qS(x)+ q~(x)~< - 1, and so
i
a
i'_
Y
a
:
a
-
Ifa
--a
1
~a
(ca
-
Ixl) dx
>~ Ca 2
for some constant C > 0 by L e m m a 2.
Acknowledgements The author would like to thank the referees for constructive comments which led to an improved presentation o f this paper.
38
L. T Fernholz/Journal o f Stat&tical Plannin 9 and Inference 57 (1997) 29-38
References Fernholz, L.T. (1983). Von Mises Calculus for Statistical Functionals, Lecture Notes in Statistics, Vol. 19 Springer, New York. Fernholz, L.T. (1991). Almost sure convergence of smoothed empirical distribution functions. Scand J. Statist. 18, 255,262. Fernholz, L.T. (1993). Smoothed versions of statistical functionals. In: S. Morgenthaler, E. Ronchetti and A.S. Stahel, Eds., New Directions in Statistical Data Analysis and Robustnes. Birkhauser Verlag. Hampel, F.R. (1974). The influence curve and its role in robust estimation. J. Amer. Statist. Assoc. 69, 1887-1896. Hampel, F.R., E.M. Ronchetti, P.J. Rousseeuw and W.A. Stahel (1986). Robust Statistics: The approach Based on Influence Functions. Wiley, New York. Huber, P.J. (1981). Robust Statistics. Wiley, New York.