Statistics & Probability North-Holland
Letters 9 (1990) 367-374
April 1990
ROBUSTNESS AGAINST UNEXPECTED IN THE LOCATION MODEL
Ruben
DEPENDENCE
H. ZAMAR
Department
University of British Columbia,
of Statistics,
Vancouver, B.C., Canada V6T I W5
Received April 1989 Revised July 1989
Abstract: Robustness of M-estimates of location against unexpected dependence in the data is studied via a min-max asymptotic variance approach. A measure of dependence is defined and used to construct a neighborhood of the classical location model which includes dependent observations. The solution of the min-max problem is a Huber’s type M-estimate with psi-function 4,. The tuning constant c tends to zero, i.e. $, (5) + sign(x) (the sample median score function), when the maximum degree of dependence allowed in the neighborhood increases. Thus the median, which is the most bias-robust estimate of location, is also approximately the most variance-robust in the present context. Keywords:
Robustness,
M-estimates,
dependence
1. Introduction In this paper we consider the asymptotic behavior of M-estimates of location the data F,( x,, , . , xn) deviates from the classical location model F,(x,,...,x,)=F,(x,-~)+..F,(x,-O),
when the joint distribution of
F, known.
(I)
From the robustness point of view one is concerned with the performance of an estimate when the underlying distribution of the data deviates from the given model. Possible deviations from model (1) are: (i) the common marginal distribution has heavier tails than E’, and (ii) the data are serially correlated. The bulk of the robustness literature deals with case (i); see for, example, Hampel et al. (1986) and references therein. For discussions of case (ii) see Portnoy (1977, 1979) Bickel and Herzberg (1979) Martin and Li (1985) Beran and Kunsch (1985) and Hampel et al. (1986, Chapter 8). In this paper we consider both of these violations simultaneously, and solve a min-max asymptotic variance problem similar to that in Huber (1964). In the present context, the sample median emerges as a good approximation to the min-max variance estimate.
2. M-Estimates Huber
of location
(1964) defined
M-estimates
of location
as solutions
of the estimating
equation
(2) Supported
by Natural
0167-7152/90/$3.50
Sciences and Engineering
Research
6 1990, Elsevier Science Publishers
Council
of Canada
B.V. (North-Holland)
Grant
A9276.
367
Volume
9.
STATISTICS AND PROBABILITY
Number 4
LETTERS
where 4 is an arbitrary non-decreasing odd function. Observe that maximum likelihood estimates satisfy this equation with score function
April 1990
in the classical
location
model
GM,(X)= -fo’(x)/h(x)-
(3)
Of course this definition only applies when the scale parameter u is known. When (I is unknown the residuals X, - T, in (2) must be divided by a robust estimate of scale. For simplicity, we assume throughout this paper that u = 1 is known. Li and Zamar (1989) show that the min-max result on Section 5 can also be derived for the general case, under suitable regularity conditions. Huber (1964) showed that, under mild regularity assumptions on 4 and F, M-estimates are asymptotically normal with asymptotic variance AV($,
F)=E,{~*(X-~)}/[~F{~C’(X--~)}]‘.
(4
3. The model’s neighborhood Let (X,} given by
be a strictly
stationary
d,= supIP(X,,U, A,B
process.
A measure
of dependence
between
(5)
of dependence
for the process
Observe that 0 < D({ X, }) < 1, D({ X, }) = 0 if and only if the X, are independent only if C:=, d, diverges. Fix D, < 1, 0 < E < 0.5 and F, symmetric. Let FE.Do be the class of processes 1. The marginal {F:
X,, i = 1, 2,. . . , is
X+B)-P(X,+A)P(X+B)I,
where A and B are Bore1 sets of R’. A measure
Assumption
X0 and
distribution
F of X, belongs
to the &-contamination
{ X, } is given by
and D({ X, }) = 1 if and { X,} which satisfy: family
F(x)=(l-&)Fo(x-8)+~H(x-o)},
where H is an arbitrary
symmetric
(7)
distribution.
Assumption 2. D( { X, }) < Do. Assumption 3. For any bounded with asymptotic variance AV(~)
=
E{ I”}
score function
#, the corresponding
+ EZ, Cov{+(x0), 444))
[q
M-estimate
is asymptotically
normal
(8)
~'cm>l'
The ‘classical’ E-contamination family studied by Huber (1964) is obtained when D,, = 0. The case in which the contamination is due to data dependence only is obtained when E = 0 and Do > 0. For 0 G D,, 6 1 and 0 < E < 0.5, Fe D is a collection of models for data which deviate from a ‘ target’ model by containing a certain fraction of”outliers and/or a certain degree of serial correlation. 368
Volume
9. Number
STATISTICS
4
AND
PROBABILITY
Remark 1. Assumption 3 holds if {X,} satisfies the conditions Billingsley (1968, Section 21). This includes G-mixing processes the form
LETTERS
April 1990
of the functional central limit theorem with CyZ,$1/2 < cc and linear processes
in of
J=” X, =
(9)
C a$,+, ,=-Cc identically
where the S, are independent
In the sequel we assume
without
distributed
loss of generality
with mean zero and finite variance
and
that 0 = 0.
4. The maximum asymptotic variance In this section we derive the maximum asymptotic variance, m(4), over FF,Du of an M-estimate with a monotone score function J/. Notice that AV($) = 00 for any unbounded score function 4. For in this case the process { X,} can be chosen so that Var( +(X0)) is arbitrarily large. Hence, for the min-max variance theory, attention can be restricted to bounded I/J. The maximum of + is denoted by
(10)
MC+) = Jimm+(l). Let & = D,/(l that
E{ ~(X&(x,>} This and the assumptions
(1979)
it can be shown
(11)
G4M%W,. on $ and F,,,.
r2 = (1 -e)&{
+‘(z>}
G (1 - s>&,{ #‘(Z>} Furthermore,
for all {X,} E FE.,,]. As in Billingsley
- D,,). Then & > Ezld,(X,)
since 1,5’is non-negative
imply that the numerator
+&/{+2(Z)} + M’(+)(s
T’ of (8) satisfies
+2EW(X0), I=1
G(X))
(12)
+ W,).
the denominator
of (8) satisfies (13)
From (12) and (13)
AV(rli) G
(1 -@F,{ #‘(z)) + M2(‘!‘)b+ Wo)
Lemma 1. Suppose that {X,} the right hand side of (14).
(14)
[o- 4~,w~w]2 satisfies Assumptions
l-3
and that $ satisfies
(10). Then, AV(+)
is equal to
369
Volume
9, Number
STATISTICS
4
AND PROBABILITY
LETTERS
April 1990
Proof. Let
X,=(1-B,)Z,+B,(l-b&t-B,b,l/:,
(15)
whereP,h {O V4h PC:)and P2 are mutually independent, the B, and b, are Bernoulli random variables with P( B, = 1) = E and P(b, = 1) = 6, the Z, are independent random variables with common distribution F, and the C: are independent random variables with P(q = p) = P(L/: = -p) = i. Finally, { y } is a Markov chain with states 0, -p and p, transition matrix
and stationary
initial
distribution
((1 - a), +(Y, ia).
Table
1 gives the joint
distribution
of (V,, v). If
(16) then{ X, 1 E bO.
To see this observe
that { X,} satisfies
Assumption
2 because
implies
The process { X,} satisfies Assumption 3 because it is a strictly stationary +-mixing process (see Billingsley, 1979, Section 21). Assumption 1 is trivially satisfied. Using Table 1 it is easy to verify that E{ #2(X,)}
= (I - e)E{ Ic/‘(Z,)}
E{ +‘( X0)} = (1 - &)E{ $‘( Z,)}
+ (4
- 6) + Eq+2(P)>
+ { &(l - 8) + &a} 4’(p)
+ &6(1-
and Cov{l/(X,),
Table 1 Joint distribution UO
4(x,)}
=E2s2 Cov{~(V,),
+(I/;)}
=E2s2cz’+‘~2(/l).
of (V,, v)
u,
P(V,
= uo, y = u,)
0
(I -
P
(Y(l - 0()/2
-lJ 0
a(1 - a)/2 a(1 - a)/2 aq7 + a’-‘)/4 u?(la’_‘)/4
P -P 0 P -!J
370
a)2
(n/2)(1 - a) a2(1- a’-‘)/4 0?(1+ a’_‘)/4
CY)$‘(O)
STATISTICS AND PROBABILITY
Volume 9, Number 4
Let AV(+,
(Y, IL) denote
the asymptotic
AV(+, a, PL)=
under
(1-4~{~2(zo)} [(l-&)E{~‘(Z,)}
and the lemma follows because andp--+co. 17
5. Min-max
variance
+
LETTERS
{ X,} of an M-estimate
April 1990 with score function
+ {&(1-6)+&~(Y+sPO(Y)~2((1) (17)
{‘(1-S)+&6a}~‘(~)+ES(l-~)~‘(0)]2’
the right hand side of (17) tends to the right hand side of (14) when a: + 1
variance
Let C be the class of all monotone q+‘(Z))
score functions
4 which satisfy Assumption
3 and
= I.
(18)
Since AV($) = AV(k$) for all k > 0, this is just a convenient such that $( co) = M. The following theorem is our main result.
normalization.
Let C,
Theorem 1. Suppose that F, has an even andpositive density f which satisfies (i) A(x) non-decreasing; (ii) lim, j m A(x) = 00; (iii) 0 < E{A2(Z,)} < co. Let 0 < E < 0.5 and 0 < D, < 1 be fixed. Then: (a) There exists c = C(E, D,) such that 1,5~given by 44x)= is min-max.
4. Then
m+kM,W9
max{&&x),
-k&9)1
be the subset
= - { f’(x)/f(x)}
of C
is
(19)
That is, m(+.)
(b) Let h(t) =A(t)/{
foraN
1; A’(x)f(x)
lim C(E, D,)=O 4,-1 Proof. Let Ma = {2f(O)}-‘.
If h’(t) > 0 for a/l t > 0 then
forallO
= -2~%4fYx)
4 satisfying G(m)
dx}.
(20)
(21)
Since
l=k$!W)) any function
#EC.
dxG2J,Mf(O),
(18) must also satisfy
> Ma.
It follows that c=
u c,. M,M<,
For each c > 0, let
4,(x>
ax) = E{ g(z)}
. 371
Volume
9, Number
STATISTICS
4
PROBABILITY
LETTERS
= cc and lim ( _ &((c)
Then &( 00) = (p,(c) and by (ii) lim, _ &(c) c = c(M) 2 0 with the property +&c)
AND
April 1990
= M,, . So for each M >, M,, there exists
= M.
Now we shall show that for each $ E CM, &“{ 4’(z>} If d = d(M)
a E,{ &W,(Z)}.
= E,,{ +:,cMj(Z)}
z(G) = =
then, as in Hampel
j_mm[ q(x) - y]‘/(x) jm J12(~)f(~) dx + $/” -m
et al. (1986, Chapter
2),
dx A’(x)f(x)
--cx;
dx + ; = la ~,L~(x)f(x) -W
dx + K,
over CM is equivalent to minimizing where K does not depend on 4. Hence, minimizing EFJ $2(Z)} over C,,,, and the last is clearly achieved when 4 = +c(Mj. Since for all $ in CM we have AV(~)
Z( 4)
(1-E)-&,{ ‘k’(z)} + M2(E+Wo>
=
(1 - E)2 it follows that #v(q)
>AV(&,,,,)
for all $E
For all value of M for which CM is not empty inf XV($) +
CM. there exists & with &(c) = M. So we conclude
that
> inf AV(Gc). c
Since g(c)
=AV(+.)
and g is a continuous (b) observe that if
h(c) =
-+ co
function
icA2(x)f(x)
as c+
00
of c it has a global minimum
dx
SC A'(+@)
at some c0 = cO(e, D,), proving
(a). To prove
dx12,
0
then
~2(~){8Po+E+2(1-~)[1-Fo(c)]}
h(c)
dc) = 2(1 and our assumption
-E)
+
on A(c) implies
(1 - E2) g’(c) > 0 for all c 2 0 when &, is sufficiently
large.
•i
Table 2 shows optimal values of c for several values of E and Do when F, is the standard normal distribution. Since in this case A’(c) = c2f(c), Theorem l(b) is in force, and c = 0 for sufficiently large values of D,. Observe that the presence of even a small amount of dependence has a severe effect on the value of the optimal c. For instance, the values of c when D, = 0.25 are roughly 50% smaller than those when Do = 0. Also notice that the effect of dependence is somewhat stronger for small values of E. 372
Volume
9, Number
STATISTICS
4
AND
PROBABILITY
April 1990
LETTERS
Table 2 Optimal
values of c for various choices
of E and D, E = 0.05
E = 0.10
E = 0.15
1.94
1.40
1.14
0.88
1 .I6
1.34
1.11
0.96
0.05
1.34
1.16
0.99
0.88
0.25
0.76
0.70
0.65
0.60
0.50
0.44
0.41
0.39
0.36
0.75
0.21
0.19
0.18
0.17
0.99
0.00
0.00
0.00
0.00
DO
E =
0.00 0.01
0.01
6. Concluding remarks
When Do = 0 Theorem l(a) reduces to Huber’s (1964) min-max variance result for E-contamination neighborhoods. Huber showed that an M-estimate of location with psi-function (19) minimizes the maximum asymptotic variance AV( $) among monotone functions 4. In a related optimality problem, Hampel (1968) finds $ to minimize AV($, F,) subject to a bound on the gross-error sensitivity:
The solution to Hampel’s problem is also an M-estimate with psi-function (19). The direct method used to prove our theorem provides insight into the close relation between Huber’s and Hampel’s optimality problems. In fact, from the given proof it becomes clear that Huber’s optimality problem consists of minimizing the functional
J(‘/‘) = (1-E&,{ d”(z)}
+E sup+‘(X) .Y
(22)
subject to (18). On the other hand Hampel’s problem consists of minimizing the first term of (22) subject to (18) and a bound on sup,$(x). In Huber’s case the bound on sup,($(x) is not given explicitly but added, as a penalty, to the objective functional (cf. Hampel et al., 1986, Section 2.7). When + is a redescending score function with max q(x)
=$(A)
=M
and
G’(X) =O,
an argument similar to the proof of the Lemma shows that the right hand side of (14) is a lower bound for AV( $), Since AV( 4, (Y, X) tends to the right hand side (14) when (Y tends to one, we have that in general AV( $) 2 right hand side of (14) and our theorem also holds when redescending functions I/J are allowed in C. Extension of the present result to M and GM-estimates of regression is straightforward providing gross errors and serial correlation can affect only the response variable. The general case when all the variables can also be affected by gross errors and serial correlation is outside the range of the present theory and deserves further study. 373
Volume 9, Number
4
STATISTICS
AND PROBABILITY
LETTERS
Huber,
P.J. (1964)
April 1990
References Beran J. and H. Kunsch (1985) Location estimators for processes with long-range dependence, Research Rept. No. 40, Sem. fur Statist., ETH (Zurich). Bickel, P.J. and A.M. Herzberg (1979), Robustness of design against autocorrelation in time I: Asymptotic theory, optimality for location and linear regression, Ann. Statist. 7, 77-95. Billingsley, P. (1968), Convergence of Probability Measures (Wiley, New York). Billingsley, P. (1979), Probability and Measure (Wiley, New York). Hampel, F.R. (1968), Contributions to the theory of robust estimation, Ph.D. Thesis, Univ. of California (Berkeley, CA). Hampel, F.R., E.M. Ronchetti, P.J. Rousseeuw and V.A. Stahel (1986) Robust Statistics. The Approach Based on Influence Functions (Wiley, New York).
374
Estimation
of a location
parameter,
Ann.
statrst.35, 73-101. Lee, C.H. and R.D. Martin (1984), Ordinary and proper location M-estimates for ARMA models, Tech. Rept. No. 29, Dept. of Statist., Univ. of Washington (Seattle, WA). Li, B. and R.H. Zamar (1989), Min-max asymptotic variance M-estimates of location when scale is unknown, Tech. Rept., Dept. of Statist., Univ. of British Columbia (Vancouver, B.C.). Portnoy, S.L. (1977), Robust estimation in dependent situations. Ann. Statist. 5, 22-43. Portnoy, S.L. (1979), Further remarks on robust estimation in dependent situations, Ann. Star& 7, 224-231.