Robustness against unexpected dependence in the location model

Robustness against unexpected dependence in the location model

Statistics & Probability North-Holland Letters 9 (1990) 367-374 April 1990 ROBUSTNESS AGAINST UNEXPECTED IN THE LOCATION MODEL Ruben DEPENDENCE ...

435KB Sizes 0 Downloads 20 Views

Statistics & Probability North-Holland

Letters 9 (1990) 367-374

April 1990

ROBUSTNESS AGAINST UNEXPECTED IN THE LOCATION MODEL

Ruben

DEPENDENCE

H. ZAMAR

Department

University of British Columbia,

of Statistics,

Vancouver, B.C., Canada V6T I W5

Received April 1989 Revised July 1989

Abstract: Robustness of M-estimates of location against unexpected dependence in the data is studied via a min-max asymptotic variance approach. A measure of dependence is defined and used to construct a neighborhood of the classical location model which includes dependent observations. The solution of the min-max problem is a Huber’s type M-estimate with psi-function 4,. The tuning constant c tends to zero, i.e. $, (5) + sign(x) (the sample median score function), when the maximum degree of dependence allowed in the neighborhood increases. Thus the median, which is the most bias-robust estimate of location, is also approximately the most variance-robust in the present context. Keywords:

Robustness,

M-estimates,

dependence

1. Introduction In this paper we consider the asymptotic behavior of M-estimates of location the data F,( x,, , . , xn) deviates from the classical location model F,(x,,...,x,)=F,(x,-~)+..F,(x,-O),

when the joint distribution of

F, known.

(I)

From the robustness point of view one is concerned with the performance of an estimate when the underlying distribution of the data deviates from the given model. Possible deviations from model (1) are: (i) the common marginal distribution has heavier tails than E’, and (ii) the data are serially correlated. The bulk of the robustness literature deals with case (i); see for, example, Hampel et al. (1986) and references therein. For discussions of case (ii) see Portnoy (1977, 1979) Bickel and Herzberg (1979) Martin and Li (1985) Beran and Kunsch (1985) and Hampel et al. (1986, Chapter 8). In this paper we consider both of these violations simultaneously, and solve a min-max asymptotic variance problem similar to that in Huber (1964). In the present context, the sample median emerges as a good approximation to the min-max variance estimate.

2. M-Estimates Huber

of location

(1964) defined

M-estimates

of location

as solutions

of the estimating

equation

(2) Supported

by Natural

0167-7152/90/$3.50

Sciences and Engineering

Research

6 1990, Elsevier Science Publishers

Council

of Canada

B.V. (North-Holland)

Grant

A9276.

367

Volume

9.

STATISTICS AND PROBABILITY

Number 4

LETTERS

where 4 is an arbitrary non-decreasing odd function. Observe that maximum likelihood estimates satisfy this equation with score function

April 1990

in the classical

location

model

GM,(X)= -fo’(x)/h(x)-

(3)

Of course this definition only applies when the scale parameter u is known. When (I is unknown the residuals X, - T, in (2) must be divided by a robust estimate of scale. For simplicity, we assume throughout this paper that u = 1 is known. Li and Zamar (1989) show that the min-max result on Section 5 can also be derived for the general case, under suitable regularity conditions. Huber (1964) showed that, under mild regularity assumptions on 4 and F, M-estimates are asymptotically normal with asymptotic variance AV($,

F)=E,{~*(X-~)}/[~F{~C’(X--~)}]‘.

(4

3. The model’s neighborhood Let (X,} given by

be a strictly

stationary

d,= supIP(X,,U, A,B

process.

A measure

of dependence

between

(5)

of dependence

for the process

Observe that 0 < D({ X, }) < 1, D({ X, }) = 0 if and only if the X, are independent only if C:=, d, diverges. Fix D, < 1, 0 < E < 0.5 and F, symmetric. Let FE.Do be the class of processes 1. The marginal {F:

X,, i = 1, 2,. . . , is

X+B)-P(X,+A)P(X+B)I,

where A and B are Bore1 sets of R’. A measure

Assumption

X0 and

distribution

F of X, belongs

to the &-contamination

{ X, } is given by

and D({ X, }) = 1 if and { X,} which satisfy: family

F(x)=(l-&)Fo(x-8)+~H(x-o)},

where H is an arbitrary

symmetric

(7)

distribution.

Assumption 2. D( { X, }) < Do. Assumption 3. For any bounded with asymptotic variance AV(~)

=

E{ I”}

score function

#, the corresponding

+ EZ, Cov{+(x0), 444))

[q

M-estimate

is asymptotically

normal

(8)

~'cm>l'

The ‘classical’ E-contamination family studied by Huber (1964) is obtained when D,, = 0. The case in which the contamination is due to data dependence only is obtained when E = 0 and Do > 0. For 0 G D,, 6 1 and 0 < E < 0.5, Fe D is a collection of models for data which deviate from a ‘ target’ model by containing a certain fraction of”outliers and/or a certain degree of serial correlation. 368

Volume

9. Number

STATISTICS

4

AND

PROBABILITY

Remark 1. Assumption 3 holds if {X,} satisfies the conditions Billingsley (1968, Section 21). This includes G-mixing processes the form

LETTERS

April 1990

of the functional central limit theorem with CyZ,$1/2 < cc and linear processes

in of

J=” X, =

(9)

C a$,+, ,=-Cc identically

where the S, are independent

In the sequel we assume

without

distributed

loss of generality

with mean zero and finite variance

and

that 0 = 0.

4. The maximum asymptotic variance In this section we derive the maximum asymptotic variance, m(4), over FF,Du of an M-estimate with a monotone score function J/. Notice that AV($) = 00 for any unbounded score function 4. For in this case the process { X,} can be chosen so that Var( +(X0)) is arbitrarily large. Hence, for the min-max variance theory, attention can be restricted to bounded I/J. The maximum of + is denoted by

(10)

MC+) = Jimm+(l). Let & = D,/(l that

E{ ~(X&(x,>} This and the assumptions

(1979)

it can be shown

(11)

G4M%W,. on $ and F,,,.

r2 = (1 -e)&{

+‘(z>}

G (1 - s>&,{ #‘(Z>} Furthermore,

for all {X,} E FE.,,]. As in Billingsley

- D,,). Then & > Ezld,(X,)

since 1,5’is non-negative

imply that the numerator

+&/{+2(Z)} + M’(+)(s

T’ of (8) satisfies

+2EW(X0), I=1

G(X))

(12)

+ W,).

the denominator

of (8) satisfies (13)

From (12) and (13)

AV(rli) G

(1 -@F,{ #‘(z)) + M2(‘!‘)b+ Wo)

Lemma 1. Suppose that {X,} the right hand side of (14).

(14)

[o- 4~,w~w]2 satisfies Assumptions

l-3

and that $ satisfies

(10). Then, AV(+)

is equal to

369

Volume

9, Number

STATISTICS

4

AND PROBABILITY

LETTERS

April 1990

Proof. Let

X,=(1-B,)Z,+B,(l-b&t-B,b,l/:,

(15)

whereP,h {O V4h PC:)and P2 are mutually independent, the B, and b, are Bernoulli random variables with P( B, = 1) = E and P(b, = 1) = 6, the Z, are independent random variables with common distribution F, and the C: are independent random variables with P(q = p) = P(L/: = -p) = i. Finally, { y } is a Markov chain with states 0, -p and p, transition matrix

and stationary

initial

distribution

((1 - a), +(Y, ia).

Table

1 gives the joint

distribution

of (V,, v). If

(16) then{ X, 1 E bO.

To see this observe

that { X,} satisfies

Assumption

2 because

implies

The process { X,} satisfies Assumption 3 because it is a strictly stationary +-mixing process (see Billingsley, 1979, Section 21). Assumption 1 is trivially satisfied. Using Table 1 it is easy to verify that E{ #2(X,)}

= (I - e)E{ Ic/‘(Z,)}

E{ +‘( X0)} = (1 - &)E{ $‘( Z,)}

+ (4

- 6) + Eq+2(P)>

+ { &(l - 8) + &a} 4’(p)

+ &6(1-

and Cov{l/(X,),

Table 1 Joint distribution UO

4(x,)}

=E2s2 Cov{~(V,),

+(I/;)}

=E2s2cz’+‘~2(/l).

of (V,, v)

u,

P(V,

= uo, y = u,)

0

(I -

P

(Y(l - 0()/2

-lJ 0

a(1 - a)/2 a(1 - a)/2 aq7 + a’-‘)/4 u?(la’_‘)/4

P -P 0 P -!J

370

a)2

(n/2)(1 - a) a2(1- a’-‘)/4 0?(1+ a’_‘)/4

CY)$‘(O)

STATISTICS AND PROBABILITY

Volume 9, Number 4

Let AV(+,

(Y, IL) denote

the asymptotic

AV(+, a, PL)=

under

(1-4~{~2(zo)} [(l-&)E{~‘(Z,)}

and the lemma follows because andp--+co. 17

5. Min-max

variance

+

LETTERS

{ X,} of an M-estimate

April 1990 with score function

+ {&(1-6)+&~(Y+sPO(Y)~2((1) (17)

{‘(1-S)+&6a}~‘(~)+ES(l-~)~‘(0)]2’

the right hand side of (17) tends to the right hand side of (14) when a: + 1

variance

Let C be the class of all monotone q+‘(Z))

score functions

4 which satisfy Assumption

3 and

= I.

(18)

Since AV($) = AV(k$) for all k > 0, this is just a convenient such that $( co) = M. The following theorem is our main result.

normalization.

Let C,

Theorem 1. Suppose that F, has an even andpositive density f which satisfies (i) A(x) non-decreasing; (ii) lim, j m A(x) = 00; (iii) 0 < E{A2(Z,)} < co. Let 0 < E < 0.5 and 0 < D, < 1 be fixed. Then: (a) There exists c = C(E, D,) such that 1,5~given by 44x)= is min-max.

4. Then

m+kM,W9

max{&&x),

-k&9)1

be the subset

= - { f’(x)/f(x)}

of C

is

(19)

That is, m(+.)


(b) Let h(t) =A(t)/{

foraN

1; A’(x)f(x)

lim C(E, D,)=O 4,-1 Proof. Let Ma = {2f(O)}-‘.

If h’(t) > 0 for a/l t > 0 then

forallO
= -2~%4fYx)

4 satisfying G(m)

dx}.

(20)

(21)

Since

l=k$!W)) any function

#EC.

dxG2J,Mf(O),

(18) must also satisfy

> Ma.

It follows that c=

u c,. M,M<,

For each c > 0, let

4,(x>

ax) = E{ g(z)}

. 371

Volume

9, Number

STATISTICS

4

PROBABILITY

LETTERS

= cc and lim ( _ &((c)

Then &( 00) = (p,(c) and by (ii) lim, _ &(c) c = c(M) 2 0 with the property +&c)

AND

April 1990

= M,, . So for each M >, M,, there exists

= M.

Now we shall show that for each $ E CM, &“{ 4’(z>} If d = d(M)

a E,{ &W,(Z)}.

= E,,{ +:,cMj(Z)}

z(G) = =

then, as in Hampel

j_mm[ q(x) - y]‘/(x) jm J12(~)f(~) dx + $/” -m

et al. (1986, Chapter

2),

dx A’(x)f(x)

--cx;

dx + ; = la ~,L~(x)f(x) -W

dx + K,

over CM is equivalent to minimizing where K does not depend on 4. Hence, minimizing EFJ $2(Z)} over C,,,, and the last is clearly achieved when 4 = +c(Mj. Since for all $ in CM we have AV(~)

Z( 4)

(1-E)-&,{ ‘k’(z)} + M2(E+Wo>

=

(1 - E)2 it follows that #v(q)

>AV(&,,,,)

for all $E

For all value of M for which CM is not empty inf XV($) +

CM. there exists & with &(c) = M. So we conclude

that

> inf AV(Gc). c

Since g(c)

=AV(+.)

and g is a continuous (b) observe that if

h(c) =

-+ co

function

icA2(x)f(x)

as c+

00

of c it has a global minimum

dx

SC A'(+@)

at some c0 = cO(e, D,), proving

(a). To prove

dx12,

0

then

~2(~){8Po+E+2(1-~)[1-Fo(c)]}

h(c)

dc) = 2(1 and our assumption

-E)

+

on A(c) implies

(1 - E2) g’(c) > 0 for all c 2 0 when &, is sufficiently

large.

•i

Table 2 shows optimal values of c for several values of E and Do when F, is the standard normal distribution. Since in this case A’(c) = c2f(c), Theorem l(b) is in force, and c = 0 for sufficiently large values of D,. Observe that the presence of even a small amount of dependence has a severe effect on the value of the optimal c. For instance, the values of c when D, = 0.25 are roughly 50% smaller than those when Do = 0. Also notice that the effect of dependence is somewhat stronger for small values of E. 372

Volume

9, Number

STATISTICS

4

AND

PROBABILITY

April 1990

LETTERS

Table 2 Optimal

values of c for various choices

of E and D, E = 0.05

E = 0.10

E = 0.15

1.94

1.40

1.14

0.88

1 .I6

1.34

1.11

0.96

0.05

1.34

1.16

0.99

0.88

0.25

0.76

0.70

0.65

0.60

0.50

0.44

0.41

0.39

0.36

0.75

0.21

0.19

0.18

0.17

0.99

0.00

0.00

0.00

0.00

DO

E =

0.00 0.01

0.01

6. Concluding remarks

When Do = 0 Theorem l(a) reduces to Huber’s (1964) min-max variance result for E-contamination neighborhoods. Huber showed that an M-estimate of location with psi-function (19) minimizes the maximum asymptotic variance AV( $) among monotone functions 4. In a related optimality problem, Hampel (1968) finds $ to minimize AV($, F,) subject to a bound on the gross-error sensitivity:

The solution to Hampel’s problem is also an M-estimate with psi-function (19). The direct method used to prove our theorem provides insight into the close relation between Huber’s and Hampel’s optimality problems. In fact, from the given proof it becomes clear that Huber’s optimality problem consists of minimizing the functional

J(‘/‘) = (1-E&,{ d”(z)}

+E sup+‘(X) .Y

(22)

subject to (18). On the other hand Hampel’s problem consists of minimizing the first term of (22) subject to (18) and a bound on sup,$(x). In Huber’s case the bound on sup,($(x) is not given explicitly but added, as a penalty, to the objective functional (cf. Hampel et al., 1986, Section 2.7). When + is a redescending score function with max q(x)

=$(A)

=M

and

G’(X) =O,

an argument similar to the proof of the Lemma shows that the right hand side of (14) is a lower bound for AV( $), Since AV( 4, (Y, X) tends to the right hand side (14) when (Y tends to one, we have that in general AV( $) 2 right hand side of (14) and our theorem also holds when redescending functions I/J are allowed in C. Extension of the present result to M and GM-estimates of regression is straightforward providing gross errors and serial correlation can affect only the response variable. The general case when all the variables can also be affected by gross errors and serial correlation is outside the range of the present theory and deserves further study. 373

Volume 9, Number

4

STATISTICS

AND PROBABILITY

LETTERS

Huber,

P.J. (1964)

April 1990

References Beran J. and H. Kunsch (1985) Location estimators for processes with long-range dependence, Research Rept. No. 40, Sem. fur Statist., ETH (Zurich). Bickel, P.J. and A.M. Herzberg (1979), Robustness of design against autocorrelation in time I: Asymptotic theory, optimality for location and linear regression, Ann. Statist. 7, 77-95. Billingsley, P. (1968), Convergence of Probability Measures (Wiley, New York). Billingsley, P. (1979), Probability and Measure (Wiley, New York). Hampel, F.R. (1968), Contributions to the theory of robust estimation, Ph.D. Thesis, Univ. of California (Berkeley, CA). Hampel, F.R., E.M. Ronchetti, P.J. Rousseeuw and V.A. Stahel (1986) Robust Statistics. The Approach Based on Influence Functions (Wiley, New York).

374

Estimation

of a location

parameter,

Ann.

statrst.35, 73-101. Lee, C.H. and R.D. Martin (1984), Ordinary and proper location M-estimates for ARMA models, Tech. Rept. No. 29, Dept. of Statist., Univ. of Washington (Seattle, WA). Li, B. and R.H. Zamar (1989), Min-max asymptotic variance M-estimates of location when scale is unknown, Tech. Rept., Dept. of Statist., Univ. of British Columbia (Vancouver, B.C.). Portnoy, S.L. (1977), Robust estimation in dependent situations. Ann. Statist. 5, 22-43. Portnoy, S.L. (1979), Further remarks on robust estimation in dependent situations, Ann. Star& 7, 224-231.