Quadratic mode regression

Quadratic mode regression

Journal of Econometrics Quadratic Myoung-jae Received 57 (1993) l-19. North-Holland mode regression Lee * October 1989, final version received...

1020KB Sizes 0 Downloads 64 Views

Journal

of Econometrics

Quadratic Myoung-jae Received

57 (1993) l-19.

North-Holland

mode regression Lee *

October

1989, final version

received January

1992

Generalizing the mode regression of Lee (1989) with the rectangular kernel (RME), we try a quadratic kernel (QME), smoothing the rectangular kernel. Like RME, QME is the most useful when the dependent variable is truncated. QME is better than RME in that it gives a N”*-consistent estimator and an asymptotic distribution which parallels that of Powell’s (1986) symmetrically trimmed least squares (STLS). In general. the symmetry requirement of QME is weaker than that of STLS and stronger than that of RME. Estimation of the covariance matrices of both QME and STLS requires density estimation. But a variation of QME can provide an upper bound of the covariance matrix without the burden of density estimation. The upper bound can be made tight at the cost of computation time.

1. Introduction In a linear maximizing

model

y = x’/3 + U, Lee’s (1989) mode regression

estimates

p by

(W’-)~ 1CI Y, - x;b I < ~1, w.r.t. (with respect to) b, where u’ is a positive number and 1[A] is the zero-one indicator function of the event A. The name ‘mode regression’ comes from the observation that, assuming that the densityf,,, is strictly unimodal about x’p and symmetric around x’/3 up to fw, /I uniquely maximizes the objective function. The population version of the objective function is E,E,,,l[Iy-x’bl
f4.,x(y)l[x’b-w
Correspondence fo: Myoung-jae Lee, Department of Economics, University Park, PA 16802, USA. * I am grateful to C. Manski, J. Powell, J. Terza, the associate helpful comments.

0304-4076/93/$05.00

0

1993.-Elsevier

Science Publishers

Pennsylvania editor,

State

University,

and the referees for their

B.V. All rights reserved

2

M. Lee, Quadratic mode regression

Thus, by capturing the most probability mass under fYIX with an interval of length 2w and center at x’b, we can maximize the population objective function. The indicator function is reminiscent of the rectangular kernel as is used in histrogram construction. Generalizing the mode regression estimator with the rectangular kernel (RME, hereafter) under the assumption of mode (y 1x) = x’/?, we can imagine ‘a generalized kernel mode regression’ maximizing

UP)

c K(Y,

-

xib),

(1.1)

w.r.t. b, where K( ) is a bounded kernel satisfying certain properties. Again, we want to capture the most probability mass underfylx, but this time the probability is smoothly weighted by K. The asymptotic distribution of RME has not been derived and it appears that RME is N1’3-consistent. Here, by employing a smooth kernel, we hope to get a N “‘-consistent esimator with an asymptotic distribution. The class of estimators defined by (1.1) is a subclass of Huber’s (1972) ‘M-estimator’ which maximizes (l/T) C,p( y, - xi b) for a general p. (1.1) is special in that K is bounded from above and below, which is desirable at the presence of outliers. There are two types of K: one is the ‘soft rejector’ with K( 1z 1) =z-0 as 1z 1 =P co and the other is the ‘hard rejector’ with K( Iz I ) = 0 if 1z 1> a bound. Here the term ‘rejector’ is used for rejecting outliers. Using the same method as in Lee (1992a), it can be shown that the hard rejectors are robust with a high ‘breakdown point’ [see Hampel et al. (1986) for the details of the breakdown point]. The topic of robust estimation with the class (1.1) will be discussed in a separate paper. In this paper, we apply (1.1) to the truncated dependent variable case which is certainly related to the issue of robustness, because truncation is a kind of contamination of the distribution. When applied to truncated data, we need some restrictions on the kernel such as bounded support and symmetry. In this paper, we examine mode regression with a quadratic kernel (quadratic mode regression estimator, QME, hereafter), which includes the least squares estimator (LSE) as a special case for nontruncated data. When y is truncated from below at a known point c, our estimator bT is defined by maximizing

(l/m

- {yt -max(x:b,c cwz

= (l/T)xmax[w*

+ w)}‘] l[ly,-

- {y, - max(x:b,

max(xib,c

c + w)}*,O],

+ w)l
(1.2)

3

M. Lee, Quadratic mode regression

where w is a fixed positive becomes

number.

(lIT)C {w’ -(Yt-.m2f = (l/T)Cmax{w’

When

c = -

lClY,-x;bI

cc (no truncation),

(1.2)


- (yt - ~;b)~, 0).

(1.3)

If we set w = a, maximizing (1.3) w.r.t. b is equivalent to LSE. The modification of (1.3) to (1.2) for truncated data is sufficient for the identification of /I. In moving from (1.3) to (1.2) the objective function is made continuous in b and flat over certain noninformative regions in the space of b. In its one-sample version, mode regression using a quadratic kernel is identical to Huber’s (1964) form of the skipped mean. When it is applied to truncated data, QME resembles Powell’s (1986) symmetrically trimmed least squares estimation (STLS, hereafter). When y is truncated from below at c, STLS is defined by minimizing (l/T)x

{ y, - max(0.5.y, + OSc, x;b)j2.

Let c = - cc for nontruncated y, then STLS = LSE. The main advantage of to be locally symmetric up to kw QME over STLS is that QME needsf,,, around 0, while STLS needs the symmetry up to +x’ b. Therefore STLS requires global symmetry if x is unbounded. The paper is organized as follows. In the next section, we lay out the basic model assumptions. In section 3, we derive the asymptotic distribution of QME, using the same technique as in Lee (1992b). The proof of consistency is given in the appendix. A variation of QME which avoids the problem of density estimation for its asymptotic covariance matrix is presented in section 4, and the results of a small-scale simulation study comparing RME, QME, and STLS are reported in section 5. The final section is reserved for conclusions.

2. Basic model Here we lay out the basic model paper.

assumptions

to be used throughout

the

Assumption 1. Linear model and truncated random sample y: = x;fi + u,, (xl, y;“)’ is observed only when y: > c, where xi= (1, X21> ... , xkt) and /I = (/Ii, p2, . . , ljk)‘. (x,, yt) is random sample, t = 1,2, . . . , T, where y, is yp truncated from below by c.

M. Lee, Quadratic mode regression

4

Assumption 2. Conditional strictly unimodal regression mode(y* Ix) = x’,!3omode(u 1x) = 0 due to (2.1); fUIX(I.2) ~f,t~(~r) if / AZ 12 1i1 1(unimodality) and the inequality is strict at A1 = 0 (strict unimodality at 0). Also, supXfulX (0) < cc . Assumption 3. Other restrictions on the distribution of u given x is symmetric around 0 up to f w, w > 0. (ii)f,,, is continuous (i) .iUIX (iii) Eu* exists.

at

Assumption 4. Conditions on regressors (i) E{xx’ 1 [x’fl > c + w]} is a positive P(x’j3>c+w)>O.(ii)E[~(~< co.

implies

Assumption 5. Compact parameter The true parameter p is an interior

definite

space point of a compact

matrix,

which

parameter

k w.

space B.

Assumption 6. Condition for the population maximand differentiability For some positive constants v and M, E{ 1 [Ix I z > ) c + w - x’b I]} I Mz, where I b - p ) I v, 0 < z 5 v, and 1. ) is the Euclidean norm. A distribution F on R is ‘unimodal’ about v if F is convex over (- cc, v) and concave over (v, cc ) [see Dharmadhikari and Joag-dev (1988)]. This definition does not exclude multiple modes (the uniform distribution is unimodal following the definition), and we strengthen the definition by requiring the mode to be unique in Assumption 2 for identification. Another concept of unimodality, strong unimodality, will appear later. In Assumption 3, the restriction onfllx is used in deriving the asymptotic distribution. Assumption 4(i) is a version of a full rank condition. Assumption 6 which is also needed for the distribution is explained next. The maximand of QME is nondifferentiable. Hence the standard technique for deriving the asymptotic distribution of smooth extremum estimators is not helpful. However, if the population version of the maximand is differentiable, the techniques of Huber (1967) and Pollard (1984) can be applied. The term that may be troublesome in this regard is the indicator function l[x’b > c + w], but Assumption 6 implies the differentiability of the population maximand at /I Essentially, Assumption 6 is sufficient for the Lipschitz continuity of El [x’b > c + w] near 0: For two points a and b near ,5’, IEl[x’b

> c + w] - El[x’a

= E{l[x’b

> c + w]\

> c + w > x’a] + l[x’a

> c + w > x’b]}

M. Lee, Quadratic

= E(l[O

mode regression

> c + w - x’b > x’a - x’b]

+ l[x’a

- x’b > c + w - x’b > 0]}

=El[lx’(b-a)I>Ic+w-x’bl] Ic+w-x’bl]
with

z=lb-al.

Consider the simple case in which x’b = bl + b2x, where x has densityj”, the support of I x ( is bounded by 1. Then,

and

El[Ixlz>Ic+w-x’bl]IP(z>Ic+w-x’bl)
(l/UC Iw’ The population

- x;b)‘)

-(y,

version

l[ly,

- x;bi < w].

(2.1)

of (2.1) is

-(y-x’b)2} ExEy,xiw2

l[/y-x’bl
(2.2)

As stated before, for given x, maximizing (2.2) is equivalent to capturing the most probability under fyIX, weighted by the kernel w2 - (y - x’b)2. The presence of the indicator function is to make sure that the kernel is nonnegative by restricting the range of I y - x’b / . It is of interest to realize that, in the one-sample case, (2.1) is Huber’s (1964, p. 79) skipped LSE which minimizes (llq&(Yt

-

0

w.r.t. 8, where p(s) is defined p(s) = s2 if

(2.3) by

IsI I w,

= w2 otherwise.

6

M. Lee, Quadratic mode regression

Often w is taken to be proportional to the sample standard make the resulting location estimator scale-invariant. Rewriting (2.3) using the definition of p, we have

Now subtract

deviation

of y to

this from w2 to get

W)C Iw’ -(Y~--8)2}1cIY~-~lIwl,

(2.4)

which is the one-sample version of (2.1). The idea of skipped LSE seems to have arisen to make estimation of the mean more robust. By converting it to a maximization problem, we offer a different perspective of this robust estimation procedure. The choice of w in (2.3) is still an open question in robust statistics literature. Perhaps we may say that w is chosen according to our desire to reject outliers. A large w makes (2.3) approach LSE, so attaining the Gauss-Markov efficiency, and a small w makes (2.3) more robust. So w is chosen in the trade-off between efficiency and robustness. If we focus on efficiency, the choice of w may be made by minimizing the covariance matrix. The basic problem is that a scalar w cannot minimize a matrix. Nevertheless, a couple of ad hoc ideas can be suggested. One is to define a norm on the covariance matrix and minimize it w.r.t. w. If we are interested in t-values, minimizing the Euclidean norm of the diagonal may be a good strategy. Another way to choose w is to get an initial estimate for b and generate bootstrap estimates to calculate MSE, then choose the w which yields the smallest MSE. With the dependent variable truncated from below at c, x’b in (2.1) must be replaced by max(x,b, c + w) as done in RME. So the modified objective function is

Qdb) = (WC Cw2 - { y, - max(xjb, x l[lyt

-max(xjb,c

c + w)}~]

+ w)l
(2.5)

Note that if c = - co, x’b > c + w always so that (2.5) becomes (2.1); that is, (2.5) covers the nontruncated case as a special case. The population version of (2.5) is Q(b) = E[w2 - { y - max(x’b, xl[ly-max(x’b,c+w)I
c + w)}‘]

(2.6)

M. Lee, Quadratic mode regression

3. Asymptotic

I

normality

In this section, we derive the asymptotic distribution of QME. Consistency is proved in the appendix. For an estimator defined by maximizing a smooth maximand, deriving its asymptotic distribution is straightforward [see, for instance, Amemiya (1985)]: Taylor-expand the first-order condition around bT and evaluate it at /I The method is not directly applicable to QME, for it has a nondifferentiable maximand. Nevertheless, it is possible to derive its distribution, establishing similar steps as in Amemiya (1985) for a smooth optimand. The proof is applicable to other extremum estimators whose population maximand Q(b) ( = Eq(b)) is differentiable and q(b) is piecewise smooth. Theorem

1.

Asymptotic

T1’*(bT

- p)

V = E{ l[x’p

normal distribution of b,: N(0, (W-

follows

V)-‘Z(W-

> c + w] {2wf,,(w)/(1

- F,,,(c

I’-‘),

- x’fi))}xx’}.

Although QME has a nondifferentiable maximand, its population version is differentiable and satisfies c’Q(P)/db = 0. Assumption 6 implies the differentiability. Similarly to Amemiya (1985), we will show the following three steps: T-‘12xr(/?)

-

T-“*xr(b,)

-

T-‘~Er’(b~)T”*(fi

- b,) = oP(l),

(3.1) T-“*xr(bT) T-‘xr’(b:)

= oP(l), - Er’(fi)

where b; is to be chosen Er(b)

= dEq(b)/db

(3.2) = oP(l),

(3.3)

later, and r(b) and r’(b) are defined and

Er’(b)

= dEr(b)/db.

by (3.4)

8

M. Lee, Quadratic mode regression

As for QME,

r’(b) = 1 [x’b > c + w] l[ / y - x’b 1 < w] x 2xx’{wf,,,(x’b

+ w) + wfYyIx(x’b- w) - I}.

The proofs that this form of r(b) and r’(b) is correct and that b; satisfying (3.1) exists are available from the author upon request. The approach taken here in deriving the asymptotic distribution of QME falls between Huber (1967) and Pollard (1984). Both of them deal with extremum estimators with a nonsmooth maximand. Huber’s (1967) method with quite a few regularity conditions to verify goes a long way to establish (3.1). Pollard’s (1984) method applies ‘stochastic equicontinuity’ (to be explained below) to the first-order approximation of the maximand and does not show the steps (3.1) to (3.3) explicitly. Here, by proving (3.1) to (3.3) as done for extremum estimators with a smooth maximand, we make the ideas behind Huber (1967) and Pollard (1984) easier to grasp. The differentiability of q(b) is sufficient for (3.1) but not necessary. (3.1) can be established by ‘stochastic equicontinuity’. Rewrite the last two terms of (3.1) as T - “’ c {r(b,) - Er(b,) Using the mean-value Er(b,)

theorem,

+ Er(b,)}

- T-l”

(bT - p).

c Er’(b;)

(3.5)

there exists b,* such that

= Er(B) + Er’(bG)(b,

- B) = Er’(b:)(bT

- P),

= 0. This is where b,* is selected. Then (3.5) becomes

for Er(P)

T-“2x{r(bT) Hence (3.1) is equivalent T-‘i2~(r(b,) By defining

- Er(b,)}. to - Er(P)}

G(b) = T-“*x{r(b)

G(b,)

- W/3) = ok.

- T-‘i’z{r(p) - Er(b)},

- Er(/?)}

= op(l).

we need to prove (3.6)

The relation (3.6) puts certain continuity restriction on the empirical process G(b). But even if G(b) is continuous in b, the continuity alone is not enough for

9

M. Lee, Quadratic mode regression

(3.6) since G is stochastic. Also we are not dealing with T-’ c but with T-‘/2 c, which will converge at best to a limiting stochastic process, not to a deterministic entity. Since bT is random, proving (3.6) directly is troublesome. Instead we prove the following uniform version which is sufficient for (3.6): For any q, E > 0, there exists a neighborhood of 8, N(p), such that

sup 1G(b) - G(a) 1 > y

< E.

(3.7)

N(D)

If this is satisfied, G(b) is said to be ‘stochastically equicontinuous’ two conditions are sufficient for (3.7) with a given r(b):

at 0. The next

(i) There is an ‘envelope’ R(b) such that R(b) 2 r(b) for ER(b)’ < m and r(b) is continuous at /? with Lz norm.

all

(ii) The class {r(b)) (1989)].

m d exed by b is a Euclidean

class [Pakes

r(b)

and

with

Pollard

For QME, R(b) = 4w2x2 and Assumption 6 implies the continuity of r(b) in L2 norm. Condition (ii) limits the variability of r(b). In most applications in econometrics, (ii) will be satisfied. For instance, if r(b) is a finite-dimensional polynomial in b or indicator functions indexing ‘simple sets’ or a function of bounded variation or a combination of them, then (ii) is satisfied. See Pollard (1984) or Pakes and Pollard (1989) for more details. Verifying (3.2) has been done by employing one-sided gradients. For a piecewise smooth q(b) as in QME, the one-sided derivative along one axis of Rk which agrees with r(b) a.e. exists, and near the maximum, it is nonincreasing whether QT(b) is concave or not. This fact provides a op(l) bound for T - ‘12c r(p) as in Ruppert and Carol1 (1980) and Powell (1984, 1986). For instance, for the least absolute deviation estimator (LAD) maximizing (l/T)1 - 1y, - xjb 1, Ruppert and Carol1 (1980) show that

where r,(b) is the ith component of r(b). Using the continuity of F,,,, the number of data with y, = xi bT can be shown to be finite a.e.. Then the right-hand side is less than a constant times T-l” max, / xir I which is op(l). Similarly for QME, I T ‘I2 C ri(b) I is bounded by

Again the number of data satisfying a.e. and this is o,,(l).

either one of the conditions

in { } is finite

10

M. Lee, Quadratic mode regression

For a differentiable q(b), Amemiya (1985, p. 113) gives a sufficient condition for (3.3): the uniform convergence of T- ’ c r’(b) to Er’(b) and the continuity of Er’(b). As for QME, Assumption 6 is sufficient for the continuity of Er’(b). The sufficiency of these two conditions for (3.3) can be seen easily: (3.3) is less than 1T-‘or’-

Er’(bF)l

Then the uniform convergence inequality and Assumption 6 The asymptotic distribution written as (W - V)-‘Z( W El[x’/I and

W-

+ IEr’(bF) - Er’(P)l.

makes the first term oP( 1). The Cauchy-Schwarz makes the second term o(1). of STLS has the same structure as that of QME, I’)-‘. With c = 0, Z of STLS is

> 0, ) u ( < x’p]xx’u2

Vof STLS is

ElCx’B > Ol{lCI u I < X’BI - 2(X’P)f,lxWBM1 - F”I,( - x’P,,>xx’. The term inside

{ } is:

area between - rectangle In QME,

W-

& x’p under

the truncated

density

of u I x

with base [ - xl/?, x’p] and height the density

at x’p.

V is

ElCx’B> wl{lClul < WI- 2wf,1xWU - Fu/x(- x’b)))xx’. The term inside

{ } is:

area between - rectangle

+ w under

the truncated

density

of u I x

with base [ - w, w] and height the density

at w.

From this, it is clear that the strict unimodality condition is essential. Viewing the two methods as versions of the skipped mean, QME always trims more than STLS does. One major difference between STLS and QME is in the strength of the symmetry requirement. STLS calls forfUt, to be symmetric up to f x)/I, while QME requires it up to f w. As the support of the regressors becomes larger, the symmetry requirement of STLS becomes stronger, while that of QME stays the same.

M. Lee, Quadratic mode regression

Estimating the asymptotic v, = y, - x;bT, the followings ZT

PI’,

E

E

(1/7)x

covariance matrices is problematic. suffice as estimators of 2 and W:

11

Defining

1 [x;bT >c+w,1v,I
1 [x;bT > c + w, I II,1 < w]x,xI.

(l/r)C

But for V there is no natural candidate, for it involves density estimation. kernel density estimation as is done in STLS, we can estimate V by I’, = (l/T)1

1 [x;b,

x(1/2/l){

Using

> c + w] 2wx,xj

l[w - h < v, < w] + l] + l[ - w < v, -=C- w + h]},

where h is a smoothing parameter. But instead, in the next section, we propose a version of QME which avoids density estimation.

4. Covariance matrix without density estimation As is shown in the previous section, conventional estimation of V would require density estimation. Density estimation is, however, unappealing due to the arbitrariness in the choice of the smoothing parameter, as well as the computational burden it imposes. In this section, we propose a modified version of QME whose covariance matrix has an upper bound which is free of the density component. To simplify the notation, ignore the truncation and temporarily drop the subscript u I x in fu IX. The main idea is the following: Since V involves wf(w), if we integrate the objective function w.r.t. wafter multiplying it by l/w, we may be able to replacef(w) by an expression involving the distribution function and avoid estimating f(w). In practice, the integral should be approximated by a summation over a finite number of w’s As we add more terms to the sum, the approximation gets only better at the cost of the computation time. Assumef is symmetric up to f w * and suppose we maximize the following w.r.t. b:

w-)C i L

i=l

w - (_!Jt-

xibj2} 1CI Yt

-

xib I < WiI(h/Wi),

(4.1)

where Wi = ih, h > 0, i = 1,2, . . . , I, I 2 2, and wI I w *. With I = 1, (4.1) becomes QME. In (4.1), we evaluate the QME objective function with different WI’S and sum them up with weight h/‘wi = l/i.

M. Lee, Quadratic mode regression

12

The asymptotic

covariance

Z W= Exx’n2h2

W,,, -

Now observe

(i

matrix

ClClal

<

is (IV,,, -

VW)-‘Z,(

W, -

VW)-i, where

wil/wi

V,,, = Exx’

(4.2)

that

Exx’~ { F(w,) - F(0)) 2 Exx’~ h C f(wi), I because h 1 i f( wi) is a histogram-type approximation fitted under f between 0 and wI. Hence we can construct an upper bound of the covariance matrix with W, - V,,, replaced by W, - Vk such that W, -

V; = Exx’

hC

i( Since wi = ihi, Z, and W, -

i

l C Iu

I<

willwi - (F(w,) - F( - WI)) .

Vh can be respectively

estimated

by

2 (l/T)~x,xb:

CIClutl < willi > (i

f

and

(l/T)~~t~~{~lCl~tl r The last matrix

should


1Cl0tl<_‘I},

be p.d. asymptotically,

if

Iu,I 2 wl,

then

{ ...} = 0

if

lu,I < w,,

then

{ ... } = x(1/i) I

(4.3)

because

and - 1> 0

since

I>

2.

The upper bound becomes tighter as we use more wis, but at minimum, two wi)s can provide an upper bound. It is important to note that the problem of choosing I is not the same as that of choosing the smoothing parameter in the nonparametrics literature. In nonparametric regression, the estimated curve swings between two extremes

M. Lee, Quadratic made regression

13

depending on the smoothing parameter: If it is too large, the curve is a flat line and if it is too small, the curve will be rugged so as to go through all the data points. In the modified QME, the approximation gets only better as I goes up. To put it differently, the covariance approximation is like an infinite sum convergent to a limit, while the nonparametric regression estimate is not convergent. Going back to the truncated case, we should maximize (4.1) with x’b replaced by max(x’b, c + w). The asymptotic covariance matrix is the same as (4.2) with 1[x’fi > c + w] present and f(w) replaced by f(w)/(l - F(c - x’p)). The estimated upper bound is the same as (4.3) with l[x;/? > c + w] present; that is, for truncated model, Z, and W, - Vl: are respectively estimated by 2

(l/T)Cx,xj~f

f

(l/T)CX,X: f


Cl[x:b~>C+wi,I~‘tI (i

11

[x;b,

> C + Wi, IL’, I < Wi]/i

ii

- l[x;b,

It is not certain analytically fares with regard to relative

> c + ~1, I u,I < WI]

how QME, efficiency.

.

with its concomitant

modifications,

5. A simulation study In this section, we present a small-scale Monte Carlo study for QME. As revealed in the previous sections, QME may be regarded as a ‘hybrid’ between STLS and RME. It is close to STLS, as it trims the conditional density. It is close to RME, since QME selects data with l[ 1y - x’b 1< w] = 1 and gives the quadratic weight w2 - (y - x’b)2 to those data to determine bT more finely than RME. So one may expect QME to behave similarly to STLS and RME. The following simulation study will shed light on it. The main model to be used for the result in table 1 is Y, = 0 + x, + U, where

x, N N(0, l),

with T = 200 and 200 replications for each design. Random numbers are generated by a multiplicative-congruential method. The simplex algorithm [see Himmelblau (1972)] is used, which is not to be confused with the simplex method for linear programming. The algorithm does

M. Lee, Quadratic mode regression

14

Table

1

N(0, l), T = 200, and 200 replications;

y,=o+x,+u,,x,BIAS

STD

RMSE

LQ standard

UQ

MAE

1.821 1.928 1.748 1.840

0.45 1 0.435 0.489 0.380

Design

1: 50% truncation,

1.500 1.155 3.490 1.599

1.565 1.219 3.492 1.729

Design

2: 25% truncation,

0.209

0.210

0.878

0.992

1.135

0.125

0.387 0.332

0.466 0.343

0.973 0.847

1.217 1.041

1.483 1.267

0.267 0.198

Design

3: 50% truncation,

0.503

0.541

0.914

1.091

1.353

0.199

1.138 1.134

1.254 1.249

0.896 1.049

1.414 1.214

1.993 1.678

0.542 0.353

1.198

1.966

0.495

1.305 1.489

2.156 2.635

0.649 0.799

1.111

1.607

0.303

1.375 1.271

2.215 1.877

0.765 0.436

1.479

2.182

0.540

1.455 1.400

2.707 1.830

0.460 0.707

2.488 2.235 2.762

0.846 0.608 1.161

w = 0.5 w= 1.0 w= 1.5 WQME

0.445 0.549 0.133 0.658

STLS

0.02

I

2::

0.258 0.084

STLS

0.201

SK

0.536 0.5 14

STLS

0.498

3.164

3.797

0.777

ZiK

0.768 0.472

2.351 1.956

2.012 2.474

0.658 0.857

STLS

0.397

6.389

6.851

0.824

2::

0.292 0.471

2.308 1.811

2.327 1.871

0.765 0.850

STLS

0.566

3.401

3.448

1.044

z:EE

0.549 0.616

0.810 3.148

0.979 3.208

0.879 1.031

STLS 2::

0.707 0.63 1 0.687

Design 4: 50% truncation,

Design

5: 50% truncation,

Design 6: 50% truncation,

Design

Design

only slope is reported.

MED

3.714 2.365 1.437

7: 50% truncation, 3.780 2.463 1.569

0.908 0.927 0.696 0.997 standard

standard

standard

standard

gamma

gamma 0.733 0.932 0.774

normal 1.343 1.242 1.112 1.270 normal

normal

logistic

Cauchy

(2,l) mode

(3,l) mode 1.337 1.519 1.616

8: 50% truncation, symmetric and close to N(0, 1) up to + w, linear upper tail and rectangular lower tail

STLS

1.425

1.085

1.791

1.633

2.207

2.976

1.207

ZFlEE

0.740 0.720

1.218 1.375

1.561 1.415

0.994 1.029

1.420 1.369

2.079 2.063

0.482 0.467

STLS 2::

1.557 0.665 1.352

2.881 2.499 3.250

0.800 0.947 1.209

STLS

0.024

0.244

0.245

0.786

0.984

1.134

0.161

ZiYEE

0.167 0.024

0.480 0.470

0.5081 0.47

0.838 0.813

0.113 0.990

1.416 1.178

0.307 0.185

Design 9: 50% truncation, 3.744 3.438 3.779

Design

4.055 4.013 3.502

10: 50% truncation,

u = (x’P)N(O, 1) 0.989 0.952 1.191

1.514 1.697 1.978

u = (l/x’/I)N(O,

1)

M. Lee, Quadratic

mode regression

15

not use gradients. In all designs, only the result for the slope coefficient is reported. RMSE is the root mean square error, LQ is the lower quartile, UQ is the upper quartile, and MAE is the median absolute error. In Design 1, we check the problem of choosing w with 50% truncation and the standard normal distribution which will serve as a benchmark for our simulation study. With a given data set, there exists a reasonable range of w. If w is too large, data with 1y - x’b 1 > w are underutilized, and if w is too small, data with x’b < c + w are underutilized. Hence we can find a reasonable range for w. For Design 1, w = 0.5 to 1.5 appears to be the range. So we try QME corresponding to w = 0.5, 1.0, 1.5. Also, QME in Design 2 has w = 0.8. Comparing QME with four w’s, although the results vary, does not yield useful guidelines for how to choose w. The weighted version of QME(WQME) with w = 0.3,0.6,0.9, 1.2, 1.5 does reasonably well compared with other QME’s, but there seems to be no good reason to recommend WQME over QME other than the convenience in covariance matrix calculation. In practice, it is necessary to try several w’s initially. Picking the w that minimizes a norm of the covariance matrix or bootstrap MSE is one way to choose from among them. Also, as in Lee (1989), an average of the estimates for several w’s may work along with WQME applied to the reasonable range of w. In Designs 2 and 3, the % of truncation varies from 25 to 50. Since STLS is closer to LSE than QME is, it does well in both designs. This, however, will change for distributions with thicker tails. The performance of QME is somewhere between STLS and RME. One thing to note is the asymmetry in the sampling distributions in Design 3. In Designs 4 and 5, we check the sensitivity to the underlying distribution, particularly to the tail behavior of the distribution. We consider the standard logistic and standard Cauchy distributions. In Design 4, none of the three methods does as well as with the normal, though the logistic is not much different from the normal. The good performance of RME is somewhat surprising, while that of QME and STLS deteriorated rapidly. Again, QME’s performance is consistently between RME and STLS. In Design 5, QME does much better than STLS. QME is worse than RME in BIAS but better in STD, which is to be expected. In Designs 6 and 7, two asymmetric distributions are tested. QME and STLS do worse than RME, whose performance does not change much throughout the experiment. Again the asymmetry of the sampling distribution is noteworthy. In Design 8, we use a distribution which is quadratic in the middle up to f 1, linear in the upper tail, and rectangular in the lower tail. Random numbers for this particular case were generated by the ‘rejection method’. The quadratic portion of the distribution is set such that it is very close to standard normal up to f 1. This design is deliberate, expecting that QME and RME will perform better than STLS, for both need symmetry only up to + w. The value of w is set

16

M. Lee, Quadratic mode regression

at 0.8. The results confirm our expectation: The behavior of QME and RME is better than that of STLS and not much different from the result of Design 3. In Designs 9 and 10, we examine two different heteroscedasticity cases: One is an increasing heteroscedasticity (u = (x’/3)N(O, 1)) and the other is a decreasing heteroscedasticity (U = (l/x’/?)N(O, l)), as similarly done in Powell (1986). In Design 9, all three estimators fare badly. It is surprising to see that RME is better than QME and at least as good as STLS, in spite of its possible T”3-consistency. In Design 10, all three estimators do very well, even better than in Design 3. The relationship between the three estimators and heteroscedasticity seems to need further study. In summary, RME’s performance stays at about the same level for all distributions. STLS does best with normals but as we go to other distributions, its performance deteriorates due to outliers (compare BIAS to MED and STD to MAE). The performance of QME seems to fall between RME and STLS.

6. Conclusion In this paper, we proposed a semiparametric estimation method, QME, for truncated data. The main assumption needed is the symmetry off” Ix up to + w. Compared to STLS, which is also good for truncated data, the advantages of QME are that its symmetry requirement is weaker than that of STLS and its potential robustness. The disadvantage of QME from a user’s viewpoint is in choosing w. While STLS needs density estimation for its covariance matrix, a weighted version of QME circumvents density estimation. Although we cannot derive an optimal w analytically, there exists a reasonable range for w with a given data set. Trying several w’s in the range will be a worthwhile experiment, along with the weighted QME. This paper also offers an instructive way to derive asymptotic distributions of extremum estimators with nonsmooth maximands by adapting the work of Huber (1967) and Pollard (1984).

Appendix: Strong consistency

of QME

First we prove identification in the sense that Q(b) attains its unique maximum at p. This can be done in two steps; first, the unimodality of Q(b) is derived, which proves the existence of a maximum. Then we show that the maximum is unique at p. Theorem A.1.

Q(b) attains its unique maximum

at b = B.

M. Lee, Quadratic mode regression

Proof

17

Define Q,.(b), Qxb(b), and Qxo such that Q(b) = E,Q,(b)

= E,(l[x’b

> c + w]Q,&)

+ l[x’b

I c + w]Qx,,),

Q&b) = j{w’ -(Y - ~‘b)~) 1Clv - x’bl < wlf,~x(y)d~> Qxo= j{w’ -(Y-C

- w)‘) 1Clv - c - wl <

We use subscript b in Qxb(b) and b while Qxo is not. We will show full rank condition to show that QX(B) - Q,(b) can be written

wlfylx(y)dy.

0 in QXo to denote that Qxb(b) is a function of that x’/J maximizes Q,(b) and then invoke the only fi maximizes Q(b). as

QAB) - Q,(b) = 1Cx’B> c + wlQxdB)+ 1Cx’PI c + wlQxo - l[x’b

> c + w]Qxb(b) - l[x’b

I c + w]Qxo. (A.2)

The main point is that when both x’/I and x’b > c + w, QXb(fi) > Qxb(b) unless b = /I, for we capture the most probability underf,,, by putting the trimmed quadratic kernel {w 2 - (y - x’b)2} l[ / y - x’b / < w] around the mode x’fi. Formally, this can be proved using the convolution property of unimodal density and strongly unimodal density. View Qxb(b) as the convolution density of y 1x and z evaluated at x’b, where z is a random variable whose density is proportional to (w” - z’) 1 [ I z I < w]. A distribution F is defined to be strongly unimodal if the convolution F * S is unimodal for any unimodal distribution function S [see Dharmadhikari and Joagdev (1988) for this and other statements regarding unimodality]. If F is continuous, strong unimodality is equivalent to the logconcavity of the density. Since (w2 - z2) 1 [ I z 1< w] is logconcave and Qxb(b) is unimodal. This proves the existence of f y Ix is unimodal, the convolution a maximum in Qxb when x’b > c + w. By differentiating Qxb(b) w.r.t. x’b, we can verify that Qxb(b) has a strict local maximum at x’p. Then x’fi must be the global strict maximizer. Hence QXb(fi) > Qxh(b) when x’b and x’p > c + w. With the main point made, there are four cases to consider in (A.2), depending on the direction of the inequalities: x’jl > c + w and x’b > c + w:

QX,(fi) - Qxb(b) > 0 unless x’s = x’b,

x’@ > c + w and x’b 5 c + w:

Q&‘)

x’p I c + wand x’b > c + w:

QXo - Qxb(b) uncertain,

x1/3 I c + w and x’b 5 c + w:

QXo - QXo = 0.

- QXo > 0,

M. Lee, Quadratic mode regression

18

The

only

troublesome

case is the third for which we show below that x‘fl I c + w and x’b > c + wo Qxo - Qdb) 2 0, using the condition x’fi I c + w < x’b. What we proved above for the first case of x’/? and x’b > c + w is that we capture the most weighted probability by matching the center of the kernel x’b with that off,,,, x’j. The unimodality of Qxb(b) shows that the captured probability does not decrease as x’b approaches .x’j3 from above. But Qxh(b) at x’b = c + w is QXo. Hence, QXo 2 Qxb(b) where x’/I I c + w < x’b. So far we proved that Q,(b) reaches its unique maximum when x’b = x//l, which implies that Q(b) attains its unique maximum when x’b = x’fi. Now we show that P(x’b not equal to x’y, x’fi > c + w) > 0 for any y not equal to /3. This will prove that Q(b) > Q(b) unless b = 0. Let d = fi - y, then from Assumption 4 (i), d’(Exx’

1 [x’/3 > c + w])d = E(d’xx’d

= E((x’d)2 l[x’fl > c + w]) > 0

1 [x’p > c + w])

for any d not 0,

which indicates P(x’d not 0, x’fi > c + w) > 0 for any d not 0; that is, P(x’p not equal to x’y, x’/3 > c + w) > 0 for any y not equal to p. W Next, we prove the strong Theorem

A.2.

consistency

of QME.

The estimator bT in QME is strongly consistent for /?J.

Proof: The proof is done combining the identification theorem, continuity of Q(b) in b, compactness of the parameter space, and a.e. uniform convergence of Q,(b) to Q(b) [see, for instance, Huber (1967)]. Since the continuity of Q(b) is obvious, we need to prove the uniform convergence. The proof which can be done using the combinatorial method as in Pollard (1984, ch. 2) is omitted; see Pakes and Pollard (1989) for more recent references of the method and exposition. Lee (1989) employs the method to prove the consistency of RME. To get the general idea of the proof, note that the class of functions discrimmax(w2 - (y - x’b)2, 0) indexed by b has ‘graphs’ with ‘polynomial ination’. This fact essentially establishes the uniform convergence. Pollard’s (1984, p. 28) example 26 proves this result for a general function including maxfw’ - (y - x’b)2, O}.

References Amemiya, T.. 1985, Advanced econometrics (Harvard University Press, Cambridge, MA). Dharmadhikari, S. and K. Joag-dev, 1988, Unimodality, convexity, and applications (Academic Press, San Diego, CA).

M. Lee, Quadratic mode regression

19

Hampel, F.R., E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel, 1986, Robust statistics: The approach based on influence functions (Wiley, New York, NY). Himmelblau, D.M., 1972, Applied nonlinear programming (McGraw-Hill, New York, NY). Huber, P., 1964, Robust estimation of a location parameter, Annals of Mathemtical Statistics 35, 73-101. Huber, P., 1967, The behavior of maximum likelihood estimates under nonstandard conditions, Proceedings of the Fifth Berkeley Symposium 1, 221-233. Huber, P., 1972, Robust statistics: A review, Annals of Mathematical Statistics 43, 1041-1067. Lee, M.J., 1989, Mode regression, Journal of Econometrics 42, 337-349. Lee, M.J., 1992a, Median regression for ordered discrete responses, Journal of Econometrics 51, 59977. Lee, M.J., 1992b, Winsorized mean estimator for censored regression, Econometric Theory, forthcoming. Pakes, A. and D. Pollard, 1989, Simulation and the asymptotics of optimization estimator, Econometrica 57, 102771057. Pollard, D., 1984, Convergence of stochastic processes (Springer-Verlag, New York, NY). Powell, J., 1984, Least absolute deviations estimation for the censored regression model, Journal of Econometrics 25, 3033325. Powell, J., 1986, Symmetrically trimmed least square estimation for tobit models, Econometrica 54, 1435-1460. Romano, J.P., 1988, On weak convergence and optimality of kernel density estimates of the mode, Annals of Statistics 16, 6299647. Ruppert, D. and R.S. Caroll. 1980, Trimmed least squares estimation in the linear model, Journal of the American Statistical Association 75, 828-838.