A comparison of local constant and local linear regression quantile estimators

A comparison of local constant and local linear regression quantile estimators

COMPUTATIONAL STATISTICS & DATA ANALYSIS Computational Statistics & Data Analysis 25 (1997) 159-166 ELSEVIER A comparison of local constant and loca...

557KB Sizes 25 Downloads 127 Views

COMPUTATIONAL STATISTICS & DATA ANALYSIS Computational Statistics & Data Analysis 25 (1997) 159-166

ELSEVIER

A comparison of local constant and local linear regression quantile estimators Kerning Yu*, M.C. Jones Department of Statistics, The Open University, Walton Hall, Milton Keynes h4K7 6AA, UK Received 1 August 1995

Abstract Two popular nonparametric conditional quantile estimation methods, local constant fitting and local linear fitting, are compared. We note the relative lack of differences in results between the two approaches. While maintaining the expected preference for the local linear version, the arguments in favour are relatively slight, at least in the interior, and not as compelling as may be thought. The main differences between the approaches lie at the boundaries. @ 1997 Elsevier Science B.V.

Keywords: Boundary behaviour;

Conditional

quantile; Kernel smoothing;

Local polynomial

fit

1. Introduction The estimation of conditional quantiles has many applications in hypothesis testing and interval estimation, robust estimation and the provision of reference centiles in medicine. Local polynomial approaches to conditional quantile estimation and, in particular, the relative merits of local constant and local linear quantile estimation are the subject of this article. For a review of the literature on nonparametric conditional quantile estimation, together with a longer version of this note, see Yu (1997). Suppose that we have i.i.d. observations {(Xi, Y,)}; of (X, Y). The conditional pquantile qp(x) is the p-quantile of the conditional distribution F(ylx) of Y given X.

* Corresponding

author.

0167-9473/97/$17.00 @ 1997 Elsevier PZZ SO167-9473(97)00006-6

Science B.V. All rights reserved

160

K. Yu, M. C. Jones1 Computational

Statistics

& Data Analysis

25 (1997)

159-166

Koenker and Bassett (1978) suggested the use of the loss function p,(t) = i{ It]+ (2~ - l)t} to define qp(x). A kernel weighted implementation of this is to take as estimated conditional quantile &(x) = argmin, 2

p,(y1 - a)K(h-‘(x

(1.1)

- X,)),

here, K is a kernel function with bandwidth h, taken to be a symmetric unimodal density function. Formula (1 .l) is, in fact, a local constant version of conditional quantile estimation. Contrast this with the local linear version: iP(x) = Li, where (6, &) = argmin,b 2

p,(Y, - a - b(& - x))K(h-‘(x

-x)).

(1.2)

Both approaches have been proposed and investigated before. Also, in Yu and Jones (1997), we compare (1.2) with a further local linear quantile estimator based on a “double kernel” approach, and find marginally in favour of the latter. Nonetheless, the current context will suffice for comparative purposes. If p,(z) were replaced in (1.1) and (1.2) by .z2, then the well-known local constant (Nadaraya-Watson) and local linear regression mean estimators ensue. Advantages of local linear fitting over local constant fitting have been explored (e.g. Fan, 1992; Ruppert and Wand, 1994; Cleveland and Loader, 1997); the former is now well established as preferable. One naturally wonders whether the comparison of the two approaches to smoothing conditional quantiles above affords analogous results. The general answer, unsurprisingly, is “yes”, but we also indicate ways in which the advantages of local linear fitting over local constant fitting are perhaps not quite as compelling as one might sometimes be led to believe. Section 2 gives a theoretical comparison of the mean squared errors (MSEs) of the two methods. Section 3 briefly indicates results of comparisons of practical performance based on data applications and simulations. A concluding summary is given in Section 4.

2. Theoretical comparison In this section, we make a brief theoretical comparison of G,,,(x) and iP(x) by and nh + 00. Three of looking at their asymptotic MSEs as II + 00, h = h(n)+0 the four MSEs that we will be interested in (when we divide our investigation into “interior” and “boundary” behaviours) already appear in the literature. The assumptions underlying these results are standard ones, assuming up to two continuous derivatives of quantities as necessary (Fan et al. 1994). In addition to notation already established, write g(x) for the marginal density of the X’s, assumed to be continuous, and take the support of g to be [0,11. Let the support of K be [- 1, l] and define the interior by h XX < 1 - h and the boundary region as the complement of this.

K. Yu, M. C. Jones I Computational Statistics & Data Analysis 25 (1997)

Table 1 Pointwise

bias and variance Bias

Local constant fit

;h2k2

Local linear fit

$h’kzqb’(n)

MSE:

161

of ep and 4,

Method

2.1. Asymptotic

155166

Variance

>

P(I-P)NK) hd~)P bIpP(Xl=)

x in interior

The asymptotic biases and variances of iP(x) and G&V) are given in Table 1. Results for GP(x) are taken from Jones and Hall (1990) and for GP(x) from Fan et al. (1994). MSEs follow, of course, by squaring biases and adding to variances. Write

k2 = J: i z*K(z) d z and R(K) = J”, K*(z) dz. The two asymptotic variances are the same and thus differences between local constant and linear fits are concerned only with their biases. Had the term F20(~p(x)~x)/f(~P(x)~x) instead been -q:(x), these biases would have followed precisely the form of biases in the regression mean case (e.g. Fan, 1992) by replacing qP by the mean m. An unappealing property of the local constant bias is that the second term, not present in the local linear case, depends on the marginal density g. If the design is uniform, it is perhaps surprising that the two terms still differ in general, but the formulae check out by various derivations. q;(x) is as expected in the local linear case, but

~20tqpt41x> = -k;(x) + ~q;t~)~*uog ft$wIX>~“’+ 2q;tw"t%t-+91 f (%(X)lJc) (2.1) (Jones and Hall, 1990) is more naturally associated with methods which estimate conditional quantiles by first estimating the conditional distribution function and then inverting (e.g. Yu and Jones, 1997). One special case where the biases are the same is if q;(x) = 0 (flat portions, minima, maxima, points of inflection): then all terms but q;(x) are zero. If, on the other hand, g’(x)qL(x) = 0 because g is uniform, the last two terms in (2.1) also vanish if the conditional density f(qJ x )Ix ) is uniform and the biases are again the same. If Y = m(x)+& where E has density x then (2.1) simplifies further to -F20(qP(x)Ix)/ f(qJx)Ix) = q;(x) {qb(x)}*{log f(qp(x)Ix)}“. Thus, additionally, when gW&(x) = 0 (1) If lgKx>lis much larger than lq’,(x)l, the two smoothings have approximately equal asymptotic bias, variance and MSE, (2) when qF(x)=O, MSEL(x) 5 MSEC(x) and (3) more generally, when q/6(x) log O1{f(qJx)(x)} -C0, IBIASL(x)l 5 IBIASC(x)l and MSEL(x) 5 MSEC(x). Here, we have written BIASC

162

K. Yu, M.C. Jones1 Computational

Table 2 Pointwise

Statistics

& Data Analysis 25 (1997)

155166

bias and variance of &, and ip near the boundary at zero

Method

Bias

Variance

Local constant fit Local linear fit

(MSEC) and BIASL (MSEL) for the biases (MSEs) of the local constant and local linear fits, respectively. Note that it is quite possible for MSEL(x) 5 MSEC(x) when q;(x) logo1 {f(qJx)Ix)} > 0, depending on the relative sizes of q:(x) and {4#)12

logolW%(x)lx)~.

2.2. Asymptotic

MSE:

x near boundary

For the boundary case write x = ch, 0 5 c < 1 (of course, the other boundary could be accommodated in an analogous way). Then, set

K( al(c) =Jcu’u)

at 1

du.

-1

Further notation is B(c) =

a:(c) a0(c>adc)

and

adc)~3(c) -

4~)

vccj

=

fl(a2W

{ao(c>a2(c)

adW2K2b> -

du at(c>>’



The asymptotic biases and variances of iP(x) and G&X) for boundary points are given in Table 2. Results for tP(x) are proved by adapting the work of Jones and Hall ( 1990) (proof omitted to save space), and for GP(x) are taken from Fan et al. ( 1994). There is now, of course, a big difference between local constant and local linear fitting. The former has a boundary bias of order h which compares unfavourably with the latter’s O(h2) boundary bias. The local linear quantile estimate’s boundary MSE constants are, by the way, the same as those familiar for the local linear mean estimation problem (Fan and Gijbels, 1992). As a referee reminded us, for small samples, the direction in which the asymptotic results might mislead is clear. One would expect the bias of the local linear estimator to be less than that of the local constant because the former involves one more parameter than the latter. Likewise, the variance of the local linear estimator should be the greater.

3. Practical comparison Our study of the practical differences between local constant and local linear quantile fitting methods was based on application to five real data sets and various simulation contexts. Lack of space precludes anything but the briefest summary of these studies. The standard normal kernel was used throughout.

K. Yu, M. C. Jonesl Computational Statistics & Data Analysis 25 (1997)

1.59-16/i

163

?? ? ?

1

2

3

4

5

6

Age (vr) Fig. I. Seven smoothed quantile curves for serum concentration (g I-‘) of immunoglobulin-G in children by local constant fitting (solid lines) and local linear fitting (dotted lines): Sth, IOth, 25th, 75th, 90th and 95th percentiles.

3.1. Summary of data set applications The five practical data sets are the triceps skinfold data (Cole and Green, 1992) with y1= 892, the serum concentration data (Isaacs et al., 1983) with IZ= 300, the heart transplant log survival time data with n = 184 that was originally taken from Crowley and Hu (1977), the girls’ weight data (Cole and Green, 1992) with YE = 4011, and the motorcycle impact data (Hardle, 1990) with II = 133. In each case, we chose bandwidths by the method of Yu and Jones (1997) which consists of utilising the algorithm of Ruppert et al. (1995) to select the bandwidth suitable for regression mean estimation and then making “rule-of-thumb” adjustments to obtain suitable bandwidths for each quantile estimate. This method is tailored to the local linear fitting case, but we use the same values for local constant fitting too, since a corresponding methodology is unavailable. Estimated quantiles for the serum concentration data are plotted in Fig. 1, with the local constant fits as solid lines and the local linear fits as dotted lines; the data points are also plotted. The general impression is of a very considerable similarity between the quantiles estimated by the two methods except in some cases near the boundaries. In the interior, the differences would generally make no impact on qualitative conclusions and are, if anything, of smaller magnitude than the differences between the two

164

K. Yu. M.C

-2

JoneslComputational

Statistics

& Data Analysis

0

-1

25 (1997)

1

159-166

2

X

Fig. 2. The pointwise relative efficiency {MSE(&,&))/MSE(C&,&))}~~~, averaged over 100 replications (with fixed X’s), for estimation of 40.9, based on n = 500.

versions of local linear fitting demonstrated in Yu and Jones (1997). In the case shown, differences at the boundaries are considerable, and even qualitatively different: in Fig. 1, a peak toward the right-hand edge (indicated by both local linear fits of Yu and Jones, 1997) disappears with local constant fitting. 3.2. Indication of simulation investigations As part of a simulation study of several competing kernel-based regression quantile estimators to be published elsewhere, comparison of eP and iP over replications of a variety of cases was made. General impressions were confirmed of the little difference between these methods in the interior and substantial differences, to the detriment of iPp, in boundary regions. We should add that in very smooth situations, bandwidths will be large and hence boundary regions also large. One indication of these results, in a form different from that to be published elsewhere, is given in Fig. 2. The simulation situation is n = 500 points from the model Y =2 + sin(2X)

+ 2exp(-16X2)

+ c,

where X N N(0,1) truncated to [ -2,2] and E - N(O,O.25), independently. The pointwise relative efficiency {MSE(&,,(X))/MSE(&,,(X))}~~~, averaged over 100 replica-

K. Yu, M.C. JonesIComputational Statistics & Data Analysis 25 (1997) 159-166

165

tions (with fixed X’s), for estimation of 40.9 is shown. The bandwidths were chosen, separately for each estimator, by minimising the asymptotic integrated MSE for the given model. Clearly, differences between MSEs are small in the interior (except at isolated points where the bias of the local linear is especially small), but can be substantial near the boundaries (here, towards the right-hand end). Similar qualitative conclusions arise for other values of p, except p = 0.5 which is equal mean in this model.

4. Conclusions We see from the previous theoretical analysis and empirical comparison that while local linear fitting has a particularly appealing asymptotic mean squared error in terms of intuitive and mathematical simplicity, the practical differences from local constant fitting in the interior are not very great. This simplicity makes it easier to provide bandwidth selection rules, however. (This relates to a single overall bandwidth per quantile; local bandwidths have not been considered.) While the theory is right to point to areas of rapidly changing quantiles and rapidly changing design density as being where differences could be great, it is our experience that it is difficult to engineer such situations in simulations and still retain general practical relevance. (The analogous point goes for regression mean estimation too.) Detrimental boundary influence indeed exists when using local constant fitting in some cases, and it is this aspect which clinches the argument in favour of local linear smoothing. In addition, we would like to point out that ( 1) “Double kernel” versions of quantile estimation methodology seem even better (Yu and Jones, 1997) than the current check function methods, (2) our bandwidth selection strategy seems to work well, (3) local linear fitting can give quantile derivative estimates at the same time as estimating a quantile itself (but perhaps the bandwidth should be changed) and (4) There is a bigger cost in computing time with local linear than with local constant fitting.

Acknowledgements The authors are happy to acknowledge in preparing this article for publication.

the assistance of the editor and two referees

References [l] Cleveland, W., Loader, C., 1997. Smoothing by local regression: principles and methods (with discussion). Comput. Statist., to appear. [2] Cole, T.J., Green, P.J., 1992. Smoothing reference centile curves: the LMS method and penalized likelihood. Statist. Med. 11, 1305-1319. [3] Crowley, J., Hu, M., 1977. Covariance analysis of heart transplant data. J. Amer. Statist. Assoc. 72, 27-36.

166

K. Yu, M. C. Jones1 Computational

[4] Fan, J., 1992. Design-adaptive

Statistics

& Data Analysis

25 (1997)

159-166

nonparametric regression. J. Amer. Statist. Assoc. 87, 998-1004. [5] Fan, J., Gijbels, I., 1992. Variable bandwidth and local linear regression smoothers. Ann. Statist. 20, 2008-2036. [6] Fan, J., Hu, T.-C., Truong, Y .K., 1994. Robust nonparametric function estimation. Scandinavian J. Statist. 21, 433-446. [7] Hardle, W., 1990. Applied nonparametric regression. Cambridge University Press, Cambridge. [8] Isaacs, D., Altman, D.G., Tidmarsh, C.E., Valman, H.B., Webster, A.D.B., 1983. Serum immunoglobin concentrations in preschool children measured by laser nephelometry: reference ranges for IgG, IgA, IgM. J. Clin. Pathol. 36, 1193-1196. [9] Jones, M.C., Hall, P., 1990. Mean square error properties of kernel estimates of regression quantiles. Statist. Probab. Lett. 10, 283-289. [lo] Koenker, R., Bassett, G.S., 1978. Regression quantiles. Econometrica 46, 33-50. [l l] Ruppert, D., Sheather, S.J., Wand, M.P., 1995. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90, 1257-1270. [ 121 Ruppert, D., Wand, M.P., 1994. Multivariate locally weighted least squares regression. Ann. Statist. 3, 1346-1370. [ 131 Yu, K., 1997. Smooth regression quantile estimation. Ph.D. Thesis, The Open University, UK. [14] Yu, K., Jones, M.C., 1997. Local linear quantile regression. J. Amer. Statist. Assoc., to appear.