Statistics & Probability Letters 60 (2002) 81 – 89
Nonparametric comparison of regression curves by local linear !tting Tue G$rgens∗ Economics RSSS, Australian National University, Canberra ACT 0200, Australia Received March 2001; received in revised form July 2002
Abstract This paper proposes a new nonparametric test for the hypothesis that the regression functions in two or more populations are the same. The test is based on local linear estimates using data-driven bandwidth selectors. The test is applicable to data with random regressors and heteroskedastic responses. Simulations indicate the test has good power. c 2002 Elsevier Science B.V. All rights reserved. MSC: C12; C14 Keywords: Nonparametric testing; Local polynomial !tting; Bias correction
1. Introduction This paper proposes a new nonparametric test for the hypothesis that the regression functions in two or more populations are the same. The new test has several attractive properties. First, it requires very little structure in the data and therefore has a wide range of applications. In particular, it is applicable to data with unequal sample sizes, random nonuniform regressors, and heteroskedastic responses. Second, the new test is fully automated in the sense that it does not rely on an arbitrary choice of smoothing parameters. The test is compatible with asymptotically optimal data-driven bandwidth selectors. Finally, simulation results show that the new test has good level properties and good power against important alternatives. The setup is the following. Suppose there are m populations and that nj observations are available for the jth population. All observations are assumed to be independent. Let Yij and Xij denote the ith ∗
Fax: +61-2-61250182. E-mail address:
[email protected] (T. G$rgens).
c 2002 Elsevier Science B.V. All rights reserved. 0167-7152/02/$ - see front matter PII: S 0 1 6 7 - 7 1 5 2 ( 0 2 ) 0 0 2 8 3 - 3
82
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
observation in the jth population of the response variable and the (univariate) regressor, respectively. Consider the model Yij = j (Xij ) + ij ;
i = 1; 2; : : : ; nj ; j = 1; 2; : : : ; m;
(1)
where ij is an unobserved random variable with E(ij |Xij ) = 0 and j is the mean function de!ned by j (x) = E(Yij |Xij = x). Assuming that E(Yij2 |Xij ) ¡ ∞, de!ne the variance function j by j (x) = Var(Yij |Xij = x). The objective is to test the hypothesis H0 : 1 = j for all j against the alternative hypothesis H1 : 1 = j for some j. The motivation for this paper was an empirical application where the sample sizes are diFerent and the regressors not identically distributed between populations, and where there is considerable heteroskedasticity within each population. A review of the literature revealed a large number of tests which are not applicable to such data. For example, Hall and Hart (1990), King et al. (1991), Delgado (1993), Kulasekera (1995), Young and Bowman (1995) and Yatchew (1999) all assume homoskedastic responses within each population, and Baltagi et al. (1996) assume equal sample sizes. Munk and Dette (1998) consider unequal sample sizes and heteroskedastic responses, but only in the context of asymptotically uniform !xed regressors. Apparently, only two papers discuss testing with unequal sample sizes, random nonuniform regressors and heteroskedastic responses. Neither test is ideal for the application at hand. Practical use of Cabus’ (1998) test is hampered by the lack of guidance on how to choose of a smoothing parameter. Scheike’s (2000) test does not involve smoothing parameters, but is designed speci!cally for uniformly ordered alternatives (1 6 2 or 1 ¿ 2 ) and seems to have little power against other alternatives. The main ingredients of the new test are local linear estimates, but the basic idea can be described without the estimation details. For a given evaluation point x, let ˜ j (x) denote the estimator of j (x) and de!ne P(x) = (˜ 1 (x); : : : ; ˜ m (x)) . Let R be the (m − 1) × m restriction matrix appropriate for testing the null hypothesis H0 : 1 = j for all j. (That is, R = [–; −I ] where – is a vector of ones and I is an identity matrix.) Intuitively, if the ˜ j (x)’s are asymptotically unbiased, then the asymptotic distribution of t(x) = P(x) R [R Var(P(x))R ]−1 RP(x)
(2)
2
is (m − 1) under the null by virtue of asymptotic normality of P(x). This local test statistic is turned into a global test statistic by integrating over the regressor space, ∞ T= t(x)!(x) d x; (3) −∞
where ! is some nonnegative weight function. The purpose of the weight function is simply to ensure that only x in the support of Xij contribute to T . The limiting distribution of T is nonstandard, but can easily be simulated. 2. Test statistic The idea of local polynomial regression is straightforward. To estimate the conditional mean at a given point x, simply !t a pth order polynomial to the data using weighted least squares with weights K((x − Xij )=bj )1=2 , where K is a kernel function and bj a bandwidth. To formalize this de!ne Y = (Y1j ; : : : ; Ynj j ) and X = (X1j ; : : : ; Xnj j ) , let W be an nj × nj diagonal matrix whose ith
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
83
diagonal element is K((x − Xij )=bj ), and let X be an nj × (p + 1) matrix whose (i; k)th element is ˆ where ek denotes (Xij − x)k −1 . De!ne Rˆ = (X W X)−1 X WY . The estimator of j (x) is ˆj (x) = e1 R, the (p + 1)-vector whose components are all 0 except for the kth component which is 1. A bandwidth sequence is said to be optimal if it minimizes the asymptotic mean square error of ˆj (x), conditional on p and K (see e.g. Fan and Gijbels, 1995). If an optimal bandwidth is used (bj proportional to nj1=(2p+3) ), then ˆj (x) will be asymptotically biased. It turns out that the leading term of the biases cancel across the m populations if the ˆj (x) are computed using the same bandwidth for all j. Hence, if the sample sizes and the distributions of the regressor are similar for all populations, the test may be carried out using optimal bandwidths and ˜ j (x) = ˆj (x). If they are not similar, then the optimal bandwidths will diFer across populations and the asymptotic biases must be dealt with. This paper uses bias-corrected estimates of j (x). Following Fan and Gijbels (1995), a Taylor series expansion of order a implies that the conditional bias of Rˆ is approximately (X W X)−1 X W ZS, where S=(p+1 ; : : : ; p+a ) with k =j(k) (x)=k! and Z is an nj ×a matrix whose (i; k)th element is (Xij −x)p+k . Let Sˆ be an estimator of obtained by (p+a)th ). De!ne order local polynomial regression using an optimal bandwidth (b˜j proportional to n1=(2p+2a+3) j − 1 ˜ ˆ The bias-corrected estimator of j (x) is then ˜ j (x) = e1 R. R˜ = Rˆ − (X W X) X W ZS. Under the assumption (see the appendix) that j has p + a + 1 bounded derivatives, the asymptotic mean square error of an (p + a)th order polynomial smoother with the optimal bandwidth is O(nj−(2p+2a+2)=(2p+2a+3) ). The asymptotic mean square error of the estimator considered in this paper is O(nj−(2p+2)=(2p+3) ), because a derivatives are reserved for estimating the optimal bandwidth and the bias correction. The estimator is therefore not optimal given the amount of smoothness assumed. However, the estimator is optimal conditional on p and K, because optimal bandwidths are used at each stage. The problem of estimating Var(P(x)) reduces to estimating Var(˜ j (x)|X ), because the m population samples are independent. De!ne the (p + 1) × nj smoother matrix Q = (X W X)−1 X W , let Q˜ be the equivalent (p + a + 1) × nj matrix using bandwidth b˜j instead of bj , and let e˜ denote the ˜ . (e˜ is an a × (p + 1) zero matrix next to an a × a identity a × (p + a + 1) matrix such that Sˆ = e˜QY ˜ ˜ ) = j (x)(Q − QZe˜Q)(Q ˜ ˜ (1 + o(1)). ˜ − QZe˜Q) matrix.) Then R = QY − QZe˜QY , whence Var(R|X Several estimators of j (x) have been proposed (see e.g. discussions by Ruppert et al., 1997; Xia, 1998). The one used here is taken from Fan and Gijbels (1995), who suggested
ˆj (x) =
ˆ ˆ W (Y − XR) (Y − XR) : trace(W − W X(X W X)−1 X W )
(4)
˜ )= ˆj (x)(Q −QZe˜Q)(Q R|X ˜ j (x)|X )= ˜ ˜ , and Var( The estimator of the variance is thus Var( −QZe˜Q) ˜ )e1 . R|X e1 Var( The use of random bandwidths complicates the derivation of the asymptotic distribution of T . The approach taken here is similar to that of Xia (1998). Let [xL ; xU ] denote the support of the weight function !, which is strictly contained in the supports of Xij for j = 1; : : : ; m. De!ne ∞ u−x −1=2 dWj (u); K (5) Yj (x; bj ) = bj bj −∞
84
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
∞ where Wj (x) is the Wiener process on (−∞; ∞), and put dK = −∞ K(u)2 du. For simplicity, let ˜ j (x)|X ). If regularity assumptions given in the appendix hold and if Vj denote the estimator Var( p = 1 and a = 2, then ˜ j (x) − j (x) = Vj1=2 dK−1=2 Yj (x; bj ) + o(nj−1=2 bj−1=2 )
(6)
almost surely uniformly over x ∈ [xL ; xU ] and (bj ; b˜j ) ∈ Hn . The sets H1 ; H2 ; : : : are de!ned in the appendix and include the optimal bandwidth rates. It follows from (6) that the limiting distribution of T is the same as the limiting distribution of ∞ T∗ = t ∗ (x)!(x) d x; (7) −∞
where −1 t ∗ (x) = P ∗ (x) R [R Var(P(x))R ] RP ∗ (x)
(8)
(V11=2 dK−1=2 Y1 (x; b1 ); : : : ; Vm1=2 dK−1=2 Ym (x; bm )) .
and P ∗ (x) = Asymptotic critical values for T are given by the appropriate quantiles of the distribution of T ∗ . Although the distribution of T ∗ is nonstandard, it is easy to simulate. To do this, let u0 ; u1 ; : : : ; uL be an evenly spaced partition of [xL − bj ; xU + bj ] with u0 = xL − bj and uL = xU + bj . If L is a large number, then L (ul + ul−1 )=2 − x −1=2 (ul − ul−1 )Zjl ; K (9) Yj (x; bj ) bj bj l=1
for some sequence Zj1 ; : : : ; ZjL of independent standard normal variables. To draw a value from the distribution of T ∗ , simply generate iid normal variables Zj1 ; : : : ; ZjL and calculate Yj (x; bj ) for j = 1; : : : ; m, then calculate P ∗ (x), t ∗ (x) and T ∗ . The integral in the de!nition of T ∗ can be evaluated using standard numerical methods. 3. Simulation results This section presents Monte Carlo simulation results. The data generating processes are taken to be of the form Yij = j (Xij ) + j1=2 (Xij )ij ;
i = 1; 2; : : : ; nj ; j = 1; 2:
(10)
Experiments are carried out for two mean functions, the bump function j (x) = x + 2 exp(−16x2 ) and the sine function j (x) = sin(2(x). In the base experiments (labeled “Base” in the table), quantities other than 1 and 2 are the same for the two populations, namely nj = 500, Xij ∼ U [0; 1], ij ∼ N(0; 1), and j (x) = *j2 where *j2 is determined such that R2 = Var(j (Xij ))=Var(Yij ) = 0:80. (Results for R2 = 0:40 are available from the author.) Robustness is examined by varying the data generating process for the second population while leaving the !rst population unchanged: in the !rst variation (labeled “Heteroskedasticity”) 2 (x) = *22 31−x , in the second variation (labeled “Small Sample”) n2 = 300, and in the third variation (labeled “Skewed Regressor”) Xi2 ∼ 5 − (U [0; 1] − 2)2 − 1. The kernel is Epanechnikov. The bandwidths for estimating the mean function are chosen using
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
85
Table 1 Monte Carlo rejection probabilities Bump function New test
Sine function Scheike
New test
Scheike
0.100
0.050
0.100
0.050
0.100
0.050
0.100
0.050
2 (x) = 1 (x) Base Heteroskedasticity Small sample Skewed regressor
0.113 0.118 0.110 0.118
0.060 0.063 0.058 0.062
0.102 0.101 0.103 0.102
0.052 0.048 0.050 0.048
0.120 0.120 0.115 0.115
0.067 0.064 0.059 0.057
0.105 0.098 0.100 0.096
0.055 0.054 0.049 0.052
2 (x) = 1 (x) − 0:100 Base Heteroskedasticity Small sample Skewed regressor
0.782 0.819 0.660 0.826
0.668 0.719 0.534 0.727
0.817 0.819 0.689 0.833
0.723 0.724 0.566 0.734
0.959 0.974 0.895 0.962
0.925 0.950 0.831 0.927
0.924 0.926 0.856 0.918
0.872 0.873 0.766 0.866
2 (x) = 1 (x) − 0:346(x − 12 ) Base 0.714 Heteroskedasticity 0.760 Small sample 0.597 Skewed regressor 0.770
0.593 0.631 0.460 0.655
0.102 0.099 0.103 0.102
0.052 0.046 0.051 0.049
0.943 0.964 0.853 0.947
0.893 0.927 0.768 0.895
0.103 0.097 0.100 0.096
0.054 0.055 0.048 0.053
2 (x) = 1 (x + 0:015) Base Heteroskedasticity Small sample Skewed regressor
0.701 0.726 0.573 0.739
0.188 0.187 0.187 0.211
0.118 0.114 0.111 0.133
0.948 0.967 0.882 0.958
0.907 0.932 0.801 0.919
0.103 0.099 0.098 0.097
0.054 0.055 0.049 0.055
0.806 0.822 0.686 0.833
the “re!ned” procedure developed by Fan and Gijbels (1995). The bandwidths for estimating the bias coeUcients are chosen using Fan and Gijbels “RSC” procedure. Finally, the weight function is !(x) = 1[0:01; 0:99] (x), there are 4000 replications per experiment, and p-values are simulated using L = 100 and 4000 draws of Y1 and Y2 . The simulated sizes for nominal sizes 10% and 5% are given in the top panel of Table 1 under the heading “New Test”. For comparison simulated sizes are also presented for the test proposed by Scheike (2000). It can be seen that the new test has a slight tendency to overreject, but that the size distortions are small. The remaining three panels in Table 1 show the power of the tests for alternatives of the form 2 (x) = a1 + a2 (x − 1=2) + 1 (x + a3 ), where a1 represent a vertical shift, and a3 a horizontal shift. The values of a1 , a2 and a3 are chosen such that 1 a2 a slope change 2 ( (x) − (x)) d x = 0:01. The new test has good power for all three alternatives. In contrast, 1 2 0 Scheike’s test has good power against a vertical shift, but virtually none against horizontal shifts and tilts. This reWects the fact that Scheike’s test is designed to have power against uniformly ordered alternatives (1 6 2 or 2 6 1 ), whereas the new test has power against all alternatives. It seems reasonable to conclude that the new test performs favorably.
86
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
Fig. 1. Scatter 1967/1968.
4. Application Recently De Fontenay et al. (2002) studied individual earnings data over the period 1967–1998 for US men. Central to this study were conditional distributions of changes in log-earnings from one year to the next given log-earnings in the !rst year. To illustrate the new test on a real data set, consider the hypothesis that the conditional mean of changes in log-earnings between 1967 and 1968 given log-earnings in 1967 is the same as the conditional mean of changes in log-earnings between 1997 and 1998 given log-earnings in 1997. There are 7961 observations for 1967/1968 and 7741 for 1997/1998. Fig. 1 shows a scatter plot of the !rst sample; the equivalent plot for the second sample is similar. Fig. 2 shows the estimated conditional mean functions for the two years. There are two lines for each year; one is the raw estimate, the other the bias-corrected estimate. The functions are compared on the interval from 8.9 to 10.7, which corresponds to the 5 –95 percentile range for 1967 and roughly to the 10 –90 range for 1997. Kernels, bandwidths, etc. are chosen as described in Section 3. Note that the estimated regression lines cross in Fig. 2. As demonstrated by the simulations presented in Section 3, this suggests a situation where Scheike’s test is unsuitable and a more powerful test, such as the one proposed in this paper, is needed. As it turns out, the value of Scheike’s test statistic is −1:34 with a p-value of 0.17, and the value of the new test is 2:07 with a p-value of about 0.28. The tests therefore agree in this case that there is no statistically signi!cant diFerence between conditional yearly earnings growth during 1967/1968 and during 1997/1998.
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
87
Fig. 2. Means 1967/1968 (solid), 1997/1998 (dashed).
5. Concluding remarks The framework presented in this paper lends itself to various generalizations. The extension to dependent data (within or across populations) is straightforward, building on the work by Xia (1998) and references therein. The extension to multivariate regressors and comparison of derivative functions is harder, because no uniform limit theory for multivariate and higher-order local polynomial estimators is currently available in the literature. Appendix This appendix sketches the proof of the central result (6). Since the proof is the same for j = 1; : : : ; m, the population subscript j is omitted. Recall that [xL ; xU ] denotes the support of !. Let [xL∗ ; xU∗ ] denote a slightly larger interval, so that xL∗ ¡ xL and xU ¡ xU∗ . The following assumptions are maintained throughout (and must hold for each of the m populations). Assumption 1. 1. {Xi ; Yi } are independent and identically distributed. 2. The density function f of Xi satis!es 0 ¡ c 6 f(x) 6 C ¡ ∞ for all x ∈ [xL∗ ; xU∗ ] and some constants c and C, and has a bounded continuous derivative @f. 3. The conditional mean function has a bounded (p + a + 1)th order derivative @p+a+1 on ∗ [xL ; xU∗ ].
88
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
4. The conditional variance function is continuous on [xL∗ ; xU∗ ]. 5. All moments of ij exist. 6. The kernel K is a symmetric density function with a bounded continuous derivative @K and compact support [ − 1; 1]. 7. The density function g of Yi satis!es b−3 (log n) |y|¿an g(y) dy=O(1), where {an } is a sequence of constants tending to in!nity as n → ∞. Assumption 1.7 was used by HXardle (1989) to prove (16) below. The remaining assumptions are standard and need no discussion. De!ne the bandwidth intervals Hn = [c1 n31 ; c2 n32 ] where c1 ¿ 0, c2 ¿ 0, and 0 ¡ 32 ¡ 31 ¡ 1. To Y n ) and An = o(5 indicate uniform convergence, de!ne the notation An = O(5 Y n ) to mean supx∈[xL ; xU ]; b∈Hn |An | = O(5n ) almost surely and supx∈[xL ; xU ]; b∈Hn |An | = o(5n ) almost surely, respectively. In addition, ∞ de!ne 60n = n−1=2 b−1=2 log n and 6kn = bk + n−1=2 b−1=2 log n for k ¿ 0, and put cj = −∞ uj K(u) du. Finally, de!ne the n × 1 matrix C k = [(Xi − x)k ]i . It is well known in the literature (see e.g. Masry, 1996; Xia, 1998) that n 11 Xi − x K (Xi − x)j = b j cj f(x) + b j+1 cj+1 @f(x) + o(6 Y 1n b j ) (11) n i=1 b b and
n
11 K n i=1 b
Xi − x b
Y 0n b j ): (Xi − x)j (Yi − (Xi )) = O(6
(12)
These results can easily be proved, using for instance empirical process theory. ˆ De!ne M = E(Y |X ). The !rst result is uniform consistency of the local polynomial estimator R. A Taylor series expansion implies M = XR + Cp+1 (p+1 + o(1)) uniformly, where R = (0 ; : : : ; p ) Y p+1; n –), where with k = j(k) (x)=k!. This can be used with (11) and (12) to prove that B(Rˆ − R) = O(6 p B = diag(1; b; : : : ; b ) and – a vector of ones. With p + a instead p this implies b˜p+1 B˜ a (Sˆ − S) = ˜ : : : ; b˜a−1 ), and O˜ denotes the ˜ 6˜p+a+1; n –), where 6˜p+a+1; n = b˜p+a+1 + n−1=2 b˜−1=2 log n, B˜ a = diag(1; b; O( 1 ˜ 6˜p+a+1; n b˜−p−1 B˜ − equivalent of OY with b˜ replacing b. It follows that Sˆ − S = O( a –). ˜ A Taylor series expanThe second result is uniform consistency of the bias-corrected estimator R. p+a+1 O(1) uniformly. This can be used with (11) and (12) to prove sion implies M = XR + ZS + C that Y 1n 60n –) B(R˜ − R) = n−1 f(x)−1 S −1 B−1 X W (Y − M ) + O(6 1 Y p+a+1 –) + O(b Y p+1 1a Ba )O( ˜ 6˜p+a+1; n b˜−p−1 B˜ − + O(b a –);
(13)
S −1
where is inverse of the (p + 1) × (p + 1) matrix S = [ci+j−2 ]i; j . The next task is to ensure that the remainder terms are o(n−1=2 b−1=2 ) almost surely, uniformly ˜ To this end, for some positive constants c1 , c2 , c3 , c4 , 1 , 2 , 3 , 4 with 1 = 1=3, over x, b, and b. 1 ¿ 2 ¿ 3 and 4 = 1, de!ne ˜ : c1 n−1 ¡ b ¡ c2 n−2 ¡ b˜ ¡ c3 n−3 ; b5 b˜4 ¡ c4 n−4 }: (14) Hn = {(b; b) YY −1=2 b−1=2 ), where It can be veri!ed that if p = 1 and a = 2, then the remainder terms in (13) are o(n YY n ) means supx∈[x ; x ];(b;b)˜ ∈H |An | = o(5n ) almost surely. An = o(5 L U n
T. G#rgens / Statistics & Probability Letters 60 (2002) 81 – 89
89
The de!nition of Hn may not me immediately transparent, but note that if b = cn− and b˜ = cn ˜ −˜, 1 1 1 then 9 ¡ ¡ 3 and 0 ¡ ˜ ¡ 3 on Hn . Moreover, the condition involving c2 and 2 implies that b= b˜ = O(c2 n−2 ) for some 2 ¿ 0, so bj = b˜j → 0. The last condition, involving c4 and 4 is a lower bound on joint convergence, ensuring that b and b˜ do not converge too slowly at the same time. Note also that the optimal rates b ˙ n−1=5 and b˜ ˙ n−1=7 are in Hn . The !rst component of BR˜ − BR is (x) ˜ − (x). For p = 1 and a = 2, (13) yields n Xi − x 1 1 YY −1=2 b−1=2 ); K (Yi − (Xi )) + o(n (x) ˜ − (x) = (15) f(x)n i=1 b b where the sum on the right-hand side is the !rst component of S −1 B−1 X W (Y − M ). HXardle (1989) proved that n Xi − x 1 1=2 1=2 1 K (Yi − (Xi )) = ( (x)f(x))1=2 Y(x; b) + o(1): n b Y (16) n i=1 b b Uniform convergence of the (scaled) variance estimator nbV to (x)f(x)−1 dK can be established using similar arguments. This together with (15) and (16) imply (6). References Baltagi, B.H., Hidalgo, J., Li, Q., 1996. A nonparametric test for poolability using panel data. J. Econometrics 75 (2), 345–367. Cabus, P., 1998. A Kolmogorov–Smirnov test for checking the equality of two regression curves. C. R. Acad. Sci. Ser. I—Math. 327, 939–942. De Fontenay, C., GHrgens, T., Liu, H., 2002. The role of mobility in oFsetting inequality: a nonparametric exploration of the CPS. Rev. Income Wealth 48 (3), 347–370. Delgado, M.A., 1993. Testing the equality of nonparametric regression curves. Statist. Probab. Lett. 17 (3), 199–204. Fan, J., Gijbels, I., 1995. Data-driven bandwidth selection in local polynomial !tting: variable bandwidth and spatial adaptation. J. Roy. Statist. Soc. Ser. B 57 (2), 371–394. Hall, P., Hart, J.D., 1990. Bootstrap test for diFerence between means in nonparametric regression. J. Amer. Statist. Assoc. 85 (412), 1039–1049. HXardle, W., 1989. Asymptotic maximal deviation of M -smoothers. J. Multivariate Anal. 29 (2), 163–179. King, E., Hart, J.D., Wehrly, T.E., 1991. Testing the equality of two regression curves using linear smoothers. Statist. Probab. Lett. 12 (3), 239–247. Kulasekera, K.B., 1995. Comparison of regression curves using quasi-residuals. J. Amer. Statist. Assoc. 90 (431), 1085–1093. Masry, E., 1996. Multivariate local polynomial regression for time series: uniform strong consistency and rates. J. Time Ser. Anal. 17 (6), 571–599. Munk, A., Dette, H., 1998. Nonparametric comparison of several regression functions: exact and asymptotic theory. Ann. Statist. 26, 2339–2368. Ruppert, D., Wand, M.P., Holst, U., HXossjer, O., 1997. Local polynomial variance-function estimation. Technometrics 39 (3), 262–273. Scheike, T.H., 2000. Comparison of non-parametric regression functions through their cumulatives. Statist. Probab. Lett. 46, 21–32. Xia, Y., 1998. Bias-corrected con!dence bands in nonparametric regression. J. Roy. Statist. Soc. Ser. B 60 (4), 797–811. Yatchew, A.J., 1999. An elementary nonparametric diFerencing test of equality of regression functions. Econ. Lett. 62, 271–278. Young, S.G., Bowman, A.W., 1995. Nonparametric analysis of covariance. Biometrics 51, 920–931.