STATISTICS & PROBABILITY LETTERS ELSEVIER
Statistics & Probability Letters 37 (1998) 287 293
Bandwidth selection for power optimality in a test of equality of regression c u r v e s 1 K.B. Kulasekera*, J. Wang Department of Mathematical Sciences, Clemson University, Martin Hall, Clemson, SC 29634-1907, USA Received 1 July 1996; received in revised form 1 May 1997
Abstract We consider the bandwidth selection in a test of equality of regression curves given by King et al. (1991). We propose two sub-sample methods that determine data-based bandwidths maximizing the power while keeping the asymptotic size of the test to be fixed at a given level. The optimality is proved and some simulation results are presented. (~) 1998 Elsevier Science B.V. All rights reserved Keywords: Kernel estimator; Design variables; Nonparametric test
1. Introduction In this article, we are interested in examining the effect o f the bandwidth on the power o f the test given b y King et al. (1991) for the hypothesis H0: f = g
vs
Hi: f ~ g .
We observe data in the form {(Yli,xi), i = 1. . . . . n} and {(Y2i,xi), Yli=f(xi)-~-gi,
i = 1 . . . . . n,
Y2i = g(xi) --b ~]i,
i = 1. . . . . n.
(1) i = 1. . . . . n} with
(2)
Here ei and ~/i, i = 1 , . . . , n , are independent, random errors for the two groups. It is further assumed that within each group the errors are identically distributed with mean zero, but the distributions o f the e's and r/'s may be different. Also, let V a t ( e l ) = ~r2 and V a r ( r h ) = z 2. In this testing problem, the two functions f and g are specified only up to some smoothness conditions. The domain o f the covariate (x) measurements, is taken as [0, 1]. The covariate values (design points) for the two samples are the same. * Corresponding author. 1This research was supported by Grant 1R15 GM51106-01 from NIH. 0167-7152/98/$19.00 (~) 1998 Elsevier Science B.V. All rights reserved PI1 S01 67-71 52(97)00129-6
288
K.B. Kulasekera, J. Wan9 / Statistics & Probability Letters 37 (1998) 287-293
King et al. (1991) has proposed two tests for the above hypothesis, one assuming a normal error structure and the other without the normality. Both tests are based on linear estimators of f - 9 , which involve userdefined bandwidths. The value of the bandwidth plays a major role in these tests, a wrong choice leading to negligible powers. Bandwidth selection in a test of this type has been discussed by Kulasekera and Wang (1996), who propose some criteria that depend on some parameter which is a function of the bandwidth h (say) and f - 9 - One plugs in a preliminary estimate of f - 9 to use these methods which makes the procedures sensitive to the estimators of f - 9 . In this paper, we propose two methods that do not involve a parameter that is a function of f - 9 . The first method we propose is to divide the covariate domain into a few subdomains, and test the equality within these subdomains. Then we maximize the proportion of rejections with respect to the bandwidth. Letting the number of subdomains to grow with the sample size, we show such a procedure, indeed, maximize the power in an asymptotic sense. The second approach is to take all possible subsamples of a given size k < n, where k is chosen to satisfy some conditions, and test the equality based on all such subsamples. The asymptotic properties are the same in both cases, although the latter is more computationally involved. One problem that surfaces here is that if one uses a data-based bandwidth, exact distributional results no longer hold, and therefore, getting finite-sample exact critical points are difficult. However, for large samples we can show that the size is not much different, since the asymptotic distribution of the test statistic does not change. We observe that even with small samples, such selected bandwidths do not effect the size drastically. The rest of the paper is laid out as follows. In Section 2 we give details of the proposed methods and their properties and in Section 3 the results of a simulation study are presented.
2. Selection rules We first develop the notation. Let the covariates be from [0, 1] and assume, without loss of generality, that the covariates are ordered. Then, define Di = Yli - Y2i, i = 1 , . . . , n . Let Mn be a sequence such that M=Mn~cx~ and M n / n ~ O as n ~ o o . Let k = [ n / M , ] . The test statistic in King et al. (1991) (KHW for short) for testing ( l ) is based on a linear estimate (Eubank, 1988; Mfiller, 1987) of f - 9 - A linear estimator of # ( x ) - - f ( x ) - 9 ( x ) based on the D's can be written as n
ft(x) = ~
Oiwi(x),
(3 )
i=1
where the weights wi(x) depend on the type of the linear smoother that is being used. For example, in regular kernel smoothing, wi(x ) = fs~_, ( K ( (x - u )/h )/h ) du, so = O, si = ( xi + Xi+ l )/2 and sn = 1. Here, h is the bandwidth of the kernel estimator. The bandwidth is typically a function of the sample size. Then the test statistic is defined as T--
D IW IWD/n DIGtGD ,
(4)
where D is the vector of Di = Y l i - Y2i, i = 1. . . . . n, W is the matrix w i ( x j ) of the kernel estimator of/~ at each design point xj with a bandwidth h, and G~G is the matrix of the quadratic form for the estimator s z of Var(Di) = 72 = tr2 + ~2 (Hall et al., 1990). KHW use s2= l ~ i -~
2n z..., (Di+l - Di) 2.
(5)
i=1
The null hypothesis is rejected for large values of T, where critical points are found using the exact distribution of 7" under normal error structure. For unknown error distributions, asymptotic critical points
K.R Kulasekera, J. Wang I Statistics & Probability Letters 37 (1998) 287-293
289
are used. The test statistic and the null distribution depends on h for all samples in the exact case. In the asymptotic test, the test statistic depends on h. Thus, the power of the test depends on h. Our attempt here is to choose h so as to maximize the power of the test for a given size ~. We will need some of the following assumptions at various stages: AI: The kernel function K is a known probability density function symmetric about 0, it has a compact support [ - 1 , 1], ~:1 = f l 1 u2K(u)du <00 and ~2 = fl_,K2(u)du A2: The functions f and g are twice continuously differentiable. A3: The sequences of design points {xi, i = 1. . . . . n} become dense in [0, l] as n ~ oc. A4: The bandwidth sequence h used in estimating f - g satisfies nh2---+ oe.
2.1. Procedure 1 The proposed procedure is the following: divide the covariate values into M groups, the jth group containing k values xj,Xj+M .... ,xj+(k-l~, j = 1..... M. Now, we use the differences Dj ..... Dj+(k-OM in constructing a test statistic similar to (4). In particular, let
ojw/ oj/k -
oj'aj'ajoj
'
j=
1..... M,
where GjGj is the quadratic form for the estimator of the variance of the differences based on the observations in the jth group. The bandwidth is kept the same for all groups. Now, define 1
M
A = ~ ZI[Tj>~cj(~)],
(6)
j=l
where cj(~) is the critical point for the size ~ test of (1) with the covariate values Xj,Xj+M,...,Xj+(k_I)M and the corresponding differences Dj ..... Dj+(k_ 1)M. Let Hn = [On-6, fin-6], where 0 < 0 < fl are two fixed numbers and 0 < 6 < 1. The value of the bandwidth hn = hn -6, where hk = hk -6 maximizes A in Hk, is taken as the optimal bandwidth for constructing the test statistic T. The following can be observed: (1) E ( A ) = ~ and Var(A(h))=~(1 - ~)/M under Ho (2) E ( A ) ~ 1 under Ha and assumptions AI-A3. Since the power is degenerate at 1 as the sample size increases when the two functions are sufficiently apart, we examine the behavior of A for local alternatives. KHW observe that with samples of size n, for alternatives of type f - 9 = P/(n2h,,) 1/4 the test based on T has non-trivial power. We examine the role of A in selecting an hn under such alternatives. Let zt(hn)=P[T>c(n,~)] where c(n,~) is the critical point for a size ~ test based on a sample of size n. Also, let P[Ts >/c(k, c~)] = 7Zlj(h k) and define n l ( h k ) _ ~ y = l nlj(hk) M Suppose h* =h*n-rEHn maximize rt(hn). Then we have Theorem 2.1. Under assumptions A1-A3, h* /h ~ 1 in probability as n ---+go. Proof. Let
h*1,k =h~k-aEHk maximize nl(hk). Since
E(A) = ~zl(hk)
K.B. Kulasekera, J. Wang / Statistics & Probability Letters 37 (1998) 287-293
290
and M
Var(A) =
~
rqj(hk)(1 -
zqj(hk))/M 2,
j=l
we observe that A - ~l(hk) ~ 0 as n ~ o o . It is clear that
This gives
h/h~ ~
1 in probability as n ~ o e .
(7) Now, let A = f01(#(x)) 2 dx.
r~(h~)= P [Z>~z~ - A ] + en(h,),
(8)
where B = 2 f_22(f_l 1K(z)K(z + y)dz)2 dy and, en is the error of normal approximation (King et al., 1991). Now, consider a setup where M independent replicates of responses Dj, j = 1..... n have been obtained at xi, i= 1.... ,n. If we set k=n, then we can write M
P where
[Z ~>z, enj ,
A]
-
1 ~-~Pj=l [Z >~z~- A] = -M1Z(P[TJ>>c'J(~)]+en(jhn)= j' I
=
(9)
is the error of normal approximation for the j t h replicate. Combining (8) and (9), we can write .<
,
rt(hn ).~ re(hi,n)
1
4. ~
M
Z enj(h*) 4 en(h*).
(10)
j=l
Since u is a continuous function in h and en's tend to 0 as n -~ ec, we deduce that h*/h~ --~ 1 as n --* oc. This shows that the selection procedure is asymptotically optimal for local alternatives converging at a rate
1/(n2hn) 1/4. [] 2.2. Procedure 2 In this approach, we let k = n ~, 0 < ~ < ½ and take all possible subsamples of size k from the xi, i = 1. . . . . n. Then using the Di's that are associated with the selected x ' s we construct the test statistic Tj, j = 1. . . . . N where N--- (~). The empirical proportion of rejection is defined as N
A*
I y~I[Tj>~cj(a)],
(11)
j=l
where cj(~) is the exact critical point for the test using the sample of size k. Now, we maximize A* with respect to hk = hk -~ where hk EHk. If the maximizing value i s / ~ ' = h*k -~, we use hn = h*n -~ as the optimal bandwidth in a test with sample size n. Unlike the first procedure, the division of the original sample is not systematic and, therefore, this procedure can have a larger impact from non-uniform design densities in selecting the appropriate h. Theorem 2.2.
Under assumptions
AI-A3,
h*/h P-~1 as n ~ co.
Proof. Observe that V a r ( A * ) ~ 0 as n---~0 since ('-ff)/N~ 1 under the conditions on k. Also, if the x ' s become dense in [0, 1] as n --* c~, then it follows that the proportion of subsamples of size k that become
K.B. Kulasekera, J. Wan 9 I Statistics & Probability Letters 37 (1998) 287-293
291
dense in [0, 1] tends to 1 as n ~ c~. For example, if x's were uniformly distributed over [0, 1], then for e > 0 , let Ix = I x - e/2,x + e/2]. Then, the proportion of subsamples that do not have an intersection with Ix is ("-k"~)/N, which tends to 0 as n ~ e~. This enables us to use a normal approximation similar to that leads to (9). Now, the proof follows along the same lines as the proof of Theorem 2.1. Since the number of subsamples is very large even for a moderate n and k, we may select asymptotic critical points in each test using Tj in (11). This reduces the computational burden significantly although some accuracy is sacrificed. Here the asymptotic critical points are given by z~0(Tj) where z~ is the upper c¢ quantile of a standard normal variate and 6g(Tj) is the estimated null variance of the test statistic Tj. If one 2 1 estimates y2 by s 2, then ff2(Tj)= 1/n2hBs 2, where B = 2 f~-2(f'-i K(z)K(z + y)dz)2 dy (King et al., 1991).
3. Empirical results In this section we present some simulation results regarding the proposed procedure. We studied several alternatives f - # that represent a variety of alternatives. We considered h in an interval Ilk =--[0.01,0.25] and adjusted the bandwidth for a sample size n using 6 = 0.4. In all simulations, the error structure was taken to be normal with mean 0 and a variance 0.5. In the first part, sample sizes were taken to be 100 for procedure 1 with k---10. For procedure 2, we considered n - - 2 0 and k = 5 , since N becomes very large for large n and k when all possible subsamples were taken. Tables 1 and 2 give the empirical power results based on 200 replications for procedures 1 and 2, respectively. In the second part of the simulations we examined the effect of M and k on the power for the procedure 1 by taking n = 120 and changing the value of M. At the same time, we examined the power of a procedure similar to procedure 2 by taking M subsamples (rather than all N possible subsamples) of size k from the original sample. The results of this study with 1000 simulations for each combination are given in Tables 3 and 4. The application of the data-based bandwidth in a test can have an effect on the null distribution of the test statistic. The effect will diminish as the sample size gets larger. An inspection of Tables I and 2 show that the size does not change very much from the intended level when the data-based bandwidth has been used. The power seem to increase compared with results in King et al. (1991) and Kulasekera and Wang (1996) for procedure 1 with a sample size 100. For procedure 2, since the sample size is small, the power does not seem very high. Some calculations with n = 50 and k = 49 (although this value of k does not satisfy the requirements for asymptotic results to hold), shows that the power in that case is much improved. The choice of M does not seem to influence the power very much in the first procedure. The second procedure with the number of subsamples equal to M instead of N seem to be comparable to the first procedure in most cases when M was about the same as k, but the performance dropped when M became Table 1 Empirical power of size 0.05 tests with procedure 1; n = 100, k = 10 Function f - y
Power
Mean of hn
Std. dev. of hn
0 0.67 sin(9.42x) 0.5 cos2(10x) 0.5 sin(4x) 0.5 cos(10x) sin(2~x) cos(2~x) x U3 0.81 -- 1.62x 1
0.048 0.930 0.798 0.878 0.604 1.00 1.00 1.00 1,00 1,00
0.19 0.172 0.183 0.172 0.176 0.1236 0.1363 0.176 0.179 0.192
0.0375 0.0823 0.0359 0.0815 0.0749 0.0785 0.0602 0.0852 0.086 0.0227
292
K.B. Kulasekera, J. Wan9 I Statistics & Probability Letters 37 (1998) 287-293 Table 2 Empirical power of size 0.05 tests with procedure 2; n = 20, k = 5 Function f - g
Power
Mean of hn
Std. dev. of hn
0 0.67 sin(9.42x) 0.5 cos2(10x) 0.5 sinZ(10x) 0.5 sin(4x) 0.5 cos(10x) sin(2nx) cos(2~rx) x 1/3 0.81 - 1.62x 1
0.05 0.19 0.12 0.11 0.2 0.14 0,45 0.49 0.50 0.27 0.76
0.0386 0.0388 0.0375 0.0381 0.0375 0.0387 0.0391 0.0396 0.0384 0.0369 0.03842
0.0152 0.0083 0.0089 0.0084 0.0081 0.0074 0,0085 0.0108 0.0152 0.0087 0.0227
Table 3 Empirical power of size 0.05 tests: n = 120, kM = n Function f - g
0 0.5 cos2(10x) 0.67 sin(9.42x) 0.81 - 1.62x
Procedure
1 2 1 2 1 2 1 2
Power k=6, M=20
k=10, M=I2
k = 1 2 , M = 10
0.082 0.048 0.770 0.584 0.900 0.946 0.998 0.950
0.080 0.068 0.774 0.544 0.888 0.906 0.998 0.926
0.102 0.980 0.784 0.540 0.906 0.916 0.992 0.880
Table 4 Empirical power of size 0.05 tests: n = 120, kM = n Function f - g
0 0.5 cos2(lOx) 0.67 sin(9.42x) 0.81 - 1.62x
Procedure
1 2 1 2 1 2 1 2
Power k = 15, M = 8
k=20, M=6
k=30, M=4
0.094 0.080 0.768 0.524 0.900 0.914 0.986 0.910
0.086 0.062 0.796 0.406 0.904 0.840 0.998 0.830
0.082 0.050 0.794 0.334 0.892 0.688 1.000 0.686
smaller. T h i s is a r e a s o n a b l e b e h a v i o r b e c a u s e w h e n the n u m b e r o f s u b s a m p l e s b e c o m e s m a l l ( w i t h k g e t t i n g large as w a s in o u r s i m u l a t i o n s ) , it is likely t h a t m a n y o b s e r v a t i o n s repeat. I n practice, for p r o c e d u r e 2 it w o u l d b e b e s t i f o n e d e c i d e s o n k a n d c h o o s e M to b e large. T h i s w o u l d r e d u c e t h e c o m p u t a t i o n a l time, yet r e t a i n i n g t h e a s y m p t o t i c properties. O n e w a y to do this w o u l d b e b y o n l y c o n s i d e r i n g t h o s e s u b s a m p l e s o f size k w h i c h h a v e " l i t t l e " overlap. T h e a m o u n t o f o v e r l a p c o u l d b e d e c i d e d b y t h e u s e r ( a t m o s t o n e x v a l u e in c o m m o n , etc.). I f the s a m p l e size is sufficiently large, o n e m a y u s e p r o c e d u r e 1 w i t h o u t m u c h r e s e r v a t i o n .
K.B. Kulasekera, J. Wang / Statistics & Probability Letters 37 (1998) 287-293
293
References Eubank, R.L., 1988. Spline Smoothing and Nonparametric Regression. Marcel Dekker, New York. Fan, J., 1992. Design adaptive nonparametric regression. J. Am. Statist. Assoc. 87, 998-1004. Hall, P., Kay, J.W., Titterington, D.M., 1990. Asymptotically optimal difference based estimation of variance in nonparametric regression. Biometrika 77, 521-528. King, E.C., Hart, J.D., Wehrly, T.E., 1991. Testing the equality of two regression curves using linear smoothers. Statist. Probab. Lett. 12, 239-247. Kulasekera, K.B., Wang, J., 1996. Smoothing parameter selection for power optimality in testing of regression curves. J. Am. Statist. Assoc., accepted. Miiller, H-G., 1987. Nonparametric Regression Analysis of Longitudinal Data. Springer, Berlin.