Wavelet designs for estimating nonparametric curves with heteroscedastic error

Wavelet designs for estimating nonparametric curves with heteroscedastic error

Journal of Statistical Planning and Inference 117 (2003) 323 – 343 www.elsevier.com/locate/jspi Wavelet designs for estimating nonparametric curves ...

303KB Sizes 0 Downloads 77 Views

Journal of Statistical Planning and Inference 117 (2003) 323 – 343

www.elsevier.com/locate/jspi

Wavelet designs for estimating nonparametric curves with heteroscedastic error Alwell J. Oyet∗ Department of Mathematics and Statistics, Memorial University of Newfoundland, St. John’s, Newfoundland, Canada A1C 5S7 Received 21 January 2000; received in revised form 26 February 2002; accepted 28 May 2002

Abstract In this paper, we discuss the problem of constructing designs in order to maximize the accuracy of nonparametric curve estimation in the possible presence of heteroscedastic errors. Our approach is to exploit the 2exibility of wavelet approximations to approximate the unknown response curve by its wavelet expansion thereby eliminating the mathematical di4culty associated with the unknown structure. It is expected that only 5nitely many parameters in the resulting wavelet response can be estimated by weighted least squares. The bias arising from this, compounds the natural variation of the estimates. Robust minimax designs and weights are then constructed to minimize mean-squared-error-based loss functions of the estimates. We 5nd the periodic and symmetric properties of the Euclidean norm of the multiwavelet system useful in eliminating some of the mathematical di4culties involved. These properties lead us to restrict the search for robust minimax designs to a speci5c class of symmetric designs. We also construct minimum variance unbiased designs and weights which minimize the loss functions subject to a side condition of unbiasedness. We discuss an example from the nonparametric literature. c 2002 Elsevier B.V. All rights reserved.  MSC: primary 62K05; 62G35; secondary 62G07; 41A30 Keywords: Daubechies wavelet; Minimax robust; Minimum variance unbiased; Multiwavelet system; Relative e4ciency; Weighted least squares

 This research is supported by the Memorial University of Newfoundland and the Natural Sciences and Engineering Research Council of Canada. ∗ Tel.: +01-709-737-8075; fax: +01-709-737-3010. E-mail address: [email protected] (A.J. Oyet).

c 2002 Elsevier B.V. All rights reserved. 0378-3758/03/$ - see front matter  PII: S 0 3 7 8 - 3 7 5 8 ( 0 2 ) 0 0 3 7 0 - 1

324

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

1. Introduction The design problem for regression models with a prespeci5ed parametric form for the mean response curve has been discussed in great details by several authors. On the other hand, studies on constructing designs for nonparametric curve estimation have been limited by di4culties associated with the unknown structure of the response function. Chan (1991) addressed this problem by employing 5rst-order diHerences to construct designs for estimation of the error variance in nonparametric regression. Mitchell et al. (1994) adopted a Bayesian approach were the unknown response curve was represented by a random function. They found that designs based on asymptotics were easy to compute and were quite e4cient over a wide range of parameters of the prior process. Discussions on traditional Bayesian design theory, where the prior for the response function is a random 5nite linear combination of known functions can be found in Chaloner (1984). The basic idea behind these approaches is to construct nonparametric designs based on some structure imposed on the unknown response. Our approach is to transform the classical nonparametric design problem into a robust minimax design problem with contamination, in the wavelet domain, through wavelet expansion of the unknown response (x). Due to the adaptive nature of wavelets, the exact structure of the response need not be known. The robust minimax design problem arises naturally from the fact that only a 5nite number of terms in the wavelet expansion can be estimated in actual computations. Thus, the remainder terms, say f(x), represents uncertainty about the exact nature of the unknown response function. We discuss the general theory in Section 2. Classical design theory has focused on the attainment of some form of a minimum variance property, assuming the 5tted model to be exactly correct. This, in particular, was the approach adopted by Herzberg and Traves (1994) in constructing designs for the Haar wavelet regression model. Box and Draper (1959) have drawn attention to the fact that these classical designs will no longer be ‘optimal’ if the model was misspeci5ed. Since then, various authors including Huber (1975), Pesotchinsky (1982) and Wiens (1992) have constructed designs which are robust against various forms of model misspeci5cation. The design problem of interest arise in the following way: an experimenter plans to observe Y (x) at n not necessarily distinct values of the design variable x chosen from a design space S in order to maximize the accuracy of estimating the unknown regression function (x) based on wavelet representations. Since S contains more than n points, the problem is that of 5nding the ‘best’ design points at which Y (x) will be observed. In our study, we take S = [0; 1] and ‘best’ is de5ned with respect to loss functions which are monotonic scalar-valued functions of the mean squared error (MSE) matrix of the weighted least squares estimates of the parameters in the wavelet approximation. These measures are de5ned in Section 2. A problem that arises from using weighted least squares is that of constructing the weights required for estimation. We 5nd in Section 2 that for N = 2 (the number of scaling functions generating the multiwavelet system) multiwavelet approximation the squared norm q(x)2 (see expression (2)) is symmetric about x = 12 and periodic with period 2−(m+1) , where m is the 5nite order of approximation. These symmetric

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

325

1.0 1.5

1.0 0.5

1.0

0.5

0.5

0.0

0.0

0.0

-0.5

-0.5 -1.0

-1.0

-1.0 -3 -2 -1 0 1 2 3 (a)

-4 -2

0

2

-5

4

5

(c)

(b)

Fig. 1. Daubechies primary wavelets: (a)

0

3

(x), (b)

5

(x), (c)

8

(x).

and periodic properties are used in Section 3 to construct minimax robust weights and designs for the N =2 multiwavelet approximation. The complicated eigenvalue problem highlighted in Section 3 is eliminated by imposing the condition of unbiasedness. This leads to the construction of minimum variance unbiased (mvu) weights and designs in Section 4. In each section we provide plots to show the structure of the “optimal” designs and weights. An example is discussed in Section 4.1. We use relative e4ciency calculations to show that the mvu designs are more e4cient than the uniform design. An outline of the proofs to the theorems can be found in the Appendix A. Our study involves the multiwavelet system constructed by Alpert (1992) and the Daubechies (1992) wavelets. The scaling functions and the primary wavelets of the Daubechies wavelet systems, commonly represented as L (x) and L (x) respectively, have no closed forms. They are constructed numerically for diHerent values of the wavelet number L; see Fig. 1. The multiwavelet system is generated by several scaling functions 0 ; : : : ; N −1 . For N = 3 the scaling functions are √

0 (x) = I[0; 1) (x); 1 (x) = 2 3(x − 1=2)I[0; 1) (x); √

2 (x) = 6 5((x − 1=2)2 − 1=12)I[0; 1) (x): The primary wavelets, see Fig. 2, for N = 2 and 3, respectively: √ 3(4|x − 1=2| − 1)I[0; 1) (x); 2 0 (x) = 2 1 (x)

= 2(1 − 3|x − 1=2|)(I[0; 1=2) (x) − I[1=2; 1) (x))

3 0 (x)

=2

and

3 2 (x)

0 (x); 3 1 (x)

√ = − 3(30|x − 1=2|2 − 16|x − 1=2| + 3=2)I[0; 1) (x);

√ = − 5(24|x − 1=2|2 − 12|x − 1=2| + 1)(I[0; 1=2) (x) − I[1=2; 1) (x)):

Statisticians with interest in other types of wavelets and in details relating to statistical applications may refer to Wegman et al. (1996), Antoniadis et al. (1994), Donoho and Johnstone (1994), Abramovich and Silverman (1998) and HQardle et al. (1998).

326

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

1

2

1

2

1

0

1

0

0

-1

-1

0

-1

-1 -2

-2

-2 0.0

0.4

0.8

0.0

(a)

0.0

0.8

0.4

0.4

0.8

0.0

(c)

(b)

Fig. 2. Primary multiwavelets: (a)

2 0 (x),

(b)

2 1 (x)

=3

0.4

0.8

(d) 0 (x),

(c)

3 1 (x),

(d)

3 2 (x).

2. Mathematical development Taking the multiwavelet system with N ¿ 1 into consideration, a generalized version of the wavelet representation of the unknown response function E(Y | x)=(x) is given by   j ml 2 −1 N −1     (x) = dl l (x) + cljk N l−j; k (x) + f(x): (1)   j=0 k=0

l=0

Representation (1) transforms the nonparametric regression model into the approximately linear model Y (x) = qT (x)V0 + f(x) + (x)

(2)

with contamination f. Only the 5rst term in (2) can, however, be estimated. When N = 1, de5ne cljk := cjk ; 0 (x) := (x), N l := (x) and m0 := m. Expression (2) is obtained by de5ning the vectors V0 = (d0 ; c00 ; c10 ; c11 ; : : : ; cm; 2m −1 )T ; q(x) = ( (x);

0; 0

(x);

−1; 0

(x);

−1; 1

(x); : : : ;

−m; 2m −1

(x))T :

For N ¿ 1, representation (2) can be obtained by appropriate de5nition of the vectors V0 and q. A second wavelet representation of (x) drops the scaling functions from (1). 1 In (2) we assume that E((x))=0, Var((x))=2 g(x), where g ∈ G={g | 0 g2 (x) d x  1 6 −1 := 0 d x}. We also assume that f is an unknown member of the class



1 1 2 2 F= f| f (x) d x 6  ; q(x)f(x) d x = 0 ; (3) 0

0

where  is some small but known constant. The second condition in F is implied by (1) which in turn implies identi5ability of model (2) where f(·) represents the

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

327

remainder terms in the wavelet representation. The 5rst condition is imposed to ensure that errors due to bias do not swamp errors due to random variation. Given n observations {xi ; yi }ni=1 , an experimenter plans to estimate V0 by weighted least squares. Denote by  the empirical distribution function of {xi }ni=1 and de5ne

1 B = B(w; ) = q(x)qT (x)w(x) d(x); 0



b = b(f; w; ) =

1

0

D = D(g; w; ) = H = BD

−1

0

q(x)f(x)w(x) d(x);

1

q(x)qT (x)w2 (x)g(x) d(x);

B:

1 Then Yˆ (x) = qT (x)Vˆ0 is a biased estimate of E(Y |x), where Vˆ0 = B−1 0 q(x)w(x)y(x) d(x) is the weighted least-squares estimate of V0 with bias B−1 b and covariance matrix (2 =n)H−1 . The mean squared error (MSE) matrix M(f; g; w; ) of Vˆ0 can be written in the form M(f; g; w; ) = 2 [B−1 bbT B−1 + !H−1 ]: We discuss solutions to the minimax design problem: minw;  maxf; g L(f; g; w; ), for some scalar-valued design criterion L(f; g; w; ) depending on f, g, w and  through M(f; g; w; ). There are two special cases of this general minimax problem that can be discussed. These are: (a) min maxf L(f; 1; 1; ); corresponding to ordinary least squares with homoscedastic weights (b) minw;  maxf L(f; 1; w; ); corresponding to weighted least squares with homoscedastic weights. Oyet and Wiens (2000) have studied (a) with some restriction on . Here, we extend that discussion to (b) and the general case. We observe that our results depend on  and 2 only through the quantity ! = 2 =n2 . This quantity may be interpreted as representing the relative importance to the experimenter of variance versus bias: ! = 0 corresponding to a ‘pure bias’ problem, ! = ∞ to a ‘pure variance’ problem. The criteria we shall consider are 1. Integrated mean squared error (IMSE):

1 LQ (f; g; w; ) = (E[(Yˆ (x) − E(Y | x))2 ]) d x 0



= tr M(f; g; w; ) + = bT B−2 b +

0

1

f2 (x) d x

2 trH−1 + n

0

1

f2 (x) d x:

328

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

2. Trace: LA (f; g; w; ) = tr M(f; g; w; ) = bT B−2 b + (2 =n) tr H−1 . 2 T −1 2 )b D b }, where 3. Determinant: LD (f; g; w; ) = |M(f; g; w; )| = ( n )N∗ { 1+(n=|H| N∗ =

N −1 

2ml +1 :

l=0

We shall assume, to avoid trivialities, that if there is a point x0 ∈ S with q(x0 ) = 0, then ({x0 }) = 0. Otherwise, since such a point x0 would contribute nothing to b or B, we could remove it from S. Some modi5cations to the proof of Wiens (1992) shows that a necessary condition for supF; G L(f; g; w; ) to be 5nite is the absolute continuity of . We implement such a design by sampling design points from the distribution function of the optimal design density—see the example of Section 4.1. Denote the density  (x) by p(x) and the largest characteristic root of a matrix by ch1 . De5ne h(x) = w(x)p(x) and l(x) = B−1 q(x)2 . Then, in terms of C = 1 q(x)qT (x)h2 (x) d x, G1 = C − B2 and G2 = C − D we 5nd that 0   1=2

1 max LA (f; g; w; ) = 2 ! [l(x)w(x)h(x)]2 d x + ch1 G1 B−2  ; (4) f∈F;g∈G

0

max LQ (f; g; w; ) = max LA (f; g; w; ) + 2 ;

f∈F;g∈G

max LD (f; g; w; ) = 2

f∈F;g∈G

(5)

f∈F;g∈G



2 n

N∗ −1 

! + ch1 G2 D−1 |H|

 :

(6)

Note from (4) and (5) that Q- and A-minimax robust designs are necessarily identi1 cal. This is a consequence of the orthogonality property, A = 0 q(x)qT (x) d x = I, of orthogonal wavelet systems; it does not hold for nonorthogonal bases. We will make frequent use of this property. 2.1. Euclidean norm: multiwavelet system and daubechies wavelet The mvu designs for heteroscedastic errors constructed in this paper involves q(x)2 . For the N = 2 multiwavelet approximation, there are two representations for the closed form of q(x)2 . The 5rst can be derived from the generalization provided by Oyet and Wiens (2000) for any value of N . The lemmas that follow provide a second representation which highlights some important properties of q(x)2 . In order to emphasize the dependence of q(x) on ml ; l = 0; 1; : : : ; N − 1, we write q(x) as q(x; m0 ; : : : ; mN −1 ). It is not too di4cult to show that for the Haar wavelet system (N = 1) q(x; m)2 = 2m+1 ; x ∈ [0; 1). The mathematical results in this section are speci5cally for the multiwavelet system because the Daubechies wavelet can only be constructed numerically. However, we provide plots of q(x; 1)2 in Fig. 3 for approximations based on 3 (x), 5 (x), and 8 (x). We observe that the plots become smoother as the wavelet number increases.

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

329

3.5 6

4.0

4

3.0

2

2.0

0

1.0

2.5

1.5

0.5 0.0

0.4 (a)

0.8

0.0

0.4 (b)

0.8

0.0

0.4

0.8 (c)

Fig. 3. A plot of q(x)2 (m = 1) for the Daubechies wavelet approximation based on: (a) (c) 8 (x).

3

(x), (b)

5

(x),

Lemma 2.1. Let the components of the vector q(x; m0 ; m1 ) be the N = 2 multiwavelet system with m0 = m1 = m. Then, 2

q(x; m) =

2m+1 −1 

lm (x; k);

k=0

= 2m+3

min

06k62m+1 −1

[1 − 3hm (x; k)];

(7)

where lm (x; k) = 2m+3 [12(2m x)2 − 6(2m x)(1 + 2k) + (1 + 3k(k + 1)]I[2−(m+1) k; 2−(m+1) (k+1)) hm (x; k) = (2m+1 x − k)(k + 1 − 2m+1 x): Lemma 2.2. The squared norm q(x; m)2 satis6es the inequality, 2m+1 6 q(x; m)2 6 2m+3 ; for all x ∈ [0; 1):

(8)

The function lm (x; k) satis6es the identities, lm (x + 2−(m+1) ; k + 1) = lm (x; k);

k = 0; 1; 2; : : : ; 2m+1 − 1

(9)

and lm (1 − x; r) = lm (x; 2m+1 − (r + 1));

r = 0; 1; 2; : : : ; 2m − 1:

(10)

One useful property we can deduce from (9) is the relation: q(x; m)2 = q(x + k · 2−(m+1) ; m)2

for k = 0; 1; 2; : : : ; 2m+1 − 1

and x ∈ [0; 2−(m+1) ): This implies that the value of q(x; m)2 is completely determined by its value in any one of the intervals [2−(m+1) k; 2−(m+1) (k + 1)), (k = 0; 1; 2; : : : ; 2m+1 − 1). In other

330

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

words, q(x; m)2 is a periodic function of x with period 2−(m+1) on the interval [0; 1]. It can be veri5ed from Lemmas 2.1 and 2.2 that q(x; m)2 is a paraboloid in each of the intervals with a minimum value of 2m+1 attained at the midpoint 2−(m+2) (1 + 2k) and a maximum value of 2m+3 attained at the endpoints. The second identity (10) shows that q(x; m)2 is symmetric about x = 12 . Thus, designs based on the N = 2 multiwavelet approximation which are proportional to q(x; m)2 inherit these properties of periodicity and symmetry. In constructing the designs in Section 3, these properties of the N = 2 multiwavelet system motivated the restriction of the density h(x) to the class HS = {h(·) | h(x) = h( 12 − x) = h( 12 + x) = h(1 − x);

0 6 x 6 12 }:

(11)

We believe that similar properties also hold when N ¿ 2. At least numerical results based on the generalization of Oyet and Wiens (2000) appear to indicate this to be true. They have shown that for an arbitrary number of scaling functions N with ml = m for all l, the Euclidean norm is given by 2m−1

q(x; m)2 = 

1 − 1=4N 2

( N −1 ({2m+1 x}) N ({2m+1 x})

− N −1 ({2m+1 x}) N ({2m+1 x})); where {x} is the fractional part of x. The asymptotic value satis5es q(x; m)2 2m+1 m+1 −1=2 (1 − {2m+1 x})−1=2 : = {2 x} N →∞ N ) lim

Although it is di4cult to infer the properties of symmetry and periodicity from this generalization, it provides “added 2exibility” for computing the values of q(x)2 for any N . Lemma 2.3. Let the components of the vector q(x; m0 ; m1 ) be the N =2 multiwavelets with m0 = m1 . Then, q(x; m0 )2 + H(x; m0 ; m1 ) for m0 ¡ m1 ; 2 q(x; m0 ; m1 ) = q(x; m1 )2 + D(x; m0 ; m1 ) for m0 ¿ m1 ; where H(x; m0 ; m1 ) and D(x; m0 ; m1 ) satisfy the di7erence equations H(x; m0 ; m1 ) = H(x; m0 ; m1 − 1) + b1(x; m1 ); D(x; m0 ; m1 ) = D(x; m0 − 1; m1 ) + b2(x; m0 );

H(x; m0 ; m0 ) = 0; D(x; m1 ; m1 ) = 0

and q(x; m0 )2 , q(x; m1 )2 satisfy (7). For k = 0; 1; : : : ; 2m1 +1 − 1 and x ∈ [2−(m1 +1) k; 2−(m1 +1) (k + 1)); b1(x; m1 ) is de6ned by m1 if k is even; 2 [6(2m1 x − k2 ) − 1]2 b1(x; m1 ) = 2 if k is odd: 2m1 [6(2m1 x − k−1 2 ) − 5]

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

331

For k = 0; 1; : : : ; 2m0 +1 − 1 and x ∈ [2−(m0 +1) k; 2−(m0 +1) (k + 1)), b2(x; m0 ) is de6ned by m0 if k is even; 2 [1 − 4(2m0 x − k2 )]2 b2(x; m0 ) = 2 m0 m0 k−1 if k is odd: 2 [4(2 x − 2 ) − 3] 3. Minimax designs We begin with some comments on the classical variance-optimal design for the Haar wavelet approximation of the unknown response curve (x). It has been shown by Oyet and Wiens (2000), that for the N = 1 multiwavelet approximation (1), any design 0 with B(0 ) = I2m+1 minimizes tr{B−1 ()}. They found that the design ∗ which places mass 2−(m+1) in each of the 2(m+1) intervals {[(i − 1)2−(m+1) ; 2−(m+1) i)}i=1; 2; :::; 2m+1 has B(∗ ) = I2m+1 , hence is Q- and A-variance optimal. The question that comes to mind is whether this is the only design with this property. The answer to this question appears to lie in the structure of B(). For any level of approximation l the diagonal elements of B() are 2l ({[2−l k; (2k + 1)2−(l+1) )} + {[(2k + 1)2−(l+1) ; (k + 1)2−l )}): The oH diagonal elements are either zero or 2l=2 ({[2−l k; (2k + 1)2−(l+1) )} − {[(2k + 1)2−(l+1) ; (k + 1)2−l )}); where k = 0; 1; : : : ; 2l − 1. It is not too di4cult to verify that for any design 0 to satisfy B(0 ) = I2m+1 the measure 0 has to place equal mass in each of the 2m+1 intervals. 3.1. Designs for multiwavelet approximations This section is speci5cally designed to highlight some of the di4culties that may arise—even in very simple cases—when constructing designs based on wavelet approximations. We exhibit minimax robust designs for the representation which drops the scaling functions from (1) with m0 = m1 = 0, i.e. q(x) = (2 0 (x); 2 1 (x))T . We take g(x) = w(x) = 1—corresponding to ordinary least squares for an uncorrelated error nonparametric regression model. In this case, the loss functions are simply functions of f and . Also, h(x) ≡ p(x). We shall consider only symmetric designs, i.e. those satisfying p(x) = p(1 − x); x ∈ [0; 1]. For determinant loss one can show, as in Notz (1989), that non-symmetric designs are in fact inadmissible. For LQ and LA there is also some evidence for this inadmissibility. For symmetric p(·) the matrices B and C are diagonal, with diagonal elements

1 2 .i = .i (p) = 2 i (x)p(x) d x; i = 0; 1;

/i = /i (p) =

0

0

1

2 2 2 i (x)p (x) d x;

i = 0; 1;

(12)

332

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

respectively. Ignoring some multiplicative constants independent of p(·), we then 5nd that max LQ (f; ) ˙ !(.0−1 + .1−1 ) + max (/0 .0−2 ; /1 .1−2 ); f∈F

max LA (f; ) ˙ !(.0−1 + .1−1 ) + max (/0 .0−2 ; /1 .1−2 ) − 1; f∈F

max LD (f; ) ˙ (.0 .1 )−1 (! + max (/0 .0−1 − .0 ; /1 .1−1 − .1 )): f∈F

Theorem 3.1. For ÿ = (.0 ; .1 )T de6ne the density  + a b p0 (x; ÿ) = c 1 − (0 6 x 6 1); − 2 2 0 (x) 2 0 (x)

(13)

where the nonnegative constants a = a(ÿ), b = b(ÿ), c = c(ÿ) are determined by Eq. (12) and the requirement

1 p0 (x; ÿ) d x = 1: (14) 0

Then p0 (·; ÿ) minimizes /0 (m) for 6xed ÿ. De6ne ÿQ and ÿD to be the minimizers of, respectively, (!(.0−1 + .1−1 ) + /0 .0−2 )|p=p0 (·;ÿ) and ((.0 .1 )−1 (! + /0 .0−1 − .0 ))|p=p0 (·;ÿ) . Then (i) p0 (·; ÿQ ) is minimax robust for LQ and LA , for those values of ! for which it satis6es /0 .0−2 ¿ /1 .1−2 ; (ii) p0 (·; ÿD ) is minimax robust for LD , for those values of ! for which it satis6es /0 .0−1 − .0 ¿ /1 .1−1 − .1 . The inequality in (i) holds for ! ¿ 4:45; that in (ii) holds for ! ¿ 5:10. For the actual computations it is simplest to write the support of p0 as [0; k] ∪ [l; 1 − l] ∪ [1 − k; 1], to express everything in terms of k and l, and to then consider these as the variables. We 5rst obtain √ a = 2 3(1 − 2(k + l)); b = 3(1 − 4k)(4l − 1); (15) 

1 − 4k c = 8k(1 − 2l) + (1 − 2(k + l)) ln 4l − 1

−1 :

(16)

Note that a and b will be nonnegative for k 6 1=4 6 l 6 1=2 − k. We then 5nd .0 = c(1 + 16(l − k)3 − b);  1 − 4k abc ; /0 = c .0 − b − a c(l − k − 1) − √ ln 2 3 4l − 1   √ 1 1 − 4k 1 + 3.0 − 2 3ac(l − k − 1) − bc ln : .1 = 4 4l − 1 

2

(17) (18) (19)

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

333

Table 1 Some A-, Q-optimal parameter values for designs (13) for multiwavelet approximations !

k

l

a

b

c

00

01

4.45 4.88 9.10 19.5 53.3 88.1

0.0732 0.07 0.05 0.03 0.01 0.001

0.395 0.398 0.415 0.433 0.453 0.462

0.221 0.224 0.243 0.254 0.254 0.258

1.23 1.28 1.58 1.94 2.34 2.53

7.37 7.84 12.3 22.1 50.8 78.8

1.67 1.72 2.15 2.98 4.94 6.36

1.67 1.71 2.03 2.60 3.77 4.62

Table 2 Some D-optimal parameter values for designs (13) for multiwavelet approximations !

k

l

a

b

c

00

01

5.10 6.01 11.7 26.1 38.6

0.0555 0.05 0.03 0.01 0.001

0.403 0.408 0.425 0.443 0.450

0.287 0.294 0.314 0.328 0.336

1.43 1.51 1.84 2.22 2.40

9.44 10.6 17.5 33.2 45.8

2.25 2.59 4.48 8.25 10.6

2.25 2.56 4.22 7.43 9.47

The quantities to be minimized by ÿQ and ÿD may now each be written in the form L(k; l) = !G(k; l) + H (k; l) for functions G and H . The requirement that (k; l) be a critical point of L for 5xed ! yields two further equations, one of which is used to obtain l from k, the other to calculate !. Explicitly, the algorithm is as follows: 1. Vary k over [0; 1=4], obtaining l ∈ [1=4; 1=2 − k] from the equation H1 (k; l)G2 (k; l) − H2 (k; l)G1 (k; l) = 0. The subscripts indicate partial diHerentiation. In all cases we have found that the left side of this equation is, for 5xed k, a monotonic function of l. 2. Compute ! = −H1 (k; l)=G1 (k; l). 3. Compute the remaining quantities from (15) to (19) and then check inequalities (i) and (ii) of Theorem 3.1. The algorithm is easily implemented on Maple, for which the code is available from the author. Some values of the constants are given in Tables 1 and 2. See Fig. 4 for plots of the Q- and D-minimax robust design densities for k = 0:05 and 0.03. In the tables, we denote the two eigenvalues by 00 = 00 (p0 ) and 01 = 01 (p0 ); e.g. for Q-robustness (00 ; 01 ) = (/0 .0−2 ; /1 .1−2 ). For the omitted range of !, i.e. those values for which 00 (p0 ) ¡ 01 (p0 ), one can obtain minimax robust design as follows. As before let p0 (x; ÿ) minimize /0 for 5xed ÿ; let p1 (x; ÿ) minimize /1 for 5xed ÿ. For each loss function L, let L0 (ÿ) = maxf∈F L(f; 0 (·; ÿ)) be the maximum loss associated with p0 , and de5ne L1 (ÿ) similarly. De5ne ÿ(0) ∗ to be the minimizer of L0 (ÿ) within the range de5ned by the condition 00 (p0 ) ¿ 01 (p0 ), ÿ(1) ∗ the minimizer of L1 (ÿ) within

334

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

10 6

8 6

4

4 2 2 0

0 0.0

0.2

0.6

0.4

0.8

(a)

1.0

0.0

0.2

0.6

0.4

0.8

1.0

(b)

Fig. 4. Q-optimal (solid line) and D-optimal (broken line) densities p0 (x): (a) k = 0:05, (b) k = 0:03.

the range de5ned by the condition 01 (p1 ) ¿ 00 (p1 ). Then the minimax robust design density is (1) if L0 (ÿ(0) p0 (x; ÿ(0) ∗ ) ∗ ) 6 L1 (ÿ∗ ); p∗ (x) = otherwise; p1 (x; ÿ(1) ∗ ) (1) with min maxf L(f; )=min (L0 (ÿ(0) ∗ ); L1 (ÿ∗ )). The numerical work is now somewhat more involved, and we do not provide explicit results for this case. We observe however that small values of ! correspond to large values of n, and that for an asymptotic analysis one would normally require 2 to be O(n−1 ), so that errors due to random variation and errors due to bias remain of the same order of magnitude. Thus ! will typically be bounded away from 0 as n → ∞ and so the results which we have exhibited are not severely incomplete.

3.2. Minimax designs and weights We construct minimax robust weights and designs for the N = 2 multiwavelet approximation with q(x) = ( 0 (x); 1 (x), 2 0 (x), 2 1 (x))T . We consider two cases: (i) g(x) = 1—corresponding to a homoscedastic error model and (ii) g(x) arbitrary. We restrict the density h(x) to the class HS . These densities are symmetric within each of [0; 1], [0; 12 ] and [ 12 ; 1], hence are periodic with period 12 . Our motivation is derived from the symmetric and periodic properties discussed in Section 2.1 and the simplicity it aHords the matrices B and C. We use the notation h(x) to denote the ‘doubly symmetrized’ version of h(x) de5ned by h(x) + h(1=2 − x) + h(1=2 + x) + h(1 − x) : 4 1  1=4 We observe that h(x) = h(x) for any h ∈ HS . Also, 0 (x)h(x) d x = 4 0 (x)h(x) d x for any (·) for which the integral exist. h(x) =

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

335

Table 3 Some A-, Q-optimal parameter values for designs (21) for multiwavelet approximations !

30∗

r

c

s

loss

0.01 0.1 0.5 2.0 5.0 10.0 50.0

0.021 0.022 0.025 0.028 0.031 0.032 0.011

83.88 76.58 58.15 42.17 34.25 29.44 0.077

0.482 0.475 0.461 0.443 0.422 0.396 0.034

−19:45 −2:50 −1:12 −0:99 −1:06 −1:12 −89:5

1.037 1.367 2.760 7.635 16.613 29.982 21.228

Theorem 3.2. For 6xed !, let q(x) = ( 0 (x), 1 (x), 2 0 (x), 2 1 (x))T , g(x) = 1, and 30∗ be the 30 which minimizes r(30 − s=16). De6ne the density   + 2 1 s − 4c(30∗ ; !)B−1 (30∗ )q(x) ; 0 6 x 6 1=2; h0 (x) = ! r x− − 4 16 (20) where s = s(30∗ ; !) ∈ (−∞; 1]; r = r(30∗ ; !) ¿ 0; c = c(30∗ ; !) ¿ 0 are chosen to satisfy

E(s) ∗ 30 = 4 (x − 1=4)2 h0 (x) d x; 0



E(s) 1 1 + 16c! B−1 (30∗ )q(x) d x ; r= P(s) 0  E(s) r! 0 {(x − 1=4)2 − s=16}B−1 (30∗ )q(x) d x c= ;  E(s) 1 + 4! 0 (B−1 (30∗ )q(x))2 d x 1  if s 6 0 4  √ E(s) = and P(s) =  1 − s if s ¿ 0;  4 4

!(1−3s) 48 !(1−3s+2s 48

if s 6 0 3=2

)

if s ¿ 0:

Then, the Q- and A-minimax robust weight and design density for the nonparametric regression model under the multiwavelet approximation with regressors q(x) are given by w0 (x) =

4c B−1 (30∗ )q(x)

and

p0 (x) =

h0 (x) ; w0 (x)

(21)

respectively. The values of r, s, c and 30∗ are determined numerically for 5xed !. Some values of the constants obtained by implementing a quasi-Newton algorithm through an Splus code are given in Table 3 (Theorem 3.2) and Table 4 (Theorem 3.3). With su4cient

336

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

Table 4 Some A-, Q-optimal parameter values for designs (23) for multiwavelet approximations !

30∗

r

c

s

loss

0.005 0.01 0.1 0.5

0.0209 0.021 0.0217 0.022

82.86 86.435 83.537 78.206

0.78 0.78 0.776 0.753

−39:02 −18:88 −2:29 −0:79

1.019 1.038 1.377 2.796

2.0

4

1.5

3

1.0

2

0.5

1

0.0

0 0.0

0.2

0.6

0.4

0.8

1.0

0.0

0.2

0.6

0.4

0.8

1.0

(b)

(a)

Fig. 5. Q- and A-optimal weight and design densities (21) (solid line) and (23) (dotted line) for ! = 0:5: (a) w0 (x) (b) p0 (x).

eHort and patience, we believe that other values of the parameters can be obtained. Fig. 5 is a plot of (21) and (23) for ! = 0:5. Theorem 3.3. Let the assumptions of Theorem 3.2 hold with g(x) arbitrary. Let z0 (x) := h01=3 (x) ¿ 0 and de6ne z0 (x) = 71=3 (x) − where

2c(30∗ ; !)!B−1 (30∗ )q(x)4=3 ; 371=3 (x)

  +  2 1 s ! r x− − + 71 (x) ; 7(x) = 2 4 16    + 2  2   8!c3 (B−1 (30∗ )q(x)4=3 )3 1 s  + r x− 71 (x) =  − 27 4 16

(22)

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

337

and s = s(30∗ ; !) ∈ (−∞; 1]; r = r(30∗ ; !) ¿ 0; c = c(30∗ ; !) ¿ 0 are chosen to satisfy

E(s) (x − 1=4)2 z03 (x) d x; 30∗ = 4 0

E(s) 1 ∗ 1 + 8!c r= B−1 (30 )q(x)4=3 z0 (x) d x ; P(s) 0 1=2 

c=

E(s)

0

B−1 (30∗ )q(x)4=3 z04 (x) d x

:

Then, the Q- and A-minimax robust weight and design density for the nonparametric regression model under the multiwavelet approximation with regressors q(x) are given by w0 (x) =

4c2 B−1 (30∗ )q(x)4=3 z0 (x)

and

p0 (x) =

h0 (x) ; w0 (x)

(23)

respectively. 4. Minimum variance unbiased designs The results of the previous section are restricted to a certain class of designs due to the complexity of the eigenvalue problem arising from 5nding the least favourable function f(x). To obtain more general results, we consider the case where the density h(x) is 5xed but not necessarily symmetric. The particular case where h(x) =  corresponds to imposing an unbiasedness condition since



b(f; w; ) = q(x)f(x)h(x) d x =  q(x)f(x) d x = 0; S

S

by the orthogonality condition in F. This problem is equivalent to the minimax design problem with ! → ∞—the pure variance problem. We therefore call these designs, mvu designs. An example is shown in Fig. 6. The results of Theorem 4.1 can be veri5ed from the proof of Theorem 3.3. Theorem 4.1. For any 6xed density h0 (x), design space S, and arbitrary g(x), the weight function w0 (x) and design density p0 (x) given by  [B−1 q(x)h0 (x)]4=3 d x [B−1 q(x)h0 (x)]4=3 ; and p (x) = w0 (x) = S  0 [B−1 q(x)h0 (x)]4=3 d x [B−1 q(x)h01=4 (x)]4=3 S minimizes (4) and hence (5). In particular, if h0 (x) = , then the mvu weight and design are   S q(x)4=3 d x w0 (x) = and p0 (x) = w0−1 (x): q(x)4=3 For nonorthogonal wavelet systems, we replace q(x) by A−1=2 q(x).

338

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

4

2.5

3

2.0 1.5

2 1.0 1

0.5 0.0

0.2

0.6

0.4

0.8

1.0

0.0

0.2

0.6

0.4

(a)

0.8

1.0

(b)

Fig. 6. Mvu weight and density for m = 1 based on the Daubechies Wavelet, w0 (x), (b) p0 (x).

8

(x), approximation: (a)

So far, the methods used in the literature for constructing designs for nonparametric regression models have resulted in design criteria that are asymptotic functions of the mean squared error or variance of some estimator. For instance, MQuller (1984) used the asymptotic integrated mean squared error of the Gasser–MQuller kernel estimator of (x) to derive an optimal design density that is similar in form to the p0 (x) in Theorem 4.1. Chan (1991) found the uniform design to be the minimizer of the asymptotic variance of an estimator of scale. More recently, Xie and Fang (2000) also found the uniform design to be admissible and minimax among a “reasonable subclass of designs” for estimating nonparametric regression models. The Bayesian methods usually require the experimenter to specify a prior distribution for the response function. If it turns out that the prior distribution has been misspeci5ed, the Bayesian designs will be completely unreliable. Two ways in which the wavelet designs, discussed in Sections 3 and 4, diHer from other nonparametric designs are: (1) the optimality criteria used in constructing the designs are exact expressions of the mean squared error; and (2) no prior distributions are required. 4.1. Example In our example, we sample n = 25 design points from the distribution function of the optimal design density in Theorem 4.1 and generate observations at these points from the response: (x) = 2 − 2x + 3 exp{−(x − 0:5)2 =0:01}; x ∈ [0; 1] with Var() = 2 g(x) = 0:2g(x). We then 5t model (2) to the data using (i) g(x) = 1 and (ii) g(x) = x (see Fig. 7). The sampled design points are: 0:023; 0:058; 0:102; 0:141; 0:18; 0:219; 0:262; 0:297; 0:34; 0:383; 0:418; 0:461; 0:504; 0:539; 0:582; 0:617; 0:66; 0:699; 0:738; 0:781; 0:82; 0:859; 0:898; 0:938; 0:977:

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

3

3

2

2

1

1

0

0 0.0

0.2

0.6

0.4

0.8

1.0

0.0

0.2

(a)

0.6

0.4

339

1.0

0.8

(b)

Fig. 7. A plot of data generated from (x) and the Daubechies wavelet 5t based on (a) g(x) = 1 (b) g(x) = x.

5

(x), m=3 and 2 =0:2

Table 5 Relative e4ciency of uniform design to mvu design IV(1) mvu

IV(1) mvu

IV(1) mvu

IVu

IVu

IVu

(0)

(1)

(0)

IV(1) mvu (1)

IVu

g(x) = 1

g(x) = 1

g(x) = x

g(x) = x

0.1258 0.1998 0.0951 0.4012 0.1061

0.0189 0.0343 0.0137 0.0934 0.0158

0.0721 0.0732 0.0255 0.1295 0.0344

0.0104 0.0113 0.0036 0.0226 0.005

(0)

Denotes uniform weights;

(1)

Denotes mvu weights.

The estimated response function is shown in Fig. 7. We compare the performance of the uniform design to the mvu design based on relative e4ciency values computed from rel: eH : = IVmvu =IVu , where IV = (ˆ2 =n) tr H−1 is the integrated variance of the estimated response. In order to avoid boundary eHect, the uniform design points were taken in the interval [0:05; 0:95]. The mvu design was consistently more e4cient than the uniform design in all the cases considered. Table 5 gives the results for only 5ve simulations. Other studies on simulated data from the response function in this example can be found in Antoniadis et al. (1994) and Gasser and MQuller (1984). 5. Concluding remarks The unknown structure of the mean response curve in nonparametric regression models has been a major source of di4culty in nonparametric design problems. This di4culty has contributed signi5cantly to restricting the level of research into design construction for estimating nonparametric models. Our results show that this restriction can be eliminated by using wavelets to approximate the response curve. Results, not

340

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

discussed in this paper, indicate that the complicated eigenvalue problem highlighted in Section 3 can be avoided if the simulated annealing algorithm is used to construct exact integer-valued designs instead of the approximate continuous designs we have discussed. In that case, minimax designs for models that are more general than those discussed in Sections 3.1 and 3.2 can be obtained. Thus, the results we have outlined in this paper is likely the beginning of further research into wavelet designs for estimating nonparametric regression models.

Appendix A. Derivations Proof of Lemma 2.1. De5ne aT (x; m) = (2 ×2

m −m; 0 (x); : : : ; 2 0−m; 2 −1 (x); 0 (x); 2 1 (x); : : : ; 2 0 m −m; 0 (x); : : : ; 2 1−m; 2 −1 (x)): 1

Then, q(x; m) = ( 0 (x); 1 (x); aT (x; m)). It is easily veri5ed that q(x; m + 1)2 − q(x; m)2 = a(x; m + 1)2 − a(x; m)2 = e(x; m + 1); where e(x; j) = 2j [4 − 36(2j x − k∗ ) + 84(2j x − k∗ )2 ]I[2−j k∗ ;2−j (k∗ +1=2)) + 2j [52 − 132(2j x − k∗ ) + 84(2j x − k∗ )2 ]I[2−j (k∗ +1=2);2−j (k∗ +1)) and k∗ is that value of k, 0 6 k 6 2m+1 −1 for which x ∈ [2−(m+1) k∗ ; 2−(m+1) (k∗ +1=2)) or x ∈ [2−(m+1) (k∗ + 1=2); 2−(m+1) (k∗ + 1)) for any x ∈ [0; 1). Therefore, it su4ces to show that 2m+2 −1  r=0

lm+1 (x; r) −

2m+1 −1 

lm (x; k) = e(x; m + 1):

(A.1)

k=0

Note from (A.1), that for an arbitrary value of m, if x ∈ [2−(m+2) r; 2−(m+2) (r + 1)), r = 0; 1; 2; : : : ; 2m+2 − 1 then r if r is even 2 k = k∗ = r−1 if r is odd: 2 The result then follows from the fact that for any x ∈ [2−(m+2) r; 2−(m+2) (r + 1)), we have e(x; m + 1) = q(x; m + 1)2 − q(x; m)2 = lm+1 (x; r) − lm (x; k∗ ):

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

341

The proof is completed by noting that lm (x; k) = 2m+3 [1 − 3hm (x; k)]I[k; k+1] (2m+1 x);

(A.2)

where hm (x; k) = (2m+1 x − k)(k + 1 − 2m+1 x). Proof of Lemma 2.2. Observe from (A.2) that lm (x; k) 6 2m+3 since hm (x; k) ¿ 0 with equality attained at the endpoints of the interval k 6 2m+1 x ¡ k + 1. The fact that lm (x; k) ¿ 2m+1 can be obtained by applying ideas from elementary calculus. Expression (8) then follows from the disjointness of the subintervals k 6 2m+1 x ¡ k + 1 for (k =0; 1; : : : ; 2m+1 −1). Identities (9) and (10) can be veri5ed by direct substitution. Proof of Lemma 2.3. For m0 = m1 , the vector q(x) is given by qT (x; m0 ; m1 ) = ( 0 (x); 1 (x); 2 ×2

−m ; 2m −1 −m0 ; 0 (x); : : : ; 2 0 0 0 (x); 0 (x); 2 1 (x); : : : ; 2 0

m1 −1 −m1 ; 0 (x); : : : ; 2 1−m1 ; 2 (x)): 1

If m0 ¡ m1 we have that j

qT (x; m0 ; m1 )2 = q(x; m0 )2 +

m1 −1 2 

2j 2

2 j 1 (2 x

− k);

j=m0 +1 k=0 2j −1

where q(x; m0 )2 is given by (7). De5ne, b1(x; j) = k=0 2j 2 12 (2j x − k); substitute for 2 1 (2j x − k) and use the fact that for any x ∈ [2−(m1 +1) k; 2−(m1 +1) (k + 1)) (k = 0; 1; : : : ; 2m1 +1 − 1), m1 if k is even; 2 [6(2m1 x − k2 ) − 1]2 b1(x; m1 ) = 2 if k is odd; 2m1 [6(2m1 x − k−1 2 ) − 5] to obtain the result for m0 ¡ m1 . Similar arguments lead to the result for m0 ¿ m1 . Proof of Theorem 3.1. To minimize /0 for 5xed ÿ, it is su4cient to minimize

1 /0 − 2t.0 + 2rt p(x) d x + 2st.1

=

0

0

1

w02 (x)p2 (x) − 2tp(x)(w02 (x) − r − sw12 (x)) d x

(A.3)

for Lagrange multipliers r; s; t chosen to satisfy (12) and (14). The multipliers have been arranged in such a way that the integrand of (A.3) is minimized pointwise by  + r w12 (x) p0 (x; ÿ) = t 1 − 2 : (A.4) −s 2 w0 (x) w0 (x) Now de5ne constants a; b; c in such a way that c(w02 (x) − aw0 (x) − b) ≡ t(w02 (x) − r − sw12 (x));

x ∈ [0; 1];

342

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

then (A.4) and (13) coincide. By the de5nition of ÿQ , p0 (·; ÿQ ) will minimize maxf∈F LQ (f; ) as long as the inequality in (i) holds; an analogous statement holds for determinant loss. Proof of Theorem 3.2. When g(x) = 1, expression (5) reduces to  

max LQ (f; w; ) = 2 1 + ch1 G1 B−2 + ! l(x)w(x)h(x) d x : f∈F

S

De5ne



30 = 4

/0 = 4

/=4

1=4

0 1=4

0 1=4

0

(A.5)

(x − 1=4)2 h(x) d x;

(x − 1=4)2 h2 (x) d x;

h2 (x) d x:

(A.6)

Oyet and Wiens (2000) have shown that the characteristic roots of the matrix G1 B−2 are / − 1 and /0 =48302 − 1, each appearing with  multiplicity two. Now minimize (A.5) with respect to w, subject to the constraint S h(x)=w(x) = 1 to obtain    

2  /0 2 1=2 min max LQ (f; w; ) =  1 + max / − 1; : −1 +! l (x)h(x) d x w f∈F 48302 S  The minimum is attained at w0 (x) = S l1=2 (x)h(x) d x=l1=2 (x). It is easily veri5ed that ch1 G1 B−2 =/−1 once the minimizing density h0 (x) is determined. To determine h0 (x), it su4ces that h minimize 2 

1=4 ! 1=4 " rs! 2 2 1=2 h (x) − 2r!(x − 1=4) h(x) + l (x)h(x) d x ; h(x) d x + ! 2 8 0 0 where the multipliers r and s are chosen to satisfy the side conditions that 30 is 5xed and that h0 (x) is a density. The multipliers have been arranged in such a way that the minimizing h0 (x) is given by (20). We 5nd that minh; w maxf = 2 [!r(30 − s=16)]— purely a function of 30 . For 5xed ! we minimize this function with respect to 30 and use h0 (x) and the de5nition of l(x) to obtain (21). Proof of Theorem 3.3. For arbitrary g(x),   the w which minimizes (4) subject to the constraint S h(x)=w(x) = 1 is w0 (x) = S [l(x)h2 (x)]2=3 d x=[l2 (x)h(x)]1=3 . So that,  

  min max LQ (f; w; ) = 2 / + ! w

f∈F;g

S

[l(x)h2 (x)]2=3 d x

3=2

:

(A.7)

Again, by arranging the multipliers r and s suitably it can be easily veri5ed that the density h0 (x) which minimizes (A.7) must satisfy the equation   2 1 s 1=3 2=3 = 0; h0 (x) ¿ 0: h0 (x) + 2!c(30 ; !)l (x)h0 (x) − r! x − − 4 16

A.J. Oyet / Journal of Statistical Planning and Inference 117 (2003) 323 – 343

343

The only real root of this equation, in terms of z0 (x) = h01=3 (x), is z0 (x) = 71=3 (x) −

2!c(30 ; !)l2=3 (x) ; 371=3 (x)

where 7(x) is given by (22). Proceeding as in Theorem 3.2 completes the proof. References Abramovich, F., Silverman, B.W., 1998. Wavelet decomposition approaches to statistical inverse problems. Biometrika 85, 115–129. Alpert, B.K., 1992. Wavelets and other bases for fast numerical linear algebra. In: Chui, C.K. (Ed.), Wavelets: A Tutorial In Theory And Applications. Academic Press, Boston, pp. 181–216. Antoniadis, A., Gregoire, G., McKeague, I.W., 1994. Wavelet methods for curve estimation. J. Amer. Statist. Assoc. 89, 1340–1352. Box, G.E.P., Draper, N.R., 1959. A basis for the selection of a response surface design. J. Amer. Statist. Assoc. 54, 622–654. Chaloner, K., 1984. Optimal Bayesian experimental design for linear models. Ann. Statist. 12, 283–300. Chan, L., 1991. Optimal design for estimation of variance in nonparametric regression using 5rst order diHerences. Biometrika 78, 926–929. Daubechies, I., 1992. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia, PA. Donoho, D.L., Johnstone, I.M., 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455. Gasser, T., MQuller, H., 1984. Estimating regression functions and their derivatives by kernel method. Scand. J. Statist. 11, 171–185. HQardle, W., Kerkyacharian, G., Picard, D., Tsybakov, A., 1998. Wavelets, Approximation, and Statistical Applications. Springer, New York. Herzberg, A.M., Traves, W.N., 1994. An optimal experimental design for the Haar regression model. Canad. J. Statist. 22, 357–364. Huber, P.J., 1975. Robustness and designs. In: Srivastava, J.N. (Ed.), A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam, pp. 287–303. Mitchell, T., Sacks, J., Ylvisaker, D., 1994. Asymptotic Bayes criteria for nonparametric response surface design. Ann. Statist. 22, 634–651. MQuller, H., 1984. Optimal designs for nonparametric kernel regression. Statist. Probab. Lett. 2, 285–290. Notz, W., 1989. Optimal designs for regression models with possible bias. J. Statist. Plann. Inference 22, 43 –54. Oyet, A.J., Wiens, D.P., 2000. Robust designs for wavelet approximations of regression models. J. Nonparametric Statist. 12, 837–859. Pesotchinsky, L., 1982. Optimal robust designs: linear regression in Rk . Ann. Statist. 10, 511–525. Wegman, E.J., Poston, W.L., Solka, J.L., 1996. Wavelets and nonparametric function estimation. In: Brunner, E., Denker, M. (Eds.), Research Developments in Statistics and Probability. VSP, Utrecht, pp. 257–274. Wiens, D.P., 1992. Minimax designs for approximately linear regression. J. Statist. Plann. Inference 31, 353–371. Xie, M., Fang, K., 2000. Admissibility and minimaxity of the uniform design measure in nonparametric regression models. J. Statist. Plann. Inference 83, 101–111.