Elfving's Theorem Revisited

Elfving's Theorem Revisited

Journal of Statistical Planning and Inference 130 (2005) 85 – 94 www.elsevier.com/locate/jspi Elfving’s Theorem Revisited W.J. Studden Department of ...

189KB Sizes 0 Downloads 70 Views

Journal of Statistical Planning and Inference 130 (2005) 85 – 94 www.elsevier.com/locate/jspi

Elfving’s Theorem Revisited W.J. Studden Department of Statistics, Purdue University, 1399 Mathematical Science Building, West Lafayette, IN 47907-1399, USA Received 11 April 2003; received in revised form 24 May 2003; accepted 30 May 2003 Available online 29 July 2004 Dedicated to Professor Herman Chernoff on his 80th Birthday

Abstract The importance of Elfving’s geometrical result is illustrated by showing how his original proof can be used to prove a number of Elfving-type results without the use of equivalence theorems. It is indicated that some of the elementary equivalence theorems follow easily from the Elfving results. It is also shown that Elfving Theorem is equivalent to a special case of a general theorem in approximation theory. © 2004 Elsevier B.V. All rights reserved. MSC: primary 60J15; secondary 33C45 Keywords: Elfving set

1. Introduction The Elfving Theorem referred to in the title is an important, simple, elegant geometrical result which allows one to choose an experimental design which will minimize the variance of unbiased estimates of a single linear combination c  in an ordinary linear regression p   model Y (z) = f (z)+ error where f (z) = i=1 fi (z)i . Uncorrelated observation are chosen at z1 , . . . , zn from some available set Z, and the objective is to determine the optimal choice of the zi . This result is from Elfving (1952) and will be described more fully below. Professor Chernoff had many research interests, one of which was the design of experiments. During the years 1958–62, I was a graduate student at Stanford and took at least one E-mail address: [email protected] (W.J. Studden). 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2003.05.004

86

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

course from Herman which was on asymptotic theory. Part on my TA/RA duties while at Stanford was to help prepare the graphs for the paper by Herman titled, Optimal Accelerated Life Designs for Estimation, which was published in Technometrics in 1962. (I actually did very little work, if any, on this project since the assistant doing the computations and graphics was very reliable.) In this paper the lifetime of some object depended on some variable, say z, in a way that larger z values produced shorter lifetimes. In investigating this relationship large z values are used to accelerate or shorten the test times due to cost considerations. The objective was to determine which z values should be used to run the tests. The paper uses a simple geometrical result of Elfving which Herman explained to me. This paper contains the first good application of Elfving’s result. I shortly became more involved with this result as Professors Karlin and Kiefer were also working on design problems. Herman wrote an article in 1999 in Statistical Science describing Elfving’s impact on the design of experiments and describes the result with some examples. The same volume contains papers by Nordström and Fellman also concerned with Elfving’s work. The purpose of the present paper is to reiterate the importance of Elfving’s geometrical result. It will be shown that some of the results in design theory that rely on ‘equivalence theorems’ can be derived using Elfving’s original argument. The basic result is described in Section 2. In Section 3 it will be indicated that Elfving result is equivalent to some very fundamental results in approximation theory. A simple extension of Elfving’s result is described in Section 4 and Dette’s Elfving-type result is proven in Section 5 following Elfving’s original argument. The relationship between Elfving’s Theorem and some simple equivalence theorems is discussed briefly in Section 6. There are a number of other results, that are not discussed here, where the Elfving set E, (see (2.8) below) plays a dominant role. For example, the inball radius of E is related to the E-optimality or Elfving’s min–max criterion. This is discussed in Pukelsheim and Studden (1993), Dette and Studden (1993) and Heiligers et al. (1995). It should also be mentioned that Dette (1993) has included extensions, of the main result discussed here, for which the result in Section 4 is a special case.

2. Elfving’s Theorem We are interested in estimating a single linear form c  in the model where observations are given by Y (zi ) = f  (zi ) + εi =

p 

fi (zi )i + εi

i = 1, . . . , n.

(2.1)

i=1

The observations are uncorrelated and have constant variance which will be assumed to be one. A judicious choice of the zi from some set Z to minimize the variance of the least squares estimate of c  is the main objective at this point. An approximate design is written as   z1 , . . . , zr , (2.2) = p1 , . . . , pr

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

87

so that if n observations are allowed npi are chosen at zi . All of the results to follow are concerned only with approximate designs. The normalized information matrix of the design is r  M() = pi f (zi )f  (zi ) (2.3) i=1

and the variance of the LSE of c  is equal to c M − ()c/n, where M − () is any generalized inverse of M(). We assume that any design used will be such that c  is estimable. Rather than perturbating  and arriving at an ‘equivalence theorem’ giving a condition on  for it to be optimal Elfving proceeds with a more elementary approach as follows.  Let Y¯i denote the average of the observations taken at zi and  let ri=1 li Y¯i = l  Y¯ be a r linear estimate of c . The unbiasedness of the estimate result in i=1 li f  (zi ) = c  or c=

r 

f (zi )li

(2.4)

i=1

and the variance is r 1  li2 . pi n

(2.5)

i=1

From Schwartz inequality  r 2 r   li2  |li | pi i=1

and from (2.4) r

(2.6)

i=1

c

i=1 |li |

=

r

i=1 |li |εi f (zi ) r i=1 |li |

=

r

where εi = ±1 and pi = |li |/ Elfving set

r 

pi εi f (zi ),

j =1 |lj |.

The right-hand side of (2.7) is contained in the

E = convex-hull{±f (z)| z ε Z}. If  is such that c is on the boundary of E, then  r 2 r   li2 1  |li |  2 . pi  i=1

(2.7)

i=1

(2.8)

(2.9)

i=1

Now equality can be achieved in both places if and only if we choose zi , pi and εi in (2.7) to produce c and then take li = εi pi /. Elfving’s Theorem. The discrete design {zi , pi } is optimal for estimating c  if and only if there exist εi = ±1 and  such that  c = pi εi f (zi )ε *E. (2.10)

88

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

Moreover, −2 = c M − ()c is the minimum variance. The reader can try out the theory by considering a simple linear regression EY i = 0 + 1 zi with observations zi ε [0,1] and finding the optimal design of estimating 0 + 1 z0 for z0 > 0, in which case we are extrapolating to z0 .

3. Approximation theory It is relatively well known that the minimization of c M − ()c with respect to the design  is equivalent to a problem in the theory of approximation of functions. Thus, c M − ()c = max d

(d c)2 (d c)2 = max  d d M()d (d f (z))2 d

(3.1)

and min c M − ()c = 

1 min max (d f (z))2 d

,

(3.2)

z

where the min is now subject to d c = 1. The interchange of min and max should be given some justification. In case c =(0, . . . , 0, 1) we are approximating the last function fp by a linear combination of the rest. Considering this case is actually no loss of generality since for the more general c  one can reparameterize by using  = K  for some nonsingular k × k matrix K with last row equal to c . The value of  in Eq. (2.10) is the value of the best approximation, using a sup norm, of fp by a linear combination of the other functions. On p. 178 of Singer (1970), Theorem 1.3 (2), we find the following results. Let E = C(Z) denote the linear space of continuous functions on some compact space Z with sup-norm, G = {g1 , . . . , gp−1 } be a (p − 1) dimension subspace and h be some function not in G, then, the function g0 εG is the best approximation to h, if and  only if there exists r points, z1 , . . . , zr (1  r  p) and r numbers 1 , . . . , r = 0 with |i | = 1 such that  j gi (zj )=0, i = 1, . . . , p − 1,  j

j

j (h(zj ) − g0 (zj ))=|h − g0 |.

(3.3)

The resemblence to Elfving’s Theorem is clear. One half of the proof is not difficult. If conditions (3.3) hold then for any gε G   |h − g0 | = i (h(zi ) − g(zi ))  |i ||h − g| = |h − g|. The other direction is more interesting and one way to derive it is by finding the  so that (0, . . . , 0, 1) is a boundary point of the Elfving set with the functions g1 , . . . , gp−1 , h.

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

89

In Singer, of course, no mention is made of Elfving and it is not used to prove the result. Singer considers his Theorem 3.1 as a special case of a more general result, which also appears in most books on functional analysis. This result says that if E is a normed linear space and G is a linear subspace then g0 is a best approximation to some h in E, if and only if there exists a linear functional in the dual E ∗ such that |l| = 1,

l(g) = 0

for all gε G

and

l(h) = |h − g0 |.

This result is geometrically evident, but in full generality is a consequence of the Hahn– Banach Theorem. To obtain Theorem 3.1 of Singer one has to identify the dual E ∗ for the space of continuous functions C(Z) and show that the linear functional needed here is given by l(e) =



i e(zi ),

 that is, l is a linear combination of the point functionals li (e) = e(zi ), with |i | = 1. The point functionals are extremal functionals amongst those of norm one, and the condition  i |i | = 1 is required to give |l| = 1.

4. A simple extension In Studden (1971) a number of strange contortions were used to extend the simple Elfving result to the situation where one is interested in minimizing the sum of variances for unbiased estimates of ci , i = 1, . . . , s. This becomes somewhat trivial if Elfving’s original idea is followed. In fact, with some slight change of notation the outline in Section 2 can be essentially copied.  If li Y¯ is an unbiased estimate of ci , then, as before ci = j f (zj )lij or C=

 j

f (zj )lj ,

(4.1)

where C is the matrix with columns c1 , . . . , cs and lj = (l1j , . . . , lsj ). Then the variance of  li Y¯ is j lij2 /pj and the sum is       lij2  |lj |2  2 2  = , |lj | = lij pj pj i j j i  2   |lj | . j

90

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

We can now proceed as before and rewrite (2.7) in the form  |li |f (zi )εi  C  pi f (zi )εi , = i = i |li | i |li |

(4.2)

i

 where now εi = li /|li | is an s-vector of length one (εi εi = 1) and pj = |lj |/ k |lk |. It is now apparent that we should define an extended Elfving set of p × s matrices

Es = convex-hull{f (z)ε |z ε Z, ε εR s , ε ε = 1}.  It then follows that j |lj |  where  is such that C is on the boundary of Es . Thus the sum of variances is minimized when  pi f (zi )εi ε *Es ; = i

the optimal design and estimate are determined as before, and −2 gives the minimum sum of variances. 5. Dette’s Theorem Dette (1993) extended Elfving’s result in a more interesting manner by considering s different models with means fj (z)j =

kj 

fj k (z)j k

k

having the same design space Z. (Note here that fj (z) and j are vectors of length kj , i.e. fj (z) = (fj 1 (z), . . . , fj k j (z)) and j = (j 1 , . . . , j k j ).) For the jth model cj j is to be estimated. The criterion here is changed to   V ( ) = j ln(Var(cj ˆ j ())), j > 0, j = 1, (5.1) j

j

which allows for investigating various models and robustness questions and also produces an Elfving-type result for D-optimality. The D-optimality comes about by starting with one model, say, g  (z), and then letting j = 1/m, f1 =(g1 ), f2 =(g1 , g2 ), .. .. .. fm =(g1 , g2 , . . . , gm )

 and ci = (0, . . . , 0, 1). If Mi () = fi (z)fi (z) d(z), i = 1, . . . , m then the optimally criterion easily reduces to the maximization of the determinant of the information matrix of Mm () since ci Mi− ()ci = |Mi=1 |/|Mi |.

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

91

As before, the design {zj , pj } must be such that ci i is estimable with respect to the ith model so that there are lij with ci =



fi (zj )lij

and

j

Note that



i

ci Mi− ()ci =

 lij2 pj

j

= −2 i .

(5.2)

i ln(−2 i ) is minimized over pj by choosing

 pj = i lij2 2i

(5.3)

i

(first show that pj is proportional to the right side and then show the sum over the left side is one using Schwartz inequality). The pj are not defined explicitly by the above since 2i depends on the pj , however, proceeding as before we write

i ci =



pj fi (zj )εij

j

where

εij = 

lij i i

i lij2 2i

.

(5.4)



i εij2 = 1, in which case the corresponding Elfving set is      2 i εij = 1 . E() = convex-hull (ε1 f1 (z), . . . , εm fm (z)|zεZ and

Note that

i

(5.5)

i

The Elfving set is now written as one long vector since the models with fi (z) are allowed to be of different dimensions. Hyperplanes bounding this Elfving set on one side are denoted  , 1) so that by (a1 , a2 , . . . , am   ai fi (z)εi  1 for all z, whenever i εi2 = 1. (5.6) i

i

Dette’s The design  = {zi , pi } is V () optimal if and only if there exists i , εij  Theorem. 2 = 1 such that with i i εij

i ci =



pj fi (zj )εij , i = 1, . . . , m,

j

 (1 c1 , . . . , m cm ) ε *E()

(5.7)

 , 1) supporting E() at ( c , . . . ,  c ) and for the hyperplane (a1 , a2 , . . . , am 1 1 m m

i ai ci = i .

(5.8)

 (It is shown further that −2 i = ci Mi ()ci .)

Proof. Dette uses the equivalence theorem conditions in (6.6) below. Our purpose here is to derive the result directly following Elfving and produce the equivalence theorem as a consequence. Suppose that the conditions in (5.7) and (5.8) hold. If we consider any other

92

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

 ) is in E() competing design then following (5.2) and (5.4) we have that (1 c1 , . . . , m cm implying that  i ai ci  1. i

This implies from (5.8) that   i i  1, i i

and hence     i i

i

i



 i

i

i 1 i

implying that     i  1 i  1  . 2i 2i Thus the conditions in the theorem imply optimality. Suppose now that  = {zj , pj } is optimal. Then there are optimal estimates li Y¯ with  2 variances −2 j lij /pj . We will show that (5.7) and (5.8) now hold with these i . To i =  ) must be a begin it is easy to show that if  = {zj , pj } is optimal then (1 c1 , . . . , m cm boundary point of E(). Then following Dette (1993, p. 758) one has the corresponding  , 1) for which hyperplane (a1 , . . . , am ai = i i Mi− ()ci for some generalized inverses Mi− (). In this case

i ai ci = i 2i ci Mi− ()ci = i ,  − since we have chosen −2 i = ci Mi ()ci .



6. Equivalence Theorems An excellent detailed analysis of equivalence theorems for optimal designs is given in Pukelsheim (1993). Here we indicate that in simple situations the equivalence theorem will follow once one has an Elfving-type theorem. Comparing the equivalence theorem for Doptimality from Dette’s result with the usual equivalence theorem for this case produces an interesting (well known) identity for quadratic forms. One of the simplest equivalence theorems is for estimating the single linear form c  described in Section 2. This states that the design  is optimal for estimating c  if and only if there exists some generalized inverse M − () such that (c M − ()f (z))2 1 c M − ()c

for all z ε Z.

(6.1)

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

93

This is related to Elfving’s Theorem by the fact that the quantity c  M − () a =  c M − ()c describes the support plane (a,1) to the Elfving set Eat the boundary point c. Thus if  c = pi εi f (zi )ε *E, (6.2) then there exists a support plane (a,1) such that a  εf (z)  1

for all ε = ±1 and z ε Z and a  (c) = 1

(6.3)

It is easily seen that the first part of (6.3) holds if and only if (a  f (z))2  1

for all z ε Z.

(6.4)

If (6.3) holds then necessarily a  εi f (zi ) = 1 or

εi = a  f (zi ).

If these εi are inserted into (6.2) the

c = M()a.

(6.5)

In this case there exists some inverse M − () such that a = M − ()c and inserting this into (6.4) produces the equivalence condition (6.1). The main problem in finding the optimal design is to find boundary points εi f (zi ) of E so that c is in the convex cone generated by these points. We turn now to the equivalence theorem corresponding to Dette’s result. This states that in the setting of Section 5 the design  is V () optimal if and only if there exist inverses Mi− () such that  i

i

(ci Mi− ()fi (z))2 ci Mi− ()ci

 1 for all z ε Z.

(6.6)

 , 1) This can be obtained in a manner similar to the above using the support plan (a1 , . . . , am and the fact that (5.4.1) is equivalent to   (ai fi (z))2  1, whenever i εi2 = 1. (6.7) i

i

A rather interesting result then emerges for the case of D-optimality described at the beginning of Section 5. Recall that for this situation ci is the vector (0, . . . , 0, 1) of length i and fi = (g1 , . . . , gi ). If one has only one model with regression functions g = (g1 , . . . , gm ) then the design  is D-optimal if and only if maximizes the determinant of Mg (). The well-known equivalence theorem in this case, says that  is D-optimal if and only if g  (z)Mg−1 ()g(z)  m

for all z ε Z.

(6.8)

94

W.J. Studden / Journal of Statistical Planning and Inference 130 (2005) 85 – 94

The question arises as to whether (6.8) is equivalent to (6.6) with the ci and fi as described above and i = 1/m. Thus we have  (c M − ()fi (z))2 i i i

ci Mi− ()ci

= g  (z)Mg−1 ()g(z).

That these are actually the same (as they must be) follows from taking an LU decomposition of Mg () = LU with lower and upper triangular matrices L and U. The details are left to the reader. 7. For further reading The following references could also be of interest to the reader. Chernoff (1962, 1999); Fellman (1999); Nordström (1999). References Chernoff, H., 1962. Optimal accelerated life designs for estimation. Technometrics 4, 381–408. Chernoff, H., 1999. Gustav Elfving’s impact on experimental design. Statist. Sci. 14, 201–205. Dette, H., 1993. Elfving’s Theorem for D-optimality. Ann. Statist. 21, 753–766. Dette, H., Studden, W.J., 1993. Geometry of E-optimality. Ann. Statist. 21, 416–433. Elfving, G., 1952. Optimum allocation in linear regression. Ann. Math. Statist. 23, 255–262. Fellman, J., 1999. Gustav Elfving’s contribution to the emergence of the optimal experimental design theory. Statist. Sci. 14, 197–200. Heiligers, H., Dette, H., Studden, W.J., 1995. Minimax designs in linear regression models. Ann. Statist. 23, 30–40. Nordström, K., 1999. The life and work of Gustav Elfving. Statist. Sci. 14, 174–196. Pukelsheim, F., 1993. Optimal Design of Experiments, Wiley. Pukelsheim, F., Studden, W.J., 1993. E-optimal designs for polynomial regression. Ann. Statist. 21, 402–415. Singer, I., 1970. Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces. Springer, Berlin. Studden, W.J., 1971. Elfving’s Theorem and optimal design for quadratic loss. Ann. Math. Statist. 42, 1613–1621.