Bayesian optimal one point designs for one parameter nonlinear models

Bayesian optimal one point designs for one parameter nonlinear models

journal of statistical planning Journal of Statistical Planning and Inference 52 (1996) 17 31 ELSEVIER and inference Bayesian optimal one point des...

637KB Sizes 1 Downloads 134 Views

journal of statistical planning Journal of Statistical Planning and Inference 52 (1996) 17 31

ELSEVIER

and inference

Bayesian optimal one point designs for one parameter nonlinear models H o l g e r D e t t e a'*, H . - M . N e u g e b a u e r b a Institutfdr Mathematik, Ruhr-Universitiit Bochum, Universitiitsstr. 150, 44780 Bochum, German3, b Debis Aviation Leasing GmbH, Epplestr. 225, 70567 Stuttgart, Germany Received 8 March 1993; revised 17 April 1995

Abstract For nonlinear one parameter models and concave optimality criteria there always exists a locally optimal one point design. This can be proved by an application of Caratheodory's theorem (Lfiuter, Math. Operationsforsch. Statist. Set. Statist. 5 (1974a) 625-636). If prior distributions with densities are used, this theorem gives no useful bound on the number of support points of a Bayesian optimal design. Chaloner (J. Statist. Plann. Inference, 37 (1993) 229-236) gave a sufficient condition on the support of the prior distribution for the existence of a Bayesian optimal one point design. In this article, a condition on the shape of the prior density is given, which is also sufficient for the existence of a Bayesian optimal one point design in nonlinear models with one parameter. AMS Subject Classification. Primary 62K05; secondary 62F15, 62A10 Keywords: Bayesian design; Optimal design; Nonlinear models; Maximum likelihood estimation; Mixture distribution; Logistic regression

I. Introduction W e c o n s i d e r a class of n o n l i n e a r regression m o d e l s where Y ( x ) is a real valued response, x e ~ ~_ ~ is the e x p l a n a t o r y variable a n d 5 r denotes a convex design space. T h e response d e p e n d s on a single p a r a m e t e r 0 ~ O ~_ ~ a n d different o b s e r v a t i o n s are assumed to be independent. If nl observations are taken at xl (i ~--- 1, . . . , k), ~i= k 1 ni z ?l, then we d e n o t e by k ni 92 ( xi, O) i=1

* Corresponding author. 0378-3758/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0 3 7 8 - 3 7 5 8 ( 9 5 ) 0 0 1 0 4 - 2

18

IL Dette, I-L-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17 31

the Fisher information of the design {xi, ni}/k=1. Every design of this type (exact design) can be represented by a probability measure on the design space with masses nJn at the points x~ (i = 1, ..., k). An approximate design is a probability measure q with finite support on the design space ;~" (see Silvey, 1980, p. 13) and the approximate (or normalized) Fisher information matrix of an approximate design q is denoted by

I(O,q) = f~ gZ(x,O)dq(x).

(1.1)

Throughout this paper we consider approximate (or continuous) designs. In practice, for moderate sample size n, good exact designs can frequently be found by integer approximation to the optimal approximate designs. A typical example for this type of models are nonlinear regression models with one parameter

Y (x) = f(x, O) + ~,

(1.2)

where e is a normally distributed error term with mean 0 and known variance a 2 > 0 which is assumed to be 1 without loss of generality. Iffis differentiable with respect to 0 with first derivative g(x, O) = df(x, O)/dx, then the Fisher information matrix of an approximate design in the model (1.2) is given by (1.1). An optimal design maximizes a function of the Fisher information 1(0, q). The criteria we will use throughout this paper are concave Bayesian optimality criteria (see Pronzato and Walter, 1985; Chaloner, 1987; Chaloner and Larntz, 1989). More precisely, if ~ is a prior distribution for the unknown parameter 0 and ~b is a concave function we call

~¢ = {E¢[dp(g2(x,O))]l x ~ f }

(1.3)

the induced design space and assume that this set is bounded from above. A design is called Bayesian ~b-optimal with respect to the prior ~ if it maximizes the function

q~O1) = E~[(a(I(O,~l))] = f c~(l(O, r/)) d~(0). 3o

(1.4)

Here and throughout this paper it is assumed that all required integrals with respect to the prior distribution ~ exist and that there exists at least one design which maximizes q~. If the prior distribution is concentrated at only one point, say 0o, one obtains the locally q%optimality criteria, which require a "best guess" 0o for 0 (see Chernoff, 1953). This means that a locally ~b-optimal design maximizes ~bO°(r/): = (a(I(Oo, ~)). There are many examples of locally optimal designs derived in closed from as functions of 00 (see for example Kitsos et al., 1988; Ford et al., 1992; Dette and Haines, 1994). For nondegenerate prior distributions, the situation is more complicated and most of the Bayesian ~b-optimal designs are derived numerically (see e.g. Chaloner and

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17 31

19

Larntz, 1989, 1992). Some analytical results have recently been found by Mukhopadhyay and Haines (1995) and Dette and Neugebauer (1996). Examples of optimal designs for one parameter models, supported at one point, can be found in Chaloner (1993), who gave a sufficient condition for the existence of an optimal one point design. This condition requires a very concentrated support of the prior distribution. In contrast to these results our work will give a sufficient condition for the existence of an optimal one point design, which is based on the shape of the density of the prior distribution. Roughly speaking, this condition requires not 'too heavy' tails of the prior density. In Section 2, we give a short introduction into the general theory and show how a nonlinear design problem can be related to a design problem for a mixture of linear regression models (see LS.uter, 1974a, b). This approach is used in order to derive upper bounds for the number of support points of the Bayesian @optimal designs. In Section 3 we present a sufficient condition, which guarantees, that a Bayesian @optimal one point design for a model with Fisher information (1.1) exists. Because one point designs are locally optimal, these results show that in certain cases locally optimal designs are robust with respect to some uncertainty in the best guess for the unknown parameter. In Section 4 we apply this result in the two examples discussed by Chaloner (1993). Here we consider an exponential regression and a logistic regression model and give sufficient conditions on the density of the prior distribution such that a one point design is Bayesian D-optimal. Furthermore, we obtain explicit expressions for the optimal designs which show, that the Bayesian D-optimal designs depend in these cases only on the expectation of the prior distribution. Finally, we present an example of a prior for which our sufficient condition is violated and no Bayesian-optimal one point design exists. It is also demonstrated that our result does not provide a necessary condition for the existence of optimal one point designs.

2. The number of support points of Bayesian optimal designs In this section we briefly discuss a result on the number of support points of a Bayesian @optimal design. Because this result can be stated for models with more than one parameter we consider a class of nonlinear regression models with Fisher information matrix given by

l,,(O,q) = f~. g(x,O) g(x,O)T dtl(x)6 [~mxm ,

(2.1)

where 0 = (01, ..., 0m) E Om __CW" denotes an m-dimensional vector of unknown parameters and/I is a design on the design space 5/" _c N. This class contains the general nonlinear regression model

Y(x) = f(x,O) + ~,

(2.2)

20

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17-31

where e is a random error with zero mean and variance as2 > 0 (which is assumed to be one without loss of generality). If f is differentiable with respect to 0 then the Fisher information of a (approximate) design for the model (2.2) is precisely given by (2.1) where g(x, O) = df(x, O)/dx. A Bayesian ~bm-optimal design maximizes ~)m (t]) : E~ [ff)m(lm (0, t]))], where q~mis a concave function on the nonnegative definite matrices and the expectation is taken with respect to the prior distribution ~ on the parameter space O. Note that this model can also be obtained from a class of linear models l~0(x) = g (x, 0)v fl + e,

(2.3)

where [I = ([11. . . . , tim)' is the vector of unknown parameters and 0 is an index for the different models. The Fisher information matrix of an approximate design q for estimating the parameter [1 in the model with index 0 is given by I,,(0, q). For this reason the problem of finding an optimal design for the estimation of 0 in the general nonlinear model with Fisher information matrix (2.1) is equivalent to the optimal design problem for the estimation of [1 in the class of linear models given by (2.3). Optimum experimental designs for a class of different models have been studied by many authors (see e.g. L/iuter, 1974a, b; Dette, 1990). By a similar argument as used in the proof of Theorem 4 in L&iuter (1974a) we obtain the following result which provides an upper bound for the number of support points of the Bayesian ~bmoptimal design. Theorem 2.1. I f the support of the prior distribution ~ consists of n points, then there exists a Bayesian c~,,-optimal design, which is supported at at most nmlm2+1) different points. Remark 2.2. In the case of nonlinear models with one parameter (m = 1), this means, that the support of the Bayesian optimal design consists of no more points than the support of the prior distribution. Especially there always exists a locally optimal design with one support point. Note also that Theorem 2.1 generalizes Theorem 3.1 in Lindsay (1983) who needed results of this type in case m = 1 for the maximum likelihood estimation of mixture distributions.

3. A sufficient condition for optimal one point designs We are now considering the nonlinear model with one parameter as described in Section 1. Chaloner (1993) presented a sufficient condition for the existence of a Bayesian @optimal one point design in this model. Roughly speaking, Chaloner's condition requires the range of the support of the prior distribution to be small. The goal in this section is the derivation of a sufficient condition for the existence of an optimal one point design which uses the shape of the prior density. For the formulation of our main result we need some notation and technical assumptions on the prior

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17-31

21

distribution and the optimality criterion which are rather standard in the literature of optimum experimental design (see e.g. Silvey, 1980, pp. 17-19). Throughout this paper fix denotes the design with mass 1 at the point x ~ X. We assume further that the (concave) criterion ¢ and the prior distribution ( guarantee the existence of the directional derivatives d(q,x):

do(q,x):

= lim 4((1

- e)r/+

= lim ¢(I(0,(1

-

e6x) -

e6x)) - ¢(l(O,q))

e)r/+

~.10

~(r/)

8

(3.1)

(for all 0 E O) such that the equality

d(q,x) = fo do (q,x) d~(0)

(3.2)

holds for all x e X (note that (3.2) means that the limit and the integration in (3.1) can be interchanged). Furthermore, let ~ ~(6x) be continuous in X and assume that the integration with respect to 0 and the differentiation with respect to x in the optimality criterion and the directional derivative d(q, x) can be interchanged, that is

~---~45(6x)=fo-~X~)(I(O,8x))d~(O)

(3.3)

d

(3.4)

and

d (th x) =

;od-~x do (q, x) d~ (0).

Theorem 3.1. Let pc(.) denote the density of the prior distribution ~ on a convex parameter space 0, • (6x) > - oc for all x in the interior of a convex design space X _ ~ and let 6x* denote the Bayesian O-optimal one point design, that is

4~(6x,) >>.~(6x) for all x e X .

(3.5)

Assume further that the function k (x, 0) = [do (6x*,x) + ¢'(I (0, 6x,) I (0, fix*)] PC(0)

= O'(I(O,(~x,)) I(0,6x)p¢(O)

(3.6) (3.7)

is logarithmic concave on X x O, then the Bayesian 4)-optimal one point design 6x, is also Bayesian C-optimal within the class of all designs.

22

H. Dette, H.-M. Neugebauer/Journal of' Statistical Planning and Inference 52 H996) 17-31

Proof. By definition of do (q, x) we get for all 0 e O

do(6,.,x) = lim ~b((1 - e,)l(0,6~.) + M(O,6D) - 4~(1(0,6~.)) ~:1o

i~

= lira ~b(l(O,6~.) + ~:(I(O,~D - l(O,6x.)) - ~(I(0,6~.))

= (~' (I(O, 6x,))(1(0, 6~) -- I(0, 6,,))

(3.8)

which proves the equality between (3.6) and (3.7). The support point x* of the best one point design maximizes 4, (6x) in f . Therefore, x* must either be a root of ~ 4~(fix) or corresponds to the upper boundary of the induced design space c5, defined in (1.3). In the following discussion we distinguish these two cases. (1) Let x* be a root of the derivative of ~b(b~), that is (note the assumption (3.3)) 0 = d

d

(6x) .....

..... .p¢(O)dO.

(3.9)

Combining (3.8) with (3.4) gives d d(;J~.,x) = [ d d~ 30 dxx [~b'(I(0, 3,,))(I (0, 3~) - I(0,6x,))] pc (0) dO

(3.10)

= fo ~6'(I(O'b'*))~x I(O, bx)p~(O)dO, and from (3.9) and (3.10) it follows that d d(6~,,x) = 0 dx " ..~= .,.,

if and only if

d

dx

,P (6x) . . . . ,

= 0.

(3.11)

Now, k(x, O) is logarithmic concave which implies by Theorem 2.16 in Dharmadhikari and Joag-dev (1988) that the function

h(x) = fo k(x,O) dO = d(b~,,x) + fo ~'(l(O,c~x,))I(O,d,,)p~(O)dO

(3.12)

is also logarithmic concave. Thus we obtain from (3.11) that d(6x,,x) has a global maximum at the point x*, that is (note (3.8))

0 = d(cSx,,x*) = supd(6x,,x). The assertion finally follows from Whittle's equivalence theorem in the Appendix. (2) If x* is not a root of ~ ~b(6x) then ~ qJ(6x) has no zeros in ~ (because of concavity and (3.5)) and must be of constant sign. We consider only the case adx q~(CSx)< 0 for all x e f , the other case can be treated similarly. Thus x* has to be the

H. Dette, H.-~L Neugebauer/Journal o f Statistical Planning and Inference 52 (1996) 17 31

23

smallest point in Y" and it follows that

dx qS(~x)

=

¢'(l(O, fix))

I(0,6x)p~(O)

x*

< O. ~=x

F r o m formula (3.8) we have d(6~., x*) = 0 and the same argument as in the proof of the first part yields ~dd ( 6 x . ,

x) ~ =x*

= - -d

4~(6x) ,-=

dx

< O. x*

The logarithmic concavity of the function h(x) in (3.12) implies that there is a global m a x i m u m of d(6x,,x) at the point x*. The assertion follows again by Whittle's equivalence theorem in the Appendix. [] Remark 3.2. It follows from the preceding proof that Theorem 3.1 remains true if the function

h(x) = fo [do(ax.,x) + ¢'(I(0,6x.)I(0,6~.)]

(3.13)

d~(0)

is logarithmic concave. Thus Theorem 3.1 is also applicable for discrete prior distributions provided that the logarithmic concavity of the function h in (3.13) can be established. An important example for such a situation is the problem of m a x i m u m likelihood estimation of mixture distributions (see Lindsay, 1983). In this model one assumes that independent identically distributed random variables X1 . . . . , X, can be observed which have a mixture density

fQ(x) = f fo(x)dQ(O), where Q denotes the mixing distribution on the parameter space O. The m a x i m u m likelihood estimator of the mixing distribution Q maximizes the function

j=l

j=l

where Yl, ---, Yk denote the different observations among x1, kj = # { i e {1 . . . .

...

,

Xn and

,n}lyi=xi}

is the number of x's equal to yj (j = 1. . . . . k). Now, for 0 _~ ~ this problem coincides with the Bayesian D-optimal design problem (¢(x) = log x, d(xl, x2) = x2/xl - 1)for

24

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17-31

the prior distribution with masses kiln at the points yj. Thus if the function fo (Y~) h(O) = ~ kj j:l f0*(Yj)

is logarithmic concave (as a function of 0), then the maximizing distribution (~ of the likelihood L(Q) is concentrated at the point 0*.

4. Examples In this section we discuss some applications of Theorem 3.1 for the Bayesian D-optimality criterion, other (concave) optimality criteria can be treated similarly. Bayesian D-optimal designs maximize the expectation of the logarithm of the determinant of the Fisher information matrix. In the model with one parameter we thus have to maximize the function

q,(~) = E, [log I(0,,1)], where the expectation is taken with respect to a prior distribution ~ on the parameter space O. Note that for the Bayesian D-optimality criterion the function k (x, 0) in Theorem 3.1 reduces to

1(0,a~)

k (x, O) = (do (a., x) + 1) Pc (0) - I (0, ax.) p~ (0)

(4. ])

and for the application of this theorem we have to establish the logarithmic concavity of this function. 4.1. Exponential regression

Consider the exponential regression model Y(x) = e -~x + ~, a > 0 ,

which we will use in the form (0 = e -a) Y(x) = OX +e,

O e O ~ _ [0,1].

(4.2)

Assume that the design space ~" is the interval [0, 1] and let Pc(') denote the density of the prior distribution on O. Then a Bayesian D-optimal design maximizes the function

4'(n) = folog(fEo, l X2OZX

2 dq(x))pg(O) dO,

H. Dette, H.-~L N e u g e b a u e r / J o u r n a l o f Statistical Planning and Inference 52 (1996) 1 7 - 3 1

25

which reduces for a one point design 6~ to • (6x) = ~ log(x282~-2)p~(8)dO = f [21ogx + ( 2 x Jo Jo

2)logO]p~(8)dS.

Differentiating this function with respect to x and equating to 0 gives the Bayesian D-optimal one point design 6~, with mass one at the point

mi.{,,-[fo

'}

(4.3)

If (do(6x,,x) + 1)p~(0) is logarithmic concave in (x, 8) e [0, 1] x [-0, 1], then the best one point design 6~, is Bayesian D-optimal, by Theorem 3.1 and (4.1). A simple calculation shows that X 2 82x

do(6x,,x)+ 1 -

2

X * 2 82x * - 2 "

Now

x 2 82x

82(do(6~*,x) + 1)

--

(x,)Zo2x, 2

is logarithmic concave (which follows readily by calculating the second derivatives) and consequently we obtain from the representation k(x,O) = (do(fi~,,x) + 1)p¢(8) -- 82(do(3~,,x) + 1)(8-2p~(8))

that the logarithmic concavity of the function 8-2p~(8) implies the logarithmic concavity of the function k(x, 8), Combining these arguments with Theorem 3.1 gives the following corollary. Corollary 4.1. If the prior distribution ~ has a density p~(') such that p~(O) 8 -.2 is a logarithmic concave function on 6) ~_ [0, 1], then the best one point design 6~, supported at the point x* in (4.3) is Bayesian D-optimal with respect to the prior ~ in the exponential regression model (4.2). Remark 4.2. (a) The design fix* is locally D-optimal for a best guess 80 = e - 1.~* (b) An example of a class of prior distributions satisfying the condition of Corollary 4.1 are Beta distributions 1

pp~a,b~(8)

- i0~-l(1--8)b-~ fl(a,b)

(0<0<1)

with parameters a >~ 3 and b ~> 1. Note that this prior is not covered by Chaloner's sufficient condition on the range of the support of the prior distribution (see Chaloner, 1993).

26

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 1~31

4.2. Logistic regression As a second example we consider a logistic regression model with known slope. Thus the responses at a value x e . f = R of the explanatory variable are independent Bernoulli random variables with probability of response p(x) where

p(x) - 1 + e -e~x-°~

(4.4)

(see also Chaloner, 1993). The value of the slope fi is assumed to be known and without loss of generality we put [J = 1. The normalized Fisher formation of a design measure q is given by I(O,q)

f =

ex- o ~(1 + eX_0)2 d r ( x ).

For a Bayesian D-optimal design we have to maximize the ~b(q) = to log(I(0, t/))d~(0). The directional derivative d(r/, x) is given by

I (0, q) p~ (0) dO d(.,x)=fodo(e,x)<(o)=foI(°'ax)

function

1.

Differentiating ~b(fix) with respect to x shows that the best one point design must be supported at x*, where x* is a root of the equation

f o 1 2e + x-0 e ~ - ° p ~ ( O ) d O - 1 = 0.

(4.5)

Now, if (do (6x,, x) + 1)pc (0) is logarithmic concave, the Bayesian D-optimal design is a one point design at a root x* of (4.5). The logarithmic concavity of this function is equivalent to the fact that the matrix - M is nonnegative definite where

m

e o

e O

- 2(i + eX_0)2 e:' o 2(1 + ex_O)2

1

2 (1 + eX-o)2 [i - 2

ex-° 1 + e x o)2

eX* o 1 - (1 + e x*-')2j

+

p~(0) p~(0) -- p~(0) p~(0) p~ (0)

"

is the matrix of the second derivatives of the function log[(do(6x,,x)+ 1)p~(0)]. Because

2eX-O q(x - O) - (1 + eX-°) 2 > 0

H. Dette, H.-~ Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17-31

27

for all x, 0 this is equivalent to d e t ( - M)~> 0. C o m p u t i n g this determinant and dividing by the term q (x - 0) gives the condition p~2 (0) - p~ (0) p~ (0) e x~ o ~>2 - q(x*-O). p~(0) (1 + e x• o)2 T a k i n g into account that q (x* - 0) ~< } for all x*, 0, yields as sufficient condition for the logarithmic concavity of the function (do (fix*, x) + 1)p~ (0) that

- [log (p~ (0))]" = p~2 (0) - p~ (0) p~ (0) 1 p~ (0) ~> ~

for all 0 e O.

(4.6)

Corollary 4.3. Assume that the density of the prior distribution p~ (0) satisfies (4.6), then the Bayesian D-optimal design for the logistic regression model (4.4) is the one point design 6x. concentrated at the point x*, where x* is the unique solution of (4.5).

Moreover, if p~(.) is symmetric with respect to the point c ~ gO, then the Bayesian D-optimal design is concentrated at the point c = E~ [0]. Proof. Using the above arguments and T h e o r e m 3.1, we only have to show that there exists a unique root of the equation ,

I(x) = 2

e x -

f~

o

~ 1 + e x-0p~-(0) d 0 -

(4.7)

1 = O.

This follows easily because l is a strictly increasing function and lira . . . . . . l(x) = - 1, lim ..... l(x) = 1. In order to prove the second part of the assertion, we assume without loss of generality that p~(.) is symmetric with respect to the point c = 0. Putting x = 0 in (4.7) we obtain

2

f~:~

e -° f' 1 + e -°pe-(O)dO- 1 = -

l-e

o 1 + e -Op~(O)dO=O'

where the last equality follows from the fact that the integrand is an odd function in 0, by the s y m m e t r y of p~(.). Thus x* = c = 0 is the unique root of (4.7) and the second assertion of the corollary follows. [] As an example consider symmetric Beta [ - a + c,a + c], that is p~(0) = g(O - c) where 1

(

distributions

//0~2'~ b

g(Ol=a~ib+l,b+l) 1-\~]], IOl 0. The inequality (4.6) reduces in this case to - (log g(0))" -

2b 1 + (O/a) 2 1 a2 (1 - (O/a)2) 2 >~

on

the

interval

28

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17 31

which is satisfied for all 0 e ~ if and only if 2x/~/> a. In this case the Bayesian D-optimal design has mass 1 at the point c = E¢(O). Note that Theorem 1 in Chaloner (1993) gives the upper bound a ~ log (2 + x/3) as a sufficient condition for the existence of a Bayesian D-optimal one point design and that this condition does not depend on the parameter b. The preceding result improves this bound whenever b >~ [-log(2 + ,~f3)]z/4 ,,~ 0.4336. Finally, by putting b = n, a = ~ the normal density

p~(O)-

~ e1x p ( @Z/tO'-

\

and considering the limit n ~ ~ we obtain

(0 -- c)2-'] 20"2 /

(4.8)

and the Bayesian D-optimal design with respect to this prior is a one point design whenever o.2 ~< 2.

4.3. A Bayesian optimal design with more than one support point In this example we illustrate a situation for which the Bayesian D-optimal design is not supported at one point together with the nature of the logarithmic concavity of the function k(x, O) defined in (3.6). To this end consider the logistic regression model (4.4) with Gaussian prior distribution (4.8) and c = 0. The Bayesian D-optimal one point design is concentrated at the point x* = 0. The function k(x, O) is given by e-02/2'~2 eX(1 + e-°) 2 (1 + eX-°) 2

k(x,O) = (do(x,O) + 1 ) p c ( 0 ) - ~

and the logarithm of this function is depicted in Fig. 1 for x = 5 and o. = 1, 1.5, 2. If k(x, O) is logarithmic concave as a function of (x, 0), then k(x, O) has to be logarithmic concave as a function of x (for fixed 0) and as a function of 0 (for fixed x). Consequently, we observe from Fig. 1 that k(x, O) is not logarithmic concave for o. = 1.5 and o. = 2 (note that the logarithmic concavity in the case o. = 1 has already been established in Example 4.2). The "checking condition"

d(6o,~x)<~O

for all x ~ Y"

of Whittle's equivalence theorem is depicted in Fig. 2 for the same values of o.. We observe that for o. = 1 and o. = 1.5 the Bayesian D-optimal design is supported at the point 0 (because the curves never exceed the value 1). The case o. = 1.5 shows that the logarithmic concavity of the function k(x, O) is not a necessary condition for the existence of a Bayesian D-optimal one point design. Finally, for o. = 2 the equivalence condition of Whittle's theorem is not satisfied and in this case the Bayesian D-optimal design has at least two support points.

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17-31

kl

29

5,0)

-2 -3 -4 -5 -6

-7 -8 -2.5

I

I

-1.5

I

-0.5

1

0.5

i

1.5

25

Figure 4.1: The function k(5, 0) in the logistic regression model (4.4) with normal prior distribution (4.8) (c = 0) for different values of the standard deviation ~ = 1 (solid line), a = 1.5 (dashed line) a = 2 (dotted line).

a(~o, ~,) + 1 1.50 1.25 1.00

-

-~

. . . . . . .

~ , ~ - : -

. . . . . . .

-.,~

_

_

0.75 0.50 0.25 0.00 I I I I I -7.0-6.0-5.0-4.0-3.0-2.0-1.0

I

I

I

I

I

I

I

I

"]

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

Figure 4.2: The function d(6o, 6:,) + 1 of Whittle's equivalence theorem in the logistic regression model (4.4) with normal prior distribution (4.8) (c = 0) for different values of the standard deviation a = 1 (solid line), a = 1.5 (dashed line) a = 2 (dotted line).

Acknowledgements

T h e a u t h o r s are grateful to a n u n k n o w n referee a n d to the associate e d i t o r for their c o n s t r u c t i v e c o m m e n t s o n a n earlier v e r s i o n of this m a n u s c r i p t . T h e s e c o m m e n t s led to a s u b s t a n t i a l i m p r o v e m e n t of this paper. P a r t s of this w o r k were d o n e while the first a u t h o r was visiting the U n i v e r s i t y of G 6 t t i n g e n a n d this a u t h o r w o u l d like to t h a n k the I n s t i t u t ffir M a t h e m a t i s c h e S t o c h a s t i k for its hospitality. T h e s a m e a u t h o r w o u l d like to t h a n k the I n s t i t u t fiir M a t h e m a t i s c h e S t o c h a s t i k , T e c h n i s c h e Universit~it D r e s d e n for the s t i m u l a t i n g e n v i r o n m e n t d u r i n g his a p p o i n t m e n t in D r e s d e n b e t w e e n 1993 1995.

30

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17 31

Appendix Let the criterion function + (t/) be a concave function on the set of all designs. Define d(r/1, t/z) to be the derivative of +(t/) at t/1 in the direction of t/2. That is ! d(th,t/2)

= lira i [ 4 ( ( 1 -- n)t/a - ~:t/2) r:lO ~;

-

-

+(t/1)]-

Define further 6x as the probability measure with mass one at the point x. The following equivalence theorem gives conditions for a design q* to be optimal. The result was proved by Chaloner and Larntz (1989) as an application of Whittle's (see Whittle, 1973) equivalence theorem and can be applied to any concave optimality criterion with existing directional derivatives. Equivalence theorem (Whittle, 1973). (a) An optimal design t/* can be equivalently

characterized by any of the three conditions 1. t/* maximizes 4(.) 2. q* minimizes supx~ ~. d(t/, 6x) 3. sup~ ~ ~ d (t/, 6x) = 0.

(b) The point (t/*, t/*) is a saddlepoint of the directional derivative d, that is d(t/*,q,) ~< 0 = d(q*,t/*) ~< d(t/2,t/*) for all p r o b a b i l i t y m e a s u r e s t/l,/72.

References Chaloner, K. (1987). An approach to design for generalized linear models. In: V. Fedorov und H. L~iuter, Eds., Model Oriented Data Analysis, Seiten 3-12. Springer, Berlin. Chaloner, K. (1993). A note on optimal Bayesian design for nonlinear problems. J. Statist. Plann. Inference 37, 229 236. Chaloner, K. and K. Larntz (1989). Optimal Bayesian experimental design applied to logistic regression experiments. J. Statist. Plann. Inference 21, 191 208. Chaloner K. and K. Larntz (1992). Bayesian design for accelerated life testing. J. Statist. Plann. Inference 33, 245 260. Chernoff, H. (1953). Locally optimal designs for estimating parameters. Ann. Math. Statist. 24, 586-602. Dette, H. (1990). A generalization of D- and D,-optimal design in polynomial regression. Ann. Statist. 18(4), 1784-1804. Dette, H. and L. Haines (1994). E-optimal designs for linear and nonlinear models with two parameters. Biometrika 81, 739 754. Dette, H. and H.M. Neugebauer (1996). Bayesian D-optimal designs for exponential regression models. J. Statist. Plann. Inference, to appear. Dharmadhikari, S. and K. Joag-dev (1988). Unimodality, Convexity, and Applications. Academic Press, New York. Ford, I., B. Torsney and C.F.J. Wu (1992). The use of a canonical form in the construction of locally optimal designs for non-linear problems. J. Roy. Statist. Soc. B 54, 569-583. Kitsos, C., D. Tiggerington and B. Torsney (1988). An optimal design problem in rhythmometry. Biometrics 44, 657-671.

H. Dette, H.-M. Neugebauer/Journal of Statistical Planning and Inference 52 (1996) 17-31

31

L~uter, E. (1974a). Die Methode der Versuchsplanung fiir den Fall nichtlinearer Parametrisierung. Math. Operationsforsch. Statist. Set. Statist. 5, 625 636 (in Russian). L/iuter, E. (1974b). Experimental design in a class of models. Math. Operationsforsch. Statist. Set. Statist. 5, 379 398. Lindsay, B.G. (1983). The geometry of mixture likelihoods: a general theory, Ann. Statist. 11, 86 94. Mukhopadhyay, S. and L. Haines (1995). Bayesian D-optimal designs for the exponential growth model. J. Statist. Plann. Inference 44, 385 394. Pronzato, L. and E. Walter (1985). Robust experimental design via stochastic optimization. Math. Biosciences, 75, 103 120. Silvey, S. (1980). Optimal Design. Monographs on Applied Probability and Statistics. Chapman and Hall, London. Whittle, P. (1973). Some general points in the theory and construction of D-optimum experimental designs. J. Roy. Statist. Soc., Ser. B 35, 123-130.