Optimal discrimination designs for exponential regression models

Optimal discrimination designs for exponential regression models

Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592 www.elsevier.com/locate/jspi Optimal discrimination designs for exponential regr...

219KB Sizes 0 Downloads 133 Views

Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592 www.elsevier.com/locate/jspi

Optimal discrimination designs for exponential regression models Stefanie Biedermanna,∗ , Holger Dettea , Andrey Pepelyshevb a Fakultät für Mathematik, Ruhr-Universität Bochum, 44780 Bochum, Germany b Department of Mathematics, St. Petersburg State University, St. Petersburg, Russia

Received 8 August 2005; accepted 26 March 2006 Available online 14 January 2007

Abstract We investigate optimal designs for discriminating between exponential regression models of different complexity, which are widely used in the biological sciences; see, e.g., Landaw [1995. Robust sampling designs for compartmental models under large prior eigenvalue uncertainties. Math. Comput. Biomed. Appl. 181–187] or Gibaldi and Perrier [1982. Pharmacokinetics. Marcel Dekker, New York]. We discuss different approaches for the construction of appropriate optimality criteria, and find sharper upper bounds on the number of support points of locally optimal discrimination designs than those given by Caratheodory’s Theorem. These results greatly facilitate the numerical construction of optimal designs. Various examples of optimal designs are then presented and compared to different other designs. Moreover, to protect the experiment against misspecifications of the nonlinear model parameters, we adapt the design criteria such that the resulting designs are robust with respect to such misspecifications and, again, provide several examples, which demonstrate the advantages of our approach. © 2007 Elsevier B.V. All rights reserved. MSC: 62K05; 62J02 Keywords: Compartmental model; Model discrimination; Discrimination design; Locally optimal design; Robust optimal design; Maximin optimal design

1. Introduction In the biological and chemical sciences, the expected response of an experiment at some experimental condition x is commonly modeled as a function in x depending nonlinearly on some unknown parameters. An important class within the nonlinear regression models are exponential models of the form k (x, ) =

k 

ai e−i x ,

k 1, x ∈ [0, ∞),

(1)

i=1

which have applications in chemical kinetics (see Gibaldi and Perrier, 1982) (in particular toxicokinetic experiments (see Becka et al. (1992, 1993)) or microbiology (see Alvarez et al., 2003, who used a model of type (1) to describe Escherichia coli inactivation by pulsed electric fields). Landaw (1985) fitted an exponential model to describe open, ∗ Corresponding author. Tel.: +49 2343228284; fax: +49 2343214559.

E-mail addresses: [email protected] (S. Biedermann), [email protected] (H. Dette), [email protected] (A. Pepelyshev). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2006.03.015

2580

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

noncyclic k-compartmental systems. Since statistical inference can be improved considerably by choosing appropriate experimental conditions, at which the observations are taken, the class of models (1) has been considered in a few articles on optimal design of experiment (see Ermakov and Melas, 1995; Dette et al., 2006a,b; where the properties of D-, c- and E-optimal designs are investigated). In most articles on optimal design, the authors assume that the model under consideration is already known except for the values of the unknown parameters, thus dealing with optimality criteria that are efficient for parameter estimation within a fixed model. In many applications, however, the form of the regression function will not be known exactly, i.e., the experimenter will have to decide among a set of competing classes of functions, which of them describes the data most adequately. The design problem for discrimination between competing nonlinear models has found much less attention in the literature than problems for parameter estimation (see, e.g., Atkinson and Fedorov, 1975a,b or Dette et al., 2005). For linear models, optimal designs for model discrimination have been discussed by several authors; see, e.g., Srivastava (1975, 1977), who mainly considered applications to 2m factorial experiments. Stigler (1971) and Studden (1982) investigated designs which are on the one hand efficient for estimating the parameters in a given polynomial regression model and on the other hand allow to test this model against a polynomial of higher degree. Läuter (1974) proposed designs which are optimal for a given class of models, and this approach was successfully applied by Dette (1990, 1994), Zen and Tsai (2002) and Tsai and Zen (2004) to the class of polynomial models and by Zen and Tsai (2004) to the class of trigonometric regression models. The goal of this article is to find designs that are efficient for discriminating between models of form (1) where the value of k, i.e., the number of exponential terms contributing to the model, is not fixed in advance. If for example a small model (small k) proves to be appropriate the number of parameters to be estimated will also be small so that estimation will be more efficient. Thus, either experimental costs can be reduced without losing precision in estimation by carrying out less runs, or the precision of the estimation will increase (for a fixed number of runs). Since optimal designs with respect to classical optimality criteria such as the D-criterion depend on the nonlinear model parameters they are termed “locally” optimal. For model (1), the optimal discrimination designs will therefore depend on the value of  = (1 , . . . , k ). Application of these designs can therefore only be efficient if a good guess for  is available. Locally optimal designs, however, provide a basis for the construction of designs, which are more robust with respect to misspecifications of the nonlinear parameters, such as Bayesian or maximin optimal designs, which will be discussed later on. The article at hand is organized as follows. Section 2 proposes several different optimality criteria that are suitable for finding discrimination designs for models of form (1) with different degrees. In Section 3, we present some analytical results, which facilitate the numerical construction of locally optimal designs substantially. In particular, we derive explicit bounds on the number of support points of the locally optimal designs with respect to the optimality criteria discussed in Section 2. Various examples of optimal designs are displayed in Section 4, where we also investigate their performance compared to different uniform designs and designs, which are optimal for parameter estimation in the largest model. Furthermore, emphasis in this section will be on robust designs with respect to misspecifications of the unknown parameters and their performance. For this purpose we determine maximin optimal designs, and compare their performance with the locally optimal designs. The proofs of our results, finally, are deferred to an appendix. 2. Optimality criteria We consider an experiment with N independent observations Yi , i = 1, . . . , N, at experimental conditions xi , which are modeled by Yi = k (xi , ) + i ,

i = 1, . . . , N,

(2)

where the errors i are assumed i.i.d. with zero expectation and finite common variance, and the regression function k (x, ) is of form (1) for some parameter k 1. By  we will denote the vector of unknown parameters in (1), i.e.,  = (a1 , 1 , . . . , ak , k )T . Without loss of generality we assume that aj  = 0, j > 0 and j  = i for j = i, i, j = 1, . . . , k. An approximate design   x1 . . . xn = w1 . . . wn

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

2581

is a probability measure with finite support, i.e., the observations are taken at the support points xi of the measure proportional to the corresponding masses wi . For the model described in (2), under some assumptions of regularity (see Jennrich, 1969) the least-squares estimator is asymptotically unbiased with asymptotic covariance matrix proportional to the inverse of the information matrix  n  T Mk (, ) = wi fk (xi , )fk (xi , ) = fk (x, )fkT (x, ) d(x) ∈ R2k×2k , (3) i=1

where the vector fk (x, ) is defined by fk (x, ) =

jk (x, ) = (e−1 x , −a1 xe−1 x , e−2 x , −a2 xe−2 x , . . . , e−k x , −ak xe−k x )T . j

The goal of this article is to find optimal designs for discriminating between models of type (2) with different numbers of exponential terms included in the regression function. Our main interest in this problem stems from the fact that often the experimenter will not know in advance, which value for k must be chosen to describe the data adequately. In what follows, we will discuss several optimality criteria that are appropriate for problems of this type. We first consider the discrimination problem between two models s (x, ) and k (x, ), where without loss of generality s < k. Following Stigler (1971) and Studden (1982), who considered polynomial regression models, a reasonable approach to find optimal discrimination designs for this problem in linear models is to use designs that are in some sense optimal for estimating the 2(k − s) parameters from the k − s highest terms in (1), to check if fitting these terms is actually necessary, i.e., to apply D2(k−s) -optimality; see, e.g., Silvey (1980). Although discriminating between nonlinear models is not exactly the same as estimating the parameters of the larger model not included in the smaller model (see Atkinson and Fedorov, 1975a) we use D2(k−s) -optimality as one of our criteria in this article. We expect that this criterion will yield designs that are efficient for both aims, discrimination between models and parameter estimation in the larger model since it is a kind of “compromise” between the D-criterion for estimating the full parameter vector and the optimal design for testing the linear hypothesis H0 : as+1 = · · · = ak = 0, which will be introduced in more detail in the subsequent paragraph. Under some assumptions of regularity, the noncentrality parameter of the likelihood ratio test for the null hypothesis H0 : aj = 0, j = 0, j = s + 1, . . . , k depends on the underlying design  through the matrix CAs,k (, ) = (ATs,k Mk− (, )As,k )−1 ,

ATs,k = (0, I2(k−s) ) ∈ R2(k−s)×2k ,

where Mk− (, ) denotes a generalized inverse of Mk (, ), since this matrix is not necessarily of full rank; see, e.g., Seber and Wild (1989, Sections 5.3, 12.4). The power of this test can thus be increased by increasing the matrix CAs,k (, ) in the Loewner ordering [see Pukelsheim (1993, Chapter 3) . If one chooses the determinant as optimality criterion, one obtains the D2(k−s) -optimal design, which maximizes the function det CAs,k (, ) = det(ATs,k Mk− (, )As,k )−1 . Note that if Mk (, ) is nonsingular, the determinant of CAs,k (, ) can be expressed by det CAs,k (, ) =

det Mk (, ) . det Ms (, )

A more general criterion will be necessary if discrimination between more than two competing models is required. Following Läuter (1974), we call a design  optimal discriminating design for a class of models {1 , . . . , k } if  maximizes the weighted geometric mean W (, ) =

k  l=1

l ln det CAl−1,l (, ) → max , 

where the constants l are user-selected weights, chosen as a measure for the interest the experimenter has in discriminating l vs. l−1 . This criterion simplifies to the D2(k−s) -criterion when the weights l , l = 1, . . . , k, are given by k = k−1 = · · · = s+1 = 1, s = s−1 = · · · = 1 = 0. Optimal discrimination designs for the class of polynomial

2582

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

regression models have been determined by Dette (1990, 1994), Zen and Tsai (2002) and Zen and Tsai (2004), whereas Tsai and Zen (2004) have found results for the class of trigonometric regression models. Optimal designs with respect to the criterion W (, ) for at least two positive weights l > 0 will also be referred to by multiple-objective or compound optimal designs. The last optimality criterion to be investigated in this study is motivated by the following idea. To check if the last k − s exponential terms in model (1) are necessary to fit the data adequately one could perform a statistical test for the 2k×k−s T  = 0 ∈ Rk−s for the matrix K null hypothesis H0 : as+1 = · · · = ak = 0, or, equivalently, Ks,k , where s,k ∈ R the ith column of Ks,k is given by the (2(s + i) − 1)th unit vector of dimension 2k. Under regularity assumptions, the noncentrality parameter of the likelihood ratio test for this null hypothesis depends on the design  through the matrix T (Ks,k Mk− (, )Ks,k )−1 ,

see, e.g., Seber and Wild (1989, Sections 5.3, 12.4), which motivates the function T Mk− (, )Ks,k )−1 → max (, ) = det(Ks,k



(4)

as another appropriate optimality criterion for discriminating between the two models s (x, ) and k (x, ). Note that for information matrices Mk (, ) of full rank this optimality criterion simplifies to (, ) =

det Mk (, ) , det M[s+k] (, )

where the matrix M[s+k] ∈ R(s+k)×(s+k) is obtained from Mk by deleting the (2(s + i) − 1)th rows and columns, i = 1, . . . , k − s, in the information matrix Mk . From (4) it follows that the optimality criterion (, ) belongs to the class of DA -optimality criteria; see, e.g., Silvey (1980), and is therefore the D-optimal design for estimating the subset {as+1 , . . . , ak } of the full parameter vector . All the optimality criteria discussed above are local criteria in the sense of Chernoff (1953), i.e., the optimal designs with respect to these criteria will depend on the unknown parameter vector . To overcome this problem, we will also consider maximin optimality criteria, which are robust with respect to the misspecification of . This approach only requires the specification of a certain range for the unknown parameters, which seems to be a realistic scenario in many practical applications. The optimality criteria are constructed as follows. Let (, ) be some criterion function depending on the design  and some parameter vector . Since the use of standardized criteria is recommended to avoid different scalings; see Dette (1997), we protect the experiment against the worst possible case by maximizing −∞ () = min ∈

(, ) , (∗ , )

(5)

where is the region of uncertainty for the unknown parameters, and by ∗ the locally -optimal design for the parameter value  is denoted. The maximization of (5) is thus equivalent to maximizing the minimal -efficiency, so that the resulting designs will either be referred to by maximin -efficient designs or by standardized maximin optimal designs. In what follows, the general notation will be replaced by the respective criteria defined above. As an alternative, a Bayesian criterion with a non-informative prior such as the uniform prior on could be used. 3. Analytical results Our first result yields a considerable simplification of the locally optimal design problems at hand by stating an explicit relationship between locally optimal discrimination designs with different parameters. This result holds true for all three local optimality criteria discussed above. Lemma 3.1. Let ∗ = {xi∗ (); wi∗ (); i = 1, . . . , n∗ ()} be a locally W- or -optimal discrimination design with respect to the parameter , where n∗ () denotes the number of support points of ∗ (), and xi∗ () are the support points with corresponding weights wi∗ (). Then for any positive constant > 0 the locally optimal discrimination design with respect to the parameter  is given by 1 xi∗ ( ) = xi∗ (),

wi∗ ( ) = wi∗ ().

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

2583

As a consequence of Lemma 3.1, we can without loss of generality restrict ourselves to the derivation of locally optimal designs where one component of the vector of unknown parameters  is set equal to one. A further simplification can be achieved by the assertion of the following lemma, which gives the value of one support point of an optimal design with respect to the criteria under consideration. Lemma 3.2. Let ∗ be a locally W- or -optimal discrimination design with respect to the parameter . Then the point zero is contained in the support of ∗ . From Caratheodory’s Theorem (see, e.g., Karlin and Studden, 1966), we obtain that the number of support points n∗ () of ∗ with respect to any local discrimination criterion is bounded by 2k − 2s n∗ ()2k(2k + 1)/2. In the following theorem we will derive considerably sharper upper bounds for n∗ () to facilitate the numerical calculation of optimal designs. Theorem 3.1. Let ∗ = {xi∗ (); wi∗ (); i = 1, . . . , n∗ ()} be a locally W-optimal discrimination design with respect to the parameter , where n∗ () denotes the number of support points of ∗ (). Then we obtain the following results: (1) For any value of  the inequality n∗ ()k(k + 1)/2 + 1 holds. (2) For parameters  of a special form we even find a sharper upper bound. Define a class of vectors  with components of the form i = (i−1 + i+1 )/2,

i = 2, . . . , k − 1,

(6)

i.e., the parameters are uniformly distributed on some interval. There exists a neighborhood U ( ) of such that for all  ∈ U ( ) the number of support points n∗ () of the W -optimal design is at most 2k. We will now study another important property of the locally optimal designs with respect to the W- and -criterion. Consider that  is in some neighborhood of the vector · (1, . . . , 1)T where is some positive constant. As  converges to ·(1, . . . , 1)T the information matrix Mk defined in (3) becomes singular. The corresponding locally optimal designs, however, are still weakly convergent to some limiting design. At the same time, the following theorem points out some connection between the discrimination design problems for the model at hand and the corresponding design problems in a heteroscedastic polynomial model. Define a regression model with i.i.d. errors by E[Y (x)] = ˜ 2k (x) =

2k 

ci x i−1 ,

Var(Y (x)) = e2 x , > 0,

(7)

i=1

where the value of is assumed to be known. The corresponding information matrix for the estimation of the polynomial coefficients ci , i = 1, . . . , 2k, is then given by   M2k () = e−2 x f˜k (x)f˜kT (x) d(x), f˜kT (x) = (1, x, . . . , x 2k−1 ). Theorem 3.2. Let i = − i , i = ri , i = 1, . . . , k, and ri ∈ R\{0} be fixed numbers with ri  = rj for i  = j , i.e., all components of the vector of nonlinear parameters  tend to for  → 0. If an optimal design  with respect to one of the respective criteria is supported on at least 2k points so that the information matrix Mk (, ) is nonsingular the following assertions hold: (1) If  → 0, and  is of the form described above, the locally optimal discrimination design k vs. s converges weakly to the corresponding optimal discrimination design ˜ 2k vs. ˜ 2s for the heteroscedastic polynomial regression model (7) with unknown parameters ci . (2) More general, under the same conditions, the W-optimal designs with respect to the criterion W (., ) converge weakly to the corresponding W-optimal design for model (7) if  → 0.

2584

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

(3) For  → 0 the -optimal discrimination designs converge weakly to the discrimination design for testing the hypothesis H0 : c2k−1 = c2k−3 = · · · = c2k−2s+1 = 0 in model (7) against the alternative that at least one of this parameters is nonzero. The restriction to nonsingular information matrices in Theorem 3.2 does not mean a great limitation for this result since for all examples in our numerical study given in the following section—as well as for numerous other examples not displayed here for brevity—we found that the optimal designs with respect to all criteria under consideration feature at least 2k distinct points of support. Furthermore, it is reasonable to consider only designs with at least 2k different support points, because such designs can also be used for the subsequent data analysis if the largest model k turns out to be the “true model”.

4. Numerical results In this section we will apply the results obtained in the previous section to construct optimal designs for all criteria under consideration numerically. All designs presented in this section have been carefully checked numerically for optimality by the respective corresponding equivalence theorems. For brevity, we restrict ourselves to the presentation of results for exponential models with not more than six parameters, since these models are most widely used within the whole class of models of type (1). It is well known among practitioners in the field of compartmental models that, even if there are many compartments, a model with a few terms fits well enough the data.

4.1. Locally optimal designs As a first example we choose the discrimination problem between the models 3 (x, ) and 2 (x, ), i.e., the question if the third term in model (1) is actually necessary to fit the data. Table 1 displays several examples of optimal discrimination designs 3 vs. 2 . As a consequence of Lemma 3.2 the point x1 = 0 is contained in the support of any locally optimal design ∗ . We will thus not present it explicitly in what follows. The weight corresponding to x1 = 0 will also be omitted for brevity. Table 1 Selected locally D2(3−2) -optimal discrimination designs for the problem of discriminating between the exponential regression models 3 vs. 2

1

2

3

x2

x3

x4

x5

x6

w2

w3

w4

w5

w6

eff D

1 1 1 1 1 1

0.2 0.2 0.5 0.5 0.8 0.8

1.5 5 1.5 5 1.5 5

0.329 0.156 0.288 0.146 0.257 0.138

1.276 0.547 1.135 0.518 1.025 0.494

2.74 1.48 2.47 1.30 2.26 1.18

5.60 4.06 4.57 3.14 4.06 2.68

13.75 12.14 9.11 7.58 7.58 6.11

0.196 0.270 0.172 0.258 0.154 0.248

0.174 0.200 0.158 0.186 0.148 0.178

0.167 0.175 0.143 0.164 0.137 0.160

0.183 0.108 0.166 0.117 0.158 0.124

0.180 0.074 0.274 0.116 0.325 0.143

0.96 0.89 0.92 0.93 0.89 0.94

Table 2 Selected locally optimal discrimination designs for testing the hypothesis H0 : a3 = 0 in the exponential regression model 3

1

2

3

x2

x3

x4

x5

x6

w2

w3

w4

w5

w6

eff D

1 1 1 1

0.2 0.5 0.8 0.8

5 1.5 1.5 5

0.128 0.246 0.223 0.113

0.571 1.020 0.918 0.489

1.621 2.448 2.182 1.273

4.250 4.880 4.266 2.851

12.342 9.696 8.145 6.311

0.135 0.129 0.120 0.135

0.282 0.143 0.129 0.216

0.270 0.163 0.147 0.218

0.177 0.193 0.182 0.177

0.126 0.311 0.364 0.214

0.87 0.87 0.84 0.91

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

2585

For comparison, Table 2 shows some of the locally optimal designs for testing the hypothesis H0 : a3 = 0 with respect to the same parameters 1 , 2 and 3 . Tables 1 and 2 also display the D-efficiency   det(Mk (, )) 1/(2k) eff D () = det(Mk (D , )) of the respective designs with respect to the D-optimal design D in the model k to evaluate how the optimal discrimination design performs for estimating the model parameters if k has been found to be the true model. This gives us the opportunity to compare the different optimality criteria under consideration in this article. We find that the D2(3−2) -optimal designs have somewhat higher D-efficiencies than the optimal designs for testing a3 = 0, i.e., the D2(k−s) -optimal designs are more efficient in the parameter estimation subsequent to the model discrimination procedure. A heuristic explanation may be that a design, which is by the construction of its optimality criterion optimal for estimating 2(k − s) parameters of some model will be more efficient for estimating 2k parameters compared with a design that is optimal for the estimation of k − s parameters in the same model. In the example  = (1, 0.5, 1.5)T , part (2) of Theorem 3.1 applies so that the upper bound for n∗ () in the design problem 3 vs. 2 is given by 2k = 6. From Table 1 we obtain that the optimal design attains this bound, i.e., the bound is sharp. We further observe from Tables 1 and 2 that if 1 and 2 are close, the optimal support points are less spread on the real axis than for more distant values 1 and 2 for both criteria. From a heuristic point of view, it seems reasonable to use designs, which allocate equal weights to some equidistant points on the real axis. In order to investigate the performance of such uniform designs, we calculated several of their D2(3−2) -efficiencies, where the D2(k−s) -efficiency eff D2(k−s) () of a design  is defined by  eff D2(k−s) () =

det CAs,k (, ) det CAs,k (∗ , )

1/(2(k−s)) .

We choose as competing designs three different uniform distributions with their supports containing the point zero, namely 1. The seven point design 7 allocating weight 17 to the points 0, 2, 4, 6, 8, 10, 12. This design was chosen since the upper bound for the number of support points of the optimal design for the problem under consideration is precisely given by seven. Often the experimenter does not want to take measurements at too many different experimental conditions, so that 7 seems attractive from this point of view. 2. The design 16 assigning equal weight to the 16 points 0, 1, 2, …, 15, which is more spread on the real axis than 7 . This design could also be used for further model checks since it contains significantly more different support points than the number of parameters to be estimated in the largest model 3 . 3. The uniform design 100 on the 100 points (n − 1)/10, n = 1, 2, . . . , 100, approximating the uniform distribution on [0, 10]. The parameter combinations for this example are given by  = (1, 0.5, 1.5)T representing the situation of relatively “similar” exponential terms in model (1) and  = (1, 0.2, 5)T representing very “different” exponential terms. The D2(3−2) -efficiencies of the uniform designs defined above given these parameter values for  are shown in Table 3. For comparison, we also present the D2(3−2) -efficiencies of the designs ∗3 , which are locally optimal for testing H0 : a3 = 0 in model 3 , as well as the D2(3−2) -efficiencies of the locally D- and E-optimal designs in the largest model 3 , D and E , respectively. We learn from Table 3 that the different uniform designs under consideration will have a very poor performance if applied to the problem at hand. Only the design 100 might yield reasonable results for particular choices of , but it is Table 3 Selected D2(3−2) -efficiencies of some uniform designs for the discrimination problem 3 vs. 2

1

2

3

7

16

100

∗3

D

E

1 1

0.5 0.2

1.5 5

0.004 ≈ 10−6

0.198 0.003

0.560 0.212

0.897 0.426

0.88 0.87

0.90 0.91

2586

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

Table 4 Selected locally D2(3−1) -optimal discrimination designs for the problem 3 vs. 1

1

2

3

x2

x3

x4

x5

x6

w2

w3

w4

w5

w6

eff D

1 1 1 1

0.2 0.5 0.8 0.8

5 1.5 1.5 5

0.166 0.340 0.311 0.151

0.574 1.135 1.008 0.507

1.751 2.600 2.302 1.301

4.185 4.662 4.082 2.808

10.104 8.177 6.896 5.279

0.146 0.134 0.139 0.150

0.133 0.117 0.124 0.155

0.130 0.181 0.170 0.133

0.239 0.245 0.239 0.203

0.250 0.250 0.250 0.248

0.939 0.909 0.922 0.959

Table 5 Selected locally optimal discrimination designs for testing the hypothesis H0 : a2 = a3 = 0 in model 3

1

2

3

x2

x3

x4

x5

x6

w2

w3

w4

w5

w6

eff D

1 1 1 1

0.2 0.5 0.8 0.8

5 1.5 1.5 5

0.118 0.272 0.247 0.122

0.607 1.128 1.016 0.542

1.800 2.668 2.362 1.387

3.907 4.809 4.257 2.762

11.558 8.866 7.434 5.787

0.087 0.107 0.110 0.092

0.196 0.126 0.124 0.175

0.203 0.154 0.142 0.177

0.219 0.171 0.163 0.169

0.290 0.392 0.409 0.370

0.593 0.807 0.803 0.710

Table 6 Selected locally W-optimal discrimination designs for the problems 3 vs. 2 (weight 3 = 23 ) and 2 vs. 1 (weight 2 = 13 )

1

2

3

x2

x3

x4

x5

x6

w2

w3

w4

w5

w6

eff D

1 1 1 1

0.2 0.5 0.8 0.8

5 1.5 1.5 5

0.163 0.318 0.288 0.147

0.565 1.142 1.017 0.503

1.677 2.555 2.295 1.261

4.248 4.621 4.064 2.808

10.503 8.534 7.160 5.531

0.189 0.147 0.144 0.185

0.157 0.132 0.134 0.161

0.152 0.167 0.159 0.145

0.195 0.219 0.209 0.183

0.180 0.257 0.277 0.204

0.983 0.926 0.921 0.978

not very attractive from a practical point of view since it requires to take measurements at 100 different experimental conditions. The results with respect to the uniform designs are particularly interesting, since increasing the number of support points frequently means going further away from the optimal design. It is therefore remarkable that approaching the uniform design with 100 points the efficiency does not go completely down. The designs ∗3 , D and E , in contrast, are expected to show a good performance. The use of an optimal design with respect to either one of the criteria defined in the previous section or a good criterion for estimating the full parameter vector in the largest model under consideration is thus strongly recommended. We now turn to the discrimination problem between models 3 and 1 . Several examples for this problem are presented in Tables 4 and 5. For comparison, we also display the corresponding D-efficiencies eff D . According to Lemma 3.2 the supports of the optimal designs again contain the point zero, but it is allocated considerably less weight than in the previous examples. The highest optimal support point, however, is given much more weight than before, particularly for the designs, which are optimal for testing the hypothesis H0 : a2 = a3 = 0. Moreover, we can see from Tables 4 and 5 that for these design problems the optimal designs are also supported at exactly 2k = 6 points. We observe this also for numerous other examples, which are not displayed here for brevity. In the next examples we are interested in designs, which are good for discriminating between models 3 and 2 on the one hand, and 2 and 1 on the other hand. Table 6 displays optimal discriminating designs with respect to the prior 3 = 23 and 2 = 13 , i.e., the discrimination between the models 3 vs. 2 is assumed twice as important as the problem 2 vs. 1 . 4.2. Maximin optimal designs In the last paragraph of this section, we will consider the problem of robustness of designs with respect to the choice of the initial parameters. The designs presented above are all locally optimal, i.e., optimal with respect to one particular

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

2587

Table 7 Selected standardized maximin D2(2−1) -optimal discrimination designs for the problem 2 vs. 1

1 (l)

1 (u)

2 (l)

2 (u)

x1

x2

x3

x4

x5

w1

w2

w3

w4

w5

min eff

0.8 0.6 0.4 0.2

1 1 1 1

1.1 1.1 1.1 1.1

1.3 1.5 1.7 1.9

0 0 0 0

0.47 0.48 0.46 0.48

1.78 1.76 1.46 1.50

4.08 4.21 3.17 3.41

– – 5.94 7.22

0.12 0.13 0.14 0.15

0.23 0.24 0.22 0.22

0.22 0.23 0.19 0.19

0.44 0.40 0.25 0.27

– – 0.20 0.17

0.9665 0.8658 0.7739 0.7194

Table 8 Selected standardized maximin D2(3−2) -optimal discrimination designs for the problem 3 vs. 2



x1

x2

x3

x4

x5

x6

x7

w1

w2

w3

w4

w5

w6

min eff

1

2

3

0 0 0

0.24 0.24 0.24

0.94 0.94 0.93

2.07 2.09 2.03

3.82 3.77 3.66

7.18 6.76 5.18

– – 7.23

0.083 0.070 0.081

0.162 0.142 0.157

0.153 0.141 0.146

0.144 0.140 0.142

0.166 0.165 0.145

0.293 0.342 0.069

0.923 0.932 0.817

It is not necessary to show the weight w7 of the optimal design with respect to 3 since w7 = 1 −

6

i=1 wi .

parameter . If not even the number k of exponential terms in model (1) is known then presumably less knowledge will be available about the values of the model parameters. We will therefore also present some examples of optimal discrimination designs in the case that only some range for each nonlinear parameter can be given in advance of the experiment. As a first example, we consider the problem of designing an experiment for discriminating between models 2 and 1 adopting the maximin approach described in (5). The criterion function −∞ () is thus given by det CA1,2 (, ) . det CA1,2 (∗ , ) ∈

−∞ () = min

Some selected examples of optimal designs with respect to this standardized maximin criterion are given in Table 7. The indices l and u denote the lower and upper boundaries, respectively, for the interval, in which a parameter i , i = 1, 2, is assumed to lie. The right column in Table 7 shows the minimal efficiency of the maximin optimal design in the rectangular region [1 (l), 1 (u)] × [2 (l), 2 (u)]. We observe that the maximin optimal designs yield reasonable efficiencies over a broad range of the parameter space. Even for the large parameter range of [0.2, 1] × [1.1, 1.9] corresponding to a high uncertainty with respect to the model parameters the efficiency is about 72%. From Theorem 3.1 we conclude that locally optimal designs for the problem 2 vs. 1 have at most k(k + 1)/2 + 1 = 4 support points. For the maximin optimal designs this is in general not true. Table 7 shows that for relatively small parameter regions the standardized maximin optimal designs are also supported at four points, whereas for larger regions, i.e., more uncertainty about the actual position of the parameters, more than four support points are necessary. This phenomenon is widely observed in the literature as far as Bayesian or maximin optimal design problems are concerned. For a theoretical explanation we refer to Braess and Dette (2007). As a second example we finally consider the problem of discriminating between models 3 and 2 . Some selected standardized maximin optimal designs for this problem are displayed in Table 8, where the parameter regions are specified by

1 = [0.6, 0.8] × [1.1, 1.3] × [1.6, 1.8],

2 = [0.8, 1] × [1.1, 1.3] × [1.4, 1.6],

3 = [0.6, 0.9] × [1, 1.4] × [1.5, 1.8]. Again, we observe that for larger parameter regions the number of support points of the standardized maximin optimal design increases. Moreover, even in the case of three unknown parameters the maximin optimal designs yield reasonable efficiencies over the different parameter spaces.

2588

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

Acknowledgment The support of the Deutsche Forschungsgemeinschaft (SFB 475, Komplexitätsreduktion in multivariaten Datenstrukturen, Teilprojekt A2, Sachbeihilfe De 502/18-1, 436 RUS 113/712/0-1) is gratefully acknowledged. Appendix A Proof of Lemma 3.1. Note that for each choice of > 0 and each design  = (x1 , . . . , xl , w1 , . . . , wl ) we obtain Mk (x1 , . . . , xl , w1 , . . . , wl , ) = BM k (x1 / , . . . , xl / , w1 , . . . , wl , · )B, where B = diag(1, , . . . , 1, ) ∈ R2k×2k . The assertion of the lemma then follows from elementary properties of determinants and matrix inversion.  Proof of Lemma 3.2. We consider a design of the form   x1 + h . . . xn + h h = , w1 ... wn with h 0. The determinant of the Fisher information of such a design can be expressed as det Ml (h , ) = e−4(1 +···+l )h det Ml (0 , ). Now it is obvious that the discrimination criteria considered in this article are decreasing with respect to h. So for each design with minimal support point greater than zero we can find a better design with respect to the optimality criteria, where the minimal support point is equal to zero. Consequently, the minimal support point of an optimal design must be equal to zero.  Proof of Theorem 3.1. One of the basic tools for research in the field of optimal experimental design are equivalence theorems; see, for example, Kiefer (1974). As we will need the equivalence theorem with respect to the W-optimal design criterion for the proof of Theorem 3.1, it will be stated below. It is given here in the (simpler) form for designs with at least 2k support points (so that the information matrix Mk of the largest model under consideration is invertible) since this form serves our purpose best.  Theorem 5.1. A design ∗ with at least 2k support points is an optimal design for discriminating the class {1 , . . . , k } with respect to the prior {1 , . . . , k } if and only if the following inequality holds. (x) =

k 

−1 ∗ T l (flT (x, )Ml−1 (∗ , )fl (x, ) − fl−1 (x, )Ml−1 ( , )fl−1 (x, )) − 2

l=1

k 

l 0.

(8)

l=1

Moreover, there is equality in (8) for any support point of the optimal design ∗ . Since the criterion for discrimination between models k and s is a special case of the compound criterion W () the corresponding equivalence theorem is obtained from Theorem 5.1 with l = 1 for l > s and l = 0 otherwise. For the proof of Theorem 3.1 we will also need the following auxiliary lemma.  Lemma 5.1. Consider the regression model ¯ k (x, ) = ki=1 (2i−1 e−i x + 2i e−(i +)x ), where 1 , . . . , 2k are unknown parameters and 1 , . . . , k and  are known values such that 0 <  < mini,j |i − j |. Then the number n of support points of the W-optimal discrimination design in this model is bounded by n k(k + 1)/2 + 1. If additionally  is in a neighborhood U ( ) of a class as defined in (6) an upper bound is given by n 2k. Proof. We assume that the optimal design is supported at n different points x1∗ , . . . , xn∗ . Denote by (x) ¯ the directional derivative from the equivalence theorem for model ¯ k . It then follows that the following equations hold (x ¯ i∗ ) = 0, i = 1, . . . , n,

¯  (xi∗ ) = 0, i = 2, . . . , n − 1,

(9)

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

2589

i.e., the function (x) ¯ has at least 2n − 2 roots counting multiplicities. We consider the set {i + j }i,j =1,...,k and refer to its different elements by u1 , . . . , ur (without loss of generality u1 < · · · < ur ). Obviously, r k(k + 1)/2 and for  ∈ in the sense of (6) we obtain r = 2k − 1. The function (x) ¯ can be rewritten as a sum of 2r + 2 functions as follows. (x) ¯ =

2r+1 

g¯ i (x) − g¯ 0 (x),

i=1

where g¯ 0 (x) = 2

k 

l ,

l=1

g¯ 1 (x) = c0 e−u1 x , g¯ 2 (x) = −b1 e−(u1 +)x , g¯ 2l+1 (x) = al e−(ul +2)x + cl e−ul+1 x , g¯ 2l (x) = −bl e−(ul +)x ,

l = 1, . . . , r − 1,

l = 2, . . . , r,

g¯ 2r+1 (x) = ar+1 e−(ur +)x . In what follows we will show that the coefficients of these functions are all positive, i.e., al , bl and cl > 0. It is then easy to see that the 2r + 2 functions g¯ 0 (x), . . . , g¯ 2r+1 (x) form a Tchebycheff system; see Karlin and Studden (1966, p. 9). The function (x) ¯ can thus have at most 2r + 1 roots. Since it follows from (9) that the number of roots is at least 2n − 2 we obtain the inequality 2n − 2 2r + 1 which implies n r + 23 . Plugging in the upper bounds for r derived above we obtain that nk(k + 1)/2 + 1 for general . If  ∈ of form (6) we get the bound n2k. From continuity considerations it then follows that this bound is also valid in some neighborhood of . It remains to show that al > 0, bl > 0 and cl > 0 for all l. For simplicity we assume that k = 1, k−1 = · · · = 1 = 0. (Otherwise we will simply have a sum of similar terms.) Note that the function from the equivalence theorem −1 T (x) ¯ = f¯kT (x)M¯ k−1 (∗ )f¯k (x) − f¯k−1 (x)M¯ k−1 (∗ )f¯k−1 (x) − 2

can be rewritten in the form (x) ¯ = f¯kT (x)Af¯kT (x) − 2

where A = M¯ k−1 (∗ ) −



−1 M¯ k−1 0

 0 . 0

The first goal is to show that sign(A) = (−1)i+j . For this we divide the matrices M¯ k and A into blocks of the same length   −1   −1 −1 M11 M12 P −1 M21 M11 −M11 M12 P −1 M11 M12 ¯ Mk () = , A= , (10) −1 −1 M21 M22 −P −1 M21 M11 M11 with M11 = M¯ k−1 (). The representation of the matrix A follows from the inversion formula for block matrices, where −1 M12 is the Schur complement of M11 . As a consequence of Karlin and Studden (1966, p. 9), and P = M22 − M21 M11 the Cauchy–Binet formula we find that sign(M¯ k−1 )i,j = (−1)i+j or, in other words, J M¯ k−1 J >c 0, where J = diag{+1, −1, . . . , +1, −1} ∈ R2k×2k , and the notation ‘>c 0’ means that all entries of a matrix are greater than 0. From the representation of A in (10) we conclude that J A22 J >c 0,

−1 J A12 J = −J M −1 J >c 0 11 M12 P

and

J A21 J >c 0.

2590

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

Furthermore, from a straightforward but tedious calculation it follows that −1 −1 J A11 J = J M −1 P P −1 M21 M11 J >c 0 11 M12 P

and consequently the result J AJ >c 0 holds for the matrix A. It can therefore easily be checked that al > 0, bl > 0 and cl > 0 in the expressions for gi (x).  The same assertion as in Lemma 5.1 remains to be proven for models of form (1) to obtain the results of Theorem 3.1. Proof of Theorem 3.1. We denote by n∗ () the number of distinct support points x1∗ , . . . , xn∗∗ () of the optimal design ∗ . Then it follows from the equivalence theorem that the following equations hold (xi∗ ) = 0, i = 1, . . . , n∗ (),

 (xi∗ ) = 0, i = 2, . . . , n∗ () − 1.

The directional derivative (x) has therefore at least 2n∗ () − 2 roots (counting multiplicities). Using the equalities lim LT f¯k (x) = fk (x),

lim LT M¯ l L = Ml , l = 1, . . . , k,

→0

→0

where

 L = diag{L0 , . . . , L0 },

L0 =

1 0

lim

→0

det LT M¯ k L det Mk = , det Ms det LT M¯ s L

 −1/ , 1/

we obtain (x) = lim (x). ¯ →0

Due to continuity the number of roots of the function (x) as the limit of the sequence (x) ¯ with respect to  → 0 cannot be larger than the number of roots of (x) ¯ for some fixed value of . From Lemma 5.1 it then follows that for general  we obtain the upper bound for the number of support points of the optimal design as n∗ ()k(k + 1)/2 + 1. For  in a class as defined in (6) we even get n∗ () 2k. By continuity, again we get that the bound n∗ () 2k is still valid for values of  in some neighborhood U ( ) of .  Proof of Theorem 3.2. The proof is similar to the proofs in Dette et al. (2004) for more general nonlinear models, so that we give only a sketch including the most important steps. By a Taylor expansion of the vector fk (x) we obtain the representation fk (x) = Lk f˜k (x) + H (), where H () = (o(2k−1 ), o(2k−2 ), . . . , o(2k−1 ), o(2k−2 ))T , f˜k (x) = (1, x, x 2 , . . . , x 2k−1 )T e− x , ⎛ ⎞ 2k−1 21 31 1 1 1 ... ⎜ 2! 3! (2k − 1)! ⎟ ⎜ ⎟ 2 ⎜ ⎟ 2k−2  1 1 ⎜ 0 1 1 ⎟ ... ⎜ 2! (2k − 2)! ⎟ ⎜ ⎟ ⎜ ⎟ .. .. .. .. Lk = ⎜ ... ... ⎟ . . . . ⎜ ⎟ 2k−1 ⎟ 2 3 ⎜ k k k ⎜1  ⎟ ... k ⎜ ⎟ 2! 3! (2k − 1)! ⎟ ⎜ ⎝ 2k−2 ⎠ 2 k k ... 0 1 k 2! (2k − 2)!

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

2591

 and det Lk is proportional to i
2k () denotes the information matrix for the parameter vector in the regression model ˜ 2k (x) defined in (7). where M For values of  near zero, we obtain det ATs,k Mk−1 (, )As,k =

2k () (det Lk )2 det M det Mk (, ) · ∼ , 2 2s () det Ms (, ) (det Ls ) det M

which proves the first assertion of Theorem 3.2 since the D2(k−s) -criterion minimizes the expression det ATs,k Mk−1 (, ) As,k . For the W-optimal design we use a similar argument and rewrite the criterion in the form  k   det Mk (, ) l det Ms (, )

l=1

→ max . 

Consequently, for values of  near zero arg max 

k  l=1

l ln

k  2l () det Ml (, ) det M ∼ arg max . l ln 2l−2 () det Ml−1 (, )  det M l=1

To prove part (3) of Theorem 3.2 we denote by L[s+k] the (s + k) × (s + k) matrix, which is obtained from Lk by deleting the (2(s + i) − 1)th rows and columns for i = 1, . . . , k − s. Then we have for values of  near zero () ∼

(det Lk )2 (det L[s+k] )

2

·

2k () det M ,  det Ms+k ()

which completes the proof of the theorem.



References Alvarez, I., Virto, R., Raso, J., Condon, S., 2003. Comparing predicting models for the Escherichia coli inactivation by pulsed electric fields. Innovative Food Science and Emerging Technologies 4, 195–202. Atkinson, A.C., Fedorov, V.V., 1975a. The design of experiments for discriminating between two rival models. Biometrika 62, 57–70. Atkinson, A.C., Fedorov, V.V., 1975b. Optimal design: experiments for discriminating between several models. Biometrika 62, 289–303. Becka, M., Bolt, H.M., Urfer, W., 1992. Statistical analysis of toxicokinetic data by nonlinear regression (example: inhalation pharmacokinetics of propylene). Arch. Toxicol. 66, 450–453. Becka, M., Bolt, H.M., Urfer, W., 1993. Statistical evaluation of toxicokinetic data. Environmetrics 4, 311–322. Braess, D., Dette, H., 2007. On the number of support points of maximin and Bayesian D-optimal designs in nonlinear regression models. Ann. Statist. 35 (2), in press. Chernoff, H., 1953. Locally optimal designs for estimating parameters. Ann. Math. Statist. 24, 586–602. Dette, H., 1990. A generalization of D- and D1 -optimal designs in polynomial regression. Ann. Statist. 18, 1784–1805. Dette, H., 1994. Discrimination designs for polynomial regression on a compact interval. Ann. Statist. 22, 890–904. Dette, H., 1997. Designing experiments with respect to standardized optimality criteria. J. Roy. Statist. Soc. Ser. B 59 (1), 97–110. Dette, H., Melas, V.B., Pepelyshev, A., 2004. Optimal designs for a class of nonlinear regression models. Ann. Statist. 32 (5), 2142–2167. Dette, H., Melas, V.B., Wong, W.K., 2005. Optimal design for goodness-of-fit of the Michaelis–Menten enzyme kinetic function. J. Amer. Statist. Assoc. 100 (472), 1370–1381. Dette, H., Melas, V.B., Pepelyshev, A., 2006a. Local c- and E-optimal designs for exponential regression models. Ann. Inst. Statist. Math. 58, 407–426. Dette, H., Melas, V.B., Wong, W.K., 2006b. Locally D-optimal designs for exponential regression models. Statist. Sinica 16, 789–803. Ermakov, S.M., Melas, V.B., 1995. Design and Analysis of Simulation Experiments. Kluwer Academic Publishers, Dordrecht, London. Gibaldi, M., Perrier, D., 1982. Pharmacokinetics. Marcel Dekker, New York. Jennrich, R.I., 1969. Asymptotic properties of non-linear least squares estimators. Ann. Math. Statist. 40, 633–643. Karlin, S., Studden, W.J., 1966. Tchebycheff Systems: With Applications in Anlaysis and Statistics. Wiley, New York. Kiefer, J., 1974. General equivalence theory for optimum designs (approximate theory). Ann. Statist. 2, 849–879.

2592

S. Biedermann et al. / Journal of Statistical Planning and Inference 137 (2007) 2579 – 2592

Landaw, E.M., 1985. Robust sampling designs for compartmental models under large prior eigenvalue uncertainties. Math. Comput. Biomed. Appl. 34, 181–187. Läuter, E., 1974. Experimental design for a class of models. Math. Oper. Statist. 5, 379–398. Melas, V.B., 1978. Optimal designs for exponential regression. Math. Oper. Statist. Series Statistics 9, 45–59. Pukelsheim, F., 1993. Optimal Design of Experiments. Wiley, New York. Seber, G.A.F., Wild, C.J., 1989. Nonlinear Regression. Wiley, New York. Silvey, S.D., 1980. Optimal Design. Chapman & Hall, London. Srivastava, J.N., 1975. Designs for searching non-negligible effects. In: Srivastava, J.N. (Ed.), A Survey of Statistical Design and Linear Models. Elsevier Science B.V., Amsterdam, North-Holland, pp. 507–519. Srivastava, J.N., 1977. Optimal search designs or designs optimal under bias-free optimality criteria. In: Gupta, S.S., Moore, D.S. (Eds.), Proceedings of International Symposium on Decision Theory. Academic Press, New York, pp. 375–409. Stigler, S.M., 1971. Optimal experimental design for polynomial regression. J. Amer. Statist. Assoc. 66, 311–318. Studden, W.J., 1982. Some robust-type D-optimal designs in polynomial regression. J. Amer. Statist. Assoc. 77, 916–921. Tsai, M., Zen, M., 2004. Criterion robust optimal designs for model discrimination and parameter estimation: multivariate polynomial regression case. Statist. Sinica 14 (2), 591–601. Zen, M., Tsai, M., 2002. Some criterion-robust optimal designs for the dual problem of model discrimination and parameter estimation. Sankhya, Ser. B 64 (3), 322–338. Zen, M., Tsai, M., 2004. Criterion-robust optimal designs for model discrimination and parameter estimation in Fourier regression models. J. Statist. Plann. Inference 124 (2), 475–487.