Optimal Experimental Design

Optimal Experimental Design

Optimal Experimental Design Anthony C Atkinson, Department of Statistics, London School of Economics, London, UK  2015 Elsevier Ltd. All rights reser...

173KB Sizes 1 Downloads 187 Views

Optimal Experimental Design Anthony C Atkinson, Department of Statistics, London School of Economics, London, UK  2015 Elsevier Ltd. All rights reserved.

Abstract The methods of optimal experimental design allow cogent comparisons of designs and provide algorithms for the construction of designs in nonstandard cases. This article covers optimal design for simple regression and then develops the general theory. A sequence of designs for specific models of increasing complexity include the 2m factorials and their fractions and generalized linear models, particularly the logistic linear model for binary data. Results on parameter uncertainty are applied to designs for conjoint choice experiments and nonlinear regression. Reference is made to several recent books and to sources of numerical algorithms.

Introduction Theoretical results from optimal experimental design allow the comparison of proposed designs and the selection of the design that is most efficient for the specific purpose of the experiment. For example, power of important tests of hypotheses may be maximized. Methods derived from the theory provide algorithms for the construction of designs in nonstandard cases, such as those when certain treatment combinations are impossible or when responses are nonnormal. Many experiments in behavioral science have a simple treatment structure. Study 3 of Correll et al. (2007) on the decision to shoot by police officers uses a four factor design with all factors at two levels, that is, a full 24 factorial design. In contrast, the choice experiments described in Section Conjoint Choice Designs of this article have a complicated structure, the selection of items to be compared depending on prior beliefs about parameter values. Optimal design algorithms are essential for the efficient design of such experiments. Although the factorial design of Correll et al. (2007) has a simple structure, Sherman and Strang (2004) stress the close relationship between experimental designs in behavioral science and those in agriculture and medicine, where there may be important heterogeneity between experimental subjects. The data analysis of Correll et al. (2007) is for the 58 participants who completed the full set of factorial tasks, for whom some information is given on gender and ethnicity. However, there is no analysis of the heterogeneity in responses using these covariates, perhaps because of serious imbalance – only one participant was black. Atkinson and Biswas (2014: Chapter6) describe optimal design methods for balancing treatment allocation over covariates in clinical trials. This article starts in Section Simple Regression with optimal design for simple regression. The next three sections develop the general theory of optimal design, before attention turns to designs for specific models of increasing complexity. Section 2m Factorial Designs and Fractions covers the 2m factorials and their fractions. Generalized linear models are in Section Generalized Linear Models with a discussion of the effects of parameter uncertainty on designs for nonlinear models in Section Optimal Design and Parameter Uncertainty. Section Locally Optimal Designs for Logistic Regression in One Variable illustrates these topics with designs for the linear logistic model. Designs for the related conjoint choice experiments are

256

in Section Conjoint Choice Designs with designs for nonlinear regression models in Section Nonlinear Regression Models. The article concludes with a brief mention of the algorithms for design construction that are such an important consequence of the theory of optimal design.

Simple Regression Many optimal designs minimize the variances of estimated parameters, or the variances of functions of the parameters, such as predictions. We therefore need to specify a model. The simplest interesting case is linear regression in one variable, which serves to introduce some important ideas. Section Locally Optimal Designs for Logistic Regression in One Variable describes designs for models with binary responses. There is a single response y, the expected value of which depends linearly on the value of the variable x through the relationship EðyÞ ¼ b0 þ b1 x. The design problem is to choose N values of x at which measurements are to be taken. Although there are N observations, the number of distinct values of x used in the experiment, n, may be much less than N. These n values of x must lie in the experimental region c, determined by experimental constraints. For a single variable we can scale the values of x such that 1  x  1 where 1 correspond to the minimum and maximum values of the variable. When the variance of the observations is constant, that is var(y) ¼ s2, least squares is the appropriate method of parameter estimation. The least squares estimators of the b 1 with parameters are b b 0 and b .X   b 1 ¼ s2 var b ðxi  xÞ2

[1]

P where the sample average x ¼ xi =N. Here and in eqn [1], the summations are over all N values xi. From eqn [1], the variance P of b b 1 is minimized when ðxi  xÞ2 is maximized. For a fixed

number of trials N this occurs when all trials are at þ1 or 1. If N is even, exactly half the trials will be at x ¼ þ1 and the other b 0 Þ ¼ s2 =N, half at x ¼ 1. This design has x ¼ 0, so that varð b which does not depend any further on the design. This design is therefore optimal for both parameters. This design exhibits several features of optimal designs. The number of support points of the design n, here two, may be

International Encyclopedia of the Social & Behavioral Sciences, 2nd edition, Volume 17

http://dx.doi.org/10.1016/B978-0-08-097086-8.42041-6

Optimal Experimental Design appreciably less than the number of trials N. The fine structure of the design depends on the value of N, here whether it is even or odd, in which case one more trial is put at one end of the design region than at the other; it does not matter which. A third feature is that the design covers the whole of the design region, using the most extreme values of x available. In any application of the design, the order in which the two treatments x ¼ 1 and x ¼ 1 are applied should be randomized to avoid any confounding with omitted variables. The optimal design found here for parameter estimation also has a minimax property for prediction: it minimizes the maximum variance of the prediction by ðxÞ over c. Optimal designs are tailored to the model that is assumed; particularly when there is one factor, the designs often have the same number of design points as there are parameters. An example is the design at two points for the two-parameter regression model. An objection to such a design is that it does not provide for checking the model, for which trials at three or more values of x are needed. This suggests that the design should also be efficient in case the relationship is a second-order model EðyÞ ¼ b0 þ b1 x þ b2 x2

[2]

It is always possible that an even higher-order model is required, but third- or higher-order models are rarely needed. If such models seem to be required, data transformations, such as the power transformations analyzed by Box and Cox (1964), are often called for. Design for the three-parameter second-order model, eqn [2], introduces further aspects of the dependence of the optimal design on the precise purpose of experimentation. Unlike the two-point design for the first-order model, there is now no overall optimal design for all parameters. Testing whether the first-order model is adequate is equivalent to b 2 is minimized by testing whether b2 ¼ 0. The variance of b a design that puts half the trials at x ¼ 0 and divides the remaining half between x ¼ 1 and x ¼ 1. But the design that provides the best estimates of all three parameters, in the sense of D-optimality described in the next section, puts one-third of the trials at each of these three points. In a satisfactory manner, the optimal design depends on the precise purpose of the experiment.

Criteria of Optimality This section mainly describes the criterion of D-optimality, which provides designs minimizing the generalized variance of the estimated parameters. We consider a general experiment in which there are m factors and write the linear model as EðyÞ ¼ Fb

[3]

Here y is the N  1 vector of responses, b is a vector of p unknown parameters, and F is the N  p extended design matrix. The ith row of F is fT(xi), a known function of the m explanatory variables. For the quadratic model   eqn [2], for which m ¼ 1 and p ¼ 3, f T ðxi Þ ¼ 1 xi x2i . The design is determined by the experimental values of x. The extended design matrix reflects not only the design but in addition the model to be fitted.

257

For the moment we continue to assume that the observational errors are independent with constant variance, s2. The least squares estimator of the parameters can be written as  1 T b b ¼ FT F F y [4] where the p  p matrix FTF is the information matrix for b. The larger FTF, the greater is the information in the experiment. With s2 constant, the covariance matrix of the least squares estimator is   b ¼ s2 F T F 1 [5] var b When interest is in the comparison of experimental designs, the value of s2 is not relevant, since it is the same for all proposed designs for a particular experiment. The volume of the confidence ellipsoid for all p elements of b is inversely of the  proportional to the square  root   determinant F T F . Designs which maximize F T F  minimize the generalized variance of b b are called D-optimal (for Determinant). Interest in the fitted model may be not only in the parameters, but also in the quality of the predictions. The predicted value at x of the fitted response for the p-parameter model eqn [3] is

with variance

T by ðxÞ ¼ b b f ðxÞ

[6]

 1 f ðxÞ var by ðxÞ ¼ s2 f T ðxÞ F T F

[7]

Designs which minimize the maximum over c of var by ðxÞ are called G-optimal. As the theorem of Section The General Equivalence Theorem and Design Efficiency indicates, as N increases, D-optimality and G-optimality give increasingly similar designs. An alternative to this minimax approach to var by ðxÞ is to find designs that minimize the average value of the variance over c. Such designs are variously called I-optimal or V-optimal from Integrated Variance. Another criterion of importance in applications, particularly for nonlinear models, is that of cb optimality in which the variance of the linear combination cT b is minimized, where c is a p  1 vector of constants. When c is a vector of zeroes and ones, the criterion selects a subset of parameters and reduces to DS-optimality. We do not exhibit such designs in this article, but focus on D- and G-optimality. Details of these and other design criteria are in Chapter 10 of Atkinson et al. (2007). Goos and Jones (2011: Section 4.3.5) stress the advantages of I-optimality.

Exact and Continuous Designs Instead of finding exact optimal designs for specific values of N, there are mathematical and computational advantages in finding the limiting continuous design independent of N. Such designs can both easily be checked for optimality and readily be approximated to provide good exact designs for many N. The theory was introduced in Kiefer (1959) and developed in a series of further papers. Kiefer’s work on optimal design is collected in Brown et al. (1985). A basic idea is to write designs as probability measures on the n support points of the design

258

Optimal Experimental Design  x ¼

x2 .xn w2 .wn

x1 w1

 [8]

P where the weights wi > 0 and wi ¼ 1. The measure x therefore specifies the experimental conditions and the proportion of experimental effort at each condition. The information matrix can be written in terms of this measure as X [9] MðxÞ ¼ wi f ðxi Þf T ðxi Þ The variance of the predicted response can be written in the standardized form dðx; xÞ ¼ f T ðxÞM1 ðxÞf ðxÞ

[10]

At the end of Section Simple Regression it was stated that the D-optimal design for the quadratic model in one variable with c ¼ ½1; 1 put weights one-third at each of the three levels of x: 1,0, and 1. For any N that is a multiple of three, the exact optimal design puts N/3 trials at each of these sets of experimental conditions. For other values of N a good design is to distribute the trials as evenly as possible over these three points.

greater than p, which provides a method for checking for D-optimality. The theorem holds for continuous optimal designs and, of course, for exact designs with replications that are an integer multiple of the weights of the continuous designs. But, for example, the exact D-optimal design for the quadratic model when N is not a multiple of three will not satisfy the theorem. To compare a continuous design x with the optimal design x we define the D-efficiency as   jMðxÞj 1=p Deff ¼ [12]  jMðx Þj The comparison of information matrices for designs that are measures removes the effect of the number of observations. Taking the pth root of the ratio of the determinants in eqn [12] results in an efficiency measure which has the dimensions of variance, irrespective of the dimension of the model. Since the variance of estimated regression coefficients is inversely proportional to the number of observations, two replicates of a design measure for which Deff ¼ 50% would be as efficient as one replicate of the optimal measure.

Quadratic Model in One Variable

The General Equivalence Theorem and Design Efficiency D-optimal continuous designs maximize jMðxÞj. Kiefer and Wolfowitz (1959) provide an equivalence theorem, relating D- and G-optimal designs, that provides a method of checking the D-optimality of a continuous design. If x is a D-optimal design measure, then x also minimizes the maximum over c of the standardized variance of prediction dðx; xÞ given by eqn [10]; that is x is also G-optimal. If we write this maximum as dðxÞ ¼ max dðx; xÞ

[11]

x˛c



then dðx Þ ¼ p, the number of parameters of the linear model. For any nonoptimal design x the maximum over c of dðx; xÞ is

The claim that the final design of Section Simple Regression with weights 1/3 is D-optimal for the quadratic model when c ¼ ½1; 1 can be checked using the General Equivalence Theorem through calculation of the variance of the predicted response. For this design dðx; x Þ is the quartic given by dðx; x Þ ¼ 3  9x2 2 þ 9x4 2

[13]

which has a maximum over c of 3 at x ¼ 1,0, or 1, the three design points. Further, this maximum value is equal to the number of parameters p. The plot in Figure 1 exhibits the optimality of the design since the values of three at the design points are indeed the optima over c. Equivalence theorems for other criteria are given in Chapter 10 of Atkinson et al. (2007).

2.5 1.5

2.0

d(x, ξ )

3.0

General equivalence theorem

−1.0

−0.5

0.0

0.5

1.0

x

Figure 1 Quadratic model in one variable; d ðx; x Þ over c for the design putting equal weights at x ¼ 1, 0, and 1. Since the maximum value is three, which is the number of parameters in the model, the design is D-optimal.

Optimal Experimental Design

2m Factorial Designs and Fractions The first-order model in m factors with main effects and all possible interactions may be written as EðyÞ ¼ b0 þ

m X

bi xi þ

i¼1

m1 X

m X

bij xi xj þ / þ bijk. xi xj xk /

i¼1 j¼1þ1

[14] When c is cuboidal, the D-optimal continuous design is the equireplicated 2m factorial, which has n ¼ 2m points of support at all combinations of values of 1 for each factor. If the full model eqn [14] is fitted p ¼ N and Mðx Þ is the identity matrix, then dðx; x Þ ¼ p at the design points and the equivalence theorem for continuous D-optimality is satisfied. Because of the orthogonality of the design, reflected in the diagonal nature of Mðx Þ, the same design is optimal if terms are omitted from eqn [14] or if the design is projected by the omission of one or more factors. Provided all the terms in the model can be estimated, the regular fractions of the design, that is the series of 2mf fractional factorials, are also the continuous D-optimal designs. The 2mf fractional factorials are often used when it is known that some interactions are not important and the full factorial design will lead to unnecessarily precise estimation of the parameters, with a consequent waste of resources. However, these orthogonal fractions are restricted to values of N that are powers of 2. Goos and Jones (2011: Section 8.2.2) demonstrate the use of algorithmic methods for the construction of optimal designs that find fractional factorials for a specified value of N. Although the designs are not quite orthogonal, highly efficient parameter estimates are obtained.

Generalized Linear Models The Family of Models The family of generalized linear models (McCullagh and Nelder, 1989) extends normal theory regression to several useful distributions, including the gamma, Poisson, and binomial. The models have three components: A distribution for the univariate response y with mean m. A linear predictor, h ¼ qT f ðxÞ. As for regression, f(x) is a vector of p known functions of the explanatory variables. l A link function, gðmÞ ¼ h, relating x to the mean m. l

259

such that, however the values of x and q vary, the mean m satisfies the physically meaningful constraint 0  m  1. The most commonly used link function is the logistic link in which h ¼ logfm=ð1  mÞg

[16]

In this model the log odds is therefore equal to the linear predictor. The weight u(x) ¼ m(1  m). The weights for other links are more complicated.

Poisson Data For Poisson data we require that m > 0. The log link, log m ¼ h, is standard for the analysis of Poisson data in areas such as medicine, social science, and economics; see, for example, Agresti (2002: Chapter 9). This link leads to models with m ¼ exp h, which satisfy the constraint on values of m, and weights u(x) ¼ m.

Gamma Data The gamma family is one in which the correct link is often in doubt. The physical requirement is again that m > 0. A useful, flexible family of links that obeys this constraint is the Box and Cox family. (See Box and Cox, 1964 for the use of this function in the transformation of data.) For the special case of the log link, the weights u(x) equal one. Therefore optimal designs for gamma models with this link are identical to optimal designs for regression models with the same linear predictors.

Optimal Design and Parameter Uncertainty The optimal designs for regression in the earlier sections of this article maximized a function of the information matrix, MðxÞ, which did not depend on the values of the parameters of the model. However, for generalized linear models, the information matrix depends on q through the weight function u(x). We accordingly write the information matrix as Mðx; qÞ and consider maximization of some function, JfMðx; qÞg. The numerical values of q required for this optimization can be specified in several ways.

l

The distribution of y determines the relationship between the mean and the variance of the observations. The information matrix for a single observation at x is Mðx; qÞ ¼ uðxÞf ðxÞf T ðxÞ

[15]

with the weights u(x) for individual observations depending on both the distribution of y and the link function. Since eqn [15] is the information matrix for weighted least squares, there are important relationships between optimal designs for regression models and those for other generalized linear models.

Binomial Data In models for binomial data with R successes in n observations, the response y is defined to be R/n. The link function should be

Locally Optimal Designs The optimal design is found for q0, a point prior estimate of q, that is we maximize JfMðx; q0 Þg. In the conjoint choice designs of Section Conjoint Choice Designs these values serve to indicate the signs of effects, the designs being relatively insensitive to the precise values of q0.

Bayesian Designs Let the prior distribution of q be p(q). Bayesian designs are found to maximize ð JfMðxÞg ¼ Eq JfMðx; qÞg ¼ JfMðx; qÞgpðqÞdq [17] The ease of calculation depends on the form of the prior p(q) and of Jð:Þ as well as, often crucially, on the region of integration. The easiest case is when q has a multivariate

260

Optimal Experimental Design

normal distribution over


Sequential Designs Where it is possible, sequential experimentation can provide an efficient strategy in the absence of knowledge of plausible parameter values. A locally optimal, or other initial, design is followed by alternating phases of parameter reestimation and locally optimal design based on the revised parameter estimates. The procedure stops when sufficient accuracy is obtained or the experimental resources are exhausted. An early example, for nonlinear regression, is Box and Hunter (1965), extended by Atkinson et al. (2007: Section 17.7). Dror and Steinberg (2008) developed a Bayesian sequential design methodology for generalized linear models.

Minimax and Maximin Designs This approach replaces dependence of designs on the unknown value of q by finding the best design for the worst case when the parameter q belongs to a set Q. A design, x is found for which JfMðx Þg ¼ max min JfMðx; qÞg x

q˛Q

[18]

A potential objection to these designs is that the minimax or maximin design is often close to a combination of locally optimal designs for values of q at the edges of the parameter space. If a prior distribution is available, such points may have a very low probability; their importance in the design criterion may therefore be overemphasized by the minimax criterion. Providing adequate performance in these unlikely worst case scenarios may greatly affect overall design performance. These designs can also be hard to find. Numerical procedures are described in detail by Pronzato and Pázman (2013: Section 9.3). Minimax designs for generalized linear models have been found by Sitter (1992) and by King and Wong (2000).

Small Effects If the effects of the factors are slight, the mean, mi of each observation will be similar, and so will the corresponding model weights, u(xi). The information matrix will then, apart from a scaling factor, be close to the unweighted information matrix, FTF and the optimal designs for the weighted and unweighted matrices will be close. Optimal designs for generalized linear models with small effects will therefore be close to optimal designs for regression models with the same linear predictor (Cox, 1988).

Locally Optimal Designs for Logistic Regression in One Variable With a single explanatory variable and the logistic link logfm=ð1  mÞg ¼ h ¼ q0 þ q1 x

[19]

it is clear that observations at values of x for which the responses are virtually all 0 or 1 will be uninformative about the values of the parameters. This intuitive result indeed follows from the weight uðxÞ ¼ mð1  mÞ. We start by finding the locally D-optimal design for the canonical case of q ¼ (0,1)T. With a response symmetrical about zero, the design can also be assumed symmetrical about zero. It is thus simple to find the D-optimal design as   1:5434 1:5434 x ¼ [20] 0:5 0:5 which puts equal weights at two points symmetrical about x ¼ 0, provided c is sufficiently large. At these support points, m ¼ 0.176 and 1  0.176 ¼ 0.824. Although designs for other values of the parameters can likewise be found numerically, design problems for a single x can often be solved in such a canonical form, yielding a structure for the designs independent of the particular parameter values. For the upper design point in eqn [20] the linear predictor, h ¼ 0 þ 1  x has the value 1.5434, which is the value needed for the optimal design whatever the parameterization. If we solve eqn [19] for the h giving this value, the upper support point of the design is given by x ¼ ð1:5434  q0 Þ=q1 . For linear regression the D-optimal design puts half of the design weight at the two extreme values of c, whereas, for logistic regression, the design does not span c, provided the region is sufficiently large. Note that as q1 /0, the value of x* increases without limit. This is an example of the result of Cox (1988) mentioned above. In practice, the design region will not be unlimited and, depending on q, the optimal design may put weight on one boundary point and an internal point, or on the two boundary points of c. Ford et al. (1992) provide similar results on the structure of designs for other single-variable generalized linear models. Optimal designs for models with more than one explanatory variable usually have to be found numerically. The optimality of continuous designs can again be checked using an equivalence theorem. For D-optimality, eqn [10] is replaced by the weighted form dðx; xÞ ¼ uðxÞf T ðxÞM1 ðxÞf ðxÞ. Chapter 22 of Atkinson et al. (2007) provides examples of the numerical calculation of locally optimal designs for logistic and gamma models with two explanatory variables. Atkinson and Woods (to appear) collect results on optimal designs for models with multifactor first-order linear predictors. They also calculate Bayesian and other designs when there is uncertainty about the values of the parameters, and discuss designs for dependent data arising from random effects in the model.

Conjoint Choice Designs Conjoint choice experiments are frequently performed in marketing and other fields of behavioral research in order to

Optimal Experimental Design

establish individual preferences between items. The factors are typically discrete and at a few levels. As an example, factors in the choice between laptop computers might include discretized values of weight, RAM, screen size, computational speed, and price. The design region, c consists of all feasible combinations of the factors (and so may not form a full factorial array, since such extremes as cheap fast computers with large screens are not a practical reality). A single observation comes from comparing the desirability of J items formed by choosing J rows from c to give a choice set, s. The utility attached to xj, the jth row of c, is customarily modeled as fxTj b plus an error term. Since only comparisons between items are meaningful, the first-order model excludes a constant. The response to the experiment might be a scoring or ranking of the J items. However, the literature summarized by Kessels et al. (2006) indicates that results closest to purchasing behavior are obtained by asking the subject to choose one of the J items. Then the multinomial logit probability model that profile j in choice set s is chosen as ) ,( J   X  T  T exp xis b [21] pjs ¼ exp xjs b i¼1

The design matrix for choice set s is the J  p matrix, Xs, with rows xTis , ði ¼ 1; .; JÞ. In the simplest case, each subject performs evaluations of N choice sets. The information matrix for the exact design for a single subject is then NMðxN Þ ¼

N X

  XsT Ps  ps pTs Xs

likely to vary between subjects, leading to random effects models (Yu et al., 2009). Further, the experiments are sequential, so that it should be possible to learn about a subject’s parameter value in real time. However, at present, the algorithms are not fast enough to allow updating without severely delaying the subject’s flow of answers.

Nonlinear Regression Models The dependence of designs for generalized linear models on the parameters q entered only through the weight function, u(x). In nonlinear models, such as those derived from chemical kinetics, the response function may depend nonlinearly on all the parameters in a more complicated way. For example, in the compartmental model presented by Box and Lucas (1959) the mean of the regression model is EðYÞ ¼ hðt; qÞ ¼ q3 fexpð  q2 tÞ  expð  q1 tÞg

where ps ¼ (p1s,.,pJs)T, Ps ¼ diag(p1s,.,pJs), and pjs is given by eqn [21]. The number N of choice sets seen by each subject is determined by how quickly subjects are expected to tire. The number of subjects is determined by N, the available resources, and the desired accuracy. The remaining optimal design problem is then to determine the N choice sets of size J that maximize the exact information matrix eqn [22]. Often J is taken as two. Clearly, maximization of a function of eqn [22] is related to the optimal design problems for generalized linear models of Section Generalized Linear Models. In particular, the design depends on the unknown parameters, b through the probabilities pjs in eqn [21]. The problems caused by parameter uncertainty described in Section Optimal Design and Parameter Uncertainty, and the methods for their solution, therefore also apply to choice experiments. The locally optimal design with all elements of b ¼ 0 is called utility neutral, since there is no prior expression of any preference between elements of c. The optimal designs are then related to the factorials and their fractions of Section 2m Factorial Designs and Fractions. Kuhfeld and Tobias (2005) describe algorithms for the construction of such designs. However, many of the sets selected by utility neutral designs may result in noninformative choices. For example, it is clear in comparing computers that higher speed and lower weight are both desirable. Accordingly, Kessels et al. (2006) provide algorithms for D-optimal and other designs for these models. Two further points are important. The model of utility used in eqn [21] assumes a fixed effect, that is, that all subjects have the same value for the parameter b. However, parameters are

ðt  0Þ [23]

with all three parameters positive and q1 > q2. The exponential structure of such models reflects an underlying scheme of firstorder chemical reactions. Locally optimal designs are found by Taylor series expansion. Let q be of dimension p  1, with q0 the vector of p prior values of q. Then the vector of partial derivatives of the model, often called parameter sensitivities, is

[22]

s¼1

261

f T ðt; q0 Þ ¼

  vhðt; qÞ vhðt; qÞ vhðt; qÞ  .  vq1 vq2 vqp q¼q0

which forms the terms of the p variable linearized multiple regression model. The vector of sensitivities for observation i is written, f T ðti ; qÞ and forms row i of the extended design matrix F for this linearized model; the information matrix is FTF. Note that q3 occurs linearly in eqn [23], so that optimal designs will not depend in the value of this parameter. Chapter 17 of Atkinson et al. (2007) describes a variety of approaches to design for nonlinear models. Because of the dependence of the designs on unknown parameter values, the results of Section Optimal Design and Parameter Uncertainty are again pertinent.

Discussion and Literature The principles and theory of optimal experimental design can yield insight into the form of efficient experiments. However, application of the methods to the more complicated, parameter-dependent, designs in Sections Generalized Linear Models; Optimal Design and Parameter Uncertainty; Locally Optimal Designs for Logistic Regression in One Variable; Conjoint Choice Designs and Nonlinear Regression Models may require powerful numerical algorithms. Atkinson et al. (2007) relies on SAS software for the numerical calculation of designs. A general discussion of some algorithms for continuous designs harnessing the power of standard optimization methods is in Sections 9.4 and 9.5. Chapter 12 covers the rather different algorithms needed for the construction of exact D-optimal designs. The book of Goos

262

Optimal Experimental Design

and Jones (2011) proceeds by numerical examples using JMP software to find exact designs and illustrate the methods of optimal design. The basic method is the coordinate-exchange algorithm introduced by Meyer and Nachtsheim (1995). The first exposition of optimal design in its modern form is Kiefer (1959), although the subject goes back to Smith (1918). In addition to the books already cited in this article, recent books on optimal design include Fedorov and Leonov (2014), who provide a solid introduction to the theory of optimal design illustrated with pharmaceutical applications, particularly dose finding. Pronzato and Pázman (2013) present the theory of optimal design for nonlinear models. Berger and Wong (2009) are more concerned with applications.

See also: Binary Response Models and Logistic Regression; Crossover Designs; Experimental Design: Large-Scale Social Experimentation; Experimental Design: Overview; Experimental Design: Randomization and Social Experiments; Generalized Linear Mixed Models; Linear Mixed Models; Model Selection and Model Averaging.

Bibliography Agresti, A., 2002. Categorical Data Analysis, second ed. Wiley, New York. Atkinson, A.C., Biswas, A., 2014. Randomised Response-adaptive Designs in Clinical Trials. Chapman and Hall/CRC Press, Boca Raton, FL. Atkinson, A.C., Donev, A.N., Tobias, R.D., 2007. Optimum Experimental Designs, with SAS. Oxford University Press, Oxford. Atkinson, A.C., Woods, D. Designs for generalized linear models. In: Bingham,D., Dean, A., Morris, M., Stufken, J. (Eds.), Handbook of Experimental Design. Chapman and Hall/CRC Press, Boca Raton, FL, to appear. Berger, M., Wong, W.K., 2009. An Introduction to Optimal Designs for Social and Biomedical Research. Wiley, New York. Box, G.E.P., Cox, D.R., 1964. An analysis of transformations (with discussion). Journal of the Royal Statistical Society, Series B 26, 211–246. Box, G.E.P., Hunter, W.G., 1965. Sequential design of experiments for nonlinear models. In: Proceedings IBM Scientific Computing Symposium: Statistics. IBM, New York, pp. 113–137.

Box, G.E.P., Lucas, H.L., 1959. Design of experiments in nonlinear situations. Biometrika 46, 77–90. Brown, L.D., Olkin, I., Sacks, J., Wynn, H.P. (Eds.), 1985. Jack Carl Kiefer Collected Papers III. Wiley, New York. Chaloner, K., Verdinelli, I., 1995. Bayesian experimental design: a review. Statistical Science 10, 273–304. Correll, J., Park, B., Judd, C.M., Writtenbrink, B., Sadler, M.S., Keesee, T., 2007. Across the thin blue line: police officers and racial bias in the decision to shoot. Journal of Personality and Social Psychology 92, 1006–1023. Cox, D.R., 1988. A note on design when response has an exponential family distribution. Biometrika 75, 161–164. Dror, H.A., Steinberg, D.M., 2008. Sequential experimental designs for generalized linear models. Journal of the American Statistical Association 103, 288–298. Fedorov, V.V., Leonov, S.L., 2014. Optimal Design for Nonlinear Response Models. Chapman and Hall/CRC Press, Boca Raton, FL. Ford, I., Torsney, B., Wu, C.F.J., 1992. The use of a canonical form in the construction of locally optimal designs for non-linear problems. Journal of the Royal Statistical Society, Series B 54, 569–583. Goos, P., Jones, B., 2011. Optimal Design of Experiments: A Case Study Approach. Wiley, New York. Kessels, R., Goos, P., Vandebroek, M., 2006. A comparison of criteria to design efficient choice experiments. Journal of Marketing Research 43, 409–419. Kiefer, J., 1959. Optimum experimental designs (with discussion). Journal of the Royal Statistical Society. Series B 21, 272–319. Kiefer, J., Wolfowitz, J., 1959. Optimum designs in regression problems. Annals of Mathematical Statistics 30, 271–294. King, J., Wong, W.K., 2000. Minimax D-optimal designs for the logistic model. Biometrics 56, 1263–1267. Kuhfeld, W., Tobias, R., 2005. Large factorial designs for product engineering and marketing research applications. Technometrics 47, 132–141. McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, second ed. Chapman and Hall, London. Meyer, R.K., Nachtsheim, C.J., 1995. The coordinate exchange algorithm for constructing exact optimal experimental designs. Technometrics 37, 60–69. Pronzato, L., Pázman, A., 2013. Design of Experiments in Nonlinear Models. Springer, New York. Sherman, L., Strang, H., 2004. Verdicts or inventions? Interpreting results from randomized controlled experiments in criminology. American Behavioral Scientist 47, 575–607. Sitter, R.R., 1992. Robust designs for binary data. Biometrics 48, 1145–1155. Smith, K., 1918. On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations. Biometrika 12, 1–85. Yu, J., Goos, P., Vandebroek, M., 2009. Efficient conjoint choice designs in the presence of respondent heterogeneity. Marketing Science 28, 122–135.