Efficient Bayesian designs under heteroscedasticity

Efficient Bayesian designs under heteroscedasticity

Journal of Statistical Planning and Inference 104 (2002) 469–483 www.elsevier.com/locate/jspi E&cient Bayesian designs under heteroscedasticity Liev...

213KB Sizes 0 Downloads 64 Views

Journal of Statistical Planning and Inference 104 (2002) 469–483

www.elsevier.com/locate/jspi

E&cient Bayesian designs under heteroscedasticity Lieven Tack ∗ , Peter Goos, Martina Vandebroek Faculty of Economics and Applied Economics, Quantitative Methods, Katholieke Universiteit Leuven, Naamsestraat 69, B-3000 Leuven, Belgium Received 7 July 2000; received in revised form 1 June 2001; accepted 26 June 2001

Abstract In optimum design theory designs are constructed that maximize the information on the unknown parameters of the response function. The major part deals with designs optimal for response function estimation under the assumption of homoscedasticity. In this paper, optimal designs are derived in case of multiplicative heteroscedasticity for either response function estimation or response and variance function estimation by using a Bayesian approach. The e&ciencies of the Bayesian designs derived with various priors are compared to those of the classic designs with respect to various variance functions. The results show that any prior knowledge about the sign of the variance function parameters leads to designs that are considerably more e&cient c 2002 Elsevier Science B.V. All than the classic ones based on homoscedastic assumptions.  rights reserved. MSC: 62K05 Keywords: Bayesian design; D-optimality; Experimental design; Heteroscedasticity

1. Introduction Since the late 1950s Taguchi (Taguchi, 1986, 1987) has introduced several new statistical tools and concepts of quality improvement that depends more on statistical theory for design of experiments. He highlighted the necessity to develop experimental strategies to achieve some target values for the expected value of certain characteristics while at the same time minimizing their variance. This so-called robust design is a method for making a manufacturing process less sensitive to manufacturing variations and is extensively discussed in Nair (1992). An interesting approach assumes that the response variance, or a suitable transformation thereof, may be well approximated by a linear model in the independent variables. Vining and Myers (1990) introduced the idea of estimating separate linear models for the response and variance structure. Separate models have the advantage of providing the analyst with a better scienti?c ∗ Corresponding author. E-mail address: [email protected] (L. Tack).

c 2002 Elsevier Science B.V. All rights reserved. 0378-3758/02/$ - see front matter  PII: S 0 3 7 8 - 3 7 5 8 ( 0 1 ) 0 0 2 5 6 - 7

470

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

understanding of the total process. The analyst is better able to see what levels of the independent factors will lead to acceptable response values as well as acceptable variability. Most literature on optimal design assumes errors with constant variance. In this paper, the assumption of a constant variance is weakened and optimal designs will be constructed under the assumption of multiplicative heteroscedasticity for either response function estimation or response and variance function estimation. The resulting designs depend on the unknown parameters of the variance function and Bayesian experimental design theory provides design criteria reCecting prior knowledge of these parameters. The basic idea is to give a more thorough analysis of the impact of the prior information on the e&ciencies of Bayesian designs. Various priors will be used and the e&ciencies of the resulting designs will be compared to the classic ones, based on homoscedastic assumptions, with respect to various variance functions. 2. Bayesian D-optimal designs This section starts with the formulation of the assumed heteroscedastic model for both response and variance function estimation. Henceforth y denotes the response of interest and x and z are the p×1 and q×1 vectors of control variables presumed to inCuence the response function and variance function, respectively. Denote by f(x) the pr ×1 vector representing the polynomial expansion of x for the response model and by g(z) the qv ×1 vector representing the polynomial expansion of z for the variance model. Both f(x) and g(z) contain an intercept. With  and  the pr ×1 and qv ×1 vectors of unknown parameters, the following heteroscedastic model is assumed:  y = f  (x) + v[g (z)]: (1) Note that v¿0 is a known twice-continuously diJerentiable function. The error term  is standardized such that E() = 0 and VAR() = 1, yielding the following mean and variance functions: E(y) = f  (x);

(2)

VAR(y) = v[g (z)]:

(3)

The optimum design problem is concerned with the selection of the number of observations at the diJerent design points at which to observe the response y in order to provide maximal information on (; ) or some subset thereof. One of the many optimality criteria is the widely used D-optimality criterion in which designs with the largest determinant of the total information matrix for the parameters will be preferred. With the signi?cance level, the 100(1 − )% con?dence region for all elements of  or  is an ellipsoid in the pr - or qv -dimensional space, respectively, of which the volume is inversely proportional to the square root of the determinant of the information matrix. In consequence, the D-optimality criterion minimizes the volume of the

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

471

ellipsoid representing the con?dence region. The per observation information matrix on (; ) equals  2  2  I(x; z) = − E 

@ log L @2 2

@ log L @@

@ log L @@ 2

@ log L @2

 ;

(4)

where L is the likelihood function. Under the assumption of independent standard normal error terms ; L becomes  1 1 [y − f  (x)]2 √ exp − L(y; x; z|; ) =  : (5) 2 v[g (z)] v[g (z)] 2 Substituting the expression of the likelihood function (5) into the expression of the per observation matrix (4) yields  f(x)f  (x)  0 v[g (z)]   I(x; z) =  (6) ;

  2 v [g (z)]  1 g(z)g (z) 0 v[g (z)] 2 where v indicates the ?rst derivative of v with respect to . After summation of the per observation matrices (6) over all N observations, the total information in the design {x; z}Ni=1 equals  N f(xi )f  (xi )  0  (z )] i=1 v[g i N   I(xi ; zi ) =  (7) : N 1 v [g (zi )] 2 i=1  g(z )g (z ) 0 i i i=1 2 v[g (zi )] The determinant of the information matrix (7) becomes





N 2

N

   N





I(xi ; zi ) = f(xi )f (xi ) 1

v [g (zi )] g(zi )g (zi )

:

v[g (zi )] 2qv

v[g (zi )] i=1 i=1 i=1

(8)

Considering the case of multiplicative heteroscedasticity by replacing function v by the exponential function is conventional and quite useful since v becomes a nonnegative function that is very Cexible. Expression (8) then simpli?es to



N

 N



N



I(xi ; zi ) = f(xi )f (xi ) 1 g(zi )g (zi ) : (9)

exp[g (zi )] 2qv

i=1 i=1 i=1 Unfortunately, the determinant of the information matrix depends on the unknown vector . One possibility to solve this problem is a Bayesian approach to design optimality. An extensive review of developments in Bayesian design theory can be found in Chaloner and Verdinelli (1995). Dette and Wong (1998) analytically derive Bayesian D-optimal designs for multiplicative heteroscedastic models in one variable and with one support point more than the degree of the polynomial model. Chaloner (1984) derives Bayesian design optimality criteria for the linear model when the parameters of

472

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

the mean function are supposed to follow a normal distribution. Chaloner and Larntz (1989) extend these ?ndings to nonlinear models. Dumouchel and Jones (1994) propose Bayesian D-optimal designs with reduced dependence on regressor speci?cation and Atkinson and Cook (1995) suppose a discrete prior distribution for . Vining and Schaub (1996) make use of a multivariate normal distribution N (0 ; I ) as a prior for  and propose a semi-Bayesian approach in that they maximize the determinant of the expected information matrix. Vandebroek and Goos (1997) study the impact of the mean 0 of the prior distribution and of the uncertainty  about this mean for that approach. Vining and Schaub (1996) use a semi-Bayesian approach by maximizing the determinant of the expected information matrix rather than using the correct criterion of maximization of the expected determinant of the information matrix. Furthermore, they omit the parameter  in the determinant of the expected information matrix for no obvious reason. Maximization of the expected determinant of the information matrix is elaborated in this paper. It is worth mentioning that both approaches give rather diJerent results. In the remainder of this exposition a discrete approximation of the multivariate normal distribution N (0 ; I ) is used as a prior distribution for . Using a discrete approximation avoids computationally intensive routines and is less time consuming than using the continuous normal distribution. This discrete approximation contains g support points 0j and corresponding probabilities p (0j ), with j ∈ {1; : : : ; g}. The expected determinant of the information matrix (9) then becomes



g N



N f(xi )f  (xi ) 1

 g(zi )g (zi )

: (10) p (0j )  (z ) ]

2qv

exp[g i 0j i=1 i=1 j=1 When the design is built up with l diJerent design points xi and design point xi is replicated ni times, the expected determinant in expression (10) can be rewritten as



g l



l ni f(xi )f  (xi ) 1

 p (0j ) ni g(zi )g (zi )

: (11)  (z ) ]

2qv

exp[g i 0j j=1 i=1 i=1 l Note that i=1 ni = N . In (11) equal importance is attached to the parameters of the response function and the variance function. However in practice, the researcher may have interest in estimating the model parameters with unequal importance. Based on Dette and Wong (1998), this is possible through restating (11) as the compound criterion





g l





l ni f(xi )f  (xi ) 

p (0j ) log

+ (1 − ) log (ni =2)g(zi )g (zi )

; 

i=1 exp[g (zi )0j ] j=1 i=1 (12) where a larger value of  ∈ [0; 1] puts more weight on making inference about the parameters of the response function. In the sequel of this paper we will work with criterion (11) which is equivalent to setting  = 0:5 in (12). The degree of uncertainty attached to the prior is expressed by parameter  with increasing  indicating

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

473

higher uncertainty. Each value of  yields another discrete prior distribution for  with probabilities p (0j ) for j ∈ {1; : : : ; g}. Imposing the condition  = 0 is nothing else than stating that the parameter vector  is exactly known in advance, whereas the case  = ∞ corresponds to a uniform prior distribution with p (0 ) = 1=g. The classic homoscedastic approach corresponds to choosing the mean 0 of the prior distribution equal to (1; 0; : : : ; 0) and  = 0. The experiment is then planned assuming a constant variance over the region of interest. We developed an algorithm to compute D-optimal designs for diJerent prior distributions for . The algorithm is based on complete enumeration rather than preferring an exchange algorithm since the latter approach often ends up with a locally optimal design. The input to the algorithm consists of the number of observations n, the factors inCuencing the mean and the variance function, respectively, the polynomial expansion for the mean and the variance model, the parameter  and the prior mean 0 . The list of l candidate or support points is also user-speci?ed or computed as in Atkinson and Donev (1992). After reading the input, the design problem boils down to the selection with replacement of n design points out of the list of l candidate points. With n design points and l candidate points, (n + l − 1)!={n!(l − 1)!} possible designs have to be investigated. The design leading to the largest criterion value (11) will be preferred. In the next sections, the computed designs will be compared to the classic designs, in order to stipulate whether any prior information leads to considerably more e&cient designs.

3. E ciencies under multiplicative heteroscedasticity The degree to which the experimental goals are achieved with the number of observations allotted is measured by the e&ciency. The D-e&ciency of an arbitrary design, represented by a measure  referring to the chosen design points and their corresponding number of replicates, is de?ned in Atkinson and Donev (1992) as 1=p  |M ()| DeJ = ; (13) |M (∗ )| where M denotes the information matrix, ∗ stands for a measure referring to the D-optimal design and p is the number of parameters in the model. Taking the ratio of the determinants in (13) to the power 1=p results in an e&ciency measure which is proportional to design size, irrespective of the dimension of the model. This means for instance that two replicates of a design measure for which DeJ = 0:5 would be as e&cient as one replicate of the optimum measure. Denote the number of observations at design point (xi ; zi ) for the D-optimal design under heteroscedasticity and for the D-optimal design in case of homoscedasticity respectively as ni (0 ; ) and nci , where c refers to the constant variance. Note that for the Bayesian design the numbers ni depend on the prior mean 0 and the degree of uncertainty . For sake of briefness, from this time on, the dependence of ni on 0 and  will no longer be rendered. The

474

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

D-e&ciency of the computed Bayesian D-optimal design for a prior distribution of the parameter vector  relative to the classic design and for a particular variance function vector  equals *



1=(pr +qv )



l f(xi )f (xi )ni 1qv

l g(zi )g (zi )ni

i=1   i=1 exp[g (zi )* ] 2 

: (14)

   l f(x )f  (x )nc l i i 1 i  (z )nc

g(z )g

 q i i i i=1

i=1 exp[g (zi )* ] 2 v Expression (14) measures how well the Bayesian D-optimal design derived with prior mean 0 and degree of uncertainty  (numerator) performs relative to the classic design (denominator), for a particular variance function vector  that is unknown in practice * and assessed by the discrete distribution with prior mean 0 . When we con?ne ourselves to response function estimation, the e&ciency becomes

1=pr 



l f(xi )f (xi )ni i=1 exp[g (z ) ]

i  *    (15)  l f(x )f  (x )nc  : i i i



i=1 exp[g (zi )* ] This paper deals with the question whether any prior knowledge about the variance function leads to considerably more e&cient designs compared to the classic ones where constant variance is assumed. Henceforth, for reasons of graphical simpli?cation, x = z is supposed. In other words, the factors possibly inCuencing the mean and variance function are the same. Of course, the approach can easily be extended to practical situations in which x = z. Besides, only two factor cases will be reported of which the design region is of a discrete form and restricted to a 3×3 grid on  = [ − 1; 1]2 . The adopted mean function is the full-second order polynomial de?ned by f  (x) = (1; x1 ; x2 ; x12 ; x22 ; x1 x2 ):

(16)

This paper deals with both ?rst order variance functions without interaction and ?rst order variance functions augmented with an interaction term x1 x2 . The corresponding polynomials are given by g (x) = (1; x1 ; x2 );

(17)

g (x) = (1; x1 ; x2 ; x1 x2 );

(18)

respectively. D-optimal designs will be computed for diJerent prior distributions for  with prior mean 0 and degree of uncertainty . Special attention will be given to the case where constant variance is assumed and no possible misspeci?cation ( = 0) of the variance function parameters is taken into account. The vector representations for the parameters of the ?rst order variance function and the parameters of the ?rst order variance function with interaction eJect are  = (1;

(1)

;

(2)

);

 = (1;

(1)

;

(2)

;

(19) (3)

);

(20)

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

475

Fig. 1. Discrete prior distributions for  ∈ {0; 0:015; 0:04; 0:1; 0:3; 3}.

respectively. For (19), the g = w2 support points 0j of the prior distribution are assumed to be on a square w×w grid, whereas for (20) the g = w3 support points 0j are assumed to be on a cubic w×w×w grid. The prior means in the middle of the support (2) (1) (2) (3) grid are represented as 0 = (1; (1) 0 ; 0 ) or (1; 0 ; 0 ; 0 ) depending on whether an interaction term is present or not. For example, for the ?rst order variance function (17) without interaction eJect a number of discrete prior distributions is given in Fig. 1. The ?gure shows six 7×7 grids, representing six prior distributions with g = w2 = 49 grid points 0j . The midpoint of each grid represents the prior mean 0 and the length of a bar is proportional to the probability attached to the corresponding support point. As already said, increasing  means introducing larger variance in the discrete approximation of the multivariate distribution N (0 ; I ) and corresponds to an increased uncertainty about the prior mean. For instance, the case  = 3 in Fig. 1 corresponds to an almost uniform prior distribution. In the sequel of this paper, D-optimal designs will be calculated for various values of  and the e&ciency will be measured relative to the classic D-optimal design, for (2) (3) particular variance function vectors  from which the components (1) ∗ , ∗ and ∗ * vary between −3 and +3. The number w and the constant distance between any two grid points are user-speci?ed.

4. Discussion of results This section deals with the calculation of D-optimal designs for various prior distributions which are compared with the classic designs that assume homoscedasticity. The D-optimal designs are computed for N = 36 observations but the results can be

476

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

Fig. 2. E&ciency plots for prior mean (1; 0:40; 0:23) , response function.

generalized to other numbers of observations. DiJerent values for w were investigated but these numbers only inCuence the results to a very small extent and the main conclusion of the paper remains the same. In the sequel of this paper only the cases w = 3 and w = 7 are reported for variance functions with and without interaction, respectively. 4.1. First order variance functions without interaction In this section we con?ne ourselves to ?rst order variance functions with the polynomial expansion given in (17). Several degrees of uncertainty  between 0 and 4 were investigated. Only those degrees of uncertainty at which changes in the computed D-optimal design occurred will be discussed. By this, ?ve degrees of uncertainty about the mean of the prior distribution are mentioned, namely  ∈ {0; 0:3; 0:5; 1; 3}. The clas(2) sic D-optimal designs assume constant variance ( (1) 0 = 0 = 0) and do not take into account any misspeci?cation ( = 0). DiJerent prior means were investigated but only a few examples are taken in for discussion. Suppose that due to prior knowledge, one assumes that the variance function vector  has mean 0 = (1; 0:40; 0:23) . The D-optimal designs for response function estimation and for response and variance function estimation depend on the uncertainty attached to the prior mean. For increased  the design points show the tendency to move to the midpoints of the outer sides of the design region. It was observed that when also the variance function is of interest, the observations tend to shift towards the cornerpoints of the design region. By this, more information for variance function estimation is obtained because a ?rst order variance function is assumed. The same insights were observed for other prior means. According to (15) and (14), Fig. 2 displays for various vectors  with components belonging to [ − 3; 3] the e&ciency ratios for * response function estimation and for response and variance function estimation respectively. Fig. 2a shows that the Bayesian D-optimal design for prior mean (1; 0:40; 0:23) is more e&cient than the classic one (e&ciency ratio (15) larger than one) for variance (2) function vectors  with components (1) ∗ and ∗ that have the same sign as those of *

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

477

Table 1 −p=2 in volume of con?dence region Reduction 1 − DeJ DeJ

p=6

p=9

p = 10

1.02 1.04 1.06 1.08 1.10 1.12

0.06 0.11 0.16 0.21 0.25 0.29

0.09 0.16 0.23 0.29 0.35 0.40

0.09 0.18 0.25 0.32 0.38 0.43

the prior mean, i.e. both components positive. Maximal e&ciency improvements of about 9.6% were observed. When uncertainty about the prior mean grows, the surface across which the Bayesian design is more e&cient expands (Figs. 2b and c). The importance of such e&ciency improvements can be seen when deriving the relationship between e&ciency improvement and reduction in the volume of the con?dence ellipsoid of the parameter estimates. It was already stated that the 100(1 − )% con?dence region for the parameter estimates is an ellipsoid of which the volume V is inversely proportional to the square root of the determinant of the information matrix of a design : 1 V ˙  : (21) |M ()| Furthermore, according to (13), it follows that for two design measures 1 and 2 , p |M (1 )| = DeJ |M (2 )|:

(22)

From (21) and (22): −p=2 V1 ˙ DeJ V 2 ;

(23)

p=2 times smaller than V2 if DeJ ¡1. Note that for response or the volume V1 is DeJ function estimation and for response and variance function estimation p = 6 and 9; respectively, for the polynomial expansions given in (16) and (17). The parameter p equals 10 when an interaction term is introduced in the polynomial expansion of the variance function. Table 1 shows the reduction in volume of the con?dence region for diJerent values of DeJ . It follows that reductions of 20% and more are easily attainable with limited e&ciency improvements. The e&ciency plots for the D-optimal designs for mean (1; 0:40; 0:23) for response and variance function estimation are shown in Fig. 3. Again, the Bayesian design performs better (e&ciency ratio (14) larger than one) for variance function vectors  * (2) with positive components (1) ∗ and ∗ . E&ciency improvements of up to 10% can be observed in Fig. 3a. A number of standard second order response surface designs will be compared to the computed Bayesian designs for response function and response and variance function estimation, computed with prior mean (1; 0:4; 0:23) . This is done in terms

478

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

Fig. 3. E&ciency plots for prior mean (1; 0:40; 0:23) , response and variance.

of the D-e&ciencies (15) and (14), respectively. The numerator in both expressions then relates to the classic design whereas the denominator refers to the computed Bayesian design. The comparison will be made for several actual variance function vectors (1; ∗ ; ∗ ) , with ∗ ∈ [ − 3; 3]. However, in order to make valid comparisons, the D-e&ciencies (15) and (14) have to be corrected for design size by multiplying both expressions by N=Nc , where N and Nc , respectively, denote the size of the Bayesian design and the standard design. We con?ne ourselves to the case  = 0:5, but similar results were obtained for other degrees of uncertainty. Among the standard second order designs, we consider the full 32 -factorial and four central composite designs: √ • a rotatable CCD with Nc = 36 observations and axial spacing equal to 2, • an orthogonal CCD with Nc = 36 observations, 12 center points and axial spacing equal to 1.21, • an orthogonal rotatable CCD with Nc = 32 observations, 16 center points and axial √ spacing 2, √ • and a uniform-precision CCD with axial spacing 2; Nc = 39 observations and 15 center points. A uniform precision CCD is a CCD for which the variance of the predicted response is the same at the design centre and a unit from the centre. To fairly compare the Bayesian designs with the central composite designs, the latter designs have to be scaled to the same region of interest such that the star points are on a radius equal to one. The D-e&ciencies of the standard designs relative to the computed Bayesian designs are shown in Fig. 4 for several actual variance function parameters ∗ . The left ?gure shows the e&ciencies for response function estimation only, whereas the ?gure on the right displays the D-e&ciencies when interest is also in the variance function. It follows that the computed Bayesian designs outperform the standard designs for positive values of ∗ , that is when the signs of the variance function parameters are speci?ed correctly. Alternatively stated, standard designs often perform poorly since

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

479

Fig. 4. E&ciencies of standard designs relative to Bayesian designs for variance functions without interaction.

they do not take into account knowledge about the sign of the variance function parameters. The outperformance of the Bayesian designs especially comes true when the Bayesian designs are compared with the central composite designs. However, when the variance function parameters are strongly misspeci?ed, i.e. for negative values of 2 ∗ in the example considered, the full 3 -factorial performs better than the computed Bayesian design. Furthermore, for response and variance function estimation, the central composite designs tend to become more e&cient than the Bayesian design for ∗ much lower than −3. 4.2. First order variance functions with interaction In this section a ?rst order variance function with interaction term x1 x2 is used as given in (18). This corresponds to a vector for the variance function parameters composed of four components as in (20). Again, several degrees of uncertainty  increasing from 0 to the one representing a nearly uniform prior distribution were investigated. Only those degrees of uncertainty at which changes in the computed D-optimal design occurred, will be discussed. Five degrees of uncertainty about the mean of the prior distribution are mentioned, namely  ∈ {0; 0:015; 0:02; 0:04; 1}. The question is whether the above conclusion for ?rst order variance functions without interaction also holds for variance functions with an interaction term. Fig. 5 displays the e&ciency plots for response function estimation for prior mean (1; 0:4; 0:23; 0:2) and  ∈ {0; 0:015; 0:02} for various values of the interaction eJect (3) ∗ . One establishes that, similar to the ?rst order case without interaction, the Bayesian

480

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

Fig. 5. E&ciency plots for Prior mean (1; 0:4; 0:23; 0:2) , response function,  ∈ {0; 0:015; 0:02}. (2) design performs better than the classic one for components (1) ∗ and ∗ with the same (1) (2) sign as the components 0 and 0 of the prior mean. This conclusion holds for any in(3) teraction eJect (3) ∗ (even if it is opposite signed to the interaction eJect 0 of the prior mean) and especially comes true when the interaction eJect (3) ∗ is in the neighbourhood of the interaction eJect (3) of the prior mean. Maximal e&ciency improvements 0 of about 10% were observed. Increased uncertainty about the prior mean—reCected by rised —again leads to expanded surfaces across which the e&ciency ratio becomes larger than one. Fig. 6 displays the e&ciencies of the standard second order response surface designs with respect to the Bayesian design computed with prior mean (1; 0:4; 0:23; 0:2) and  = 0:5. The actual variance function parameters ∗ in (1; ∗ ; ∗ ; ∗ ) range between −3 and 3 and other degrees of uncertainty  gave similar results. It is shown that for positive values of ∗ , the Bayesian designs again outperform the standard designs. As opposed to the central composite designs, the full 32 -factorial becomes more e&cient than the Bayesian design for negative values of ∗ , i.e. when the variance function is strongly misspeci?ed. Other interaction eJects, estimation of both response and variance function and prior means close to the origin gave similar results for the e&ciency plots and for the

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

481

Fig. 6. E&ciencies of standard designs relative to Bayesian designs for variance functions with interaction.

comparison with the standard second order designs. As a conclusion, classic response surface designs may perform poorly since they ignore prior knowledge of the sign of the variance function parameters.

5. Example This example is based on an experiment reported by Derringer (1974) in which the ◦ response is the viscosity of styrene butadiene rubber blends at 100 C. Most commercial elastomer formulations contain various amounts and types of ?llers, all of which exert major eJects on the viscosity of the system. The ?ller used is Silica A. The ?rst quantitative factor is the ?ller level and the second one is the level of naphtenic oils, both measured in parts per hundred of the pure elastomer. The factor levels are 0, 30, 60 and 0, 10, 20, respectively. The number of observations equals 36. One can learn from experience that the variance of the measured viscosity increases with increased ?ller level and reduced level of naphtenic oils. Since the magnitude of the variance function parameters is unknown, the process engineer supposes little deviation from constant variance, expressed by a prior mean 0 = (1; 0:03; −0:03) and  = 1. The classic D-optimal design and the Bayesian one are depicted in Fig. 7. Both designs diJer in the number of replications at four design points. E&ciency improvements of more than 5% were obtained, resulting into a reduction of up to 20% in the volume of the con?dence ellipsoid of the parameter estimates.

482

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

Fig. 7. D-optimal designs in viscosity example.

6. Conclusion In this paper the bene?t of prior knowledge about the variance function parameters in D-optimal design construction was analyzed. For discrete approximations of multivariate normal distributions of the variance function parameters, D-optimal designs were derived for response function estimation and both response and variance function estimation. The e&ciencies of the computed designs were compared with those obtained by the assumption of constant variance, the so-called homoscedastic classic designs. The eJect of the presence of an interaction term in the variance function on the design e&ciency was also studied. It has turned out that prior knowledge about the sign of the variance function parameters leads to D-optimal designs that outperform the classic ones for a wide range of actual variance function parameters. Consequently, there is a real danger that classic designs perform poorly since they do not allow for knowledge about the sign of the variance function parameters. The outperformance of the Bayesian designs was even stronger when uncertainty about the prior mean was taken into account. In practice, this means that incorporating knowledge about the sign of the parameters of the variance function leads to designs that outperform the designs based on the assumption of constant variance. Acknowledgements This research was carried out while the second author was a research assistant of the Fund for Scienti?c Research-Flanders (Belgium). References Atkinson, A.C., Cook, R.D., 1995. D-optimal designs for heteroscedastic linear models. J. Amer. Statist. Assoc. 90, 204–212. Atkinson, A.C., Donev, A.N., 1992. Optimum Experimental Designs. Clarendon Press, Oxford. Chaloner, K., 1984. Optimal Bayesian experimental design for linear models. Ann. Statist. 12, 283–300. Chaloner, K., Larntz, K., 1989. Optimal Bayesian design applied to logistic regression experiments. J. Statist. Plann. Inference 21, 191–208.

L. Tack et al. / Journal of Statistical Planning and Inference 104 (2002) 469–483

483

Chaloner, K., Verdinelli, I., 1995. Bayesian experimental design: a review. Statist. Sci. 10, 273–304. Derringer, G.C., 1974. An empirical model for viscosity of ?lled and plasticized elastomer compounds. J. Appl. Polym. Sci. 18, 1083–1101. Dette, H., Wong, W.K., 1998. Bayesian D-optimal designs on a ?xed number of design points for heteroscedastic polynomial models. Biometrika 85, 869–882. Dumouchel, W., Jones, B., 1994. A simple Bayesian modi?cation of D-optimal designs to reduce dependence on an assumed model. Technometrics 36, 37–47. Nair, V.N., 1992. Taguchi’s parameter design: a panel discussion. Technometrics 34, 127–161. Taguchi, G., 1986. Introduction to Quality Engineering: Designing Quality into Products and Processes. Asian Productivity Organization, Tokyo. Taguchi, G., 1987. System of Experimental Design, Vol. 1 and 2. Asian Productivity Organization, Tokyo. Vandebroek, M., Goos, P., 1997. Semi-Bayesian D-optimal designs and estimation procedures for mean and variance functions. Research Report No. 9745, DTEW, K.U.Leuven. Vining, G., Myers, R.H., 1990. Combining Taguchi and response surface philosophies: a dual response approach. J. Qual. Tech. 22, 38–45. Vining, G., Schaub, D., 1996. Experimental designs for estimating both mean and variance functions. J. Qual. Tech. 28, 135–147.