Journal of Statistical Planning and Inference 144 (2014) 63–68
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: www.elsevier.com/locate/jspi
Maximin distance optimal designs for computer experiments with time-varying inputs and outputs Max D. Morris a,b,n a b
Department of Statistics, Iowa State University, United States Department of Industrial and Manufacturing Systems Engineering, Iowa State University, United States
a r t i c l e i n f o
abstract
Available online 27 September 2012
Computer models of dynamic systems produce outputs that are functions of time; models that solve systems of differential equations often have this character. Time-indexed inputs, such as the functions that describe time-varying boundary conditions, are also common with such models. Morris (2012) described a generalization of the Gaussian process often used to produce ‘‘meta-models’’ when inputs are finite-dimensional vectors, that can be used in the functional input setting, and showed how the maximin distance design optimality criterion (Johnson et al., 1990) can also be extended to this case. This paper describes an upper bound on the maximin distance criterion for functional inputs. A class of designs that are optimal under certain conditions is also presented; while these designs are of limited practical value, they show that the derived bound cannot be improved in the general case. & 2012 Elsevier B.V. All rights reserved.
Keywords: Computer experiment Dynamic model Gaussian stochastic process Maximin distance design Meta-model Surrogate
1. Introduction and notation The statistical treatment of computer experiments, in which a computer model of a physical system is the object of experimental focus, has received substantial attention in the statistical literature over the last two decades. The computer model is often an implementation of a complicated deterministic function for which evaluation requires the specification of quantities that complete the definition of a problem to be solved. Because the function is complex, it is often difficult or impossible to understand analytically, and so it is implemented as a computer model y ¼ f ðxÞ, where x represents the variables for which values must be specified, or inputs; f is the computer model; and y represents the variables that constitute the problem solution, or outputs. f is generally also complex and its evaluation may require substantial computational effort. A common final or intermediate goal of a computer experiment is the development of a surrogate or meta-model, an approximation of the computer model based on a finite number of executions of f, that can be used to quickly predict what the outputs would be at new values of the inputs along with useful quantification of the uncertainty associated with these predictions. A few of the broadly cited papers on this topic are Sacks et al. (1989), Currin et al. (1991), Kennedy and O’Hagan (2001), and Higdon et al. (2004). In each of these papers, and many others, a Gaussian stochastic process (GaSP) is used as the statistical basis for formulating a surrogate for predicting a scalar- or vector-valued y from a scalar- or vector-valued x. In many applications, the computer models of greatest interest are written to mimic dynamic physical processes; the literature describing methodology based on GaSP meta-models includes examples involving DNA population dynamics
n Correspondence address: Department of Statistics, Iowa State University Ames, IA 50014, United States. Tel.: þ1 515 291 2775; fax: þ 1 515 294 4040. E-mail address:
[email protected]
0378-3758/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jspi.2012.09.005
64
M.D. Morris / Journal of Statistical Planning and Inference 144 (2014) 63–68
(e.g. Henderson et al., 2010), the design of helicopter rotor blades (Booker et al., 1999), and the global climate (e.g. Tebaldi et al., 2005). These computer models are often based on systems of ordinary or partial differential equations that require specification of time-varying quantities as inputs, e.g. boundary conditions or so-called ‘‘forcing functions’’, and their solutions are outputs that are also functions of time (and sometimes also of space). Smooth functional outputs can often be reduced to vectors of low dimension for practical analysis through, for example, principle components representation, e.g. Higdon et al. (2008). Morris (2012) introduced a generalization of the GaSP meta-model for the case in which the inputs x are expressed as a function of time t, along with a corresponding generalization of the maximin distance design criterion (Johnson et al., 1990), along the following lines. An experimental design is a collection of n functional inputs, D ¼ fx1 ðtÞ,x2 ðtÞ, . . . ,xn ðtÞg, defined for t 2 ½0,T. The GaSP treats the responses at a particular time s that result from different input functions xi and xj s s as correlated with Corrðyi ðsÞ,yj ðsÞÞ ¼ expðydij Þ, where dij is called the correlation distance between xi and xj over ½0,s, but treats responses at any two distinct times as independent whether they are generated by the same input function or not. One way to look at this is to say that the correlation distance associated with any two responses at different times is infinite. This modeling assumption is admittedly non-intuitive, since functional output that is continuous in time suggests dependence across time, but incorporating such dependence is not straight-forward in this case. For models that have vector-valued inputs and functional outputs, temporal correlations can be incorporated in a simple way through the separable or product correlation structure, with one factor reflecting temporal correlation over the time interval ðs1 ,s2 Þ, and the remaining factors expressing correlation associated with the inputs used in modeling output at either t1 or t2. However, in our context Corrðyi ðs1 Þ,yj ðs1 ÞÞaCorrðyi ðs2 Þ,yj ðs2 ÞÞ, preventing the use of this approach. As a practical matter, the assumption of temporal independence may be less critical than it initially seems. For example, where input-time product correlations are appropriate and observed data include outputs at each of a collection of times S for selected inputs, predictive interpolation for any time in S depends only on data observed at that time, i.e. data at other times are not additionally informative. (See, for example, Drignei and Morris, 2006.) Generalizing a commonly used product correlation model described by Sacks et al. (1989) for vector-valued inputs, the Rs a s correlation distance between inputs xi and xj through time s is defined as dij ¼ 0 ws ðsuÞ9xi ðuÞxj ðuÞ9 du for a specified a 2 ð0,2 and ws defined over ½0,s. A popular choice is a ¼ 2, the so-called ‘‘Gaussian’’ correlation form, which is particularly appropriate for modeling outputs that are smooth functions of inputs, a condition that is generally true for models that implement systems of partial differential equations. The positive-valued weight functions ws ðÞ, one defined for each time for which output is to be predicted, generalize the individual weights associated with each dimension of the usual vectorinput model. Effective specification of ws requires at least some understanding of the computer model under study. For example, for systems in which outputs are more dependent on recent inputs than those in the more distant past, reasonable weight functions generally take larger values for small arguments (recent times) and smaller values for larger arguments; the example of Section 3.1 features a very simple weight function of this type. In contrast, for systems that are sensitive to early inputs but become less influenced by inputs as time progresses, larger values of ws may be associated with larger values of its argument (more distant times). Suppose we want to predict outputs over a set of times S. The maximin (Mm) distance criterion for D over output times in S, under a model of independence between outputs at any two distinct times, is
f¼
s
min
di,j
1 r i o j r n,s2S
ð1Þ
and a Mm distance design for a given value of n is any D for which f is maximized. Morris (2012) constructed a Mm distance design numerically for an experiment involving a dynamic model of the reaction of stem cell ‘‘kinetics’’ to exposure to ionizing radiation. In Section 2 of this paper, we derive a general upper bound for f for even values of n. In Section 3, we present a class of designs for n ¼ 0½mod 4 that achieve this bound. The designs in this class are of limited practical value, but serve to show that the bound of Section 2 cannot be improved in the general case.
2. An upper bound for / Suppose for simplicity that the problem is scaled so that T¼1, and that each input function is constrained so that 0 rx r 1, the latter being required so that distance is constrained. Further, require that for each s 2 S Z s ws ðsuÞ du ¼ 1 ð2Þ 0 s
so that distances di,j are comparable across all s 2 S. Result. Restrict n to be an even integer. Then an upper bound on f (as defined in Eq. (1)) is 12 n=ðn1Þ. Proof. Define the total distance between the elements of a design at any time s to be X s cs ¼ di,j : ioj
M.D. Morris / Journal of Statistical Planning and Inference 144 (2014) 63–68
At any particular time u 2 ½0,s and for even n, the largest value corresponds to designs for which
P
a
i o j 9xi ðuÞxj ðuÞ9
65
can take for any a 2 ð0,2 is n2 =4, which
xi ðuÞ ¼ 0 for n=2 elements i 2 f1,2, . . . ,ng, xi ðuÞ ¼ 1 for n=2 elements i 2 f1,2, . . . ,ng: s
As a result, because each ws is restricted by (2), c ¼ maxs2S c can be no larger than n2 =4; this is achieved by any design for which the above condition is met at every time u 2 ½0,1 except on a set of integration measure zero. It follows that f can be no larger than the average correlation distance between two input functions, ðn2 =4Þ= n2 ¼ 12 n=ðn1Þ, and that this can s only be achieved for designs for which all dij are equal (to 12 n=ðn1Þ) at each s 2 S. & 3. Optimal designs First, consider the case in which there is only one time at which model output is to be predicted, S ¼ fsg, and restrict attention to the case of n ¼ 0 ½mod 4. Let Z be the n ðn1Þ design matrix for any balanced, orthogonal, main-effectssaturated, 2-level design, with coding levels 0 and 1 in each column, e.g. for n ¼4: 0 1 0 0 1 B0 1 0C B C Z¼B C: @1 0 0A 1
1
1
(A historic reference for experimental designs of this form is Plackett and Burman, 1946.) Let 0 ot1 o t 2 o o t n2 o s be the set of values that divides the integral of ws evenly over ½0,s; that is: Z t2 Z s Z t1 1 : ws ðsuÞ du ¼ ws ðsuÞ du ¼ ¼ ws ðsuÞ du ¼ n1 0 t1 t n2 Finally, define an n-function design D to be the set of functions xi over ð0,s for which xi ðtÞ ¼ Zi,1
for t 2 ð0,t 1 ,
xi ðtÞ ¼ Zi,j
for t 2 ðt j1 ,t j , j ¼ 2,3, . . . ,n2,
xi ðtÞ ¼ Zi,n1
for t 2 ðt n2 ,s: s
(The values taken by xi(t) for t 4s are immaterial.) Because each dij ¼ 12 n=ðn1Þ (the upper bound for f), this design is Mmoptimal. Now let S ¼ fs1 ,s2 , . . . ,sm g. For given n, determine an optimal design for prediction at s1 as above (i.e. as determined by Z applied within subintervals delineated by appropriate values now denoted by t 1i , i ¼ 1,2, . . . ,n2). Define s1 ot 21 o t 22 o o t 2n2 o s2 such that Z t2 Z Z t1 1 1 ws2 ðs2 uÞ du þ ws2 ðs2 uÞ du ¼ 0
s1
t 12
t 11
ws2 ðs2 uÞ du þ Z
Z
t22 t21
ws2 ðs2 uÞ du
s1
¼ ... ¼ t 1n2
ws2 ðs2 uÞ du þ
Z
s2
t 2n2
ws2 ðs2 uÞ du ¼
1 : n1
ð3Þ
Continue the definition of input functions in D through t 2 ðs1 ,s2 as xi ðtÞ ¼ Zi,1
for t 2 ðs1 ,t 21 ,
xi ðtÞ ¼ Zi,j
for t 2 ðt 2j1 ,t 2j , j ¼ 2,3, . . . ,n2,
xi ðtÞ ¼ Zi,n1
for t 2 ðt 2n2 ,s2 :
That is, over ðs1 ,s2 , repeat the pattern with which the input functions evaluate to 0 or 1 in ð0,s1 , but over subintervals of different lengths so as to satisfy requirement (3). Again, the total correlation distance among all pairs of input functions, s over ð0,s2 is c 2 ¼ ðn=2Þ2 , the correlation distance between any pair of input functions over this interval is the same s dij2 ¼ 12 n=ðn1Þ, and the design is Mm-optimal for prediction times s1 and s2 together. Continue this pattern defining input functions through the remaining intervals ðsi1 ,si , i ¼ 3,4, . . . ,m. Within ðsi1 ,si Þ find values t i1 o t i2 o o t in2 such that i Z t j1 i Z tj2 i Z sj X X X 1 ð4Þ wsi ðsi uÞ du ¼ wsi ðsi uÞ du ¼ ¼ wsi ðsi uÞ du ¼ j j n1 s t t j1 j¼1 j¼1 1 j ¼ 1 n2 where s0 ¼ 0, for each i ¼ 3,4, . . . ,m. By extension of the arguments above, any design constructed in this manner is Mmoptimal.
66
M.D. Morris / Journal of Statistical Planning and Inference 144 (2014) 63–68 j
Prediction sets S and weight functions ws can be constructed for which this scheme will not work because no ti exist satisfying the integral requirements (4). Here is one set of weight functions for which the construction works for any finite Rs set S. Define a positive function W on ½0,1, and let Oi ¼ 0i Wð1uÞ du. Define wsi ðsi uÞ ¼ Wð1uÞ=Oi
for u 2 ½0,si :
Over the interval of ½0,si , these functions wsj , j Zi, differ only by a constant factor. As a result, the integral requirements for j the definition of ti can be satisfied within each interval individually. Specifically, so long as: Z si Z ti Z si Z ti 1 2 1 wsi ðsi uÞ du ¼ wsi ðsi uÞ du ¼ . . . ¼ wsi ðsi uÞ du ¼ ws ðs uÞ du ð5Þ n1 si1 i i si1 t i1 t in2 for all i ¼ 1,2, . . . ,m, the construction described above will result in a Mm-optimal design with the correlation distance s between any two points di,j ¼ 12 n=ðn1Þ for each s 2 S. 3.1. Example 1 Define: Wð1uÞ ¼ u, Then Oi ¼
1 2 2 si ,
0 ou r 1:
and
wsi ðsi uÞ ¼ 2u=s2i ,
u 2 ½0,si :
Select t i1 ot i2 o o t in2 such that (5) is satisfied, reducing in this case to 2
2
2
2
t i1 s2i1 ¼ t i2 t i1 ¼ ¼ s2i t in2 ¼
1 ðs2 s2i1 Þ n1 i
for the change times in ½si1 ,si . These are quadratic polynomials in the t’s, and so can be easily solved. Fig. 1 is a display of change times for an optimal design with this weight function, n ¼8, m¼5, and S ¼ f0:4,0:5,0:6,0:8,1:0g. The specific set of input functions that are required to change values at each change time could be determined, for example, using any regular 274 fractional factorial design of resolution III. 3.2. Example 2 Fig. 2A displays a design of eight input functions that may be more typical of the forcing functions used in dynamic models than those comprising the optimal designs of described above. These eight functions are defined by xi ðtÞ ¼ ðcosð2pbtÞ þ 1Þ=2,
t 2 ½0,1
for b 2 f0,1:7g. The designs shown in Fig. 2B and C are more ‘‘exaggerated’’ versions of these curves, transformed so that x(t) is ‘‘closer’’ to either 0 or 1, or equal to either 0 or 1. Specifically, for Fig. 2B, the curves are defined by 8 < ð½cosð2pbtÞ14 þ1Þ=2 if cosð2pbtÞ Z 0, xi ðtÞ ¼ : ð½cosð2pbtÞ14 þ1Þ=2 if cosð2pbtÞ o 0
0.0
0.2
0.4
0.6
0.8
1.0
Fig. 1. Change times (filled circles) and prediction times (open circles) for a Mm distance optimal design as described in the text.
1.0
0.8
0.8
0.8
0.6
0.6
0.6
x
x
1.0
x
1.0
0.4
0.4
0.4
0.2
0.2
0.2 0.0
0.0
0.0 0.0
0.2
0.4 0.6 time
0.8
1.0
0.0
0.2
0.4 0.6 time
0.8
1.0
Fig. 2. Designs 1A–1C as described in the text.
0.0
0.2
0.4 0.6 time
0.8
1.0
M.D. Morris / Journal of Statistical Planning and Inference 144 (2014) 63–68
67
Table 1 Values of f for designs described in the text, with a ¼ 1 and 2. Freq. set
Waveform set A
1 2 3 4
B
C
a¼1
a¼2
a¼1
a¼2
a ¼ 1 or 2
0.3339 0.3339 0.3339 0.3339
0.1810 0.1810 0.1810 0.1853
0.3428 0.3590 0.3712 0.3774
0.2580 0.2745 0.2873 0.2893
0.3045 0.3473 0.3756 0.3990
and for Fig. 2C, the curves are defined by ( 1 if cosð2pbtÞ Z0, xi ðtÞ ¼ 0 if cosð2pbtÞ o0 for the same set of values of b, which we denote by ‘‘frequency set 1’’. We denote the three ‘‘waveform sets’’ by A, B, and C, and specify these three designs as 1A, 1B, and 1C, respectively. Designs 2A–2C were constructed in the same way using frequency set 2: b 2 f0,2:8g, designs 3A–3C were constructed with frequency set 3: b 2 f0,3:9g, and designs 4A–4C were constructed with frequency set 4: b 2 f0,4:10g. For each design, the Mm criterion function f (Eq. (1)) was computed for S ¼ f0:5,0:55,0:6, . . . ,1g and weight functions as described in Example 1; results are displayed in Table 1. Note that the upper bound on f for these designs is 1 8 2 7 ¼ 0:5714; functions in the optimal designs described earlier in this section would oscillate between the value of 0 and 1 approximately 44 times over the simulated time interval. The design in this collection that is best by the Mm criterion is 4C, based on the highest frequencies and most exaggerated waveforms, for which f ¼ 0:3990. While this value is substantially less than the bound, it should be remembered that f is not a statistical index that directly quantifies, say, precision, so 2ððn1Þ=nÞf is not related to ‘‘efficiency’’ in the usual sense. 4. Practical limitations and conclusion The construction outlined in Section 3 demonstrates that there are designs for which the maximin distance bound of Section 2 is reached, proving that no tighter general bound exists. However, the optimal designs constructed here would often be of limited practical value in many cases because they require that each x(t):
take only values that are the maximum or minimum allowed at each time (except over a range of zero integration measure), and
oscillate between these two extreme values approximately nm=2 times over t 2 ½0,1. It may be possible to find optimal designs consisting of functions that oscillate between extreme values fewer times. However, any designs for which one or more 0 o xi ðtÞ o 1 over sets of t that have non-zero integration measure cannot attain the general Mm bound. Still, in many applications, the only input functions that are physically realistic are ‘‘smooth’’, or at least vary less radically, and the optimal input sets described here would have limited credibility. Nearoptimal designs for more constrained cases may be constructed numerically, as in the example of Morris (2012). While more difficult than the derivation presented here, it would be of interest to construct alternative criterion bounds that are conditioned on smoothness or magnitude of the derivatives of x(t).
Acknowledgments The author gratefully acknowledges the contributions of the Associate Editor and Referees; their comments have led to significant improvements in this paper. References Booker, A.J., Dennis, J.J., Frank, P., Serafini, D., Torczon, V., Trosset, M., 1999. A rigorous framework for optimization of expensive functions by surrogates. Structural Optimization 17, 1–13. Currin, C., Mitchell, T., Morris, M., Ylvisaker, D., 1991. Bayesian prediction of deterministic functions with application to the design and analysis of computer experiments. Journal of the American Statistical Association 86, 953–963. Drignei, D., Morris, M.D., 2006. Empirical Bayesian analysis for computer experiments involving finite-difference codes. Journal of the American Statistical Association 101, 1527–1536. Henderson, D.A., Boys, R.J., Wilkinson, D.J., 2010. Bayesian calibration of a stochastic model using multiple data sources. Biometrics 66, 249–256.
68
M.D. Morris / Journal of Statistical Planning and Inference 144 (2014) 63–68
Higdon, D., Gattiker, J., Williams, B., Rightley, M., 2008. Computer model calibration using high dimensional output. Journal of the American Statistical Association 103, 570–583. Higdon, D., Kennedy, M., Cavendish, J., Cafeo, J., Ryne, R.D., 2004. Combining field observations and simulations for calibration and prediction. SIAM Journal of Scientific Computing 26, 448–466. Johnson, M.E., Moore, L.M., Ylvisaker, D., 1990. Minimax and maximin distance designs. Journal of Statistical Planning and Inference 26, 1313–1348. Kennedy, M.C., O’Hagan, A., 2001. Bayesian calibration of computer models. Journal of the Royal Statistical Society, Series B 63, 425–464. Morris, M.D., 2012. Gaussian surrogates for computer models with time-varying inputs and outputs. Technometrics 54, 42–50. Plackett, R.L., Burman, J.P., 1946. The design of optimum multifactorial experiments. Biometrika 33, 305–325. Sacks, J., Welch, W., Mitchell, T., Wynn, H., 1989. Design and analysis of computer experiments. Statistical Science 4, 423–429. Tebaldi, C., Smith, R.L., Nychka, D., Mearns, L.O., 2005. Quantifying uncertainty in projections of regional climate change: a Bayesian approach to the analysis of multimodel ensembles. Journal of Climate 18, 1524–1540.