Simulation Modelling Practice and Theory 54 (2015) 86–100
Contents lists available at ScienceDirect
Simulation Modelling Practice and Theory journal homepage: www.elsevier.com/locate/simpat
Interval metamodels for the analysis of simulation Input–Output relations Bogumił Kamin´ski ⇑ Warsaw School of Economics, Al. Niepodległos´ci 162, 02-554 Warszawa, Poland
a r t i c l e
i n f o
Article history: Received 23 March 2014 Received in revised form 5 March 2015 Accepted 25 March 2015 Available online 7 April 2015 Keywords: Stochastic simulation Metamodeling Model verification Bayesian inference
a b s t r a c t We analyze the use of simulation metamodels for understanding the relationship between the inputs and outputs of stochastic simulation models. Such metamodels are expected to have a simple structure, but can be conditionally biased. In order to overcome this limitation, we propose the use of interval metamodels that assume that the prediction of the metamodel is an interval rather than a single point. We develop a Bayesian procedure that allows a researcher to verify how well an interval metamodel covers a response surface of a simulation model. The proposed method is illustrated by application to Schelling’s segregation model. The example shows how the procedure can help a researcher to identify non-obvious characteristics of a simulation model response surface. Ó 2015 Elsevier B.V. All rights reserved.
1. Introduction Complex stochastic simulations are often viewed as black boxes in the way they transform their input parameters into output characteristics of modeled systems. Therefore, it is often proposed to use their simpler approximations called simulation metamodels [1]. The relevant literature identifies various reasons why metamodels can be useful for a researcher [2,3]. They can be summarized in three major groups: (i) understanding the shape of the relationship between the inputs and outputs of a model, (ii) prediction and (iii) optimization. These different usage scenarios imply different approaches to the selection of a functional specification of a metamodel, simulation experiment design and parameter estimation. A review and comparison of metamodel types applied in practice is presented in [4]. In this work we concentrate on metamodels developed to improve the understanding of simulation models. In this case a researcher usually requires that the metamodel has a simple structure that is easy to understand – like linear regression or low order polynomial [2]. However, the simplicity comes at a cost and predictions from such models can be biased in subdomains of input parameters space (this situation is called conditional bias). The reason is that these types of metamodels usually have a very constrained specification (which is desirable from the interpretation point of view) but because of this they capture global input–output relationships and are unable to fit the potentially complex local shape of a response surface of a simulation model. This situation can be contrasted with metamodels used for prediction, where a very accurate approximation of a response surface is required even at the cost of metamodel interpretability. We encounter such a situation, for example, in cases when a metamodel is to be used as a surrogate building block of some larger scale simulation [3]. Let us assume that a researcher requires a simple metamodel and accepts the loss in its goodness of fit. This, however, does not mean that objective metamodel acceptance criteria should be abandoned. For stochastic simulation metamodels, ⇑ Tel.: +48 (22) 564 60 00. E-mail address:
[email protected] http://dx.doi.org/10.1016/j.simpat.2015.03.008 1569-190X/Ó 2015 Elsevier B.V. All rights reserved.
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
87
Kleijnen and Sargent [2] suggest the following general procedure. Once a metamodel is estimated a new set of simulations should be performed to generate validation observations. Such a data set will contain information about simulation outputs for different input parameters. Additionally, because simulation is stochastic, it should be replicated several times for the same set of input parameter values. The random nature of simulation implies that standard measures of model fit – like residual mean squared error or mean absolute error – are not fully adequate as they mix simulation variance with metamodel bias. Let us explain the above issue by considering a fixed input d and denoting by YðdÞ a random variable representing the output of the simulation for input d. Now by lY ðdÞ ¼ EðYðdÞÞ we define the expected value of the simulation output1 and b ðdÞ is its prediction given by the metamodel. A single replication of the simulation gives us one sample of the random variable Y YðdÞ. Using the above notation we have:
2 b ðdÞ YðdÞ ¼ Y b ðdÞ l ðdÞ þ D2 ðYðdÞÞ D2 Y Y
ð1Þ
so the variance of deviation of the metamodel from a single simulation output is equal to the sum of squared bias of the metamodel and simulation output variance. In order to address the simulation output variability problem, Kleijnen and Sargent [2] recommend, as one of possible procedures, testing whether the absolute deviation of a metamodel prediction from a mean of simulation outputs for fixed input values is below a given threshold (Eq. (3) in [2]) and suggest that the results of these tests for different input values can be combined into a simultaneous test using the Bonferroni inequality. In this paper we propose to extend this approach using the Bayesian framework. The key idea is to accept that a simple metamodel can be conditionally biased.2 However, we want to control the size of this bias. Assume that a researcher accepts that the expected value of simulation lY ðdÞ lies in the interval h i b b ðdÞ lðdÞ; Y b ðdÞ þ hðdÞ , so that the bias is not greater than lðdÞ from below and hðdÞ from above. In this setup we BðdÞ ¼ Y b b ðdÞ. We will call BðdÞ b not a single value Y an interval metamodel. can think of a metamodel as returning an interval BðdÞ, b Now we can pose two questions: (1) what is the probability that l ðdÞ 2 BðdÞ for fixed d and (2) how can we generalize Y
this probability by allowing d to vary over the whole input parameters space. b from the response It should be stressed here that in this paper we are concerned with the deviation of the metamodel Y surface lY (which is the expected value of the simulation output). Therefore, the probability that lY ðdÞ falls into the interval b should not be mixed with the probability that a single observation of simulation output (realizametamodel prediction BðdÞ tion of a random variable YðdÞ) falls into this interval, which is a different question and, in general, will be lower.3 In summary, we assume that we have a metamodel that has a simple functional form, but consider not a point but an interval prediction and call it the interval metamodel. This allows us to retain a clear interpretation of the metamodel and simultaneously control the level of its bias. In the paper we develop a Bayesian procedure that allows for validation of interval metamodels. The structure of the paper is as follows. In Section 2 we define interval metamodels and propose a methodology for their validation. Next in Section 3 we present an example application of the proposed methodology to the classical Schelling’s segregation model. Following the standard requirements for the reproducibility of computational models [5] and the Simulation Modelling Practice and Theory journal guideline that theoretical results should be presented in the context of their applicability, http://bogumilkaminski.pl/pub/intmeta.zip provides the source codes of all the computations presented in the paper. For all implementations we used R version 3.0.0 [6] with additional packages: poibin [7] and gtools [8]. Subsequent listings assume that earlier ones were executed in the R console. Two data sets train.txt and test.txt are required to run listing8.r and can be downloaded from http://bogumilkaminski.pl/pub/schelling.zip. 2. Validation of interval metamodels By a simulation model we understand a mapping Y : D ! Y, where D is a set of input parameters to the simulation model and Y is a space of random variables called simulation model results. Random variables in Y have support on real numbers R. 4 A single run of a simulation for d 2 D produces a simulation output which is a realization of the random variable YðdÞ. Element d 2 D will be called input parameters of the simulation model. Again we will focus on the most common case and assume that D Rk . In such a case the input parameters are a k-dimensional real vector. 1
In the following text lX will denote the expected value of a random variable X. b ðdÞ l ðdÞ for a fixed input parameter d (bias can be positive or negative). However, for typical By conditional bias we understand the deviation Y Y metamodels (e.g. linear regression estimated using least squares method) the average of conditional bias over the whole input parameters space D will be close to 0 because positive and negative conditional biases will roughly cancel out. We want to stress that it is possible for a simple metamodel to be unbiased on average, but it usually will be biased for fixed d. This situation is illustrated in Figs. 1 and 8, which are discussed in Sections 2 and 3 respectively. 3 A single simulation output has additional noise due to the stochastic nature of the simulation as shown in Eq. (1). 4 It is possible to extend the presented analysis to more general sets than R, but real valued random variables are most common in applications so we focus only on them. 2
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
88
b : D ! R. The metamodel is usually estimated using data With the given simulation model Y we associate a metamodel Y gathered in a simulation experiment but it can also be specified by a researcher based on her assumptions or domain knowlb ðdÞ aims to predict EðYðdÞÞ. 5 We will call l ðdÞ ¼ EðYðdÞÞ a response edge. For input parameters d 2 D the value of metamodel Y Y
surface of model Y. b we associate a confidence bandwidth6 given by two functions l : D ! Rþ and h : D ! Rþ . We will say With the metamodel Y h i b b ðdÞ lðdÞ; Y b ðdÞ þ hðdÞ . Observe that we assume that that in point d the metamodel approximates l ðdÞ in bandwidth BðdÞ ¼ Y Y
b b will be called l and h take strictly positive values. This implies that interval BðdÞ always has a strictly positive length. Mapping B an interval metamodel. The following example presents the introduced concepts. Example. Assume that D ¼ ½0; p and Y 1 ðdÞ N ð1 þ d þ sinð10dÞ=4; 1Þ.
7
The response surface of this model is
lY 1 ðdÞ ¼ 1 þ d þ sinð10dÞ=4. Now assume that we use a simplified metamodel Yb 1 ðdÞ ¼ 1 þ d. This would be a natural choice of a metamodel if we did not know the formula for lY 1 ðdÞ. 8 However, usually we are unsure what the precision of our metamodel is so we define a confidence h i b 1 ðdÞ 0:125; Y b 1 ðdÞ þ 0:125 . 9 h b 1 ðdÞ ¼ Y B
bandwidth
by
l1 ðdÞ ¼ h1 ðdÞ ¼ 0:125
assuming
and
thus
It is pertinent to consider the question regarding the accuracy of the interval metamodel, that is: can we expect that the response surface lies within it? This concept is formalized below. b Let us define a function First let us analyze the case of a single d 2 D. We want to answer the question if l ðdÞ 2 BðdÞ. Y
b and equal to 0 otherwise. We will call CðdÞ the interval metaC : D ! f0; 1g. In point d 2 D it will be equal to 1 if lY ðdÞ 2 BðdÞ model confidence at point d. Next, we will associate space D with a probability measure m implying a cumulative distribution function F D : D ! R. 10 b Then it is natural to try to measure the probability that l ðdÞ 2 BðdÞ for a random element d drawn from D. It can be calculated Y
as:
C¼
Z
CðdÞdF D ðdÞ:
ð2Þ
d2D
C will be called interval metamodel confidence. Note that we assume that the set fd 2 D : CðdÞ ¼ 1g is measurable with respect to m. Let us remark here that if the researcher is interested in a specific point d then it is enough for her to calculate CðdÞ. However, very often one wants to verify the metamodel over some region of parameters, so then the calculation of C can be applied. It is worth noting that although we assumed that the integral given in Eq. (2) is calculated over the whole input parameters space D the calculations of C for some subset D0 D representing some region of input space that is interesting for the researcher can be performed using the same formulas after a conditional update of m measure, provided that D0 is mmeasurable. b 1 and bounds l1 and h1 it is easy to Example. For our example simulation model Y 1 and associated with it metamodel Y calculate CðdÞ and C analytically assuming that the probability measure m1 implies a uniform distribution of d on the interval b 1 ðdÞ ¼ sinð10dÞ=4. This value is in absolute value greater than 0:125 for ½0; p. Observe that l ðdÞ Y Y1
d2Z¼
9 [ k¼0
1 þ 6k 5 þ 6k p; p : 60 60
Therefore CðdÞ ¼ 0 for d 2 Z and otherwise it is equal to 1. Under the measure m1 we get C ¼ 1=3. The situation is depicted in Fig. 1. h The purpose of this paper is to present methods that allow a researcher to approximate CðdÞ and C using simulation. We shall apply the Bayesian methodology and follow the classical approach, as defined by Diaconis and Freedman [9].11 5 Again it is possible to extend the analysis to statistical parameters other than the expected value, but metamodels predicting the expected value of a simulation are most common in practice. 6 We use the term confidence because it expresses the researcher’s assumption about the range of the expected value of YðdÞ – similarly to confidence intervals in estimation. 7 By N ðl; rÞ we denote a random variable having normal distribution with mean l and standard deviation r. 8 Here we assume, for the clarity of the exposition, that the metamodel is given. In practice we would take that the model has a linear form a0 þ a1 d and estimate ða0 ; a1 Þ using a sample of the simulations. 9 The bandwidth will usually be identified using some statistical procedure, see Section 3 for an example. 10 Measure m might be implied by known empirical distribution of input parameters. If such a distribution is not known, uniform distribution assumption is the simplest one. 11 A similar approach is used in Chapter 4 of [10] (in Polish) which contains author’s initial results on Bayesian analysis of a simulation model’s input–output relations.
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
89
4 3 2 1 1 Fig. 1. Relationship between response surface
2
3
d
lY 1 ðdÞ (thick black line), metamodel Yb 1 ðdÞ (dashed gray line) and interval metamodel Bb 1 ðdÞ (gray lines).
First we focus on the approximation of CðdÞ. Assume that we have some a priori assumption about the distribution of
lY ðdÞ given by a cumulative distribution function F d;0 : R ! ½0; 1. Now assume that we are able to simulate n 2 N times independently the model Y at point d. We assume that lY ðdÞ is a known function of some unknown parameters h 2 H that uniquely define the distribution Q h of YðdÞ. 12 Next, closely following [9], consider a family of probabilities fQ h : h 2 Hg on R. By Q nh we denote the infinite product measure on Rn . This means that n samples are independent with the common distribution Q h . We assume that H is a Borel subset of a complete separable metric space (typically it will be a subset of Rk , where k is the number of parameters). Let n0 be a prior probability on H. Given this notation let Pn0 be a joint distribution of the parameters h and n-element sample R Pn0 ðA BÞ ¼ A Q nh ðBÞn0 ðdhÞ, where A and B are Borel sets. Given this probabilistic structure it is possible to calculate posterior probability on H given the n-element sample, which we denote as nn . Under this formalism F d;0 can be derived from n0 via a known link between h and lY ðdÞ. Following this, the information taken from the n-element sample allows us to compute nn using the Bayes theorem to obtain a posteriori beliefs about distribution of lY ðdÞ, that are represented by a cumulative distribution function F d;n . We assume that F d;n is consistent in the sense of [9]. Given this information we can calculate the posterior probability that CðdÞ is equal to 1 conditional on a priori distribution F d;0 and n-element sample as:
b ðdÞ 6 hðdÞÞ ¼ cn ðdÞ ¼ PrðlðdÞ 6 lY ðdÞ Y
Z bY ðdÞþhðdÞ bY ðdÞlðdÞ
dF d;n ðxÞ;
ð3Þ
where subscript n indicates that we have taken the sample of size n. Note that cn ðdÞ is a real number that we can calculate after taking the sample. Of course cn ðdÞ will vary conditional on the sample. Therefore we define a random variable C n ðdÞ describing a distribution of cn ðdÞ prior to taking the sample. Example. We give an example of a calculation of cn ðdÞ and an approximation of C n ðdÞ for simulation model Y 1 and interval b 1 . We consider input parameter d ¼ 0 and sample size equal to n ¼ 200. Note that Cð0Þ ¼ 1. metamodel B Here, and in the following examples, we assume that a researcher knows that Y 1 ðdÞ is normally distributed with standard deviation equal to 1 and only its mean is unknown.13 We assume uniform improper prior beliefs about lY ðdÞ. In such a case, after taking an n-element sample of YðdÞ, the posterior distribution of lY ðdÞ under uninformative prior is normal with mean equal to sample mean and variance equal to 1=n (see for example [11]). By U denote the cumulative distribution function of the standard normal distribution. Then for a given sample s1 ðdÞ we have:
pffiffiffi pffiffiffi b 1 ðdÞ þ h1 ðdÞ mÞ U nð Y b 1 ðdÞ l1 ðdÞ mÞ : cn ðdÞ ¼ U nð Y
ð4Þ
After the evaluation of the above expression using the code from listing1.r we obtain c200 ð0Þ 0:8855. Now we switch to the approximation of C 200 ð0Þ. Numerically we perform it by a replicated calculation of c200 ð0Þ. We can do it by running code from listing2.r. The result is depicted in Fig. 2. We calculate that the approximated mean of C 200 ð0Þ is equal to 0:7880, but it has a significant variance due to the small sample size n. h
12 Very often lY ðdÞ is simply one of these parameters. For example if we would assume that YðdÞ is normally distributed then h is a pair of two parameters representing mean value and variance of the distribution, and lY ðdÞ is first of these parameters. 13 In Section 3 we provide a more realistic study but here we want to restrict our analysis to a simple case.
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
90
density 10 8 6 4 2 0.2
0.4
0.6
0.8
1 cn (d)
b1. Fig. 2. Histogram of C 200 ð0Þ for simulation model Y 1 and interval metamodel B
Next we analyze the relationship between i
h
h
i
n
o
(A1)
lY ðdÞ 2 Yb ðdÞ lðdÞ; Yb ðdÞ þ hðdÞ ;
(A2)
lY ðdÞ R Yb ðdÞ lðdÞ; Yb ðdÞ þ hðdÞ ;
(A3)
lY ðdÞ 2 Yb ðdÞ lðdÞ; Yb ðdÞ þ hðdÞ .
b We have three possibilities: lY ðdÞ and BðdÞ.
We have assumed that F d;n is consistent, so for almost all sampling sequences:
8e > 0 : lim
Z lY ðdÞþe
n!þ1
dF d;n ðxÞ ¼ 1:
ð5Þ
lY ðdÞe
Therefore in case (A1) for almost all sampling sequences limn!þ1 cn ðdÞ ¼ 1 as there exists such an e > 0 that b This means that C n ðdÞ almost surely converges to CðdÞ ¼ 1. Analogously in case (A2) we get that ½lY ðdÞ e; lY ðdÞ þ e BðdÞ. C n ðdÞ almost surely converges to CðdÞ ¼ 0. The case (A3) is problematic as in this situation even asymptotically we do not know the value of b ðdÞ þ hðdÞ F d;n Y b ðdÞ lðdÞ . Thus it is not guaranteed that C n ðdÞ almost surely converges to CðdÞ ¼ 1. F d;n Y b b only if lY ðdÞ is not on the edge of BðdÞ. This is exactly lY ðdÞ 2 BðdÞ b the reason why we require that lðdÞ and hðdÞ are strictly positive. If lðdÞ ¼ hðdÞ ¼ 0 and Y ðdÞ ¼ lY ðdÞ (we have a perfect metaThe above analysis implies that we can test whether
model in point d) then we would not be able to conclude that C n ðdÞ almost surely converges to 1. The following example illustrates the above considerations. Example. We consider three input parameters d 2 f0; p=60; p=40g. Note that Cð0Þ ¼ Cðp=60Þ ¼ 1 and Cðp=40Þ ¼ 0. The b 1 ðp=60Þ. We difference between points 0 and p=60 is that in point p=60 we have that l ðp=60Þ lies on the boundary of B Y1
take n 2 f10; 100; 1000; 10000g and report the mean value and the standard deviation of the approximation of C n ðdÞ. This can be calculated by executing the code from listing3.r. The results are presented in Table 1. We can observe that for d 2 f0; p=40g an increase in sample size makes the distribution converge to the proper value of CðdÞ. However for d ¼ p=60 the distribution does not converge because this is a boundary point. h There is one issue related to the diagnostics for the evaluation of cn ðdÞ. If it is low (not close to 1) we can have three b ðdÞ lðdÞ, (b) strong evidence that possibilities: (a) strong evidence (high posterior probability) that lY ðdÞ < Y b b lY ðdÞ > Y ðdÞ þ hðdÞ and (c) sample size n is low and posterior distribution has a large spread relative to the width of BðdÞ b ðdÞ lðdÞÞ and and encompasses this interval from above and below. In the third situation both Prðl ðdÞ < Y Y
b ðdÞ þ hðdÞÞ are high. In order to identify this third undesired possibility, we define the following diagnostic meaPrðlY ðdÞ > Y sure to detect it:
8
v n ðdÞ ¼ min :
1
9 = dF d;n ðxÞ; dF d;n ðxÞ : ; bY ðdÞþhðdÞ Z
þ1
ð6Þ
v n ðdÞ is high (observe that this implies that cn ðdÞ cannot be high simultaneously) we may conclude that sample size n is b The way that v n ðdÞ can be used in practice is presented in Section 3. insufficient relative to the assumed width of BðdÞ. If
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
91
Table 1 The mean value and the standard deviation of the approximation of C n ðdÞ for d 2 f0; p=60; p=40g and n 2 f10; 100; 1000; 10; 000g. d
n
Mean
sd
0
10 100 1000 10,000
0.2203 0.6230 0.9946 >0.9999
0.0838 0.1777 0.0200 <0.0001
p=60
10 100 1000 10,000
0.2120 0.4615 0.5007 0.4999
0.0884 0.2455 0.2879 0.2890
p=40
10 100 1000 10,000
0.2036 0.3403 0.1239 0.0001
0.0921 0.2477 0.1689 0.0016
Having finished the discussion for a single element of the input parameters set, we proceed to the evaluation of the overall interval metamodel confidence C. n b ðdÞ lðdÞ; Consider a set A3 D consisting of points d where the distribution F d;n is not consistent or l ðdÞ 2 Y Y
b ðdÞ þ hðdÞg, case (A3) above. We assume that mðA3 Þ ¼ 0. This means that almost surely we can omit those cases where Y C n ðdÞ does not converge to CðdÞ. We take it that prior distributions of lY ðdÞ in different points are independent and identical. This reflects the assumption that there are no prior assumptions about the shape of lY ðdÞ. 14 We independently sample m points from the set D with respect to measure m. By Di we denote a random variable representing the result of the i-th draw. Define X i;n ¼ C n ðDi Þ. Note that because Di are independent and have an identical distribution, X i;n are also independent and have an identical distribution. The random variable X i;n can be interpreted as a distribution of cn ðdÞ before the drawing of d and taking n samples from YðdÞ. Therefore, the set of computed values cn ðdi Þ, where di is a realization of Di , can be seen as m independent samples of the random variable X i;n , and the empirical cumulative distribution function built using them approximates the cumulative distribution function of X i;n . Denote the average of cn ðdi Þ as:
Pm cn;m ¼
i¼1 c n ðdi Þ
m
ð7Þ
:
It is a realization of a random variable:
Pm C n;m ¼
i¼1 X i;n
m
ð8Þ
:
In what follows we analyze the asymptotic properties of X i;n and C n;m . Consider a set of elementary events X ¼ D S1 . It denotes a choice of d 2 D and then an infinite sequence of draws from the random variable YðdÞ. 15 We define a random variable W : X ! f0; 1g, where WðxÞ ¼ Cðdx Þ for dx implied by x 2 X. Observe that W has a Bernoulli distribution with the probability of success equal to C. We will show that as n tends to infinity, X i;n almost surely converges to W. Lemma 1. If mðA3 Þ ¼ 0 then X i;n almost surely converges to a Bernoulli random variable with the probability of success equal to C as n tends to infinity. Proof. First we split the set X into three disjoint subsets A1 ; A2 and A3 . By dx we will denote an element of D implied by x. i h b ðdx Þ lðdx Þ; Y b ðdx Þ þ hðdx Þ . Define A1 as a set of such x where the distribution F d ;n is consistent and l ðdx Þ 2 Y Y
x
lY
Analogously define A2 as a set h i b ðdx Þ lðdx Þ; Y b ðdx Þ þ hðdx Þ . ðdx Þ R Y
of
such
x
where
the
distribution
F dx ;n
is
consistent
and
14 This assumption is often made in the Bayesian approach to ranking and selection problems [12]. The difference here is that these methods assume that only a few alternatives are compared while we allow the decision space D to be an infinite subset of Rk . 15 We consider infinite sequences of draws in order to be able to perform asymptotic analysis. In practice a researcher will observe only finite sequences.
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
92
A3 follows the definition of A3 given earlier. Observe that A1 ; A2 and A3 are disjoint and A1 [ A2 [ A3 ¼ X. By assumption
mðA3 Þ ¼ 0 so we can concentrate on sets A1 and A2 . If x 2 A1 [ A2 then we have WðxÞ ¼ Cðdx Þ. But then X i;n ðxÞ ¼ C n ðdx ÞðxÞ and we have shown above that C n ðdx Þ almost surely converges to Cðdx Þ. This implies that X i;n almost surely converges to W as n tends to infinity. h Note that this result implies that limn!þ1 EðX i;n Þ ¼ C, and consequently limn!þ1 EðC n;m Þ ¼ C. However, for finite m even as n tends to infinity, C n;m is a random variable (in a limiting case it is an average of m Bernoulli distributed random variables). Now we move on to the analysis of the case where both n and m tend to infinity. Theorem 1. If mðA3 Þ ¼ 0 then C n;m converges in probability to C as n and m jointly tend to infinity. Proof. The statement of the theorem can be equivalently expressed as:
8e > 0 :
lim
PrðC n;m C P eÞ ¼ 0:
ð9Þ
ðn;mÞ!ðþ1;þ1Þ
Observe that:
0 6 Pr C n;m C P e 6 Pr jC n;m EðC n;m Þj þ EðC n;m Þ C P e :
ð10Þ
We have shown that limn!þ1 EðC n;m Þ ¼ C so there exists such an n0 that:
8n > n0 : EðC n;0 Þ C < e=2:
ð11Þ
Note that n0 is independent from m as EðC n;0 Þ ¼ EðX i;n Þ. Thus our problem reduces to proving that for n > n0 :
8e > 0 :
lim
PrðjC n;m EðC n;m Þj P e=2Þ ¼ 0:
ð12Þ
ðn;mÞ!ðþ1;þ1Þ
From Chebyshev’s inequality we have that:
PrðjC n;m EðC n;m Þj P e=2Þ 6 4
D2 ðC n;m Þ
e2
¼4
D2 ðX i;n Þ : me2
ð13Þ
Now observe that X i;n is limited to the ½0; 1 interval so D2 ðX i;n Þ 1=4, thus we have:
PrðjC n;m EðC n;m Þj P e=2Þ 6
1 : me2
ð14Þ
So substituting it back to Eq. (12) we can see that for n > n0 we get the required convergence to 0 as m tends to infinity. This conclusion finishes the proof. h Note that in the proof we need both n and m to jointly tend to infinity. From the proof it follows that an increase of n reduces the bias of C n;m from C and an increase of m reduces its variance. In order to converge we need to make both bias and variance tend to 0. In the proposed procedure we assume no structure of the simulation model Y because we seek to validate the existing metamodel. This approach is expressed by the assumption that a priori distributions are uninformative and independent for all the elements of set D. It should be noted at this point that there exists literature presenting models where the assumption about the correlation structure of the response surface lY ðdÞ for different input parameters is made, see for example [13,12] for the Bayesian setting or [14,15] for the frequentist treatment. Such approaches allow authors to get stronger results mainly in terms of reduced requirements for the number of samples and the ability to perform statistical inference about the input parameters d 2 D that were not sampled, yet at the cost of additional assumptions that might not be met by the response surface lY ðdÞ. In summary, we have the following conclusions: (1) the measure cn;m can be used to assess the probability that the response surface b interval metamodel B;
lY lies in the range defined by the
(2) the empirical cumulative distribution function calculated using cn ðdi Þ approximates the cumulative distribution of X i;n ; in combination with diagnostic values v n ðdi Þ this allows a researcher to identify points di for which the sample size n b i Þ; was insufficient to precisely determine if l ðdi Þ 2 Bðd Y
(3) it is possible to build a model that explains cn ðdi Þ by di ; this can help a researcher to identify a subspace of D where the b does not fit the response surface l well. interval metamodel B Y
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
93
b 1 taken from earlier Example. Let us present the above properties for the simulation model Y 1 and the interval metamodel B examples. First we want to follow the procedure described above for n ¼ 10; 000 and m ¼ 1000. The code performing this computation is given in listing4.r. We get the value cn;m 0:3001, which approximates the value 1=3 that was calculated analytically above. The empirical cumulative distribution function of cn;m is presented in Fig. 3. We note that in most of the b (the jump in the empirical distribution function points we get a strong indication to accept the hypothesis that l ðdÞ 2 BðdÞ Y
at 1) or reject it (similar jump around 0). There are, however, some points di in which the value of cn ðdi Þ is not close to these values. We investigate them in Fig. 4. By comparing it with Fig. 1 we can note that the problems occur in points where lY 1 ðdÞ b 1 ðdÞ. In such points more samples per point would be required to accurately is near the boundary of the interval defined by B b 1 ðdÞ. verify if lY 1 ðdÞ 2 B Now we turn to the analysis of bias and variance of C n;m . We take n 2 f10; 100; 10; 000g; m 2 f10; 50; 100g and report the mean value and the standard deviation of C n;m computed using the simulation given in the code from listing5.r. The results are presented in Table 2. We note three things. Firstly, the bias is reduced by the increase of n and is unaffected by m. Secondly, the standard deviation decreases with m. Lastly, the standard deviation increases with n. h The first two observations from the above example are intuitive and follow the proof of convergence of C n;m to C. The third result may be surprising and is explained by the fact that by increasing n we make cn ðdÞ closer to 0 and 1 (we are able to b identify if l ðdÞ 2 BðdÞ with greater confidence) thus increasing the variance. For fixed m and n ¼ 0 we have zero variance Y
(we use identical a priori distributions for all di that are unaffected by any observations) and in the limiting case n ! 1 we obtain the variance of a binomial distribution with m trials that have the probability of success C. This observation implies that the standard deviation of C n;m would not be a good measure of uncertainty associated with the measurement of C. In the following part of the section a superior measure is proposed. Consider the result of running the simulation Y in each of m points n times. We obtain m points di and probabilities cn ðdi Þ b i Þ. Therefore, given our posterior beliefs in fixed points di , the random variable Z describing the number of that l ðdi Þ 2 Bðd Y
b follows the Poisson observed points where the response surface lY lies in the bandwidth defined by interval metamodel B binomial distribution16 with probability parameters cn ðdi Þ. The distribution of Z captures our posterior uncertainty about the number of actually observed points where b i Þ. Observe that D2 ðZÞ ¼ Pm ð1 cn ðdi ÞÞcn ðdi Þ so the more precise the evaluation of probabilities we have (the l ðdi Þ 2 Bðd i¼1
Y
closer they are to 0 or 1) the lower the uncertainty of Z. Now assume that we have a prior on C having Beta distribution with parameters a ¼ 1; b ¼ 1. 17 We make this assumption to allow for Bayesian inference about points in D that were not sampled for the calculation of cn;m . Some kind of additional assumption is needed because in the above analysis we assumed that distributions of lY ðdÞ are independent for different d-s and such an assumption as proposed here is relatively weak. Observe that it is less restrictive than for example: (a) the need to specify a correlation function of a Gaussian random field used in different varieties of stochastic kriging models [13,14], (b) kernel specification in local regression models or smoothness structure assumptions in spline models [17], (c) specification of the functional form of input–output relationship in Bayesian models [18]. All those models assume some type of regularity of the function lY ðdÞ. 18 In practice the response surface can be more complex than the shapes allowed under these restrictions – for example discontinuous as is the case for the simulation model presented in Section 3 – so we prefer not to make such assumptions in a procedure that is aimed at validating interval metamodels. Additionally, setting prior to C allows us to tackle those cases where D is a subset of a high-dimensional space (as it is independent from the specification of D) and the above mentioned alternatives have a limited application in such cases. Furthermore, we denote a probability density function of beta distribution with parameters a and b by pðx; a; bÞ. Assume that we have sampled m points di and in each point drawn n observations of simulation model Y. Using this data we have derived the distribution Z following the procedure described above. We want to calculate the posterior probability density function of C conditional on Z. It is be denoted as pðxjZÞ. Observe that:
pðxjZÞ ¼
m m X X PrðZ ¼ zÞpðxjZ ¼ zÞ ¼ PrðZ ¼ zÞpðx; 1 þ z; 1 þ m zÞ: z¼0
z¼0
Therefore pðxjZÞ is a probability density function of a mixture of beta distributions. We can calculate the expected value of this distribution
lC ¼
16
m X 1þz 1 þ EðZÞ 1 þ mcn;m PrðZ ¼ zÞ ¼ ¼ 2 þ m 2þm 2þm z¼0
See [16] for definition of this distribution. In this text we assume uniform a priori distribution, but in practice of course other priors for ða; bÞ can be used. 18 See also, for example, [19] giving, in particular, a review of the restrictions introduced by metamodels based on Gaussian processes for deterministic simulations; the case of stochastic simulations, as considered in this paper, is more complex so all the problems described there are also valid here. 17
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
94
Pr(c100001000 ≤ x)
1 0.75 0.5 0.25 0
0.25
0.5
0.75
1
x
Fig. 3. Empirical cumulative distribution function of C 10;000;1000 .
cn (d) 1
0.5
0
0
1
2
3
d
Fig. 4. Scatterplot of ðd; cn ðdÞÞ. Table 2 Estimated mean value and standard deviation of C n;m for n 2 f10; 100; 10; 000g and m 2 f10; 50; 100g. m
n
Mean
sd
10
10 100 10000
0:2045 0:3722 0:3343
0:0291 0:0852 0:1426
50
10 100 10000
0:2042 0:3719 0:3346
0:0131 0:0378 0:0648
100
10 100 10000
0:2044 0:3723 0:3338
0:0091 0:0273 0:0454
and its variance
r2C ¼
m X ð3 þ mÞðz mcn;m Þ2 þ ð1 þ zÞð1 þ m zÞ PrðZ ¼ zÞ : ð2 þ mÞ2 ð3 þ mÞ z¼0
The mean value
lC has a small bias relative to cn;m that results from the assumption of uniform prior on C.
b 1 . First we visualize how the distribution of Z and Example. We use the simulation model Y 1 and the interval metamodel B the derived density pjZ look. Using the code from listing6.r we calculate them for n ¼ 100 and m ¼ 20. The results are presented in Fig. 5. Now let us switch to an investigation how pjZ changes when n and m change. Using the code from listing7.r we simulate it for n; m 2 f100; 500; 2500g. In Table 3 we report the means of lC and r2C . Similar to the results presented in
Table 2, we observe that increasing m does not influence the mean of lC significantly19 and increasing n reduces bias. On the
19
The only influence is the fact that by increasing m we reduce the deviation of
lC from cn;m .
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
Pr(Z/m = x)
95
p(x|Z)
0.3
3
0.2
2
0.1
1 x
0 0
0.25
0.5
0.75
1
Fig. 5. Distribution of Z=m for n ¼ 100 and m ¼ 20 (gray points) and density pðxjZÞ derived from it (black line).
Table 3 Simulation of pjZ for n; m 2 f100; 500; 2500g. m
lC
n
Mean of
100
100 500 2500
0:3729 0:3542 0:3411
0:0616 0:0546 0:0499
500
100 500 2500
0:3709 0:3560 0:3363
0:0280 0:0249 0:0226
2500
100 500 2500
0:3717 0:3541 0:3361
0:0125 0:0111 0:0101
Mean of
r2C
other hand dispersion (variance) of posterior distribution of C decreases with n and with m. However, we note that m has a much larger influence on r2C . This gives an example of the result shown earlier in the proof of Theorem 1 that the choice between n and m can be seen as a bias-variance trade-off. h In summary, the following interval metamodeling procedure can be proposed: b and interval metamodel B b using the sample; 1. simulate model Y and estimate metamodel Y b verification using the methods presented in this 2. execute new independent simulations to perform interval metamodel B section. There are two practical points worth mentioning. Firstly, we have assumed in the analysis that n is fixed between design points di . However, our Bayesian procedure can easily be adjusted to allow variability of the number of simulation replications between design points in order to ensure that we have an evaluation of point posterior probabilities sufficiently close to 0 or 1. A potential drawback of this procedure is that in some cases very large values of n would be needed (when lY ðdÞ is b very close to the boundary of BðdÞ), which might not be acceptable due to simulation time requirements. The second point is that step two of the proposed procedure can be performed iteratively. When we perform an initial verification of the interval metamodel we obtain values of v n ðdi Þ and variance r2C . In the cases where v n ðdi Þ is high we conclude that n was too small and additional samples are needed (it is impossible to give a general rule how many more samples are required, but it is possible to compute it using formula (6) when the specification of F d;n is known). If r2C is too high from the perspective of the researcher then m should be increased. If the researcher knows the target value of the variance of C then using the formula for r2C it is possible to check directly how an increase of m changes the variance (we treat cn;m as known and assume that additional points on the average have this probability of lying within the interval metamodel bandwidth – this is a typical approach used in sample size calculation procedures). A general observation may be made that the proposed method is conservative in the sense that it biases cn ðdÞ from the true values of 0 and 1 towards the interior of ½0; 1 interval. This means that we can expect the procedure to have the highest b b b everywhere or l ðdÞ R BðdÞ everywhere. If l ðdÞ 2 BðdÞ only in some subset of D then the upward bias of cn;m if l ðdÞ 2 BðdÞ Y
Y
Y
b b partially cancels out the downward bias when lY ðdÞ 2 BðdÞ. This situation reduces the mean bias in cases when lY ðdÞ R BðdÞ bias. Of course this behavior is not guaranteed as it depends on the assumption of a priori distribution of lY ðdÞ. Lastly, note that the whole analysis can be reinterpreted in terms of input modeling [20]. In this text we have not made a distinction between directly observable inputs and parameters that are obtained using statistical inference [21]. With this distinction it can be observed that the measure m on D may be interpreted as the representation of the uncertainty of
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
96
measurement of simulation parameters. Thus the proposed methodology can be seen as a way of assessing the impact of input uncertainty on simulation output.20 3. Example application We use the classical Schelling’s segregation model [23] and its implementation [24] in NetLogo 5.0.3 [25] as an example. The simulation model has two parameters: n is the number of agents in the simulation and sw is the percentage of similar agents wanted by every agent in her neighborhood. In our experiment we set sw 2 ½20; 70 and n 2 ½500; 2000. The simulation is run until no agents want to move.21 The simulation output parameter is the percentage of neighbors similar to the agent averaged over all agents (sp). In order to collect the data required to estimate the metamodel we generate 2601 sample points using grid design from space ½500; 2000 ½20; 70 where each dimension is divided into 51 equidistant values. For each point one simulation is run. The collected data is stored in train.txt file. The observed sp values vary from 56:77 to 100. Using this data we estimate a polynomial of degree 2 with interactions on parameters sw and 1= lnðnÞ to obtain the following sp prediction formula22 (all the computations are performed using the code from listing8.r):
b ðsw; nÞ ¼ c Y sp ¼ 145:4795 þ 3:9919sw 0:0137sw2 þ 1492:4188= lnðnÞ 1803:7888= ln ðnÞ2 14:3349sw= lnðnÞ
ð15Þ
We define our interval metamodel by setting hðsw; nÞ ¼ lðsw; nÞ ¼ 3. Note that these values will usually not be strongly related to standard confidence intervals that can be calculated for linear regression prediction. This is because we know that our model is probably conditionally biased and confidence intervals in linear regression are calculated under the assumption that the model is properly specified. For example in our model the mean width of 95% confidence interval of Eðspjsw; nÞ obtained after estimation is equal to around 0:4 and its maximum value is around 0:9. 23 Both these values are much lower than 3 and, as we will see later, are overly optimistic, because the metamodel given by Eq. (15) is conditionally biased. Having specified the interval metamodel let us move to its validation. In order to do this we uniformly draw a sample of 64 integer tuples such that sw 2 ½20; 70 and n 2 ½500; 2000. In each sample point we replicate the simulation 64 times. The collected observations are stored in test.txt file. For the computation of posterior distributions F ðswi ;ni Þ;64 and consequently probabilities c64 ðswi ; ni Þ in m ¼ 64 sampled points we used a Bayesian bootstrap [26] with 10,000 replications. We obtain that c64;64 ¼ 0:8907656. Table 4 presents the counts of calculated values for c64 ðswi ; ni Þ. Some of the values are exactly equal to 0 or 1 due to the fact that the computations are performed with limited precision and Bayesian bootstrap used a finite number of replications. However, we have 6 observations that have different observed values. First we run a diagnostic test to check if we do not have a problem that posterior distributions have a much higher dispersion than the interval width lðdÞ þ hðdÞ ¼ 6. In order to do this we calculate v 64 ðswi ; ni Þ following the formula given in Eq. (6). We find that for all ðswi ; ni Þ this metric evaluates to 0, so we do not have this problem (sample size is large enough to b avoid it). Therefore, these 6 points are the cases when the true value of l ðsw; nÞ is close to the boundary of interval Bðsw; nÞ Y
b nÞ. This is a case that and 64 samples per point were not enough to give us sufficient information whether lY ðsw; nÞ 2 Bðsw; can be considered typical, especially for expensive simulations. The procedure proposed in this paper allows us to handle this issue. We calculate the posterior distribution of C using the mixture of beta distributions as it was explained in Section 2. The results are presented in Fig. 6. The mean and the standard deviation of this distribution have the following values lC ¼ 0:8789242 and rC ¼ 0:04098572. Observe that a simplified way to approximate lC and rC would assume that c64;64 was calculated as a mean of exact obserb vations whether l ðsw; nÞ 2 Bðsw; nÞ or not. Then, using standard Bayesian inference, we would evaluate the mean again as Y
lC but the standard deviation would be lower and amount to 0:03985569 – this value could be called intrinsic dispersion. It is 2:76% lower than in the proposed procedure case. The higher dispersion of rC is a result of the extra uncertainty in the b evaluation of test lY ðsw; nÞ 2 Bðsw; nÞ in the 5 cases that were not conclusive. Obviously the larger the number of inconclusive probabilities c64 ðswi ; ni Þ the higher the inflation of variance would be. One could imagine an extreme case where for all observations we have c64 ðswi ; ni Þ ¼ c64;64 ¼ 0:8907656. In such a situation we would get even more dispersion since in this case rC ¼ 0:05474093 and it would be 37:35% higher than the intrinsic dispersion. Note that an increase in the number of samples per point reduces the inflation of intrinsic dispersion. On the other hand an increase in the number of different sampled points brings down the intrinsic dispersion. 20
A wider discussion of input uncertainty assessment can be found in [22] and references therein. Given the assumed range of parameters such a situation is always possible. For very high values of parameters, eg. n ¼ 2500 and sw ¼ 100 the simulation might not terminate in fixed state. 22 We use 1= lnðnÞ rather than n; lnðnÞ or 1=n as a second explanatory variable in the formula as it provides the best fit of the metamodel without increasing its complexity. 23 The confidence intervals reported here have frequentist interpretation, because we used least squares estimation and standard statistical inference. However, it can be checked that Bayesian estimation of this regression with standard priors would yield similar, overly optimistic, values. 21
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
97
Table 4 The counts of observed values of c64 ðdi Þ in the example given in Section 3. c64 ðswi ; ni Þ
0.000
0.095
0.132
0.914
0.934
0.959
0.974
1.000
Count
5
1
1
1
1
1
1
53
10
density
8 6 4 2 0
0.7
0.8
0.9
1
C
Fig. 6. Posterior distribution of C (gray line) and point evaluation c64;64 (vertical solid black line).
The proposed methodology allows us to visualize where the interval metamodel does not fit the simulation data well. Fig. 7 presents a scatter plot of the sampled points with a visualization of c64 ðswi ; ni Þ. We can observe that points where this probability is evaluated as lower than 1 concentrate around vertical lines (0 values are observed for sw equal to 25, 33, 34 and 50). This suggests that lY ðsw; nÞ might have jumps near these values. We verify the above hypothesis for the example case of n ¼ 1000. The results of the calculations are presented in Fig. 8. We observe that the averages of 10 simulations (black points) have jumps when we change sw between values 20–21, 25–26, 33–34, 40–41, 50–51, 60–61 and 66–67. However, only when the jumps are large do the sampled value averages fall outside b the interval metamodel Bðsw; nÞ such as for sw equal to 25, 33, 34 and 50 in Fig. 7. Similarly, in the same figure, the light gray points (probabilities greater than 0 and less than 1) are also located near the jumps (sw values 21, 25, 26, 32, 36, 49).24 It should be noted that it would be possible to create a metamodel that would properly model these jumps. However, it would have a much more complex structure than a simple polynomial of degree 2 of variables sw and 1= lnðnÞ. In order to conclude this section we shall discuss how interval metamodels can be built in practice. The construction of b and (2) selection of hðdÞ the interval metamodel involves two steps: (1) selection and estimation of the base metamodel Y and lðdÞ. The step of building the base metamodel can be performed using standard procedures, as discussed for example by Kleijnen and Sargent [2]. However, as was already remarked in the introduction, when building interval metamodels we are most concerned about an improvement in the understanding of the relationship between the input and output of simulation models, so linear or low order polynomial metamodels are the most natural choice. In the example that follows we consider the metamodel given by Eq. (15) and compare it to a simpler metamodel that is a linear regression on sw and n estimated using the same data:
b ðsw; nÞ ¼ c Y sp ¼ 62:2122 þ 0:7204sw 0:0059n:
ð16Þ
In the second step we have to choose lðdÞ and hðdÞ. The methodology proposed in this text is general and allows them to be arbitrary functions of d. However, in practice it can be expected that the most frequent choice is to use symmetric and constant bandwidth, i.e. assume that lðdÞ ¼ hðdÞ ¼ const. 25 In this way the interval metamodel formula is as simple as the original metamodel. The choice of the appropriate constant can be aided by an analysis of maxi v n ðdi Þ and cn;m . In general maxi v n ðdi Þ should be very close to 0 as otherwise we know that in some points our sample size is too small. The choice of value for cn;m is more subjective and depends on the objective of the researcher. A natural choice is a value similar to a standard p-value threshold equal to 90%; 95% or 99%. This process is presented in Fig. 9. We can see that for the polynomial metamodel (Eq. (15)) the values of maxi v 64 ðswi ; ni Þ indicate that the constant value of lðdÞ and hðdÞ should not be less than 0:5. Similarly, we get 90% coverage by setting this value to 3, as was done in the first part of this section. For linear metamodel (Eq. (15)) maxi v 64 ðswi ; ni Þ indicates that the minimum constant should be 0:75 and when it reaches 5 then c64;64 obtains the value 0:8681. 24 The reason for the presence of response surface jumps in the NetLogo implementation of Schelling segregation model is that each agent has a finite number of present neighbors (maximum 8) so a change of agent decision is only possible for sw threshold equal to i=j where j 2 f1; 2; . . . ; 8g and i 2 f0; 1; . . . ; jg. 25 b ðdÞ ¼ hðdÞ= Y b ðdÞ ¼ const. Another plausible option would be to use a bandwidth that is equal to a certain percentage of the metamodel prediction lðdÞ= Y
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
98
n 2000 1500 1000 500
20
30
40
50
60
70
sw
Fig. 7. Classification of points sampled in the test phase in the example given in Section 3. Black points indicate the situation where the posterior b probability that lY ðsw; nÞ 2 Bðsw; nÞ is equal to 1. Gray circles represent the case where this probability is equal to 0. Light gray points indicate intermediate probability values.
sp 100 90 80 70 60
20
30
40
50
60
70
sw
Fig. 8. Test of the relationship between sw and sp for n ¼ 1000. Black points represent the means of 10 sample simulations made for a given sw value. The solid line represents the metamodel given by Eq. (15). Dashed lines give the bandwidth of the interval metamodel where lðsw; nÞ ¼ hðsw; nÞ ¼ 3.
c64,64
maxi v64 (swi , ni )
1
0.4
0.8
0.3
0.6 0.2 0.4 0.1
0.2 0
l(d) = h(d) 0
1
2
3
4
5
0
l(d) = h(d) 0
1
2
3
4
5
Fig. 9. Comparison of metamodels given by Eqs. (15) and (16) with varying interval bandwidth. The solid line represents the results for an interval metamodel based on Eq. (15) and the dashed line for the interval metamodel based on Eq. (16).
Lastly, it is worth remarking that plots in Fig. 9 allow the researcher to quickly compare the fit of two metamodels. For example, the researcher can see that the polynomial metamodel with bandwidth 2 has roughly equivalent coverage to the linear model with bandwidth 4:3. Given such information it is possible to balance the complexity of the metamodel against its goodness of fit.
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin
99
4. Concluding remarks In this paper we have analyzed the use of simulation metamodels for understanding the relationship between the inputs and outputs of stochastic simulation models. Such metamodels are expected to have a simple structure as this guarantees that they can be easily understood and interpreted by a researcher. However, this simplicity can lead to the problem that the predictions from the metamodel can be conditionally biased. In order to overcome this limitation and capture the level of conditional bias in such models, we propose the use of interval metamodels which assume that the prediction is an interval rather than a single point. The aim of this procedure is to ensure that the response surface of the simulation model lies within the bandwidth defined by the interval metamodel. In this way we can retain a simple structure for the metamodel and offset the problem of its conditional bias. We have proposed a procedure that allows us to verify how well the interval metamodel covers the response surface of the simulation model. It uses the Bayesian framework and allows a researcher to: (1) verify the probability cn ðdÞ that for a fixed input d the interval metamodel contains the expected value of the simulation output; additionally the diagnostic measure v n ðdÞ can be used to help identify situations when the selected sample size n is too small; (2) calculate the evaluation of probability cn;m that in a random point of input space the interval metamodel covers the response surface; (3) assess the uncertainty associated with the value of cn;m due to sampling (posterior distribution of C). We have illustrated the proposed method on an example application to the Schelling segregation model and have shown that it can naturally help a researcher to uncover non-obvious characteristics of the simulation model response surface by identifying points where the simple metamodel is most biased. b is estimated from training data, but the proposed In the discussion given in the text we have assumed that metamodel Y methodology does not preclude a situation in which it is defined by the researcher based on the research question at hand. For example, one might want to verify if the expected value of simulation output is greater than some fixed value indepenb ðdÞ ¼ const; lðdÞ ¼ 0 and hðdÞ ¼ þ1 for all d 2 D. dent of the input parameter. We would then have Y Interval metamodels can be used in two different ways. In the first approach a researcher may know the levels of acceptable bias of the metamodel and thus define l and h before the analysis. Subsequently, she will be interested in the evaluation of the interval metamodel confidence C. In the second approach l and h may be undefined beforehand and the researcher may wish to test several specifications of bandwidth width. In this way she can learn the appropriate width of bandwidth (for b and in what subspace of D the interval metamodel does not fit example to obtain 90%; 95% or 99% coverage of l by B) Y
the simulation well (an example of such an analysis was given in Section 3). The proposed method uses the most straightforward approach to the design of an interval metamodel verification experiment. It assumes that m points are sampled randomly from the input parameters space and that n replications of the simulation are performed in each point. This totals to m n simulations. However, it is clear that this computational budget in practice could be allocated more efficiently. First of all n does not have to be fixed but could be chosen dynamically conditional on the collected data. Furthermore, one could consider a more efficient design for the sampling of m points from the input parameters space. These issues of optimal allocation of computing budget in an interval metamodel verification problem are an open area for further investigations. Lastly, in this paper we show how to validate interval metamodels but give only basic guidance for methods of their estimation. Therefore, there are several open methodological questions for further work. The first one is how to select bandwidths l and h so that they give the assumed interval metamodel confidence C and have some additional desired properties (i.e. they give the shortest interval). The second is whether the knowledge of wanting to obtain the interval metamodel b. would affect the way we estimate the base metamodel Y Acknowledgment Thanks are given to two anonymous reviewers who have provided detailed comments that helped to improve the paper. References [1] R.R. Barton, Metamodels for simulation input–output relations, in: J. Swain, D. Goldsman, R. Crain, J. Wilson (Eds.), Proceedings of the 1992 Winter Simulation Conference, IEEE, 1992, pp. 289–299. [2] J.P. Kleijnen, R.G. Sargent, A methodology for fitting and validating metamodels in simulation, Eur. J. Oper. Res. 120 (1) (2000) 14–29. [3] I.R. Santos, P.R. Santos, Simulation metamodels for modeling output distribution parameters, in: S. Henderson, B. Biller, M.-H. Hsieh, J. Shortle, J. Tew, R. Barton (Eds.), Proceedings of the 2007 Winter Simulation Conference, IEEE, 2007, pp. 910–918. [4] G.G. Wang, S. Shan, Review of metamodeling techniques in support of engineering design optimization, J. Mechan. Des. 129 (4) (2006) 370–380. [5] R. Peng, Reproducible research in computational science, Science 334 (6060) (2011) 1226–1227. [6] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2013).
. [7] Y. Hong, On computing the distribution function for the poisson binomial distribution, Comput. Stat. Data Anal. 59 (2012) 41–51.
100 [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26]
´ ski / Simulation Modelling Practice and Theory 54 (2015) 86–100 B. Kamin G.R. Warnes, B. Bolker, T. Lumley, gtools: Various R programming tools, r package version 3.3.1 (2014).
. P. Diaconis, D. Freedman, On the consistency of bayes estimates, Annals Stat. 14 (1) (1986) 1–26. B. Kamin´ski, Podejs´cie wieloagentowe do modelowania rynków (in Polish), Oficyna Wydawnicza SGH, 2012. A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin, Bayesian Data Analysis, Chapman & Hall/CRC, 2004. W.B. Powell, I.O. Ryzhov, Optimal Learning, Wiley, 2012. S.H. Ng, J. Yin, Bayesian kriging analysis and design for stochastic simulations, ACM Trans. Model. Comput. Simul. 22 (3) (2012) 17–26. B. Ankenman, B. Nelson, J. Staum, Stochastic kriging for simulation metamodeling, Oper. Res. 58 (2) (2010) 371–382. P. Salemi, J. Staum, B. Nelson, Generalized integrated brownian fields for simulation metamodeling, in: R. Pasupathy, S.-H. Kim, A. Tolk, R. Hill, M. Kuhl (Eds.), Proceedings of the 2013 Winter Simulation Conference, IEEE, 2013, pp. 543–554. Y.H. Wang, On the number of successes in independent trials, Statistica Sinica 3 (2) (1993) 295–312. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2011. S.E. Chick, Bayesian analysis for simulation input and output, in: S. Andradóttir, K. Healy, D. Withers, B. Nelson (Eds.), Proceedings of the 1997 Winter Simulation Conference, IEEE, 1997, pp. 253–260. L.S. Bastos, A. O’Hagan, Diagnostics for gaussian process emulators, Technometrics 51 (4) (2009) 425–438. S. Vincent, Input data analysis, in: J. Banks (Ed.), Handbook of Simulation, John Wiley & Sons, 1998, pp. 55–92. B. Zeigler, Theory of Modelling and Simulation, Wiley/Interscience, 1976. E. Song, B.L. Nelson, A quicker assessment of input uncertainty, in: R. Pasupathy, S.-H. Kim, A. Tolk, R. Hill, M. Kuhl (Eds.), Proceedings of the 2013 Winter Simulation Conference, IEEE, 2013, pp. 474–485. T. Schelling, Dynamic models of segregation, J. Math. Sociol. 1 (2) (1971) 143–186. U. Wilensky, NetLogo Segregation model, Center for Connected Learning and Computer-Based Modeling, Northwestern University, Northwestern University, Evanston, IL (1997). . U. Wilensky, NetLogo, Center for Connected Learning and Computer-Based Modeling, Northwestern University, Northwestern University, Evanston, IL (1999). . D.B. Rubin, The bayesian bootstrap, Annals Stat. 9 (1) (1981) 130–134.