Model Discrepancy Calibration Across Experimental Settings

Model Discrepancy Calibration Across Experimental Settings

Model Discrepancy Calibration Across Experimental Settings Journal Pre-proof Model Discrepancy Calibration Across Experimental Settings Kathryn A. M...

3MB Sizes 0 Downloads 31 Views

Model Discrepancy Calibration Across Experimental Settings

Journal Pre-proof

Model Discrepancy Calibration Across Experimental Settings Kathryn A. Maupin, Laura P. Swiler PII: DOI: Reference:

S0951-8320(19)30180-2 https://doi.org/10.1016/j.ress.2020.106818 RESS 106818

To appear in:

Reliability Engineering and System Safety

Please cite this article as: Kathryn A. Maupin, Laura P. Swiler, Model Discrepancy Calibration Across Experimental Settings, Reliability Engineering and System Safety (2020), doi: https://doi.org/10.1016/j.ress.2020.106818

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Ltd.

Highlights • An approach for calculating the discrepancy between a model and data is presented. • The discrepancy is a function of experimental setting and the spatiotemporal field. • This discrepancy model uses an additive bias correction and a Gaussian process model. • The evaluation of the discrepancy is performed in a two step process. • Two case studies illustrate the feasibility of the proposed approach.

1

Model Discrepancy Calibration Across Experimental Settings Kathryn A. Maupina,∗, Laura P. Swilera a

Sandia National Laboratories, P.O. Box 5800, Albuquerque, NM 87185-0101

Abstract Despite continuing advances in the reliability of computational modeling and simulation, model inadequacy remains a pervasive concern across scientific disciplines. Further challenges are introduced into the already complex problem of “correcting” an inadequate model when experimental data is collected at varying experimental settings. This paper introduces a general approach to calibrating a model discrepancy function when the model is expected to perform for multiple experimental configurations and give predictions as a function of temporal and/or spatial coordinates. 1. Introduction Making predictions about future events is the driving force behind the advancement of computational modeling and simulation. These predictions necessarily occur in the presence of uncertainties, which arise from the observational data, model parameters, and the model form itself. Here, we investigate the approximation of model form uncertainty, and how it may be used to improve the interpolatory and extrapolatory predictive capability of a computational model. The calibration of a computational model may not always result in a perfect match between the model and the observational data. Structural or systemic differences between the model predictions and observations are sometimes referred to as “model error,” “model form uncertainty,” or “structural error.” In this case, a model discrepancy formulation may be used. Oberkampf et al [22] distinguish between the terms “uncertainty” and “error” in modeling and simulation. They argue that the separation is valuable when accounting for different influences in model validation. Several approaches to accounting for model error or model form error are attributed to the validation community. Rebba et al. [27] decompose the total prediction error in the observed difference between model and data into numerical errors, model parameter uncertainty, model form error, and experimental error. Park and Grandhi [23] use a maximum likelihood estimation approach to quantify model form uncertainty using the measured differences of experimental and model outcomes. They compare this to a fully Bayesian estimation approach to demonstrate its effectiveness. The issue of explicitly modeling the discrepancy between a simulation model and observational data is widely credited to Kennedy and O’Hagan [16], who model the difference between the simulation and observation with a Gaussian process that is a function of the experimental scenario (we will refer to this as the KOH approach). The KOH approach ∗

Corresponding author Email addresses: [email protected] (Kathryn A. Maupin), [email protected] (Laura P. Swiler) Preprint submitted to Elsevier

January 20, 2020

involves a Bayesian framework in which the computer model parameters (the calibration parameters) and the discrepancy function parameters are estimated simultaneously given prior distributions on both and observational data. The KOH approach further relies on a Gaussian process (GP) for the simulation model which introduces emulator uncertainty and the necessity to estimate hyperparameters for the emulator GP. The use of Gaussian processes as emulators for computer models is described, for example, in [28] and [30]. Related Bayesian approaches are given by [5], [9], [12], and [29]. The Bayesian calibration plus discrepancy framework was extended to high-dimensional field data by Higdon et al. in [14], [13]. Additional approaches for the inclusion of model discrepancy are given by [7], [17], and [31]. These vary in the formulation of the discrepancy. Discrepancy is referred to as “external” if it is an explicit term added to (or multiplied by) the model. In this case, the discrepancy is separated from the simulation model. Discrepancy is referred to as “internal” or “embedded” if it involves adding and/or changing some parameters within the simulation model itself to account for the model form error. The challenge with incorporating model discrepancy in statistical inverse problems is that it becomes confounded with calibration parameters. One approach to address this confounding is to decouple the estimation of the hyperparameters for the emulator model from the discrepancy, as advocated by [18]. Gramacy et al. [10] also propose a modular approach, where they focus on generating local GP emulators separately from the bias estimation. As they state, “fully Bayesian joint inference in the KOH framework unproductively confounds emulator uncertainty with bias discrepancy.” In addition to the statistical confounding issues, Gramacy et al. argue that, “decoupling has significant computational advantages.” Their approach involves local approximate GP emulator models and performs maximum a posteriori point inference for the calibration parameters via the induced fits for the bias under a GP prior. The critical problem of non-identifiability of the calibration parameters and the model discrepancy is discussed in many works, including [3], [17], and [10]. A first-order Taylor series expansion method to determine the identifiability of model parameters is presented in [17], and [3] presents a way of determining identifiability based on the smoothness of the estimated discrepancy function at the optimal values of the calibration parameters. In Reference [3], the authors state, “identifiability is a highly nuanced and difficult problem, but not impossible.” In this paper, we employ a modularization strategy, where we estimate the emulator parameters separately from (and preceding) the parameter estimation for the bias. We take an optimal value of the calibration parameters, which may either be the mean or the maximum a posteriori value of the posterior parameter distributions, and then estimate the discrepancy. The decoupling and the use of the mean estimate helps address the identifiability issue. Our focus in this paper is to present a robust approach to discrepancy modeling for field data, where the discrepancy is a function of both a configuration or scenario parameter as well as a temporal and/or spatial coordinate. Our contribution is the development and demonstration of a practical, modular, and computationally feasible framework for estimating discrepancy functions for field responses given data from multiple experimental configurations. These discrepancy functions can then be used to correct model predictions at new experimental configurations and/or spatiotemporal coordinates, thus facilitating safety and reliability analyses at untested locations.

3

2. Preliminaries Given a set of observational data d and a computational model q with parameters θ, the goal of model calibration is to find the optimal set(s) of parameter values which minimize the difference between the observed data points di and the corresponding model responses qi (θ). That is, one seeks to minimize the misfit, M (d, q(θ)) = kd − q(θ)k ,

(1)

for some appropriate norm k · k. However, the observational data is corrupted by experimental noise εi . Thus, even in the case where a model is exactly able to reproduce reality, the difference di − qi (θ) = εi (2) remains. Often, the calibrated model provides an insufficient fit to the experimental data. Model error is unavoidable in many situations due to an incomplete understanding of underlying physics, likely in addition to large and possibly poorly characterized uncertainties in the calibration data. This type of discrepancy is generally attributed to model form error or structural error. Arguably, such models can be corrected to some extent by including a model discrepancy term. The seminal work by Kennedy and O’Hagan [16], introduced an additive discrepancy formulation, di = qi (θ) + δi + εi ,

(3)

with δi representing the model discrepancy. More recently, embedded formulations [31, 32, 21, 25] have also been proposed. The introduction of the discrepancy term δ presents new philosophical and implementation issues, in addition to those which exist for a basic model calibration scenario. For example, what model form of δ is appropriate? Generally, the formulation of the discrepancy is phenomenological since known physics and uncertainties are incorporated into the computational model q. The work of Ling et al. [17] examines the use of model selection techniques, which traditionally apply to the selection of the optimal model for q, in selecting the “best” functional form of δ. For any formulation of the discrepancy, there may be significant confounding between the estimates of θ and δ in the calibration process. The physical interpretations of the original model q and its parameters may be altered by the presence of a discrepancy term. It can be argued that joint or simultaneous estimation of θ and δ mitigates the amount of bias introduced into the parameter estimates. The sequential or modular approach that we use, however, makes no effort to reduce bias that may be present in estimates of the parameters θ during the calibration of the model q. Fitting a discrepancy term after determining the calibration parameter values may correct the predictions of outputs of interest, but the parameter bias remains and it may make the parameter estimates less physically meaningful. If one wants to transfer the parameter estimates to an entirely different domain, the discrepancy calibrated with the sequential approach may not apply. Thus, understanding the extent to which the calibrated θ and δ are intertwined is critical. Even with these limitations, we argue there are practical advantages to the sequential approach involving computational efficiency and identifiability. The papers [3], [18], [10] discuss the identifiability issue and propose various ways to modularize pieces of the model estimation and discrepancy calibration to mitigate its effect. 4

Despite the philosophical change in the physical interpretation of q and the increased complexity in the calibration process, including a discrepancy term during model calibration may be essential to recovering accurate parameters and model predictions. Brynjarsdottir and O’Hagan [7] argue that ignoring model discrepancy leads to biased parameter estimates for both interpolatory and extrapolatory model predictions. They also state that not accounting for model discrepancy may lead to overconfident parameter estimates and predictions. However, they caution that it is equally important for model discrepancy to be informed by physical information and a carefully selected prior. Their paper highlights the importance of selecting the prior distributions for the discrepancy model. There continues to be interest in developing new techniques to better separate and identify the contributions of calibration parameter versus discrepancy in modeling. A recent paper by Plumlee argues for using a discrepancy which is orthogonal to the gradient of the computer model [24]. This 2017 paper by Plumlee first requires the optimal parameters to be estimated by minimizing a L2 loss function. In some cases, it may also be appropriate to include an experimental bias or discrepancy term analogous to δ. However, the philosophical issues discussed above, especially that of parameter identifiability, become compounded if both experimental and model discrepancy terms are included in the problem formulation. In a Bayesian setting, known experimental bias can be built into the likelihood function with careful consideration. In the present work, we assume that experimental bias is not a significant issue. In our approach, we begin with Bayesian calibration of the computer model. Given prior distributions on the model parameters π(θ), the data d is used to update the parameters through Bayes’ rule, π(θ|d) ∝ π(d|θ)π(θ).

(4)

We make the common assumption that the experimental or measurement error is normally distributed with zero mean such that ε ∼ N (0, Σd ), leading to a Gaussian likelihood,   1 1 T −1 exp − (d − q(θ)) Σd (d − q(θ)) . (5) π(d|θ) = p 2 (2π)n |Σd |

The Bayesian update of the parameters is performed using Markov chain Monte Carlo. We then proceed to the calculation of discrepancy, which is a function of scenario and “field data” coordinates such as time and/or spatial location. We incorporate uncertainty from the model, as well as from the discrepancy term, in the development of the discrepancy function as outlined below. We employ a Gaussian process to model the discrepancy function in part because of its capability to provide uncertainty estimates on its predictions. This allows us to provide a “corrected” model prediction with uncertainty. 2.1. Formulation The formulation of model discrepancy can be quite involved depending on the form of the model response. Typically, the data is presented as a function of configuration or scenario varables, denoted by x. For example, experiments may be run at different pressures or under different loading forces, where pressure or load would be a configuration variable. If the model response is a scalar response, then we have the discrepancy formulation di (xi ) = qi (θ, xi ) + δi (xi ) + εi . (6)

5

It may be that the model produces a temporal and/or spatial response involving multiple response values at various locations over time. We refer to this case as a “field” response. We then extend the model discrepancy formulation to be a function of the independent spatial and/or temporal coordinate, denoted by t, dij (xi , tj ) = qij (θ, xi , tj ) + δij (xi , tj ) + εij .

(7)

In this “field” version of model discrepancy, the discrepancy is both a function of the configuration variable(s) and the time and/or spatial coordinates of the field where the responses are measured. Note that we have written the configuration variables x and the spatial/temporal coordinates t in bold font. They may each be a single variable, for example x could be pressure and t could be time in a problem when the response is only a function of time and not spatial location. However, they could be multi-variate: x could be pressure and temperature, for example, and t could be a set of four variables involving time and Eulerian coordinates representing a location in space. For generality, we represent them as a vector of variables although they may be single variable. Although the extension of the model discrepancy from a scalar response in (6) to a field response in (7) is not difficult conceptually, it poses additional challenges in the calculation of the likelihood and the construction of the discrepancy term. Specifically, the number of training points for the discrepancy term may become intractable, given I experiments, indexed by configuration parameter i = 1, . . . , I, and Ji points per experiment. For example, if the data is taken at very high frequencies, e.g. 10,000 Hz, at many locations over a structure, the sum-of-squares error (SSE) term in the likelihood could become very large. If we denote the residual vector by R and the covariance matrix of the measurement errors by Σ, the SSE term RT Σ−1 R could involve a covariance matrix of size 100K elements or more, becoming computationally intractable. Our formulation requires one to filter or select a subset of data points from finely resolved temporal/spatial data to address tractability. The question of how one filters the data and at what points one compares the temporal/spatial response to the model is critical in the discrepancy formulation. This may be guided by numerical considerations (taking a subset of data points that makes the matrix inversion tractable), subject matter expertise, and/or the use of an autocorrelation function: one can find the spatial and/or temporal autocorrelation lag, w, and take data at every wth point. In addition, other dimension reduction methods may be used, such as principal component analysis (PCA) [26] or compressive sensing. It is also likely that there will need to be interpolation between the model and observational responses, adding further error to the calibration process. Our argument for filtering the data and explicitly representing the discrepancy as a function of configuration and spatial/temporal variables is that functional data taken at high temporal frequency or at finely resolved spatial resolutions is often highly correlated, so a small subset of the data will be sufficient to obtain a reasonable discrepancy model. Our approach differs from the approach taken by [6], who address model discrepancy in field data. Instead of modeling a discrepancy term, Brown and Hund adjust for autocorrelation in the residuals by scaling the likelihood based on the length of the time lag in the autocorrelation of the residuals. We have chosen to explicitly construct the discrepancy as a function of the independent coordinate(s), t, in part because the formulation allows one to calculate a discrepancy error as a function of the spatial/temporal parameters. The filtering process is an extra step, but it is often beneficial, helping the 6

user understand patterns in their data and how correlated they are as a function of time or spatial location. 3. Making Predictions with Model Discrepancy 3.1. Model Discrepancy Calibration The original Kennedy and O’Hagan formulation involved the simultaneous estimation of the calibration parameters θ and the discrepancy term δ. We use a sequential approach in which the model parameters θ are calibrated first and then used in the calculation and calibration of the model discrepancy. Theoretically, the model discrepancy is a phenomenological or statistical representation of the mismatch between the computational model and the observational data. The computational model, on the other hand, may have a strong physical basis. When the sequential approach is used, the calibrated model parameters represent the best fit to the experimental data, and the discrepancy is merely a correction, meant to improve the predictive power of the model, with no physical interpretation. We argue that there is still much value in constructing an explicit discrepancy term and using it to improve the predictive accuracy of models in situations where it is well-motivated, e.g. when one has sufficient data across scenarios and there exists systemic model form error. In the proposed framework, we first perform a Bayesian calibration of the model parameters and estimate the optimal parameters, θ ∗ , which may either be the maximum a posteriori (MAP) point or the mean of the posterior parameter distribution. The MAP point can be obtained from maximizing the log of the posterior distribution, defined as the sum of the log of the prior distribution plus the log of the likelihood function. Then, conditional on that optimal value for the parameters, there are explicit values for δ by rearranging (6) for scalar model responses, δi (xi ) = di (xi ) − qi (θ ∗ , xi ),

(8)

or rearranging (7) for field responses, δij (xi , tj ) = dij (xi , tj ) − qij (θ ∗ , xi , tj ).

(9)

We have assumed that the experimental noise ε has zero mean. The calculated values of δ given in Equations (8) and (9) are used to calibrate the discrepancy model. It is important to note that δ depends explicitly only on the configuration and independent variables; its dependence on the model parameters is implicit. The functional form that is most appropriate for δ depends on the application at hand. This work uses a Gaussian process model, such that the discrepancy at each point (x, t) is a normally distributed random variable, δ(x, t) = N (µδ (x, t), Σδ ((x, t), (x’, t’)).

(10)

Here, µδ is a quadratic mean trend function and Σδ is a squared exponential covariance. The hyperparameters governing these functions in the GP are calibrated using least squares approximation and a global optimization method called DIRECT. See [1, 2, 8] for more information. Once the discrepancy model has been calibrated, the traditional model q is replaced by ˆ = q+δ in any predictive setting. This corrected model is subject to the corrected model q the same scrutiny as the traditional model. The validity of the new model for predicting 7

targeted quantities of interest and the certainty with which these predictions can be made must be assessed to determine if the corrected model is adequate for the intended use case. The adequacy assesment is typically performed by calculating validation metrics which quantify the difference of the model results with respect to specified data sets. Model validation is an active field of research, see [20, 19] and the references therein. Generally, a validation scenario is defined with an intended application in mind, validation data points are collected, and a reasonable measure of model accuracy is identified with a preset validation tolerance [4]. If the metric yields a value less than the validation tolerance, the model is said to be valid. Otherwise, the model is rendered invalid and further efforts must be made to improve the model. 3.2. Uncertainty Quantification ˆ involves analyzing the uncertainty of The uncertainty analysis of the corrected model q both the traditional and discrepancy models. The sequential treatment of the calibration process can be extended to the approximation of the uncertainty in the corrected model. In this framework, the total uncertainty in the corrected model can be decomposed into contributions from the uncorrected computational model and the discrepancy model, Σtotal (x, t) = Σq (θ, x, t) + Σδ (x, t),

(11)

where Σq is the covariance of the uncorrected model and Σδ is the covariance of the corrected model, which represent the uncertainties in the uncorrected and corrected models, respectively. The independent variable t can be excluded for scalar responses. Note that the experimental or observation error ε is implicitly included in Σq , as it is included in the calibration of the posterior distribution of θ. For example, if the model parameters are assumed to have a normal prior distribution, π(θ) ∼ N (0, Σθ ), and a normal likelihood distribution (5) is used, the parameter posterior has the form   1 1 T T −1 −1 (12) π(θ|d) ∝ exp − (µθ − θ) Σθ (µθ − θ) − (d − q(θ)) Σd (d − q(θ)) . 2 2 The parameter posterior thus depends on the experimental or measurement error ε ∼ N (0, Σd ). One way of approximating the model uncertainty Σq is to sample the posterior parameter distribution and perform a forward evaluation of the model at these samples. Thus, the inherent dependence of the parameter posterior on Σd is implicitly captured in Σq . The uncertainty of the uncorrected model predictions Σq can be derived using any number of uncertainty analysis methods, either analytically or by sampling. For example, the posterior distributions resulting from the Bayesian calibration can be propagated through the model to yield Σq . If a GP model is used to represent the discrepancy, as we do in this work, the covariance matrix Σδ is an inherent component of the GP itself. Thus, we take the total uncertainty in the corrected model to be the combination of the propagated uncertainty of the parameter posteriors through the uncorrected model and the covariance of the GP discrepancy model. Caution must be exercised when making predictions for extrapolated conditions. Although this is true for any computational model, phenomenological models, such as the discrepancy model described herein, rely on rote characterization of the data, using, for example, the shape of the data, rather than physics, to make predictions. Thus, their predictive capabilities are limited by how similarly the system behaves in new scenarios 8

as compared to those used for calibration. Interpolatory scenarios are less likely than extrapolatory scenarios to deviate drastically from the calibration data. However, the covariance estimates of the discrepancy model generally capture this increased uncertainty. 4. Examples Here we present two examples illustrating our proposed approach to model discrepancy calibration. In both examples, the models produce field responses. It is important to note that the complexity of the models themselves is not the focus of this demonstration. In both examples, the original models are relatively simple. The principal investigation of this study is the calculation of the discrepancy across experimental conditions, and how it performs in challenging scenarios. The underlying physics in each example are vastly different, thus demonstrating the robust and general nature of the proposed approach. The calibration of the model parameters using Bayes’ rule and of the discrepancy model is performed using the code Dakota [1, 2]. The results of the proposed sequential approach are compared to those produced by the KOH approach, in which the model parameters and the discrepancy parameters are calibrated simulatenously. Readers interested in performing comparative studies can contact the authors for relevant data. 4.1. Thermal Battery Consider first an example in which the temperature of a thermal battery is measured for multiple points in time. A thermal battery is used to store and release thermal energy. One example is a molten salt battery, which is used as a power source. Molten salt batteries are activated by raising the temperature of the electrolyte above its melting temperature using pyrotechnic heat pellets. The battery will remain active as long as the electrolyte is molten. The thermal processes within the components and interactions between them are critical to the overall performance of thermal batteries [15, 11]. In this example, we have at our disposal ten sets of experimental data collected at three different initial temperature conditions, which we classify as “low,” “medium,” and “high.” The physics-based computational model is relatively expensive; we therefore use a surrogate model in its place. ˜ , depends on seven parameters The surrogate computational model, denoted by q and one configuration variable. In this example, the configuration variable is the initial temperature condition and is thus a continuous variable, despite its discrete labels. A GP surrogate with a quadratic trend function is built for the thermal battery model using 3000 samples of the parameters and configuration variable from the full computational model. Each computational model run produces a field response of length 18, meaning there are 18 temperature values at 18 different time points. Following the procedures outlined in Section 3, a Bayesian calibration of the model parameters is performed. Uniform priors and a Gaussian likelihood are used. The range of the prior distributions and the measurement error, used as the variance in the likelihood calculations, were obtained through expert elicitation. Although these choices impact the results of the Bayesian calibration, this topic is beyond the scope of the present work. From the ten sets of experimental data, one data set is arbitarily chosen to represent each initial condition. From these representative data sets, data points from three points in time are extracted to be used as calibration data. These three points are chosen to be well-spaced in time and allow for both interpolation and extrapolation of the corrected model with respect to time. We consider three calibration scenarios: one in which we 9

(a)

(b)

(c)

Figure 1: Comparison of experimental data to uncorrected calibrated model in three calibration scenarios. All ten sets of experimental data are shown using solid lines, with color indicating the initial condition used. (a) Model is calibrated to three data points each from the “low” and “medium” initial temperature condition experimental data sets. (b) Model is calibrated to three data points from the “low” and “high” data sets. (c) Model is calibrated to three data points from the “medium” and “high” data sets. The 18 field responses produced by the calibrated yet uncorrected model for the three initial temperature conditions are shown for each scenario.

calibrate the surrogate model using the calibration data from the “low” and “medium” initial conditions, another in which the “low” and “high” calibration data are used, and a third which calibrates to the “medium” and “high” calibration data. The uncorrected calibrated model is compared to all ten sets of experimental data in Figure 1. We take as validation data all available experimental data points, except those used for calibration. Although more extensive validation studies can be performed [20], we consider a simple validation test by calculating the percentage error at each validation datum and computing the average over all points. The average percentage error is summarized in Table 1. With a validation tolerance of 3.5%, this model would be rendered invalid. Specifically, the uncorrected model in all three calibration scenarios overpredicts the experimental data for time points closer to t = 0 and underpredicts the experimental data later in time. This type of pathological model error, where a clear pattern of inadequacy can be seen, is indicative of model form error and is therefore the target of our discrepancy implementation. Continuing with the procedures described above, the discrepancies between the calibration data and the corresponding model responses are calculated according to Equation (9), with the optimal parameters taken to be the mean of the posterior parameter distributions. To be specific, there are three values of the configuration variable, x1 = low, x2 = medium, and x3 = high. Only two of the three are used in each of the three calibration scenarios. Furthermore, the model outputs a field response of length 18, but a subfield of length three is used for calibration: t4 , t10 , t16 . Thus, in each calibration scenario, there are six total δij upon which the GP of the model discrepancy is built, with j = 4, 10, 16, and either i = 1, 2, i = 1, 3, or i = 2, 3. The remaining values of δij are calculated as predictions, yielding the corrected model qˆij = qij + δij for i = 1, 2, 3, j = 1, . . . , 18. For purposes of comparison, a separate calibration of the corrected model is performed using a concurrent approach. Again, uniform priors and a Gaussian likelihood are used, the same calibration scenarios are considered with the calibration data described above, and the discrepancy model is assumed to be a GP with a quadratic trend function and a

10

(a)

(b)

Figure 2: Comparison of experimental data to the uncorrected calibrated model and the corrected models in the calibration scenario in which calibration data is taken from the “low” and “medium” initial temperature conditions. The corrected model shown in (a) is that calibrated using the proposed sequential discrepancy calibration. The corrected model shown in (b) is calibrated using the concurrent approach.

squared exponential covariance. As before, the discrepancy GP is a function of two parameters: the configuration variable (temperature) and the independent variable (time). Thus, the discrepancy GP has as its primary parameters the correlation lengths of the temperature and time. The Bayesian calibration is modified to include the original seven model parameters and these two additional parameters. The corrected models for the calibration scenarios are shown in Figures 2, 3, and 4. In each figure, the ten experimental data sets are shown in gray. The uncertainty estimates are calculated following Section 3.2, with Σq calculated by propagating the Bayesian posterior distributions of the parameters and Σδ calculated from the discrepancy GP. Overall, the model corrected with the proposed (sequential) approach is improved, and the overpredictions for smaller t and underpredictions for larger t are lessened. For all prediction points, the experimental data lies within the 95% prediction intervals calculated using Σtotal . The average percentage errors are summarized and compared to the uncorrected model in Table 1, from which we can see that the model corrected using the proposed approach improves the uncorrected model and satisfies the preset validation tolerance of 3.5% in all three cases. In Case 3, the results for which are shown in Figure 4, the prediction configuration is an extrapolation of the calibration configurations. Although the corrected model is the least accurate in this case, the confidence bounds are also the widest and account for the increased uncertainty in the prediction of the discrepancy model. This does not appear to be generally true of the model corrected with the KOH (simultaneous) approach. In the first two cases, the KOH corrected model improves the uncorrected model. However, extrapolation to the third experimental condition in Case 3 yields uniformly poor predictions. The model corrected with the KOH approach has smaller variances along the temperature conditions containing calibration data, and some of the validation data fall outside of the 95% confidence intervals. A quantitative comparison of the performance of both approaches is given in Table 1. The temperature conditions that contain no calibration data have much larger predic11

(a)

(b)

Figure 3: Comparison of experimental data to the uncorrected calibrated model and the corrected models in the calibration scenario in which calibration data is taken from the “low” and “high” initial temperature conditions. The corrected model shown in (a) is that calibrated using the proposed sequential discrepancy calibration. The corrected model shown in (b) is calibrated using the concurrent approach.

(a)

(b)

Figure 4: Comparison of experimental data to the uncorrected calibrated model and the corrected models in the calibration scenario in which calibration data is taken from the “medium” and “high” initial temperature conditions. The corrected model shown in (a) is that calibrated using the proposed sequential discrepancy calibration. The corrected model shown in (b) is calibrated using the concurrent approach.

12

Uncorrected Model Corrected Model KOH Corrected Model

Case 1 Case 2 Case 3 3.24% 3.88% 4.22% 1.74% 2.26% 3.03% 1.96% 2.32% 4.47%

Table 1: Summary of average percentage errors for three scenarios of the thermal battery calibration. In Case 1, the model is calibrated to three data points from the “low” and “medium” initial temperature condition experimental data and extrapolated to the “high” temperature condition. In Case 2, the model is calibrated to three data points from the “low” and “high” initial conditions and interpolated to the “medium” data set. In Case 3, the model is calibrated to three data points from the “medium” and “high” initial temperature conditions and extrapolated to the “low” initial condition. All ten experimental data sets and all 18 field responses, except those used as calibration data, are used in the calculation this error.

tion variances when using the KOH approach. The correlation length of the discrepancy GP for the configuration parameter was found to be much larger when calibrated simultaneously with the model parameters than during sequential calibration. The difference in this parameter is likely due to the difficulty in exploring the full parameter space. Recall that the correlation parameter optimization is being performed in a 9-dimensional parameter space (θ, which has length seven, plus the correlations governing the two discrepancy parameters, x and t) in the KOH approach and in a 2-dimensional parameter space (containing solely the correlation lengths for x and t) in the proposed approach. It is also important to note that the KOH corrected model takes much longer to calibrate than the model corrected with the sequential approach. The calibration time of the KOH corrected model is 49%, 38%, and 41% higher than the sequentially corrected model for Cases 1, 2, and 3, respectively. Adding two dimensions to the parameter space (temperature condition, x, and time, t) make MCMC sampling more challenging. Furthermore, the concurrent approach requires that the full corrected model is run at each MCMC sample. The correlation lengths change for each sample, so a new GP must be constructed for each sample. The proposed approach requires the optimization and construction of a single GP. 4.2. Material Failure As a second example, we consider the stress-strain relationship of a material at several temperatures. The experimental data is shown in Figure 5. This data presents a challenge to material models due to the change in shape of the stress-strain curves and the change in the point of material failure as the temperature increases. The goal is to use a single model and one characterization of the parameters for stress calculations at a wide range of temperatures. To this end, we define a completely phenomenological model,   1 log(100t + 1) − 0.2 , (13) q(θ, x, t) = θ1 x x0.5 x (100t − 1.05( 100 − 7.73)2 θ2 )2 where t is the strain and is the independent variable, θ = {θ1 , θ2 } are the parameters to be calibrated, and x is the configuration variable, which is, in this case, temperature. This model performs quite well when calibrated to a single temperature. However, the goal is to investigate a realistic scenario in which experimental data is available from multiple temperatures and a single model is needed for use at new temperature conditions. Because an estimate of the experimental error was not available, it was approximated using the multiple data sets present for each temperature. From the multiple sets of 13

Figure 5: Experimental stress-strain data, obtained at temperatures ranging from 296.15 to 1073.15K.

experimental data at x = 373.15K, x = 673.15K, and x = 973.15K, the average and standard deviations are calculated and used as calibration data and experimental error, respectively. The experimental data sets from the four remaining temperatures will be used for validation purposes. The calibration data and the uncorrected model are shown in Figure 6. It is clear from the figure that the model is inadequate, as it underpredicts the stress at lower values of strain, overpredicts the stress at higher values of strain, and is unable to capture the overall shape of the curves or the points of failure. For x2 = 373.15K, x4 = 673.15K, and x6 = 973.15K, and for all values of the independent coordinate tj , which varies in length for each configuration, the build points of the discrepancy are calculated according to Equation (9). As before, the discrepancy model is represented by a GP, and the corrected ˆ is calculated for xi , i = 1, . . . , 7, and for all tj . model q The corrected model for the calibration data is also shown in Figure 6, and that for the prediction configurations is shown in Figure 7. For the calibration configurations, the discrepancy model corrects q such that the experimental data is exactly reproduced. Two of the prediction configurations are interpolatory, and two are extrapolatory. The corrected model performs quite well for the interpolated configurations, both in capturing the shape and predicting the point of failure. The corrected model is nearly able to reproduce the shape of the curves for the extrapolated configurations, but is unable to accurately predict the point of failure, and the corrected model would be rendered invalid due to the poor performance of the extrapolation scenarios. Need for improvements in the model is further indicated by the large prediction intervals. The fact that the corrected model performs well for interpolation and poorly for extrapolation illustrates the words of warning concluding Section 3. Again, for the purposes of comparison, a separate model calibration is conducted using the simultaneous approach. The data and settings used in the Bayesian calibration details above remain the same, with two additional parameters representing the correlation lengths for the configuration (temperature) and independent (strain) coordinates. 14

Figure 6: Comparison of calibration data to calibrated model as well as the corrected model. It is apparent that the calibrated model is inadequate and could benefit from the use of model discrepancy. The corrected model is somewhat difficult to see due to the fact that it overlaps exactly with the calibration data.

Figure 8 compares both corrected models to the experimental data in the four validation scenarios. The prediction variances for the KOH corrected model are not shown due to their width; they are much wider than that of the model corrected using sequential discrepancy calibration, as in the previous example. Although the sequentially corrected model rendered by the proposed approach performs poorly in the extrapolation scenarios, that produced by the KOH approach is much worse. While the odd behavior of the KOH corrected model is not exhibited in the interpolation scenarios, it is apparent that the model corrected using the proposed approach continues to better match the validation data. Furthermore, as in the previous example, the concurrent calibration of the the material and discrepancy models is much more costly than that performed sequentially. 5. Conclusion In this paper, we have developed and demonstrated one approach to calculating model discrepancy which circumvents simultaneous estimation of the model parameters and the discrepancy function. Our approach incorporates experimental data from multiple configurations and allows “field” predictions at new experimental configurations as a function of spatial and/or temporal coordinates. The field data aspect can be used with other types of independent coordinates, for example in stress-strain curves where material model parameters are to be calibrated. This approach can incorporate experimental data across a variety of test configurations (e.g. different temperatures or load conditions) and produce a discrepancy-corrected material model to predict stress as a function of strain. We illustrated the approach and its practical utility through two different examples. With a mere six calibration data points, the behavior of the thermal battery was captured at three different initial conditions. With field data for three temperatures of a material failure experiment, whose stress-strain curves vary greatly in shape and character, exper15

(a)

(b)

(c)

(d)

Figure 7: Uncorrected calibrated model and corrected model for prediction configurations (temperatures) of material stress-strain curves. (a) and (d) represent extrapolations of the uncorrected and corrected models, with temperatures equal to 296.15K and 1073.15K, respectively. (b) and (c) represent interpolations of the uncorrected and corrected models, with temperatures equal to 473.15K and 873.15K, respectively.

(a)

(b)

Figure 8: Comparison of experimental data to the uncorrected calibrated model and the corrected models in the (a) extrapolatory and (b) interpolatory prediction scenarios of the material failure example.

16

imental data was shown to fall within the uncertainty bounds of the corrected model. For both examples, the performance of the proposed approach was compared to an analogous implementation of the concurrent approach proposed by Kennedy and O’Hagan [16]. Not only did the approach proposed herein better match the validation data, the computational cost was much lower. We have therefore demonstrated the robust, practical, and predictive capabilities of a sequential model discrepancy calibration approach. Our examples illustrated that the choices around modeling a discrepancy term require careful consideration. The KOH approach may do a better job at finding the underlying model parameters with simultaneous estimation of the model and the model discrepancy function. However, we expect model extrapolation to generally perform better in our approach because the discrepancy term captures and corrects the shape of the model prediction, given a set of calibrated parameters. When choosing an approach, the modeler must keep in mind the goal of their calibration: is it the determination of the model parameters or making accurate model predictions? Although these two options are not mutually exclusive, they must be prioritized when the underlying model is known to be inadequate. Delaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government. References [1] B. M. Adams, W. J. Bohnhoff, K. R. Dalbey, J. P. Eddy, M. S. Ebeida, M. S. Eldred, J. R. Frye, G Gerarci, R. W. Hooper, P. D. Hough, K. T. Hu, J. D. Jakeman, M. Khalil, K. A. Maupin, J. A. Monschke, E. M. Ridgway, A. Rushdi, J. A. Stephens, L. P. Swiler, J. G. Winokur, D. M. Vigil, and T. M. Wildey. Dakota, a multilevel parallel object-oriented framework for design optimization, parameter estimation, uncertainty quantification, and sensitivity analysis: Version 6.7 users manual. Technical Report SAND2014-4633, Sandia National Laboratories, Albuquerque, NM, Updated May 2017. Available online from http: //dakota.sandia.gov/documentation.html. [2] B. M. Adams, W. J. Bohnhoff, K. R. Dalbey, J. P. Eddy, M. S. Ebeida, M. S. Eldred, J. R. Frye, G Gerarci, R. W. Hooper, P. D. Hough, K. T. Hu, J. D. Jakeman, M. Khalil, K. A. Maupin, J. A. Monschke, E. M. Ridgway, A. Rushdi, J. A. Stephens, L. P. Swiler, J. G. Winokur, D. M. Vigil, and T. M. Wildey. Dakota, a multilevel parallel object-oriented framework for design optimization, 17

parameter estimation, uncertainty quantification, and sensitivity analysis: Version 6.7 theory manual. Technical Report SAND2014-4253, Sandia National Laboratories, Albuquerque, NM, Updated May 2017. Available online from http: //dakota.sandia.gov/documentation.html. [3] P. D. Arendt, D. W. Apley, and W. Chen. Quantification of model uncertainty: Calibration, model discrepancy, and identifiability. ASME Journal of Mechanical Design, 134(10):1–12, 2012. [4] I. Babuka, F. Nobile, and R. Tempone. A systematic approach to model validation based on Bayesian updates and prediction related rejection criteria. Computer Methods in Applied Mechanics and Engineering, 197(29):2517 – 2539, 2008. Validation Challenge Workshop. [5] Maria J Bayarri, James O Berger, Rui Paulo, Jerry Sacks, John A Cafeo, James Cavendish, Chin-Hsu Lin, and Jian Tu. A framework for validation of computer models. Technometrics, 49(2):138–154, 2007. [6] JL Brown and LB Hund. Estimating material properties under extreme conditions by using Bayesian model calibration with functional outputs. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2018. [7] J. Brynjarsdottir and Anthony O’Hagan. Learning about physical parameters: the importance of model discrepancy. Inverse Problems, 30(11):114007, 2014. [8] J. Gablonsky. Direct version 2.0 userguide technical report. Technical Report CRSCTR01-08, North Carolina State University, Center for Research in Scientific Computation, Raleigh, NC, 2001. [9] Michael Goldstein and Jonathan Rougier. Probabilistic formulations for transferring inferences from mathematical models to physical systems. SIAM Journal on Scientific Computing, 26(2):467–487, 2004. [10] Robert B Gramacy, Derek Bingham, James Paul Holloway, Michael J Grosskopf, Carolyn C Kuranz, Erica Rutter, Matt Trantham, R Paul Drake, et al. Calibrating a large computer experiment simulating radiative shock hydrodynamics. The Annals of Applied Statistics, 9(3):1141–1168, 2015. [11] Ronald A. Guidotti and Patrick Masset. Thermally activated (thermal) battery technology: Part i: An overview. Journal of Power Sources, 161(2):1443 – 1449, 2006. [12] Gang Han, Thomas J Santner, and Jeremy J Rawlinson. Simultaneous determination of tuning and calibration parameters for computer experiments. Technometrics, 51(4):464–474, 2009. [13] Dave Higdon, James Gattiker, Brian Williams, and Maria Rightley. Computer model calibration using high-dimensional output. Journal of the American Statistical Association, 103(482):570–583, 2008. [14] Dave Higdon, Marc Kennedy, James C Cavendish, John A Cafeo, and Robert D Ryne. Combining field data and computer simulations for calibration and prediction. SIAM Journal on Scientific Computing, 26(2):448–466, 2004. 18

[15] S Hodson, ES Piekos, R Sayer, SA Roberts, and CC Roberts. Thermal characterization of molten salt battery materials. Technical Report 2016-3378C, Sandia National Laboratories, Albuquerque, NM, 2016. [16] M. C. Kennedy and A. O’Hagan. Bayesian calibration of computer models. Journal of the Royal Statistical Society, 63:425–464, 2001. [17] You Ling, Joshua Mullins, and Sankaran Mahadevan. Selection of model discrepancy priors in Bayesian calibration. Journal of Computational Physics, 276:665 – 680, 2014. [18] Fei Liu, MJ Bayarri, JO Berger, et al. Modularization in Bayesian analysis, with emphasis on analysis of computer models. Bayesian Analysis, 4(1):119–150, 2009. [19] Y. Liu, W. Chen, P. Arendt, and H.Z. Huang. Toward a better understanding of model validation metrics. ASME Journal of Mechanical Design, 133(7), 2011. [20] Kathryn A Maupin, Laura P Swiler, and Nathan W Porter. Validation metrics for deterministic and probabilistic data. Journal of Verification, Validation and Uncertainty Quantification, 3(3):031002, 2018. [21] R. Morrison, T. Oliver, and R. Moser. Representing model inadequacy: A stochastic operator approach. SIAM/ASA Journal on Uncertainty Quantification, 6(2):457– 496, 2018. [22] William L. Oberkampf, Sharon M. DeLand, Brian M. Rutherford, Kathleen V. Diegert, and Kenneth F. Alvin. Error and uncertainty in modeling and simulation. Reliability Engineering and System Safety, 75(3):333 – 357, 2002. [23] Inseok Park and Ramana V. Grandhi. A Bayesian statistical method for quantifying model form uncertainty and two model combination methods. Reliability Engineering and System Safety, 129:46 – 56, 2014. [24] Matthew Plumlee. Bayesian calibration of inexact computer models. Journal of the American Statistical Association, 112(519):1274–1285, 2017. [25] Teresa Portone, Damon McDougall, and Robert D. Moser. A stochastic operator approach to model inadequacy with applications to contaminant transport. arXiv:1702.07779 [cs.CE]. [26] James O Ramsay and Bernard W Silverman. Applied functional data analysis: methods and case studies. Springer, 2007. [27] Ramesh Rebba, Sankaran Mahadevan, and Shuping Huang. Validation and error estimation of computational models. Reliability Engineering and System Safety, 91(10):1390 – 1397, 2006. [28] Jerome Sacks, William J Welch, Toby J Mitchell, and Henry P Wynn. Design and analysis of computer experiments. Statistical science, pages 409–423, 1989. [29] Shankar Sankararaman and Sankaran Mahadevan. Integration of model verification, validation, and calibration for uncertainty quantification in engineering systems. Reliability Engineering and System Safety, 138:194 – 209, 2015. 19

[30] Thomas J Santner, Brian J Williams, and William I Notz. The design and analysis of computer experiments. Springer Series in Statistics, 2003. [31] K. Sargsyan, H. N. Najm, and R. Ghanem. On the statistical calibration of physical models. International Journal of Chemical Kinetics, 47(4):246–276, 2015. [32] Khachik Sargsyan, Xun Huan, and Habib N. Najm. Embedded model error representation for Bayesian model calibration. arXiv:1801.06768 [stat.CO].

20