Emulator-enabled approximate Bayesian computation (ABC) and uncertainty analysis for computationally expensive groundwater models

Emulator-enabled approximate Bayesian computation (ABC) and uncertainty analysis for computationally expensive groundwater models

Journal of Hydrology 564 (2018) 191–207 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhy...

5MB Sizes 0 Downloads 70 Views

Journal of Hydrology 564 (2018) 191–207

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Research papers

Emulator-enabled approximate Bayesian computation (ABC) and uncertainty analysis for computationally expensive groundwater models

T



Tao Cuia, , Luk Peetersb, Dan Pagendamc, Trevor Picketta, Huidong Jind, Russell S. Crosbieb, Matthias Raibera, David W. Rassama, Mat Gilfeddera a

CSIRO Land and Water, GPO Box 2583, Brisbane, QLD 4001, Australia CSIRO Land and Water, PMB 2, Glen Osmond, SA 5064, Australia c CSIRO Data61, GPO Box 2583, Brisbane, QLD 4001, Australia d CSIRO Data61, GPO Box 1700, Canberra, ACT 2601, Australia b

A R T I C LE I N FO

A B S T R A C T

This manuscript was handled by G. Syme, Editor-in-Chief, with the assistance of Bellie Sivakumar, Associate Editor

Bayesian inference provides a mathematically elegant and robust approach to constrain numerical model predictions with system knowledge and observations. Technical challenges, such as evaluating a large number of models with long runtimes, have restricted the application of Bayesian inference to groundwater modeling. To overcome such technical challenges, we use Gaussian process emulators to replace a transient regional groundwater MODFLOW model for computing objective functions during model constraining. The regional model is designed to assess the potential impact of a proposed coal seam gas (CSG) development on groundwater levels in the Richmond River catchment, Clarence-Moreton Basin, Australia. The emulators were trained using 4000 snapshots derived from the MODFLOW model and subsequently used to replace the MODFLOW model in an Approximate Bayesian Computation (ABC) scheme. ABC was deemed the more appropriate choice as it relaxes the need to derive an explicit likelihood function that the formal Bayesian analysis requires. The study demonstrated the flexibility of the Gaussian process emulators, which can accurately reproduce the original model behavior at a fraction of the computational cost (from hours to seconds). The gain in computational efficiency using the proposed approach allows the global calibration and uncertainty algorithms to become more feasible for computationally demanding groundwater models. Based on the ABC analysis, the probability for the simulated CSG development causing a water table change of more than 0.2 m was less than 5%. In addition to a probabilistic estimate of the prediction, an added value of emulator-assisted ABC inference is providing information on the extent to which observations can constrain parameters and predictions, as well as the flexibility to include various quantitative and qualitative parameter constraining information.

Keywords: Gaussian process emulators Surrogate model Groundwater modelling Data worth Clarence-Moreton Basin Coal seam gas

1. Introduction Doherty (2011) states that “… a model cannot promise the right answer. However, if properly constructed, a model can promise that the right answer lies within the uncertainty limits which are its responsibility to construct”. This statement is a reflection of the need to shift the focus of groundwater modeling from seeking a single optimal prediction to a prediction distribution that encompasses the range of predictions that are consistent with observations. Uncertainty analysis can be considered as the process to achieve such a distribution to support evidence-based decision making. Predictive uncertainty can be quantified through (i) forward propagation of input uncertainty or (ii) an inverse assessment of parameters where historical measurements are



available to constrain the model (Beven, 2007; Refsgaard et al., 2007). Most practical groundwater modeling applications belong to the latter type, aiming to inform decision-making for water resource management. The inverse assessment of uncertainty requires sampling the prior distribution of parameters to yield posterior parameter distributions conditioned on observations. A multitude of methods for assessing the predictive uncertainty using various sampling strategies have been reported in the literature, such as pure Monte Carlo (MC) sampling, stratified sampling, importance sampling, projection-based sampling, or combinations of them (Beven, 2008). Among those, the null space Monte Carlo (NSMC) (Herckenrath et al., 2015; James et al., 2009; Sepúlveda and Doherty, 2015) is probably the most widely used method in groundwater

Corresponding author. E-mail address: [email protected] (T. Cui).

https://doi.org/10.1016/j.jhydrol.2018.07.005 Received 22 December 2017; Received in revised form 16 May 2018; Accepted 2 July 2018 Available online 05 July 2018 0022-1694/ Crown Copyright © 2018 Published by Elsevier B.V. All rights reserved.

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Models (GAMs) (Stanfill et al., 2015; Storlie et al., 2009; Strong et al., 2014). Table 1 summarizes the recent applications of model emulation in applied groundwater modeling. Although these previous studies significantly advanced our understanding of groundwater model emulation, to the authors’ knowledge, GP emulators have not been previously applied to regional groundwater models that are used to provide environmental impact assessment of deep resource development in a sedimentary basin. These models needs to include both the shallow unconsolidated aquifers where most receptors are located and the deep porous rock formations where resource development occurs (Raiber et al., 2015); such models are complex and highly non-linear (Cui et al., 2018a; Sreekanth et al., 2018). Gaussian process emulators (O’Hagan, 2006) are mathematically very close to Kriging interpolation (Kleijnen, 2009); they are robust and well-establish tools for accurate representation of complex response surfaces derived from numerical models. They were chosen in this study as they: (1) can provide a probabilistic estimate of the uncertainty in the emulated prediction (Rasmussen and Williams, 2006), (2) are straightforward and quick to generate, (3) can be easily tailored to individual predictions, and (4) allow flexible parameterization (Bastos and O’Hagan, 2009). Razavi et al. (2012) and Asher et al. (2015) have provided a thorough review on the application of surrogate models in water resources along with a comprehensive comparison among different types of emulators. The second critical issue that limits the application of Bayesian inference is the difficulty to define an explicit likelihood function for complex and non-linear groundwater models. Although different likelihood functions can be derived based on some assumptions, such as that the error residuals are normally distributed with a variance related to the observation uncertainty, this observation uncertainty is however often very difficult to establish and the assumptions are often not realistic (Hill and Tiedeman, 2007). The Approximate Bayesian Computation (ABC) framework (Nott et al., 2012; Vrugt and Sadegh, 2013) relaxes the need for computing an explicit likelihood function by using summary statistics or multiple objective functions. ABC has its roots in the rejection sampling in which only those parameter combinations that meet specific defined acceptance criteria are accepted. This is in contrast with formal Bayesian analysis where parameter combinations are accepted based on a likelihood function. ABC is also superior in diagnostic model calibration by defining multiple summary statistics that capture different aspects of the modelled system (Vrugt and Sadegh, 2013). ABC has been recently applied in hydrology by Nott et al. (2012) and Vrugt and Sadegh (2013), however and to the authors’ knowledge, it has not been applied in groundwater modelling yet. Another critical challenge that the classical groundwater model calibration faces is the design of the objection function. Different weight factors are usually applied to different types of numerical observations to ensure that every type of observation does influence the model calibration, however, the determining the weights is not straightforward and is usually subjective (Doherty, 2015). Qualitative

modelling, especially for computationally expensive groundwater models either due to large spatial and temporal scales or coupled multiple processes. Although the projection-based NSMC can significantly reduce computational cost and allow a non-linear uncertainty analysis for computationally intensive groundwater models, the posterior-distribution is always surrounding a pre-calibrated parameter set using a gradient-based sampling algorithm. The method is sensitive to initial parameter assignment and prone to local minima when the model is not linear (or not approximately linear), although the bias may be minimized to some extent by multiple starting-points NSMC (Keating et al., 2010; Tavakoli et al., 2013). Meanwhile, with the continuous rise in computing power and improvement of data measurements, Bayesian analysis based on holistic sampling algorithms has become increasingly popular for posterior parameter inference in other fields (Sadegh and Vrugt, 2014; Vrugt and Sadegh, 2013). However, two key issues have hindered the application of Bayesian inference in groundwater modelling, although other factors exist (Pappenberger and Beven, 2006), such as a steep learning curve, and a lack of well-documented and robust tools with a user-friendly interface. The first factor is the heavy computational burden for most practical groundwater models. Despite the advances in algorithmic sampling efficiencies (Maier et al., 2014; J. A. Vrugt et al., 2009), the number of model runs required to accurately approximate the posterior distributions are counted in tens to hundreds of thousands for highly parameterized groundwater models (Keating et al., 2010). Constraining wide and uninformative prior parameter distributions for a complex regional groundwater model through Bayesian inference using a holistic search algorithm requires an often prohibitively large number of model runs. Model emulation has a great potential to overcome this particular issue. The principle of model emulation is to use computationally-efficient algorithms to replace computationally-demanding models. Emulators are also known as surrogate models, meta-models, reduced models, proxy models, lower fidelity models, and response surfaces (Razavi et al., 2012). Numerous model emulation techniques have been explored in various disciplines, and they can be broadly categorized into three classes; data-driven methods, projection-based methods, and multi-fidelity methods (Asher et al., 2015; Robinson et al., 2008). The emulator is typically a black-box or statistical model that is trained on a set of model inputs and their corresponding outputs. A well-trained emulator can yield relatively accurate and precise predictions for new inputs such as parameter values and forcing variables that were not part of the original training set. Emulators have gained popularity for performing tasks such as model calibration, sensitivity analysis and uncertainty analysis, where a model must be run a large number of times. Some popular choices in the literature have included Gaussian processes (GPs) (Kennedy and O’Hagan, 2001; Liu and West, 2009; O’Hagan, 2006; Sacks et al., 1989), Neural Networks (Kourakos and Mantoglou, 2009; Yan and Minsker, 2006), Random Forests (Hooten et al., 2011; Leeds et al., 2014) and Generalized Additive

Table 1 Summary of recent applications of model emulation in applied groundwater modeling. Paper Laloy et al. (2013) Wu et al. (2014) Wu et al. (2015) Xu et al. (2017) Rajabi and Ketabchi (2017) Cui et al. 2018a,b (this paper)

Emulation technique Polynomial chaos expansion (PCE) Polynomial chaos expansion (PCE) Support vector machines (SVM) Support vector regression (SVR) Gaussian process emulator (GP) Gaussian process emulator (GP)

Numerical model

Discretization 2

Study area

Purpose

Steady-state groundwater flow Coupled SW-GW model

113 km , 14 layers

Nete Basin, Belgium

9106 km2, 5 layers

Zhangeye Basin, China

Posterior inference of a highly parameterized groundwater flow model Monte Carlo uncertainty analysis

Coupled SW-GW model

9106 km2, 5 layers

Zhangeye Basin, China

Optimize SW/GW ratio in irrigation

Spokane Valley-Rathdrum Prairie, United States Kish Island, Iran

Investigate the impact of structure error on calibration and prediction Optimization of coastal groundwater management Improve computational efficiency for global model calibration and uncertainty analysis

2

3D transient groundwater model 3D transport model

844 km , 3 layers

3D transient groundwater model

8230 km2, 6 layers,

90.5 km2, 2 layers,

192

Clarence-Moreton Basin, Australia

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 1. Study area with major geographic information including topography, major towns, rivers, groundwater model boundary, groundwater level observation bores and locations of modelled CSG development fields.

observations (e.g., direction of flow) cannot be usually included in the composite objective function. On the other hand, ABC has not limitation on taking different types of numerical observations into account by including them in the summary statistics. Qualitative data can also be implemented as acceptance criteria. The objective of this paper is to explore whether the emulator-enabled ABC framework can serve as a practical tool for a comprehensive and transparent assessment of groundwater model predictive uncertainty; it also aims at tackling some of the challenges encountered by the current groundwater model calibration methods and the application of Bayesian inference in groundwater modelling. The framework is applied to a transient regional groundwater model developed to assess the potential impact of coal seam gas (CSG) development in the Richmond River catchment, Clarence-Moreton Basin in eastern Australia (Fig. 1). The next section introduces the methods used in the study, followed by a description of the study area and the numerical

MODFLOW model. The results section evaluates the performance of the emulator and the resulting probabilistic estimates of model predictions, while the discussion focuses on the practical implementation and transparency. The findings are summarized in the conclusions section along with recommendations for further research. 2. Method 2.1. Numerical groundwater modelling The groundwater modelling aims to probabilistically evaluate the potential changes in the relevant groundwater system due to a proposed CSG development in the Richmond river basin, NSW, Australia. The direct impact is represented by the difference in the groundwater head between the CSG development scenario and the baseline. MODFLOWNWT (Niswonger et al., 2011), a Newton formulation of MODFLOW193

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

is the scale parameter, KN is the N by N matrix of pairwise correlations between each of the xi with i, jth entry K θ (xi , xj ) , and kNT (x ) is a vector of length N with ith element K θ (x , xi ) being the correlation function between x and xi . In the above, DN denotes the set of training data for the laGP, which is comprised of pairs of predictor vectors and response variables. This correlation function is modeled as an isotropic Gaussian of the form,

2005, was chosen to execute the model in order to reduce numerical instabilities associated with cell drying and rewetting in the uppermost layer of the model. The model area is characterised by high-relief topography that usually results in cell drying and rewetting during modelling in the unconfined parts of a groundwater model. The frequent drying and rewetting is one of the major reasons that causes difficulty to converge. MODFLOW-NWT was designed to address drying and rewetting problems more efficiently thus providing a more stable solution compared to MODFLOW-2005 by treating nonlinearities of cell drying and rewetting as a continuous function of groundwater head. The GMRES matrix solver was used for solving the large number of equations generated by the groundwater model.

p

⎧ ⎫ K θ (xi , xj ) = exp − ∑ (x i, k −x j, k )2 / θ . ⎨ k=1 ⎬ ⎩ ⎭

where x i, k denotes the k component of xi . In practice, a nugget can be added, whereby K θ, η (x , xi ) = K θ (x , xi ) + η {x = x } . The standard approach taken in the laGP package is to set η = 10−6 , which offers enough flexibility to aid parameter estimation whilst providing a high level of fidelity to the observed model runs at each of the xi .The specific algorithm for fitting the laGP (as implemented in the “laGP” R package) is outlined in Gramacy (2016), but for ease of reference it is repeated below. Multi-stage Approximate Local GP Modeling Algorithm:

2.2. Gaussian process emulator GPs can be thought of as a prior distribution over functions of interest to emulate. In the context of emulation, the function in question is the response of a model’s output, y , over vectors of p-dimensional model inputs, x . The finite dimensional distributions of the model outputs can be described through a mean μ (x ) and positive-definite covariance matrix Σ , with the (ith, jth) entry σi2, j = τ 2K (xi , xj ) . The covariance terms are the product of a variance parameter, τ 2 , and a correlation function, K (·,·) , that describes the decline in the correlation between two model outputs yi and yj as the distances between their respective input vectors xi and xj are increased. The popularity of GPs stems from two main features: (i) the predictive distributions from the emulator have simple analytical distribution; and (ii) as a result of analytical tractability, inference for the model parameters is relatively simple. Gramacy and Apley (2015) improved the computational efficiency of GPs through the use of local approximate Gaussian processes (laGPs). To overcome the computational burden encountered for standard GPs (which model the data through a stationary covariance function), we used laGPs that is only a function of the training data within a local neighborhood of the input vector, x. The local neighborhood is defined to be the N input vectors nearby x that are chosen one by one in an iterative manner to maximize a reduction in prediction variance, a scheme referred as Active Learning Cohn or “ALC” by Gramacy and Apley (2015). As recommended by Gramacy (2016), empirical Bayes has been adopted for fitting laGPs in the current study. A reference prior for the variance τ 2 (algebraically equivalent to an Inverse-Gamma distribution with shape and scale parameters both equal to zero) and the multivariate normal likelihood of the GP makes integrating out τ 2 analytically tractable. The resulting marginal likelihood for the remaining parameters has the form,

p (YN |K θ (·,·)) =

ψ Γ(N /2) ( N )−N /2 , (2π ) N /2|KN |1/2 2

1. Choose a sensible starting global θx = θ0 for all x. 2. Calculate local designs Xn (x , θx ) based on ALC, independently for each x: a) Choose a nearest neighbor design Xn0 (x ) of size n 0 . b) For j = n 0 , …, n−1 set

xj + 1 =

Nσ 2 (x|DN , K (·,·)) . N −2

ψN [K θ (x , x )−kNT (x ) KN−1 kN (x )] , N

(6)

3. Also independently, calculate the maximum likelihood estimate θn̂ (x )|Dn (x , θx ) thereby explicitly obtaining a globally nonstationary predictive surface. Setθx = θn̂ (x ). 4. Repeat steps 2–3 as desired. 5. Output predictions Y (x )|Dn (x , θx ) for each x. To construct a laGP emulator, n runs of the target model are required in order to obtain a set of training data. Ideally, these model runs should span a wide range of vectors across the space of model inputs and attempt to fill this space with the limited number of model runs that can be feasibly obtained with the available resources. Unsurprisingly, the experimental designs used to build model emulators are broadly referred to as “space-filling designs”, such as Latin Hypercube Designs (LHDs) (Santner et al., 2003). In this study we chose the maximin-LHD, which is implemented in the lhs package in R (Carnell, 2016). The maximin Latin Hypercube design is generated like a standard Latin Hypercube design, one design point at a time, but with each new point selected to maximise the minimum Euclidean distance between design points in the parameter space. Points in the design span the full range of parameter values in each dimension of the parameter space, but also avoid redundancy amongst points by maximising the Euclidean distance between two points (since nearby points are likely to have similar model output). GPs tend to perform better if the output variable is close to a Gaussian distribution. As the groundwater model outputs of the case study were typically not normally distributed but heavily skewed, the quantity to be emulated was transformed using a normal quantile transform (Bogner et al., 2012) before training the emulator. The following steps are required to carry out such a normal quantile transformation of a sample X:

(1)

(2)

(3)

where

σ 2 (x|DN , K θ (·,·)) =

vj (x ;θx )−vj + 1 (x ;θx ),

where vj + 1 (x ;θ) = [Kj + 1 (x , x )−kTj + 1 (x ) K −j +11 kj + 1 (x )],which is an input vector from a predefined local nearest neighbor XN that has approximately the most potential to reduce predictive variance according to ALC but not included in the current local design yet, and then update Dj + 1 (x , θx ) = Dj (x , θx ) ∪ (x j + 1, y (x j + 1)) Xj + 1 and similarly (x ) = Xj (x ) ∪ xj + 1.

and variance defined as,

Var[Y (x )|DN , K θ (·,·)] =

arg max xj + 1∈ XN ⧹ Xj (x )

where θ is the length scale parameter used in the isotropic Gaussian correlation function (Eq. (5)) used in this paper, and ψN = YNT KN−1 YN . This approach is differentiable analytically and therefore lends itself to fast, Newton-based methods for maximum likelihood estimation of the parameters in the laGP. The resulting predictive distribution from the laGP has a Student-t distribution with N degrees of freedom with a mean defined as,

μ (x|DN , K θ (·,·)) = kNT KN−1 YN ,

(5)

th

(4) 194

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

(a) Emulator training

(b) Approximate Bayesian Computation (ABC)

Problem defining and Groundwater model development

Prior parameter distribution

Random samples

Define model outputs

Objective function emulators

Design of experiment

Wide initial range of parameters Stress test

Satisfy pre-defined threshold?

Refine parameter range

Yes

No

Posterior parameter distribution

Latin hypercube sampling Obtain multiple snapshots from the groundwater model

Enough samples in posterior distribution?

No

Yes

Emulator training

MODFLOW model Emulator performance check Posterior predictions

Emulator

Fig. 2. Framework of the proposed surrogate-based Baysian uncertainy analysis for complex groundwater models.

The proposition underlying ABC is that a parameter set, θ∗, should lie in the posterior distribution when the distance between the observed and simulated data is less than a small positive value, ∊ (Vrugt and ∼ ∼ Sadegh, 2013). The distance is defined as ρ (Y , Y (θ∗)), where Y is the ∗ observation vector, and Y (θ ) is the simulated data corresponding to the observations. For complex models, such as the 3D transient regional groundwater model used in the study, the chance to meet the acceptance criteria is very low, which results in a very low acceptance rate during model evaluation. To avoid this difficulty, the distance between ∼ summary statistics, S (Y ), and S (Y (θ∗)) is usually used during inference. A set of summary statistics can be can be defined to extract different types of information from the observation data (Csilléry et al., 2010) when it is necessary. Another way to increase the information available for the ABC sampler is to incorporate multiple context specific state variables and their relationships into a single summary statistics. Only parameter combinations sampled from the priors that satisfy these criteria are accepted in the posterior distributions. The widely used ABC algorithms can be grouped into three categories based on the different sampling techniques, including the basic rejection algorithm and its variations, the ABC-MCMC algorithm (Sadegh and Vrugt, 2014) and the algorithm using sequential Monte Carlo (Turner and Van Zandt, 2012). The sampling procedure starts from a candidate θ∗ from the priors, and continue until the moments of the posterior distribution stabilize. Regardless of what sampling algorithms are chosen, the process of ABC inference can be described by the generic algorithm as follows. Generic algorithm of ABC:

1. Sort the sample X from smallest to largest: x1, …, x n . 2. Estimate the cumulative probabilities pi , …, pn using a plotting position like i such that pi = P (X ⩽ x i ). n+1

3. Transform each value x i of X in yi = Q−1 (pi ) of the normal variate Y, where Q−1 is the inverse of the standard normal distribution, using a discrete mapping. 2.3. Approximate Bayesian computation (ABC) The ABC framework was firstly published in the statistical literature by Diggle and Gratton (1984). Recently, the method has been seeing increasing applications in population and evolutionary genetics mainly due to the limitation of likelihood-based inference and rapid generation of large amounts of data (Beaumont et al., 2002; Csilléry et al., 2010). Inspired by the increasing application in evolutionary genetics, Vrugt’s team have introduced the ABC approach into hydrologic modelling (Vrugt and Sadegh, 2013) and implemented it in the DREAM suite for Bayesian inference (Sadegh and Vrugt, 2014). The ABC provides a more flexible mechanism to define the nature and number of evaluation criteria. Various kinds of information can be formalized into the summary statistics. The process of defining summary statistics can be subjective (Beven et al., 2008), but is more transparent, which makes it easier to engage with stakeholders. This provides ample opportunity to integrate expert knowledge in the model evaluation and even to tailor model evaluation to specific predictions, which is receiving increased attention internationally (de Boer-Euser et al., 2017; Gupta et al., 2012). 195

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

for i = 1 to N do ∼ while ρ (S (Y ), S (Y (θ∗))) > do ∗ Sample θ from the prior, θ∗ ∼ p (θ) Simulate data Y using θ∗, Y ∼ Model (θ∗) ∼ Calculate ρ (S (Y ), S (Y (θ∗))) end while Accept θ∗ as θi , Set θi ← θ∗ end for ∼ where ρ (S (Y ), S (Y (θ∗))) is the distance function of the two summary statistics variables. The major focus of the present study is to demonstrate the applicability of ABC on emulated complex groundwater models. The basic rejection algorithm was chosen because it is straightforward to understand and easy to be implemented, despite the availability of more sophisticated algorithms. For a more detailed discussion about the ABC algorithms, interested readers are referred to Nott et al. (2012), Vrugt and Sadegh (2013) and Sadegh and Vrugt (2014).

snapshots for emulator training. Depending on the context, different sampling methods can be used to generate the required parameter combinations. In the current study, 4000 parameter combinations were generated for the entire parameter space for the groundwater model using the maximin Latin Hypercube design (Santner et al., 2003, p. 138). 4. Emulator training The results of the model runs from the DOE, including the simulated equivalents to the observations and the predictions of interest, were used to train the statistical emulators, which will replace the MODFLOW model in the ABC analysis. In the current study, emulators were only trained for the objective functions/summary statistics; the final predictions were generated by the MODFLOW model to avoid uncertainties from the emulators. The hyperparameters of the laGPs were fitted by empirical Bayes. 5. ABC inference

2.4. Framework In the ABC process, random parameter combinations were generated from the parameter prior distributions. These were evaluated with the emulators trained with the outputs corresponding to historical observations. Only those parameter combinations that met the acceptance criteria were accepted into the posterior parameter distributions. In this study, the key prediction was the maximum drawdown due to CSG development at various locations in the model domain. A set of relevant model evaluation criteria were formalized into objective functions and their acceptance thresholds. The objective functions were calculated for all of the DOE model results, and the corresponding emulators were trained for these objective functions. This set of emulators were subsequently used in a Monte Carlo sampling of the prior parameter distributions.

The workflow for the Gaussian process emulator enabled ABC is shown in Fig. 2. It consists of two major modules: developing and training the emulator (Fig. 2a) and the approximate Bayesian inference (Fig. 2b). 1. Problem definition and process-based groundwater model development The workflow starts with defining the modelling objective. Although the objective can be reviewed and refined throughout the modelling process, it should be clearly stated at the initial stage of a modelling project (Anderson et al., 2015). Once the objective has been defined and the benefits are clearly justified, a groundwater model can be developed. The defined objective needs to be revisited iteratively during the model development process to justify the selection of various parameters, boundary conditions, observations and processes.

6. Posterior prediction The final step in the procedure is to generate the posterior ensembles of predictions. In the current study, prediction ensembles were generated by the MODFLOW model with 200 samples randomly selected from the posterior parameter distribution.

2. Define model outputs GPs capture the functional behavior of a numerical model, i.e., how a model output changes as a function of varying model parameters. Therefore, an essential prerequisite for model emulation is that a finite set of model outputs is defined for which an emulator will be created. These model outputs may correspond to historical observations or be the model predictions for which the model is being developed.

3. Case study 3.1. Study area and groundwater model The Clarence-Moreton Basin is an intracratonic sedimentary basin covering approximately 38,000 km2 on shore in south-east Queensland and north-east New South Wales (Rassam et al., 2014). It was filled with sedimentary sequences of Middle and Late Triassic to Lower Cretaceous age with a combined thickness of up to approximately 3500 to 4000 m. The climate across the basin falls within the temperate and subtropical climate groupings. The mean annual rainfall varies from 800 to 2716 mm. Rainfall is highest during the warmer months of January to March and lowest during the colder months of July to September (Rassam et al., 2014). “It contains nationally important wetlands, numerous national parks and forest reserves, and sites of international importance for bird conservation. It includes potential habitat for 432 threatened species under Queensland, NSW and Commonwealth legislation” (Rassam et al., 2017). Although the Clarence-Moreton Basin contains several major river catchments, the current study is focused on the Richmond River catchment where CSG development plans had been proposed (Raiber et al., 2015) (Fig. 1). The potential CSG project is referred to as the West Casino Gas Project in this study. CSG operations can induce changes in groundwater that depend on varying factors such as hydraulic properties and CSG development schemes. The impact in turn can potentially

3. Design of experiment The primary purpose of the design of experiment (DOE) is to efficiently sample the parameter space within the limited time and computing resources to provide training data for emulators. The groundwater model should be evaluated for a wide range of parameter combinations chosen in a systematic and efficient manner. The first stage of this evaluation applies a “stress test” involving a limited number of model runs to evaluate the stability of the model using a wide range of parameters. This test helps to identify the stability of the minimum and maximum parameter ranges. This test results in a more stable model while ensuring that sampling from within this range explores the full range of predictive uncertainty. This exercise also helps identify the redundant parameters that do not (or only slightly) affect the model predictions. Based on the outcomes of the stress test, model parameterisation is further reviewed to either remove the redundant parameters that did not affect the predictions or tie these parameters to other independent parameters. Once the model passes the stress test, it can be used to generate 196

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 3. Hydrogeological units in the Clarence-Moreton Basin and the layer configuration in the groundwater model.

the transient groundwater model, resulting in 1441 stress-periods in total.

reduce the quantity of water that discharge to surface water bodies as base flow, or may cause a decline in water level in a groundwater bore. There are around 4000 groundwater bores in the model domain. More than 90% of them were drilled for irrigation and domestic/stock consumption (McJannet et al., 2015). The groundwater model was developed to assess the potential impact of the West Casino Gas Project on the surrounding groundwater systems, and in particular on the operational groundwater bores in the study area. The model is the first comprehensive model covering the entire Richmond River catchment, based on a newly developed 3D geological model (Cui et al., 2018b; Raiber et al., 2017b). The remainder of this section only briefly describes the model development. Cui et al. (2017) provides more details on the groundwater model. Seven layers of the ten hydrostratigraphic units identified in the geological model (Raiber et al., 2017b) were represented with six numerical layers (Fig. 3) in the groundwater model. The upper, unconfined layer contains the major aquifers targeted for agricultural and domestic groundwater extraction, especially the Richmond River Alluvium and the Lamington Volcanics. Layers 2–5 comprise sedimentary bedrock units, separating the unconfined aquifers from the coal seams targeted for CSG production. These coal seams are represented in layer 6. The model domain was discretized into 248 rows and 184 columns resulting in 33,352 active cells in each of the six layers. Grid spacing varied from 200 m inside the CSG development field to 800 m outside the domain. The transient model spans a 120-year period from 1983 to 2102, divided in monthly stress-periods, where the first 30 years are used to compare simulated heads and fluxes with observations. A steady state stress period was added to generate the background context for

3.2. Boundary conditions and model parameterization General-head boundaries were implemented in the northern and southern sides of the groundwater model where the simulated aquifers continue beyond the model extents. A general-head boundary was also applied in parts of the eastern side of the groundwater model where it intersects with the Richmond River alluvium. Other lateral boundaries of the model were assigned as no-flow boundaries. The long-term average recharge was estimated using chloride mass balance analysis (Crosbie et al., 2015; Raiber et al., 2017a). The pointscale estimates were up-scaled, coupled with Australian Water Resources Assessment (AWRA) recharge outputs (Vaze et al., 2013) and normalized to yield a spatially-explicit time series of recharge, which was implemented using the MODFLOW Recharge package and varied via multipliers during the uncertainty analyses. Perennial reaches of the Richmond River network were explicitly simulated using the MODFLOW River package by imposing stage-height time series derived from river-gauging sites. Estimates for the riverbed hydraulic conductivity and thickness were informed by the riverbed topography. Initial riverbed conductance was calculated using the two parameters along with river reach width and length. The initial values were adjusted with the aid of interpolation points during the uncertainty analyses. The CSG extraction wells in the coal seams of layer 6 were simulated using the MODFLOW Drain package, with a variable conductance that aims at matching the independently evaluated volumes of extracted 197

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

hydraulic parameters of deep layers, especially the targeted coal seam formation. On the other hand, groundwater level observations however are mainly available in the unconfined part of the model and their dynamics are more dominated by shallow processes such as recharge and SW-GW interaction. Two objective functions and their thresholds were defined based on the available observations.

water and achieves the target pressure head. The proposed West Casino Gas Project consists of two potential gas fields (one to the south and one to the west of Casino) (Fig. 1). There were 90 wells in the proposed development plan when the study was conducted for the gas field west of Casino, which were to be drilled into the Walloon Coal Measures over a period of 20 years starting in 2018. The gas field south of Casino was assumed to be a small project with five wells. Other existing groundwater bores that were not drilled for CSG extraction were simulated using the MODFLOW Multiple Node Well package.

1. The total simulated CSG water production is less than 1.2 GL over the simulation period that is twice of the previously estimated value by the CSG industry. The total CSG water production was computed for all successful runs of the DOE. These values were used to create a CSG water production emulator. 2. The absolute weighted difference between observed and simulated groundwater levels is less than 2 m, where observations close to the development location receive greater weight and observations close to rivers receive less weight. A total of 900 head records from 43 bores were selected as the head observations (Fig. 1).

3.3. Model parameterization Depending on data availability, two different approaches were adopted to assign horizontal hydraulic conductivities (Kx) in the groundwater model. The Kx for the top layer were interpolated from available Kx from 130 existing pumping test records, using the TGUESS approach (Bradbury and Rothschild, 1985). The Kx for the deeper layers were computed from fitted Kx-depth relationships mainly using drill stem tests (DST) data, as well as a few pumping tests. Due to the lack of sedimentary bedrock aquifer data in the Clarence-Moreton Basin, relevant data were sourced from the linked Surat Basin, where hydrogeological units are expected to show similar hydraulic properties to their counterparts in the Clarence-Moreton Basin. Porosity/permeability-depth relationships have been extensively studied within the petroleum industry (e.g., Magara, 1980; Nelson, 1994). Many studies have shown a linear or exponential decline in porosity down to depths of about 3 to 4 km, depending on rock compositions and geological settings. As hydraulic conductivity is generally correlated with porosity, it is deemed that a similar trend would exist between hydraulic conductivity and depth. The following log(Kx)depth linear relationship was implemented for layers 2, 3 and 4:

The weighting of individual head observations depends on the distance between observation and prediction and the distance between observation and river boundary, calculated as follows: m

Oh =

∑ j=1

(

−3h a

) + d,

i=1

⎞ (ho, i−hs, i )2⎟, ⎠

dj fw (dj ) = 1−tanh ⎛ ⎞, ⎝D⎠

(7)

(9)



(10)

where, dj is the distance (in km) between observation bore j and the central location of the development field. Coefficient D controls the distance at which the weight of observation bore j drops to zero. To account for transient observations, the weights are divided by nj , the number of observations at the observation location. The tanh function allows the weight of an observation to decrease almost linearly with distance and to gradually become zero at a distance of approximately 3D . In the current study, the value of D is set equal to 7 km, which means that an observation has a weight of zero if the distance between the prediction and the observation is more than 21 km. The threshold Th , the upper limit of the head objective function is defined as:

where h is depth, and m is the value of log(Kx) close to ground surface. For layers 5 and 6, the following exponential decay relationship was implemented:

log(Kx ) = c 1−b



with Oh the head objective function, m the number of observation bores, nj the number of observations at one specific observation bore j, r j the distance of observation bore j to the nearest watercourse line network, ho, i the head observation and hs, i the simulated equivalent. fw (dj ) is a distance weighting function as, ⎜

log(Kx ) = −0.0009h + m ,

nj

1 ⎛ ⎜r j fw (dj ) n j ⎝

(8)

where a, b, c and d are four coefficients that can be adjusted during modelling. The adjustment of the parameters in the two log(Kx)-depth relationships are restricted by the boundary lines, that is, the log(Kx)depth lines are not allowed to move outside the boundary lines in Fig. 4. The “stress test” of the groundwater model revealed that many of the 63 parameters chosen for the initial parameterization of the model hardly affect the prediction of interest and that a parameterization scheme with 38 independent parameters is sufficient to explore the water table change due to the CSG development. Table 2 lists the 38 parameters that were independently varied in the uncertainty analysis and how they relate to the hydraulic properties, stresses and boundary conditions of the groundwater model. The ‘Notes’ column provides a short description of the parameters. The ‘Min’ and ‘Max’ columns provide the range over which the parameter is sampled. The chosen parameter range is based on expert knowledge, local hydrogeological setting and results from a series of model stability tests. The parameter ranges were sampled uniformly in the DOE to ensure sufficient coverage of the parameter space for emulator training.

m

Th =

∑ j=1

1 ⎛ ⎜r j fw (dj ) n j ⎝

nj

∑ i=1

⎞ (2)2⎟, ⎠

(11)

This means that any parameter combination that results in an average difference between observed and simulated groundwater level less than or equal to 2 m is deemed acceptable. 4. Results 4.1. Emulator performance Within the operational constraints (time and computational resources) of the study, 4000 parameter sets were uniformly sampled from the parameter ranges using the above mentioned maximin Latin Hypercube method. The two objective functions were calculated for all the models in the DOE. These DOE objective function values and their corresponding parameter values formed the training dataset for the two objective function laGP emulators. For emulators to be applicable in Bayesian inference, they need to be sufficiently accurate as to ensure that the added uncertainty of

3.4. Observations and objective functions For the current study, two types of observations were used to constrain the parameters, including groundwater levels and independently forecast CSG water production rates from the CSG industry. The total CSG water production is a function of the integrated groundwater dynamics over tens of years in the prediction period of the model. It was expected that the water production would have some constraint on the 198

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 4. Hydraulic conductivity versus depth for (a) BMO Group, Gubberamunda Sandstone, Westbourne Formation, Springbok Sandstone, and (b) Walloon Coal Measures within the Surat and Clarence-Moreon basins. Boundary lines of the log(Kx)-depth relationship are highlighted using the dark blue lines. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

in the ABC analysis, the risk of wrongly accepting or rejecting a parameter combination is acceptable for regional groundwater modelling.

replacing the original model with a surrogate is acceptably small. The accuracy of emulators, the degree to which emulators can reproduce the relationship between parameters and predictions, depends greatly on the density of sampling of the parameter space (Liu, 2005). A crossvalidation was conducted by portioning the DOE samples into a training set and a test set (250 samples). Fig. 5 provides a comparison between the modelled objective functions and the emulated ones based on the 250 test samples. For both objective functions, the absolute values of test errors are dependent on the magnitude of the objective functions. The test errors of head objective functions (Fig. 5a) are more dynamic than the water production objective function (Fig. 5b), particularly for lower values. The maximum test errors are around 30% of the simulated objective functions. This means that an uncertainty of 30% of the objective functions can be caused by the emulating process. Although it is a subjective decision, the authors argue that by using an emulator with this accuracy

4.2. Posterior parameter distribution A posterior parameter distribution was generated through the generic rejection sampling algorithm under the ABC framework as described in the method section. During this Monte Carlo simulation, random samples of the prior parameter distribution were generated. Only the samples that meet both the acceptance criteria are accepted as part of the posterior distribution. This process was repeated until a predefined number of samples (in this case 10,000) were accepted. The green histograms in Fig. 6. show the resulting posterior parameter distributions based on two emulators of the objective functions. Some of the parameters are not constrained significantly by the objective functions and the posterior distribution for these is almost 199

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Table 2 The 38 adjustable parameters during uncertainty analysis and their corresponding ranges. Group

Parameter name

Minimum

Maximum

Note

General-head boundary

GHB_N GHB_S GHB_E RCH_1 RCH_2 RCH_3 RCH_4 RCH_5 RCH_6 RCH_7 RCH_8 RCH_9 DRN_C_1 DRN_C_2 Rinc rp1 Kx_1_a Kx_1_o Kx_2 Kx_3_d Kx_4 Kx_5_c Kx_5_d Kx_6_c Kx_6_d Kx_Kz_1 Kx_Kz_2 Kx_Kz_3 Kx_Kz_4 Kx_Kz_5 Kx_Kz_6 Sy_1 Ss_1 Ss_2 Ss_3 Ss_4 Ss_5 Ss_6

0.01 0.001 0.1 0.2 0.3 0.2 0.2 0.2 0.3 0.2 0.2 0.5 0.01 0.2 –0.5 0.1 0.1 0.02 –2.2 –3 –2.2 –4.5 –1 –4.5 –1 0.01 0.01 0.01 0.01 0.01 0.01 0.5 0.01 0.01 0.01 0.01 0.01 0.01

1000 10,000 1,000,000 2 2 5 5 5 5 5 5 5 5 5 0.5 1000 2 5 1.5 0.5 1.5 –3.5 0.5 –3.5 0.5 100 100 100 100 100 100 1.5 100 100 100 100 100 100

Multiplier for conductance of the north general-head boundary Multiplier for conductance of the south general-head boundary Multiplier for conductance of the east general-head boundary Multiplier for alluvium recharge zone Multiplier for recharge zone 2 Multiplier for recharge zone 3 Multiplier for recharge zone 4 Multiplier for recharge zone 5 Multiplier for recharge zone 6 Multiplier for recharge zone 7 (Walloon Coal Measures) Multiplier for recharge zone 8 (Walloon Coal Measures) Multiplier for recharge zone 9 (Walloon Coal Measures) Multiplier for conductance of drain cells for CSG wells Multiplier for conductance of drain cells in layer 1 Parameter to stochastically change river stages (m) Riverbed conductance at pilot point 1 (m2/d) Multiplier for initial Kx of the alluvial units in layer 1 Multiplier for initial Kx of the non-alluvial units in layer 1 X-intercept of log(Kx)-depth relationship for layer 2 X-intercept of log(Kx)-depth relationship for layer 3 Intercept of log(Kx)-depth relationship for layer 4 Parameter c in equation P2 Parameter d in equation P2 Parameter c in equation P2 Parameter d in equation P2 Multiplier for Kx/Kz ratio of layer 1 Multiplier for Kx/Kz ratio of layer 2 Multiplier for Kx/Kz ratio of layer 3 Multiplier for Kx/Kz ratio of layer 4 Multiplier for Kx/Kz ratio of layer 5 Multiplier for Kx/Kz ratio of layer 6 Multiplier for specific yield of layer 1 Multiplier for specific storage of layer 1 Multiplier for specific storage of layer 2 Multiplier for specific storage of layer 3 Multiplier for specific storage of layer 4 Multiplier for specific storage of layer 5 Multiplier for specific storage of layer 6

Recharge pacakge

Drain package River package Hydraulic conductivity

Storativity

in the shallow aquifer. For most of these parameters under well constraining (expect for Kx_2 and Kx_4), the posterior distribution are skewed to the upper or lower ends of the parameters. This is different with the most-widely used assumption of normal distribution in many formal Bayesian analysis. A bivariate plot (Fig. 7) was also compiled to reveal the correlation between a set of 12 selected parameter. The 12 parameters were selected because they are constrained to some extent by the two objective functions. The drain conductance are positively correlated with recharge multipliers, particularly RCH_1 (alluvium). Higher recharge can be counterbalanced by a higher drain conductance in the top layer. The

identical to the prior distributions. The parameters for which the posterior distribution are significantly different to the prior distribution, are the recharge multipliers of recharge zones 1 (RCH_1), the multiplier for the drainage conductance of the CSG wells (DRN_C_1), the multiplier for conductance of drain cells in the top layer (DRN_C_2), and the hydraulic conductivity parameters for layers 2, 3, 4 and 6 (Kx_2, Kx_3_d, Kx_4, Kx_6_c and Kx_6_d). Parameters related to the hydraulic properties of layer 5, the aquitard between the shallow aquifers and the targeted coal seam formation, are only constrained slightly. However, these parameters, particularly the vertical/horizontal hydraulic conductivity ratio, are expected to have a strong impact on the drawdown prediction

Fig. 5. Comparison between the modelled and emulated values of (a) the head objective function and (b) the CSG water production objective funciton. The graph is based on the results from 250 test parameter sets. 200

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 6. Histograms of the posterior parameter distributions of the groundwater model. Table 1 provides a detailed description of the parameters.

positively correlated.

drain conductance of CSG well cells needs to be lower when the hydraulic conductivity in layer 6 is high to meet the water production criterion. When the hydraulic parameter of layer 2 (Kx_2) is close to the lower end, and the hydraulic parameter of layer 3 (Kx_3_d) cannot be close to the upper end. The hydraulic properties of layers 3 and 4 are

4.3. Posterior prediction Although various types of prediction can be made with the 201

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 7. Bivariate plots of the ABC derived posterior samples for 12 selected parameter among the 38 parameters adjusted during the ABC analysis.

0.15 m (Fig. 9). The impact becomes weaker gradually with increasing distance from the proposed CSG development field. As the probability map also suggested, another high impact zone occurs along the border between the Kangaroo Creek Sandstone and the McLean Sandstone about 5–7 km west of the development area. The impact extends more to the west, as the targeted coal seam unit deepens towards to east. The map also shows the mitigating effect of the major rivers on additional drawdown (Fig. 9). Fig. 10 shows the time series of the additional drawdown evolution due to CSG over the simulated period at two representative locations (node_324 and node 1291 from the center and boundary of the CSG development field) (Fig. 1). The two locations represent two domestic bores in layer 1 and layer 3, respectively. Although the CSG bores were activated following the sample scheme from 2018 to 2036, the drawdown evolution shows different characteristics among different parameter sets. For some parameter sets, peak impact were observed over the simulation period from 1983 to 2102. On the other hand, the drawdown impact is gradually increasing until the ending of the simulation for some posterior realizations. For these parameter sets, a longer modelling time is required to investigate the peak drawdown impact. As a summary, given the current parameterization and assumptions in the groundwater model, it is very likely that the maximum additional drawdown is less than the 0.2 m threshold, for the two locations.

developed groundwater model. In the current paper, we focus only on additional drawdown due to the simulated CSG development by evaluating the appropriate posterior parameter distribution generated in the previous step. The predictive posterior distribution was generated by running the MODFLOW model with 200 parameter sets from the posterior parameter distribution. Figs. 8 and 9 provide two different visualizations of predictive posterior distributions. Fig. 8 shows the probability of drawdown exceeding 0.2 m for predictions in layer 1, while Fig. 9 shows the 95th percentile of these distributions. The 0.2 m threshold corresponds to an impact threshold in the NSW aquifer interference policy (NSW DPI, 2012) for impact on high priority groundwater-dependent ecosystems. The results show that there are only two noticeable areas where it is possible to have CSG drawdown exceeding 0.2 m, although the probability is all below 5% (Fig. 8). One area is the central part of the proposed CSG development field surrounded by alluvium. The other one is a north-south strip along the border between Kangaroo Creek Sandstone and the MacLean Sandstone aquitard. The low permeability of the MacLean Sandstone limits the spreading of the drawdown impact resulting in an accumulated water level change along the border. The highest probability of exceeding 0.2 m drawdown is recorded in the top layer is 4.8%. The 95th percentiles of additional drawdown are generally higher in the central part of the development field with a maximum of around 202

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 8. Probability of exceeding 0.2 m additional drawdown due to the CSG development in layer 1. The impact zone west of the development field is along the border between the Kangaroo Creek Sandstone (layer 4) and the Mclean Sandstone aquitard (layer 5). The contour is based on randomly 200 samples from the posterior parameter distribution.

5. Discussion

emulator accuracy is achieved, as well as dimension reduction techniques. A computational drawback of building a GP emulator is that the evaluation of the predictive distribution requires the inversion of, and computing the determinant of an n by n covariance matrix (operations whose complexity is O(n3)). In other words, for large values of n, constructing a GP emulator may become computationally prohibitive. In the context of model emulation, n is relevant to the number of model runs that were performed and used to train the emulator. As the number of parameter in the process-based model is increased, the number of model runs required to support emulator training grows dramatically, although laGPs has potential to overcome this. As pointed out in the reviews by Razavi et al. (2012) and Asher et al. (2015), other types of emulators may be better suited for problems with very high parameter dimensions, such as polynomial chaos surrogates using adaptive sparse grids and project-based emulators.

5.1. How many samples are enough for the training dataset? The performance of emulators is largely determined by the density of sampling of the parameter space and the complexity of the response surfaces of the predictions of interest. For the groundwater model in this study, with tens of parameters (38 adjusted parameters) and hours of runtimes (2–8 h), the laGPs can capture the major input-output relationships of the MODFLOW model based on a DOE of 4000 samples. The ratio between samples and the number of parameters is around 100. The emulator performance is validated by comparing the emulated outputs with the modelled outputs for the head and the CSG water production objective functions. However, there should be caution in assuming that this ratio can be directly applied to other groundwater models. Emulator performance is model-specific and prediction-specific. For models with more parameters and more complicated inputoutput relationships, a denser sampling is required. When this occurs after performance validation, adaptive sampling schemes (Gorissen et al., 2010; Shields et al., 2015) can be considered, in which samples are incrementally added to the DOE until a predefined or optimal

5.2. Other beneficial usage of the DOE The sampling scheme for the DOE results in an ensemble of model runs to train emulators. This ensemble of model runs however have 203

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 9. 95th percentile of possible maximum drawdown due to CSG development between 1983 and 2102 at the regional water table. The contour is based on randomly 200 samples from the posterior parameter distribution.

great potential to gain additional insight in the model and to identify structural or conceptual model issues. For instance, the same dataset can be used for a formal global sensitivity analysis, such as delta-moment sensitivity analysis as outlined by Plischke et al. (2013). Peeters et al. (2018) show how this kind of analysis can be used to identify parameters that can be constrained by the observations as well as parameters that the predictions are sensitive to. This is a critical step in a data worth analysis to identify the observations that have the most potential to reduce predictive uncertainty. These model runs can also be used to show the fraction of successful runs that meet the selected acceptance criteria to help identify deficit in the model conceptualization and parameterization.

simplified model interface (Gorelick and Zheng, 2015). For example, this may be the case where the output from a groundwater model need to be coupled with an ecological model to study the impact of new water management policies on ecosystems. Emulators can be trained to capture the relationship between all the inputs of the two models and the final predictions of interest directly. Then, decision makers can experiment with the simpler emulators rather than the process-based groundwater and ecological models that require more skills and effort. Emulators can also be used in agent-based water resource management platforms to explore the behavioral interactions between different entities, such as stakeholders, environment, and government (Castilla-Rho et al., 2015).

5.3. Applications of emulators in groundwater management

5.4. Comments on ABC

In the current effort, emulators were trained for helping predict the change of hydrological variables as a result of CSG development. The impact is calculated from a model batch that includes a baseline model and a model with CSG development. The two models are essentially the same except for the CSG development scheme. Advantages of emulators can be further gained when different types of models need to be coupled in a modeling framework that requires short runtimes and/or a

The use of GP emulators with short runtimes made the simple rejection sampling scheme computationally tractable and allowed the generation of a posterior parameter distribution of 10,000 samples within hours. More sophisticated sampling methods such as population Monte Carlo (Turner and Van Zandt, 2012) and Markov chain Monte Carlo (MCMC) as implemented in DREAM(abc) (Sadegh and Vrugt, 2014) are needed however if the ABC scheme is to be used without 204

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Fig. 10. The temporal evolution of the drawdown due to CSG at two representative model locations, (a) node_324 (the central point of the simulated CSG field), and (b) node_1291 (representing a domestic bore screened in layer 3 located in the north boundary of the CSG field. The locations of the two model locations can be found in Fig. 1.

• ABC does not only relax the need to explicitly define a likelihood

emulators or with computationally more demanding surrogate approaches, such as the multi-fidelity surrogates used in Doherty and Christensen (2011) and Burrows and Doherty (2015). The ABC methodology has similar theoretical drawbacks as the GLUE methodology (Mantovan and Todini, 2006; Stedinger et al., 2008; Vrugt et al., 2009); the choice of acceptance criteria is subjective and as a result, the method does not strictly adhere to the probabilistic Bayesian theory. A great advantage, however, is that this explicit formulation of evaluation criteria and thresholds increases transparency. It makes it much easier to formalize what modelers, experts and stakeholder considers to be a ’good’ model. In turn, model confidence is increased as model evaluation criteria can be unambiguously and transparently documented.

function, it also provides flexibility on objective function selection and diagnostic ability to model structure deficiency. The application of ABC is straightforward and its application in practical groundwater modeling should be encouraged.

Although the application of emulators and Bayesian inferences in practical groundwater modeling is still at its preliminary stage, they have shown great potential. More research is required to (1) compare the effectiveness and efficiency of the different model emulating algorithms, (2) invent new model emulating algorithms particularly for that highly parameterized models (hundreds to thousands of parameters), and (3) develop well-documented and user-friendly software packages to further lower the barrier for the application of emulator-enabled Bayesian inference.

6. Conclusions To conduct probabilistic assessment of the potential impact of coal seam gas depressurization in the Clarence-Moreton Basin on relevant groundwater systems, emulator assisted Approximate Bayesian Computation (ABC) was conducted. Major conclusions that can be drawn from this study are:

Acknowledgments This research is part of the Bioregional Assessment Programme, which is funded by the Australian Government Department of the Environment and Energy. The Bioregional Assessment Programme is a transparent and accessible program of baseline assessments that increase the available science for decision making associated with the impacts of coal seam gas and coal mining development on water resources and water-dependent assets. Bioregional assessments are being undertaken in a collaboration between the Department of the Environment and Energy, the Bureau of Meteorology, CSIRO and Geoscience Australia. For more information: www.bioregionalassessments.gov.au. We equally acknowledge the CSIRO Land and Water for receiving funding through the strategic appropriation research project “Next generation methods and capability for multiscale cumulative impact and management”. We are grateful to Lei Gao and Warrick Dawes for their constructive reviews, and to David Post and Anthony Swirepik for their kind support.

• Given the current model conceptualization and simulated develop•

ment scheme, the water table change due to the West Casino Gas Project in the Richmond River Catchment is unlikely exceeding 0.2 m. The proposed strategy of using computationally efficient emulator/ surrogate models to calculate objective functions during model constraining allows the application of global calibration and uncertainty analysis to computationally demanding groundwater models. The running time of the MODFLOW model varies between 2 h and 8 h depending on the parameter values, with an average of about 4 h without including the post-processing time. The running time of the laGP emulators is less than a minute. 205

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

References

Inference for Physical Parameters in Nonlinear Mechanistic Models. J. Agric. Biol. Environ. Stat. 16, 475–494. https://doi.org/10.1007/s13253-011-0073-7. James, S.C., Doherty, J., Eddebbarh, A.-A., 2009. Practical postcalibration uncertainty analysis: Yucca Mountain, Nevada. Ground Water 47, 851–869. https://doi.org/10. 1111/j.1745-6584.2009.00626.x. Keating, E.H., Doherty, J., Vrugt, J.A., Kang, Q., 2010. Optimization and uncertainty assessment of strongly nonlinear groundwater models with high parameter dimensionality. Water Resour. Res. 46, 1–18. https://doi.org/10.1029/ 2009WR008584. Kennedy, M.C., O’Hagan, A., 2001. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B 63, 425–464. https://doi.org/10.1111/1467-9868.00294. Kleijnen, J.P.C., 2009. Kriging metamodeling in simulation: A review. Eur. J. Oper. Res. 192, 707–716. https://doi.org/10.1016/j.ejor.2007.10.013. Kourakos, G., Mantoglou, A., 2009. Pumping optimization of coastal aquifers based on evolutionary algorithms and surrogate modular neural network models. Adv. Water Resour. 32, 507–521. https://doi.org/10.1016/j.advwatres.2009.01.001. Laloy, E., Rogiers, B., Vrugt, J.A., Mallants, D., Jacques, D., 2013. Efficient posterior exploration of a high-dimensional groundwater model from two-stage Markov chain Monte Carlo simulation and polynomial chaos expansion. Water Resour. Res. 49 (5), 2664–2682. https://doi.org/10.1002/wrcr.20226. Leeds, W.B., Wikle, C.K., Fiechter, J., 2014. Emulator-assisted reduced-rank ecological data assimilation for nonlinear multivariate dynamical spatio-temporal processes. Stat. Methodol. 17, 126–138. https://doi.org/10.1016/j.stamet.2012.11.004. Liu, F., West, M., 2009. A Dynamic Modelling Strategy for Bayesian Computer Model. Bayesian Anal. 1–21. https://doi.org/10.1214/09-BA415. Liu, L., 2005. Could Enough Samples be more Important than Better Designs for Computer Experiments? In: 38th Annual Simulation Symposium. IEEE, pp. 107–115. doi:10. 1109/ANSS.2005.17. Magara, K., 1980. Comparison of porosity-depth relationships of shale and sandstone. J. Pet. Geol. 3, 175–185. https://doi.org/10.1306/BF9AB5C4-0EB6-11D78643000102C1865D. Maier, H.R., Kapelan, Z., Kasprzyk, J., Kollat, J., Matott, L.S., Cunha, M.C., Dandy, G.C., Gibbs, M.S., Keedwell, E., Marchi, A., Ostfeld, A., Savic, D., Solomatine, D.P., Vrugt, J.A., Zecchin, A.C., Minsker, B.S., Barbour, E.J., Kuczera, G., Pasha, F., Castelletti, A., Giuliani, M., Reed, P.M., 2014. Evolutionary algorithms and other metaheuristics in water resources: current status, research challenges and future directions. Environ. Model. Softw. 62, 271–299. https://doi.org/10.1016/j.envsoft.2014.09.013. Mantovan, P., Todini, E., 2006. Hydrological forecasting uncertainty assessment: incoherence of the GLUE methodology. J. Hydrol. 330, 368–381. https://doi.org/10. 1016/j.jhydrol.2006.04.046. McJannet, D., Raiber, M., Gilfedder, M., Cui, T., Marvanek, S., Rassam, D., 2015. Current water accounts and water quality for the Clarence-Moreton bioregion. Product 1.5 from the Clarence-Moreton Bioregional Assessment. Department of the Environment, Bureau of Meteorology, CSIRO and Geoscience Australia, Australia. Nelson, P.H., 1994. Permeability-porosity relationships in sedimentary rocks. Log Anal. 35, 38–62. Niswonger, R.G., Panday, S., Motomu, I., 2011. MODFLOW-NWT, A Newton Formulation for MODFLOW-2005, U.S. Geological Survey Techniques and Methods 6–A37. U.S. Geological Survey. Nott, D.J., Marshall, L., Brown, J., 2012. Generalized likelihood uncertainty estimation (GLUE) and approximate Bayesian computation: What’s the connection? Water Resour. Res. 48, 1–7. https://doi.org/10.1029/2011WR011128. NSW DPI, 2012. NSW aquifer interference policy: NSW Government policy for the licensing and assessment of aquifer interference activities. NSW Department of Primary Industries. O’Hagan, A., 2006. Bayesian analysis of computer code outputs: a tutorial. Reliab. Eng. Syst. Saf. 91, 1290–1300. https://doi.org/10.1016/j.ress.2005.11.025. Pappenberger, F., Beven, K.J., 2006. Ignorance is bliss: Or seven reasons not to use uncertainty analysis. Water Resour. Res. 42, 1–8. https://doi.org/10.1029/ 2005WR004820. Peeters, L.J.M., Pagendam, D., Crosbie, R.S., Viney, N.R., Rachakonda, P.K., Dawes, W.R., Gao, L.S.M., Zhang, Y.Q., McVicar, T.R., 2018. Determining the initial spatial extent of an environmental impact assessment with a probabilistic screening methodology. Submitt. Environ. Model Softw. Plischke, E., Borgonovo, E., Smith, C.L., 2013. Global sensitivity measures from given data. Eur. J. Oper. Res. 226, 536–550. https://doi.org/10.1016/j.ejor.2012.11.047. Raiber, M., Cui, T., Pagendam, D., Rassam, D., Gilfedder, M., Crosbie, R., Marvanek, S., Hartcher, M. (Eds.), 2017. Observations analysis, statistical analysis and interpolation for the Clarence-Moreton bioregion. Product 2.1-2.2 from the Clarence-Moreton Bioregional Assessment. Department of the Environment, Bureau of Meteorology. Geoscience Australia, Australia. Raiber, M., Murray, J., Bruce, C., Rassam, D., Ebner, B., Henderson, B., O’Grady, T., Gilfedder, M., Cui, T., 2017b. Conceptual modelling for the Clarence-Moreton bioregion. Product 2.3 from the Clarence-Moreton Bioregional Assessment. Department of the Environment, Bureau of Meteorology. Geoscience Australia, Australia. Raiber, M., Rassam, D., Cui, T., Pagendam, D., Janardhanan, S., 2015a. Development of a 3D geological model of the Clarence-Moreton Basin: on the challenge of integrating petroleum systems and groundwater systems approaches. APPEA J. 55, 464. https:// doi.org/10.1071/AJ14099. Raiber, M., Rassam, D., Hartcher, M., 2015. Coal and coal seam gas resource assessment for the Clarence-Moreton bioregion, Product 1.2 from the Clarence-Moreton Bioregional Assessment. Department of the Environment and Energy, Bureau of Meteorology, CSIRO and Geoscience Australia, Australia. Rajabi, M.M., Ketabchi, H., 2017. Uncertainty-based simulation-optimization using Gaussian process emulation: Application to coastal groundwater management. J. Hydrol. 555, 518–534. https://doi.org/10.1016/j.jhydrol.2017.10.041.

Anderson, M.P., Woessner, W.W., Hunt, R.J., 2015. Applied groundwater modeling: simulation of flow and advective transport, 2nd ed. Elsevier, San Diego, United States. Asher, M.J., Croke, B.F.W., Jakeman, A.J., Peeters, L.J.M., 2015. A review of surrogate models and their application to groundwater modeling. Water Resour. Res. 51, 5957–5973. https://doi.org/10.1002/2015WR016967. Bastos, L.S., O’Hagan, A., 2009. Diagnostics for Gaussian Process Emulators. Technometrics 51, 425–438. https://doi.org/10.1198/TECH.2009.08019. Beaumont, M.A., Zhang, W., Balding, D.J., 2002. Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035. Beven, K., 2008. Environmental Modelling: An Uncertain Future? CRC Press. Beven, K., 2007. Towards integrated environmental models of everywhere: uncertainty, data and modelling as a learning process. Hydrol. Earth Syst. Sci. 11, 460–467. https://doi.org/10.5194/hess-11-460-2007. Beven, K.J., Smith, P.J., Freer, J.E., 2008. So just why would a modeller choose to be incoherent? J. Hydrol. 354, 15–32. https://doi.org/10.1016/j.jhydrol.2008.02.007. Bogner, K., Pappenberger, F., Cloke, H.L., 2012. Technical Note: the normal quantile transformation and its application in a flood forecasting system. Hydrol. Earth Syst. Sci. 16, 1085–1094. https://doi.org/10.5194/hess-16-1085-2012. Bradbury, K.R., Rothschild, E.R., 1985. A computerized technique for estimating the hydraulic conductivity of aquifers from specific capacity data. Ground Water 23, 240–246. https://doi.org/10.1111/j.1745-6584.1985.tb02798.x. Burrows, W., Doherty, J., 2015. Efficient Calibration/Uncertainty Analysis Using Paired Complex/Surrogate Models. Groundwater 53, 531–541. https://doi.org/10.1111/ gwat.12257. Carnell, R., 2016. Latin Hypercube Samples: Provides a number of methods for creating and augmenting Latin Hypercube Samples. R package version 0.14. http://lhs.rforge.r-project.org/. Castilla-Rho, J.C., Mariethoz, G., Rojas, R., Andersen, M.S., Kelly, B.F.J., 2015. An agentbased platform for simulating complex human–aquifer interactions in managed groundwater systems. Environ. Model. Softw. 73, 305–323. https://doi.org/10.1016/ j.envsoft.2015.08.018. Crosbie, R., Raiber, M., Cui, T., Viney, N., 2015. Blending field observations and AWRA outputs to estimate groundwater recharge in the Clarence-Moreton basin, eastern Australia. In: 21st International Congress on Modelling and Simulation, Gold Coast, pp. 2033–2039. Csilléry, K., Blum, M.G.B., Gaggiotti, O.E., François, O., 2010. Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418. https://doi.org/10. 1016/j.tree.2010.04.001. Cui, T., Moore, C., Raiber, M., 2018a. Probabilistic assessment of the impact of coal seam gas development on groundwater: Surat Basin. Hydrogeol. J. Australia 10.1007/ s10040-018-1786-2. Cui, T., Peeters, L., Rassam, D., Raiber, M., Crosbie, R., Gilfedder, M., Pickett, T., Hartcher, M., Marvanek, S., Bruce, C., Davies, P., 2017. Groundwater numerical modelling for the Clarence-Moreton bioregion, Product 2.6.2 from the ClarenceMoreton Bioregional Assessment. Department of the Environment and Energy, Bureau of Meteorology, CSIRO and Geoscience Australia, Australia. Cui, T., Raiber, M., Pagendam, D., Gilfedder, M., Rassam, D., 2018b. Response of groundwater level and surface-water/groundwater interaction to climate variability: Clarence-Moreton Basin, Australia. Hydrogeol. J. 26, 593–614. https://doi.org/10. 1007/s10040-017-1653-6. de Boer-Euser, T., Bouaziz, L., De Niel, J., Brauer, C., Dewals, B., Drogue, G., Fenicia, F., Grelier, B., Nossent, J., Pereira, F., Savenije, H., Thirel, G., Willems, P., 2017. Looking beyond general metrics for model comparison - lessons from an international model intercomparison study. Hydrol. Earth Syst. Sci. 21, 423–440. https://doi.org/10. 5194/hess-21-423-2017. Diggle, P.J., Gratton, R.J., 1984. Monte Carlo Methods of Inference for Implicit Statistical Models. J. R. Stat. Soc. Ser. B 46, 193–227. https://doi.org/10.1079/IVPt200454IN. Doherty, J., 2015. Calibration and uncertainty analysis for complex environmental models, 1st ed. Watermark Numerical Computing, Brisbane, Australia. Doherty, J., Christensen, S., 2011. Use of paired simple and complex models to reduce predictive bias and quantify uncertainty. Water Resour. Res. 47, 1–21. https://doi. org/10.1029/2011WR010763. Doherty, J.E., 2011. Modeling: Picture perfect or abstract art? Ground Water 49, 455. https://doi.org/10.1111/j.1745-6584.2011.00812.x. Gorelick, S.M., Zheng, C., 2015. Global change and the groundwater management challenge. Water Resour. Res. 51, 3031–3051. https://doi.org/10.1002/2014WR016825. Gorissen, D., Couckuyt, I., Demeester, P., Dhaene, T., Crombecq, K., 2010. A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design. J. Mach. Learn. Res. 11, 2051–2055. Gramacy, R.B., 2016. laGP : Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R. J. Stat. Softw. 72. https://doi.org/10.18637/jss.v072.i01. Gramacy, R.B., Apley, D.W., 2015. Local Gaussian process approximation for large computer experiments. J. Comput. Graph. Stat. 24, 561–578. https://doi.org/10. 1080/10618600.2014.914442. Gupta, H.V., Clark, M.P., Vrugt, J.a., Abramowitz, G., Ye, M., 2012. Towards a comprehensive assessment of model structural adequacy. Water Resour. Res. 48, 1–16. https://doi.org/10.1029/2011WR011044. Herckenrath, D., Doherty, J.E., Panday, S., 2015. Incorporating the effect of gas in modelling the impact of CBM extraction on regional groundwater systems. J. Hydrol. 523, 587–601. https://doi.org/10.1016/j.jhydrol.2015.02.012. Hill, M.C., Tiedeman, C.R., 2007. Effective Groundwater Model Calibration. John Wiley & Sons Inc Hoboken, NJ, USA. https://doi.org/10.1002/0470041080. Hooten, M.B., Leeds, W.B., Fiechter, J., Wikle, C.K., 2011. Assessing First-Order Emulator

206

Journal of Hydrology 564 (2018) 191–207

T. Cui et al.

Storlie, C.B., Swiler, L.P., Helton, J.C., Sallaberry, C.J., 2009. Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliab. Eng. Syst. Saf. 94, 1735–1763. https://doi.org/ 10.1016/j.ress.2009.05.007. Strong, M., Oakley, J.E., Brennan, A., 2014. Estimating multiparameter partial expected value of perfect information from a probabilistic sensitivity analysis sample: a nonparametric regression approach. Med. Decis. Making 34, 311–326. https://doi.org/ 10.1177/0272989X13505910. Tavakoli, R., Yoon, H., Delshad, M., ElSheikh, A.H., Wheeler, M.F., Arnold, B.W., 2013. Comparison of ensemble filtering algorithms and null-space Monte Carlo for parameter estimation and uncertainty quantification using CO2 sequestration data. Water Resour. Res. 49, 8108–8127. https://doi.org/10.1002/2013WR013959. Turner, B.M., Van Zandt, T., 2012. A tutorial on approximate Bayesian computation. J. Math. Psychol. 56, 69–85. https://doi.org/10.1016/j.jmp.2012.02.005. Vaze, J., Viney, N., Stenson, M., Renzullo, L., Dijk, a Van, Dutta, D., Crosbie, R., Lerat, J., Penton, D., Vleeshouwer, J., Peeters, L., Teng, J., Kim, S., Hughes, J., Dawes, W., Zhang, Y., Leighton, B., Joehnk, K., Yang, A., Wang, B., Frost, A., Elmahdi, A., Smith, A., Daamen, C., 2013. The Australian Water Resource Assessment Modelling System (AWRA). In: 20th International Congress on Modelling and Simulation, Adelaide, Australia, pp. 1–6. Vrugt, J.A., Sadegh, M., 2013. Toward diagnostic model calibration and evaluation: Approximate Bayesian computation. Water Resour. Res. 49, 4335–4345. https://doi. org/10.1002/wrcr.20354. Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Robinson, B.A., Hyman, J.M., Higdon, D., 2009a. Accelerating Markov Chain Monte Carlo Simulation by Differential Evolution with Self-Adaptive Randomized Subspace Sampling. Int. J. Nonlinear Sci. Numer. Simul. 10. https://doi.org/10.1515/IJNSNS.2009.10.3.273. Vrugt, J.A., ter Braak, C.J.F., Gupta, H.V., Robinson, B.A., 2009b. Equifinality of formal (DREAM) and informal (GLUE) Bayesian approaches in hydrologic modeling? Stoch. Environ. Res. Risk Assess. 23, 1011–1026. https://doi.org/10.1007/s00477-0080274-y. Wu, B., Zheng, Y., Tian, Y., Wu, X., Yao, Y., Han, F., Liu, J., Zheng, C., 2014. Systematic assessment of the uncertainty in integrated surface water-groundwater modeling based on the probabilistic collocation method. Water Resour. Res. 50 (7), 5848–5865. https://doi.org/10.1002/2014WR015366. Wu, B., Zheng, Y., Wu, X., Tian, Y., Han, F., Liu, J., Zheng, C., 2015. Optimizing water resources management in large river basins with integrated surface water-groundwater modeling: A surrogate-based approach. Water Resour. Res. 51 (4), 2153–2173. https://doi.org/10.1002/2014wr016653. Xu, T., Valocchi, A.J., Ye, M., Liang, F., 2017. Quantifying model structural error: Efficient Bayesian calibration of a regional groundwater flow model using surrogates and a data-driven error model. Water Resour. Res. 5375–5377. https://doi.org/10. 1002/2016WR019831. Yan, S., Minsker, B., 2006. Optimal groundwater remediation design using an Adaptive Neural Network Genetic Algorithm. Water Resour. Res. 42. https://doi.org/10.1029/ 2005WR004303.

Rasmussen, C.E., Williams, C.K.I., 2006. Gaussian processes for machine learning. MIT Press. Rassam, D., Beringen, H., Raiber, M., Cui, T., Gilfedder, M., Schmidt, R., Post, D., Henderson, B., Lewis, S., 2017. Assessing impacts of coal resource development on water resources in the Clarence-Moreton bioregion: key findings. Product 5: Outcome synthesis from the Clarence‑Moreton Bioregional Assessment. Department of the Environment and Energy, Bureau of Meteorology, CSIRO and Geoscience Australia, Australia. Rassam, D., Raiber, M., McJannet, D., Janardhanan, S., Murray, J., Gilfedder, M., Cui, T., Matveev, V., Doody, T., Hodgen, M., Ahmad, M. (Eds.), 2014. Context statement for the Clarence-Moreton bioregion. Product 1.1 from the Clarence-Moreton Bioregional Assessment. Department of the Environment, Bureau of Meteorology. Geoscience Australia, Australia. Razavi, S., Tolson, B.a., Burn, D.H., 2012. Review of surrogate modeling in water resources. Water Resour. Res. 48, 1–32. https://doi.org/10.1029/2011WR011527. Refsgaard, J.C., van der Sluijs, J.P., Højberg, A.L., Vanrolleghem, P., a.,, 2007. Uncertainty in the environmental modelling process – a framework and guidance. Environ. Model. Softw. 22, 1543–1556. https://doi.org/10.1016/j.envsoft.2007.02. 004. Robinson, T.D., Eldred, M.S., Willcox, K.E., Haimes, R., 2008. Surrogate-Based Optimization Using Multifidelity Models with Variable Parameterization and Corrected Space Mapping. AIAA J. 46, 2814–2822. https://doi.org/10.2514/1. 36043. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P., 1989. Design and Analysis of Computer Experiments. Stat. Sci 10.1214/ss/1177012413. Sadegh, M., Vrugt, J.A., 2014. Approximate Bayesian Computation using Markov Chain Monte Carlo simulation: DREAM (ABC). Water Resour. Res. 50, 6767–6787. https:// doi.org/10.1002/2014WR015386. Santner, T.J., Williams, B.J., Notz, W.I., 2003. The design and analysis of computer experiments. Springer Science & Business Media. Sepúlveda, N., Doherty, J., 2015. Uncertainty Analysis of a Groundwater Flow Model in East-Central Florida. Groundwater 53, 464–474. https://doi.org/10.1111/gwat. 12232. Shields, M.D., Teferra, K., Hapij, A., Daddazio, R.P., 2015. Refined Stratified Sampling for efficient Monte Carlo based uncertainty quantification. Reliab. Eng. Syst. Saf. 142, 310–325. https://doi.org/10.1016/j.ress.2015.05.023. Sreekanth, J., Cui, T., Pickett, T., Rassam, D., Gilfedder, M., Barrett, D., 2018. Probabilistic modelling and uncertainty analysis of flux and water balance changes in a regional aquifer system due to coal seam gas development. Sci. Total Environ. 634, 1246–1258. https://doi.org/10.1016/j.scitotenv.2018.04.123. Stanfill, B., Mielenz, H., Clifford, D., Thorburn, P., 2015. Simple approach to emulating complex computer models for global sensitivity analysis. Environ. Model. Softw. 74, 140–155. https://doi.org/10.1016/j.envsoft.2015.09.011. Stedinger, J.R., Vogel, R.M., Lee, S.U., Batchelder, R., 2008. Appraisal of the generalized likelihood uncertainty estimation (GLUE) method. Water Resour. Res. 44. https:// doi.org/10.1029/2008WR006822.

207