Journal of Hydrology 557 (2018) 826–837
Contents lists available at ScienceDirect
Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol
Research papers
A surrogate-based sensitivity quantification and Bayesian inversion of a regional groundwater flow model Mingjie Chen ⇑, Azizallah Izady, Osman A. Abdalla, Mansoor Amerjeed Water Research Center, Sultan Qaboos University, Muscat, Oman
a r t i c l e
i n f o
Article history: Received 16 November 2017 Accepted 30 December 2017 Available online 2 January 2018 Keywords: MODFLOW Bayesian Inverse modeling Surrogate Bagging MARS
a b s t r a c t Bayesian inference using Markov Chain Monte Carlo (MCMC) provides an explicit framework for stochastic calibration of hydrogeologic models accounting for uncertainties; however, the MCMC sampling entails a large number of model calls, and could easily become computationally unwieldy if the highfidelity hydrogeologic model simulation is time consuming. This study proposes a surrogate-based Bayesian framework to address this notorious issue, and illustrates the methodology by inverse modeling a regional MODFLOW model. The high-fidelity groundwater model is approximated by a fast statistical model using Bagging Multivariate Adaptive Regression Spline (BMARS) algorithm, and hence the MCMC sampling can be efficiently performed. In this study, the MODFLOW model is developed to simulate the groundwater flow in an arid region of Oman consisting of mountain-coast aquifers, and used to run representative simulations to generate training dataset for BMARS model construction. A BMARSbased Sobol’ method is also employed to efficiently calculate input parameter sensitivities, which are used to evaluate and rank their importance for the groundwater flow model system. According to sensitivity analysis, insensitive parameters are screened out of Bayesian inversion of the MODFLOW model, further saving computing efforts. The posterior probability distribution of input parameters is efficiently inferred from the prescribed prior distribution using observed head data, demonstrating that the presented BMARS-based Bayesian framework is an efficient tool to reduce parameter uncertainties of a groundwater system. Ó 2018 Elsevier B.V. All rights reserved.
1. Introduction Numerical simulation of a groundwater flow system requires knowledge of hydrologic characteristics of the aquifers, such as hydraulic conductivity and specific storage of hydrogeologic units. Complete knowledge of these properties are usually very hard to obtain in the subsurface, and the incurred uncertainty will reduce the predictive capability of the groundwater model and impair the reliability of groundwater resources management based on model predictions. To improve the model accuracy and reduce uncertainties, many inverse modeling methods have been developed to estimate hydrologic parameters by matching modeled results to corresponding field measurements (e.g., Carrera, 1988; Dai and Samper, 2004; Carrera et al., 2005; Hill and Tiedeman, 2007; Tonkin and Doherty, 2009; Sadeghi-Tabas et al., 2016; Shang et al., 2016). Whereas, applications of these methods are usually limited by considerable number of forward model simulations ⇑ Corresponding author at: Sultan Qaboos University, P.O. Box 17, Postal Code 123, Al-Khoud, Oman. E-mail addresses:
[email protected],
[email protected] (M. Chen). https://doi.org/10.1016/j.jhydrol.2017.12.071 0022-1694/Ó 2018 Elsevier B.V. All rights reserved.
required during inverse modeling (Wagner, 1995; Jordan et al., 2015). Owing to the insufficient knowledge of the aquifers, considerable uncertainties might be introduced into groundwater modeling systems yielding ill-posed inverse problems with non-unique solutions (de Marsily et al., 1999). To cope with this issue, Bayesian inference has been studied and applied extensively as an effective inverse framework yielding a parameter posterior probability distribution instead of a deterministic solution from its prior distribution (Oliver et al., 1997; Kavetski et al., 2006; Fu and GomezHernandez, 2009; Tarantola, 2004). The analytical solutions of posterior distribution are usually very difficult to obtain in Bayesian inversion (Khaleghi et al., 2013). Instead, Markov Chain Monte Carlo (MCMC) methods provide a practical tool to draw samples from the specified target distribution to numerically approximate posterior distributions for groundwater model parameters (Hassan et al., 2009; Efendiev et al., 2011; Laloy et al., 2013). However, when dealing with large-scale high-fidelity groundwater models, MCMC methods may become computationally burdensome because of the considerable number of uncertain parameters to be estimated and many times of repeated forward model calls
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
for posterior distribution to reach convergence (Smith and Marshall, 2008). There are primarily four promising ways to address this issue include (1) parallel calculation and grid computing (e.g., Kerrou et al., 2010), (2) improvement of MCMC sampling algorithms (e.g., Vrugt et al., 2009; Rajabi and Ataie-Ashtiani, 2016), (3) reduction of inverse model dimension, that is, excluding insensitive physical properties and only keep key parameters in Bayesian inversion (e.g., Kerrou and Renard, 2010; Chen et al., 2014), and (4) development of surrogate statistic model to substitute high-fidelity physical model in MCMC sampling (e.g., Zeng et al., 2012). In this study, we will combine the latter two approaches to develop a surrogate-based Bayesian framework, in which the primary behavior of a regional groundwater flow model is replicated with a cheaper surrogate model using fewer but sensitive parameters for application in MCMC sampling. Surrogate models have been developed to approximate physical models using a variety of statistical algorithms, such as kriging, radial basis functions (RBFs), polynomials, support vector machines (SVMs), artificial neural networks (ANNs), sparse grid interpolation (e.g., Simpson and Mistree, 2001; Regis and Shoemaker, 2007; Fen et al., 2009; Zhang et al., 2009; Behzadian et al., 2009; Zeng et al., 2012). Razavi et al. (2012) reviewed 48 applications of surrogate models in hydrology covering most of these approximation techniques. Chen et al. (2013, 2015) introduced multivariate adaptive regression spline (MARS) (Friedman, 1991) into a simulation-optimization methodology for underground gas and geothermal reservoir modeling. Dai et al. (2014) and Keating et al. (2016) further applied MARS surrogate model to a risk assessment of carbon geological storage. MARS algorithm, which adaptively develops local models in local regions for flexible regression modeling of high dimensional data, shows super performance in handling high-dimensional data, and its stability can be enhanced by bootstrap aggregating, namely Bagging (Bühlmann and Yu, 2002). Bagging is a kind of smoothing operation which could be used to improve MARS model prediction (Bühlmann, 2003). In our recent study, Bagging MARS (BMARS) surrogate model was coupled with an optimizer to numerically calibrate a complicated MODFLOW for simulation of groundwater flow in mountain hardrock and unconfined alluvial aquifers in northwest Oman (Chen et al., 2017). In this study, we extend from deterministic optimization to integrate BMARS modeling approach to MCMC sampling process to achieve an efficient Bayesian inversion of a regional-scale groundwater flow model. Sensitivity measures input parameters’ relative contributions to the uncertainty of an output. Sensitivity Analysis (SA) is very useful in model parameter importance ranking, uncertainty quantification and management (Wei et al., 2013). Local SA methods have been widely used in groundwater modeling studies (e.g., Sanz and Voss, 2006). They had been implemented in standard calibration softwares UCODE (Poeter and Hill, 1999) and PEST (Doherty, 2005) and become readily accessible for users. Compared to traditional local SA, the emerging global SA methods take into account the entire uncertain range of input parameters, as well as their interactions between them (Oladyshkin et al., 2012). Since input parameters (e.g., hydraulic conductivity) of groundwater models are usually varying across several orders of magnitude, the global SA is best suitable and essential for comprehensive sensitivity evaluations. However, demanding computational cost has prevented the global SA from widespread use as local SA methods. To tackle this challenge, Polynomial Chaos Expansions (PCEs) have been extensively studied to efficiently calculate the global sensitivity indices (e.g., Sudret, 2008; Sochala and Le Maître, 2013; Rajabi et al., 2015). In this study, the developed BMARS model is introduced into Sobol’ method (Sobol’, 1993) for highly efficient global SA of MODFLOW model parameters. According to importance ranking, the insensitive parameters are thus identified and
827
excluded from Bayesian inversion. Without losing key physical characteristics of the original model, the dimension of inverse modeling is reduced and computing efforts are focused on the most important parameters. The presentation of the study is organized as follows. In Section 2, BMARS based Bayesian inverse modeling framework methodology is illustrated and its major components including physical MODFLOW model, BMARS model, Bayesian inference, Sobol’ sensitivity are described. Section 3 gives details on the MODFLOW model for the Al-Fara area in Oman. Results and subsequent analysis are presented in Section 4, and concluding remarks are drawn in Section 5.
2. Methodology The purpose of this study is to develop a surrogate-based Bayesian inversion of a MODFLOW model system using BMARS algorithm, entailing high-dimensional parameter space sampling, groundwater flow model development, BMARS model construction, Sobol’ sensitivity analysis, and Bayesian inference using BMARS model. In this section, the methodology framework is illustrated firstly, followed by detailed description of core components.
2.1. BMARS-based Bayesian inversion framework The proposed inversion framework proceeds as follows (Fig. 1): 1. M input parameters are chosen with uncertainties defined by their distributions and ranges, which are representative of prior knowledge of the groundwater flow system. In this study, the parameters are hydrogeologic characteristics of the aquifers, including hydraulic conductivity, specific storage and specific yield. 2. N input M-vectors are sampled from their prior probability density functions (PDF) using Latin-Hypercube (L-H) method (McKay et al., 1979). These N realizations are used to generate input files to run N MODFLOW model simulations and their corresponding responses (in our case below, heads at observation wells) are obtained. 3. The N input vector-response pairs (shaded in Fig. 1) are used to train and cross-validate BMARS surrogate model. Leave-one-out cross validation (LOOCV) approach is used for surrogate model validation (Picard and Cook, 1984). 4. The importance of the input parameters of the MODFLOW model is ranked according to Sobol’ total order sensitivity indices (Sobol’ 1993), which is efficiently computed using cheap BMARS model. Only top sensitive parameters are considered influential to the hydrogeologic system and included in Bayesian inversion. 5. The BMARS model is utilized in place of the MODFLOW model within a Bayesian inversion scheme, from which posterior distributions of the sensitive parameters are inferred by comparisons with observed head data. The posterior PDFs represent a subset of the prior data that yield indeterminate solutions with highest probabilities. The BMARS-driven Bayesian inference framework is programmed in Python script by coupling MODFLOW model with various algorithms mentioned above, i.e., L-H sampling, BMARS approximation, LOOCV, Sobol’ method, and Bayesian inference. The suite of MODFLOW simulations are distributed to and executed on high performance computing facility at Sultan Qaboos University in Oman.
828
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
Fig. 1. Schematic diagram of the BMARS-based Sobol’ sensitivity analysis and Bayesian inversion framework. The gray-shaded part shows that surrogate training data consists of N inputs-response pairs generated from N high-fidelity MODFLOW model simulations. In the case study, M = 15, N = 1000, K = 8.
2.2. MODFLOW model In MODFLOW, the groundwater mass balance equation governing transient groundwater flow is written as (Bear, 1972)
@ @h @ @h @ @h @h þ þ ¼ Ss Kx Ky Kz W; @x @x @y @y @z @z @t
ð1Þ
where t is the time (T), W is the sink/source of water (1/T), h is the hydraulic head (L), and Kx, Ky, Kz is the hydraulic conductivity (L/T) in x, y, z directions respectively. Specific storage Ss = Sy /b (1/L), and Sy is the specific yield () and b is the aquifer thickness (L). According to Eq. (1), K, Ss, and Sy are important parameters to hydraulic dynamics and hence initially treated as uncertain parameters to be inferred (Fig. 1). The MODFLOW-NWT version is chosen in this study because it is packed with Newton solver and upstreamweighting module, and superior to deal with drying-rewetting process of the unconfined aquifer flow problem in the test case (Niswonger et al., 2011).
indices) and their interactions (2nd and higher-order indices). The Sobol’ sensitivity indices (SI) can be calculated as follows (Sobol’, 2001)
SIi ¼ Var½Eðyjxi Þ=VarðyÞ;
ð4Þ
SIij ¼ fVar½Eðyjxi ; xj Þ Var½Eðyjxi Þ Var½Eðyjxj Þg=VarðyÞ;
ð5Þ
X XX SIij þ SIijk þ ; SITi ¼ SIi þ
ð6Þ
j–i
j–i k>j
where SIi , SIij , SIijk , and SITi are the 1st, 2nd, 3rd, and total order indices of input parameter xi respectively. Sobol’ total order sensitivity measures the contribution of each input parameter to the uncertainty of output y, including all variances caused by its interactions with any other parameters in all the orders. Traditional Sobol’ method calculates output response y using physical model f ðxÞ, and usually is very time-consuming. In our surrogate-based Sobol’ method, BMARS model ^f BMARS is used in place of MODFLOW model f ðxÞ to achieve fast estimation of Sobol’ total order sensitivity indices.
2.3. Bagging multivariate adaptive regression spline (BMARS) 2.5. BMARS-based Bayesian inversion A MARS model can be expressed as
^f ðxÞ ¼
Ji k X Y ci ½Sji ðxv ðj;iÞ t ji Þ; i¼1
ð2Þ
j¼1
where, x is the M-dimensional input parameter vector. k and ci are the number and coefficients of associated basis functions. Ji is the interaction order of basis function, i.e., the number of input parameters included in the basis function, Sji ¼ 1 is the sign indicators, v (j,i) is the index of the parameter x which is split on knots tji. Detailed description of MARS can be found in Friedman (1991). BMARS model is the bootstrap expectation of MARS models, and usually approximated by Monte Carlo method below (Breiman, 1996): B X ^f BMARS 1 ^f ðxÞ; b B b¼1
ð3Þ
where ^f b ðxÞ is the MARS model, and B is defined as the instantiation number of MARS model for Monte Carlo approximation. 2.4. Sobol’ sensitivity indices The Sobol’ method is a variance-based global sensitivity calculation approach. The variance of response y is decomposed into fractions which are attributed to each input parameter (1st order
Bayesian inversion aims to reduce parameter uncertainty by integrating prior information (in form of prior PDF) and the observations through the inference of the posterior PDF. According to Bayes theorem, posterior distribution pðxjyÞ can be expressed as:
pðxjyÞ ¼ pðyjxÞpðxÞ=pðyÞ;
ð7Þ
where x is the parameter vector to be inversed, y is the observations, pðxÞ is the parameter prior probability, pðyÞ is the marginal distribution and usually a normalized constant, and pðyjxÞ is the likelihood function, which measure the fitness between predictions and observations. The errors between simulated and observed head is calculated as
e ¼ y f ðxÞ;
ð8Þ
where f ðxÞ is the predictions from groundwater flow models. The errors e is usually assumed to follow multivariate normal distribution with covariance matrix C, resulting in Gaussian likelihood function defined as (Zeng et al., 2012):
pðyjxÞ ¼ expðeT C1 e=2Þ=½ð2pÞn=2 jCj1=2 ;
ð9Þ
where jCj is the determinant of C, and n is the number of observations. In this study, BMARS surrogate model ^f BMARS is used as an approximation of physical MODFLOW model f ðxÞ to evaluatepðyjxÞ. As a result, the posterior probability distribution pðxjyÞ can be
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
calculated without executing expensive MODFLOW models during MCMC sampling, and thus significantly improves Bayesian inversion efficiency. 3. Al-Fara regional groundwater flow model To illustrate the proposed framework methodology, a hydrogeologic system consisting of mountain hardrock and coastal alluvium in Al-Fara area of Oman is chosen as the test case due to available data and knowledge from the extensive studies of this region (Amerjeed, 2016). 3.1. Study area The study area, wadi Al Fara catchment, covers a 1172 km2 area extending from 522,700 to 570,700 E in longitude and from
829
2,559,400 to 2,633,900 N in latitude in north Oman. As shown in Fig. 2a, it originates from the northwest slopes of the mountain towards northern coastal plain for about 70 km to the sea. The average winter temperatures range between 20 and 28 °C in the study area, while it may reach over 40 °C in summer. The average annual rainfall is between 73 and 270 mm/year from coastal to mountain zone, while the potential evapotranspiration may reach 2300 mm/year in this area. Ground surface elevations drop rapidly from a maximum of 3000 m at the southern mountain to 100 m at the base of the piedmont, and then to sea-level at the coast (Fig. 2b). The mountainous zone generally lacks vegetation with little soil coverage. Wadi beds and their banks are inhabited with limited agriculture developed on terraces irrigated by Aflaj originating from springs. The coastal plain is made up of relatively uniform alluvial sediments, and is the most extensive agricultural regions irrigated by groundwater.
Fig. 2. Study area Al-Fara: (a) Location in north Oman; (b) Vertical cross-section A-A0 . Adapted from Amerjeed (2016). The unit index and properties for the 6 materials are shown in Table 1.
830
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
vium, Upper Fars, and Lower Fars. Hajar Super Group comprises carbonate rocks and dolomite, siltstones and sandstones. It is highly heterogeneous, fractured and karstified. As the upper mountain zone, its surface covers 340 km2 and receives most of the rainfall in the target area. Ophiolite is characterized as fracture aquifer covered by Wadi Alluvium in valleys. This middle valley zone covers about 350 km2 of area. The Plain Alluvium is deposited as varicoloured sand interbedded with cemented and silty gravels. It covers the total coastal plain, an area of 480 km2, and serves as the most important aquifer in the study area. Beneath the Plain Alluvium is the Upper Fars, which comprised dolomites, limestone with interbeds of thin siltstones towards the base, and it is the second important aquifer of Al-Fara area. The underlying Lower Fars produces little water and mainly serves as the impermeable bedrock. It is characterized by mudstone, claystone, variably cemented gravels, and calcareous shale. 3.3. Numerical model
Fig. 3. Boreholes used to construct stratigraphy of the study area.
3.2. Geologic model Digital elevation model obtained from NASA Shuttle Radar Topography Mission and stratigraphy data from 57 borehole logs in the target area (Geo-Resources Consultants, 2006) are used to build the 3D geological model (Fig. 3). As shown in Fig. 2b, the geological structure consists of six primary hydrostratigraphic units, that is, Hajar Super Group, Ophiolite, Wadi Alluvium, Plain Allu-
Based upon the geological model, the model domain is discretized into 150 rows, 97 columns and 6 layers in MODFLOW. Every layer consists of 14,550 cells whose area is 500 m by 500 m each. Cells that fall out of the model boundary are flagged as ‘‘inactive” in model simulations and removed from Fig. 4. The model is bounded by groundwater divide along the mountain ridge as the no-flow southern boundary, and the coast of the Oman Sea as the northern boundary with specified zero constant head. Previous studies show that the general groundwater flow is from the south highland to the north coast plain, hence eastern and western boundaries are assumed parallel with groundwater flow direction and no-flow. The bottom layer is treated no flow boundary as impermeable bedrock. Time-variant recharge estimated from rainfall, evapotranspiration, abstraction is specified on the top layer. As shown in Fig. 5, Hajor Super Group, Ophiolite, Wadi Alluvium and Plain Alluvium are distributed from southern mountain area to the coast plain on the top layer. Head measurements are obtained from the 15 observations wells located in the target area as green solid circles shown in Fig. 5. Alluvium material, including both Wadi and Plain Alluvium units, disappear on the underlying layers. Impermeable Low Fars occurs from Layer 3, indicating the Layer 1 and 2 are the main flow path for groundwater from mountains to the coast (Fig. 6). Owing to the data availability, the model simulates the groundwater flow in the target area over 20 years from January 1993 to September 2013, with 248 monthly stress periods totally. The initial head data for the model are obtained from the measure-
Fig. 4. Three-dimensional MODFLOW grid of the study area. The unit index and properties for the 6 materials are shown in Table 1.
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
831
tions, along with the corresponding input sample values, form the training dataset for BMARS model development, as illustrated as gray area of Fig. 1. 4. Results and discussions 4.1. BMARS model development During the bagging process, 100 MARS models (B = 100 in Eq. (3)) are instantiated to approximate the BMARS model, and each MARS model is developed by 100 basis functions and 8 variables (k = 100, Ji = 8 in Eq. (2)). A well-fitted BMARS model does not necessarily mean good predictive ability due to over-fitting, and model validation is required for quality control. The cross-validation of BMARS models in this study is conducted by LOOCV method, which can use the existing training dataset without requiring additional test dataset (Picard and Cook, 1984). The quality of the BMARS models is examined by scatter plots comparing true data from MODFLOW model and predicted data by BMARS model. Fig. 7 gives an example for head output at well RQ-1 (see Fig. 5 for location) at the last simulation stress to examine how BMARS model approximates MODFLOW model. The 1000 points associated with predicted head values from BMARS are plotted against ‘‘true” data from 1000 MODFLOW simulations. The well-fitted BMARS model (fitting R2 = 0.9954) show excellent predictive performance (validation R2 = 0.9976) during model cross-validation. The scaled Root Mean Square Errors (RMSE) for fitting and validating BMARS model are 0.01298 and 0.02165 respectively, suggesting a remarkable goodness of fit of surrogate with physical models. Standard deviations for each head prediction by BMARS model are also calculated based on the associated 100 instantiations of MARS models. In Fig. 7b, only those heads whose true value (from MODFLOW) falls outside the range (+/ one standard deviation) of the predicted values (from BMARS) are plotted with error bar for a clear display. As a result, none of those error bars intersect with the diagonal in the Fig. 7b. Fig. 5. Hydrogeologic unit distribution, observation wells, and initial head contour on the top layer (Layer 1) of the Al-Fara MODFLOW model.
ments in January 1993 using empirical Bayesian kriging interpolation method (Izady et al., 2017). 3.4. MODFLOW simulations In the proposed method, BMARS models are constructed using a training dataset, which is composed of sample inputs and associated responses from MODFLOW simulations. Considering parameter uncertainties, the scenarios of these simulations should represent various conditions of the physical system as complete as possible. In this case, the uncertainties are represented by range and distribution of key properties of five hydrogeologic units, that is, hydraulic conductivity, specific storage, and specific yield, resulting in 15 uncertain parameters totally. Note that the 6th unit Lower Fars is considered impermeable bedrock and its properties are given fixed values. As presented in Table 1, the ranges of the 15 parameters account for our best knowledge about Al-Fara groundwater flow system and expected to be improved by numerical method of Bayesian inversion using monthly head measurements from 15 observation wells. Following the procedure in Fig. 1, the 15 uncertain parameters as a vector are sampled 1000 times within their ranges using L-H method, yielding 1000 MODFLOW models. The log-transformed hydraulic conductivity and the other two properties are assumed to follow uniform random distribution in their ranges. The outputs of the 1000 model simula-
4.2. Sensitivity analysis As discussed in Introduction section, sensitivity analysis provides a useful tool to identify the key properties that control the groundwater flow system. The purpose of sensitivity analysis in this study is to (1) exclude insensitive parameters from Bayesian inversion, and (2) evaluate hydrogeologic system spatiotemporally. Thanks to the efficient BMARS-based Sobol’ method, the total order Sensitivity Indices (SI) of head outputs at 15 well locations and 20 yearly time points to 15 input parameters, i.e., total 15 2015 = 4500 SIs, are conveniently computed. Preliminary investigation suggests that these SIs are variant spatially and temporally for each input parameter. It makes sense since these parameters are associated with spatially distributed hydrogeologic units whose properties may mostly affect the hydraulic heads at wells located within or near them. This spatiotemporal variability of SIs raises a challenge on how to evaluate overall importance of a parameter for screen purpose. The spatially maximal SI of each parameter at 15 different well locations is sorted out and summarized in Table 2 for each of 20 yearly times. The occurrence of SI >0.2, which is considered sensitive, are counted for each parameter over the 20 continuous years and shown in the 2nd row of Table 2. Obviously, hydraulic conductivity of unit 1, 2, 4, and 5 (HK_1, HK_2, HK_4, HK_5) are highly sensitive based on the fact that spatial maximal SIs of all 20 yearly times are greater than 0.2. These four units are either primary aquifers or flow pathways of the groundwater flow in Al-Fara area. Although only at two times are
832
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
Fig. 6. Hydrogeologic unit distribution map on the Layer 2–5 of the Al-Fara MODFLOW model.
SI greater than 0.2 for HK_3, the hydraulic conductivity of Wadi Alluvium, this parameter is still somewhat sensitive and cannot be excluded from Bayesian inference. Wadi Alluvium overlies only a small portion of Ophiolite in the valley surface, explaining that its HK is not as important as that of other four units for the entire model system. Regarding specific storage, Ss_2 remains sensitive
throughout 20 years of simulation, while Ss_5 is so at 8 times, indicating that both of them play an important role in groundwater storage. In contrast, none of Ss_1, Ss_3, and Ss_4 seems influential at any of 20 checked times according to the zero occurrence of SI >0.2. Specific yield, on the other hand, has little effect for all the hydrogeologic units except Plain Alluvium. There is only one SI of
833
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837 Table 1 Hydrogeological unit parameters of the Al-Fara MODFLOW model. Unit Index
Unit Name
Hydraulic conductivity (m/d) Kh
Range
Ss
Range
Sy
Range
1
Hajar Super Group
a
0.01–10
5 105
106–104
0.05
0.01–0.1
2
Ophiolite
0.016
0.001–1
0.05
0.01–0.1
Wadi Alluvium
1.139
1–100
9.52 104 5 104
105–103
3
105–103
0.1
0.02–0.2
4
Plain Alluvium
1.486
1–100
5 104
105–103
5
Upper Fars
0.1–10
0.01–0.1
Lower Fars
9.94 104 106
105–103
6
3.417 0.001
0.158 0.05
0.834
Specific storage (1/m)
Specific yield (–)
0.02–0.2
0.001
a
The 8 underlined are parameter values at peak of likelihood from Bayesian inversion. The other 7 parameters for the 1st five units are insensitive according to sensitivity analysis and assigned mean value during inversion. The parameter values for Lower Fars (as bedrock) are fixed in sensitivity analysis and Bayesian inversion.
Fig. 7. Scatter plots of 1000 head outputs at location of well RQ-1 on Sep. 1, 2013 between MODFLOW and BMARS model based on 1000 representative simulations, which measure the goodness of fit of BMARS model during (a) training, and (b) cross-validation respectively. The blue vertical lines in (b) stand for error bars of approximated heads by BMARS model.
Sy_4 greater than 0.2 at the first year of simulation, though Sy_4 is still considered sensitive to some degree. Based on the above analysis, eight input parameters, that is, HK_1, HK_2, HK_3, HK_4, HK5, Ss_2, Ss_5, and Sy_4 are treated as controlling parameters of AlFara groundwater flow system to be included in the BMARS-based Bayesian inversion discussed in the next sub-section (See Table 3.). The spatially maximal SIs for the eight sensitive parameters are visualized with the simulation time in Fig. 8. Roughly, HK_1, HK_2,
HK_4, and Ss_2 are highly sensitive as most of their SIs are greater than 0.5, and HK_3, HK_5, Ss_5, and Sy_4 are second-tier in sensitivity since SIs are below 0.4 at most times. With respect to the spatial variability, SIs for the eight sensitive parameters at 15 well locations at simulation time of year 10 and 20 are presented in Fig. 9. The wells from left to right at abscissa are sequenced from southern mountain to northern coastal plain (see Fig. 5 for well locations). As shown in Fig. 9a, SIs for HK_1 are very high at RQ1 and HZ-5, the two southmost wells at highlands, but drastically drop to near zero at subsequent wells. It clearly indicates that hydraulic conductivity of unit# 1 (Hajar Super Group) can only affect areas as far as HZ_5 and won’t reach HZ-1 and beyond. Note RQ-1 and HZ-5 are located at Ophiolite zones, but are the two closest wells to Hajar Super Group. Just the opposite, SIs for HK_4 are zero at RQ-1 and HZ-5, and rise to 0.5 at HZ-1, and finally reach above 0.8 at locations of alluvium areas. This finding suggests hydraulic conductivity of Plain Alluvium won’t influence flow in highland but plays a dominant role in the flow system in plain area. HK_2 and Ss_2, hydraulic conductivity and specific storage of Ophiolite, follow the similar spatial pattern in terms of SI. Both parameters affect area including RQ-1, HZ-5, HZ-1, JT-17, DP-5, and JT-14, with highest SI at HZ-5. This result is reasonable considering the relative close locations of these wells to Ophiolite zone. HK_30 s role is very limited in the entire flow system, and only shows minor influence at HZ-5. It is mostly due to the small volume portion of Wadi Alluvium. HK_5 and Ss_5 have little influence at the southmost three wells, and show limited role at other wells northwards. It can be explained by spatial distribution of Upper Fars zone, which is located in northern part of domain. It is noted that HK_5 becomes more important in areas near coast as its SIs keep rising when approaching coast. Vertical cross-section (Fig. 2b) of the model domain shows that the thickness of Upper Fars keeps increasing towards coast, and occupies most area of the northern boundary. This boundary is a constant head boundary allowing inflow and outflow, which could be partially controlled by hydrogeologic properties of Upper Fars. As for specific yield, only Sy_4 is somewhat sensitive, because Plain Alluvium is the primary unconfined aquifer in Al-Fara area, and specific yield, instead of specific storage, takes the leading role in such hydrogeologic conditions according to flow equations of MODFLOW. The spatial SIs at year 20 shows a pattern without significant difference to those at year 10, indicating temporal variability of SI is not as significant as spatial variability. As a summary for spatial variability, HK_1 controls highland area including RQ-1 and HZ-5, and HK_4 dominates the left plain zones. HK_2 and Ss_2 affect areas between mountain and plain zones, including 6 wells from RQ-1 to JT-14. On the other hand, HK_3, HK_5, Ss_5, and Sy_4 are less sensitive than the four parameters discussed above. However, HK_5 shows its innegligible effect near coast area. In other words, hydraulic conductivity of all the five units is sensitive. Storage properties, either specific storage or specific yield, are sen-
834
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837 Table 2 Spatially maximal Sobol’ total order Sensitivity Indices (SI) of head outputs to 15 parameters at 20 yearly times. The presented SI is the maximum value drawn from 15 observation well locations. The SI >0.2 (gray shaded) are considered sensitive, the number of which are summed for each parameter over 20 time points.
Year # (SI>0.2) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 a
HK_1a
HK_2
HK_3
HK_4
HK_5
Ss_1
Ss_2
Ss_3
Ss_4
Ss_5
Sy_1
Sy_2
Sy_3
Sy_4
Sy_5
20
20
2
20
20
0
20
0
0
8
0
0
0
1
0
0.33 0.47 0.55 0.58 0.65 0.65 0.66 0.71 0.73 0.75 0.79 0.76 0.84 0.78 0.79 0.80 0.81 0.78 0.79 0.72
0.87 0.84 0.86 0.85 0.83 0.82 0.92 0.82 0.78 0.81 0.68 0.76 0.72 0.65 0.65 0.65 0.68 0.64 0.63 0.68
0.11 0.12 0.09 0.11 0.07 0.11 0.20 0.08 0.05 0.07 0.08 0.09 0.09 0.34 0.04 0.10 0.04 0.04 0.15 0.26
1.00 1.01 0.95 0.95 0.92 0.91 0.91 0.89 0.89 0.91 0.89 0.88 0.88 0.87 0.87 0.86 0.86 0.86 0.85 0.85
0.22 0.26 0.29 0.30 0.31 0.32 0.32 0.35 0.34 0.34 0.35 0.36 0.36 0.36 0.37 0.38 0.38 0.38 0.38 0.40
0.00 0.02 0.01 0.01 0.00 0.02 0.03 0.01 0.03 0.02 0.05 0.00 0.02 0.19 0.00 0.01 0.02 0.01 0.01 0.03
0.60 0.43 0.38 0.41 0.53 0.57 0.50 0.71 0.71 0.70 0.86 0.91 0.92 0.76 0.68 0.70 0.72 0.67 0.64 0.82
0.03 0.00 0.01 0.00 0.01 0.02 0.01 0.00 0.01 0.08 0.02 0.00 0.12 0.12 0.01 0.08 0.02 0.01 0.04 0.06
0.11 0.07 0.04 0.05 0.04 0.08 0.03 0.04 0.02 0.15 0.01 0.02 0.01 0.04 0.01 0.01 0.01 0.03 0.01 0.03
0.69 0.46 0.34 0.30 0.24 0.30 0.19 0.18 0.19 0.24 0.29 0.18 0.18 0.18 0.17 0.18 0.17 0.17 0.17 0.17
0.02 0.01 0.11 0.00 0.00 0.10 0.11 0.02 0.00 0.00 0.07 0.00 0.06 0.09 0.01 0.00 0.01 0.06 0.01 0.01
0.02 0.03 0.01 0.01 0.04 0.01 0.01 0.01 0.00 0.00 0.02 0.04 0.01 0.06 0.02 0.01 0.03 0.07 0.01 0.03
0.00 0.01 0.04 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.06 0.00 0.02 0.00 0.02 0.01 0.05 0.05 0.05 0.01
0.31 0.17 0.11 0.10 0.08 0.15 0.08 0.06 0.06 0.13 0.12 0.05 0.04 0.09 0.04 0.10 0.04 0.04 0.08 0.03
0.01 0.01 0.00 0.01 0.01 0.01 0.00 0.00 0.01 0.01 0.16 0.01 0.02 0.09 0.01 0.02 0.06 0.00 0.01 0.00
HK_1 denotes horizontal hydraulic conductivity of Hajar Super Group (indexed as unit 1 as shown in Table 1).
Table 3 Statistics of the 8 parameter posterior distributions. Parameter
Std dev
Mean
Peaka
Remark
HK_1 HK_2 HK_3 HK_4 HK_5 Ss_2 Ss_5 Sy_4
0.219 0.183 0.296 0.103 0.173 0.156 0.126 0.211
0.482 0.478 0.466 0.106 0.667 0.814 0.851 0.752
0.640 0.406 0.028 0.086 0.767 0.951 0.994 0.765
Log-normalized Log-normalized Log-normalized Log-normalized Log-normalized Normalized Normalized Normalized
a Parameter values at peak of likelihood, which are converted back as underlined in Table 1.
Fig. 8. Spatially maximal Sobol’ total order Sensitivity Indices (SI) head outputs to 8 sensitive input parameters in 20 yearly time points (data shown in Table 2). Parameter name is made up of symbol and unit index. For example, HK_1 denotes horizontal hydraulic conductivity of Hajar Super Group (indexed as 1 in Table 1).
Fig. 9. Sobol’ Sensitivity Indices of head outputs to 8 sensitive input parameters at 15 observation wells in (a) year 10, and (b) year 20. Parameter name is made up of symbol and unit index. For example, Ss_2 denotes specific storage of Ophiolite (indexed as 2 shown in Table 1).
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
sitive for unit 2, 4, and 5, which are the major three aquifers of the target region. Unit 1, Hajar Super Group consists of hardrock and bedrock, while Unit 3, Wadi Alluvium is sporadically deposited alluvium on gorges of Ophiolite. Consequently, neither of the two units could retain large volume of groundwater. 4.3. BMARS-based Bayesian inference BMARS model is called 10,000 times during MCMC (Markov Chain Monte Carlo) burn-in phase, a procedure for parameter distribution to relax from the initial state (Gilks et al., 1996). MCMC sampling procedure then run another 10,000 BMARS models as a sample increment to create parameter posterior distributions, and continues until the Markov chain reaches convergence, which is regarded to be reached if the posteriors won’t change any more. In this study, the chain reaches convergence after four sample increments, which means total model calls during MCMC is 50,000. Note the maximum MCMC sampling size is set as 106. The time cost running BMARS model and the associated Bayesian inversion is negligible, and the major efforts are made in 1000 representative MODFLOW model simulations for generating training dataset. As a result, the BMARS-based Bayesian inference improves inverse efficiency by 50,000/1000 = 50 times against the traditional approach in this case. A single simulation of Al-Fara MODFLOW
835
model takes about 8 min averagely in a normal computer. That said, the power of the proposed method can be more significant when a higher resolution model grid is needed for more realistic and complicated flow or reactive transport simulations. Fig. 10 presents posterior PDF (Probability Density Function) of the 8 parameters, as well as 2D priors and posteriors between these parameters. Parameter values are normalized between 0 and 1 to facilitate effective inversion. Since prior distribution is assumed uniform for all the 8 parameters, the 2D priors between any two of them show random likelihood. However, the 2D posteriors, updated from priors with head observations, reveal maximum likelihood area. The corresponding posterior PDFs of each parameter are inferred from their uniform prior PDFs shown as red horizontal lines in Fig. 10. Visually, posterior distribution spread most widely for HK_3, suggesting little additional knowledge on the parameter is gained by inversion, and least for HK_4, indicating significant uncertainty reduction achieved by incorporating head observations. The low and high identifiability of HK_3 and HK_4 is consistent with their low and high sensitivity, as discussed in sensitivity analysis in the previous section. The dispersion of the distribution can be measured quantitatively by standard deviation. The smaller the deviation is, the less the spread is. Table 2 summarizes statistics for the posterior PDFs of the 8 parameters including standard deviation, mean, and the value at
Fig. 10. BMARS-based Bayesian inversion of 8 sensitive parameters. Diagonal bar plots are inferred posterior probability distributions for 8 parameters. Lower and upper diagonal images are 2D priors and posteriors between 8 parameters respectively. The ranges of 8 parameters are normalized between 0 and 1, and the 5 hydraulic conductivity parameters are log-normalized. The red horizontal lines in diagonal bar plots are uniform prior probability (0.05 = 1/20 for 20 bars).
836
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837
Fig. 11. Simulated and observed head history at well HZ-5. Simulated heads are from MODFLOW model using parameter values at peak of likelihood as shown in Table 1.
peak likelihood. As shown in the table, standard deviation for HK_3 and HK_4 is 0.296, the biggest, and 0.103, the smallest, respectively, while those for the other 6 parameters are between them. The MODFLOW model with 8 parameter values at peak of likelihood from Bayesian inference as listed in Table 1 is executed, and the simulated heads at well HZ-5 is compared to 248 monthly observations. As shown in Fig. 11, although the model didn’t perform perfectly at those times with drastic head changes, the simulated values match the general trend of head history well. In fact, the mean and maximum abstract error between simulated and observed head is only 1.11 and 4.87 m respectively, which are very small considering head values are over 300 m at HZ-5. This comparison demonstrates the accuracy of the proposed stochastic inversion methodology. The relatively large discrepancy during sharp head changes are possibly caused by significant heterogeneity of each hydrogeologic unit, which is treated homogeneous in MODFLOW model in this study. As a result, the MODFLOW model is not sensitive enough to track the abrupt changes of flow conditions. In other words, the relatively high error originates from the MODFLOW model, not the stochastic inversing modeling.
5. Conclusions In this paper, a surrogate model approximating a regional groundwater flow model was developed using bagging multivariate adaptive regression spline (BMARS) algorithms. The cheap BMARS model replaced a high-fidelity groundwater flow model in Sobol’ sensitivity analysis and Bayesian MCMC inversion, both of which require a large number of model iterations. The BMARSbased Bayesian framework was successfully implemented and demonstrated by stochastically calibrating a regional MODFLOW model in Al-Fara area in Oman. The simulated heads with parameter values at peak of likelihood show good match with observations in well HZ-5, demonstrating reasonable accuracy of developed stochastic inversing modeling method. In the test case, BMARS-based Bayesian inference entailing 50 thousands model runs was accomplished in no time, while a single MODFLOW model simulation can cost around 8 min. The developed stochastic optimization framework was demonstrated computationally highefficient, indicating that more complicated hydrologic system can be addressed when the cost of physical model becomes unaffordable. Sobol’ total order sensitivity indices can also be rapidly calculated using the BMARS model, providing a powerful variance-based global sensitivity analysis tool. Future work will be focused on extending the BMARS technique to more realistic and complicated problems that incorporate higher-resolution grids, variable density flow conditions (e.g. seawater intrusion), contaminant reactive transport simulation, or
water resources management and contamination risk assessment, aspects that could impossibly be coped with physical hydrologic models. It is because the developed framework methodology is model-independent, and any model other than MODFLOW, or a chain of models, could be readily and flexibly integrated. In addition to hydrogeologic property parameters, operational parameters such as well number, location and abstraction rate, can be treated as uncertain variables to be optimized stochastically to facilitate groundwater resources management. This operation-related optimization will also be better constrained, once a hydrologic model in natural condition is achieved as described above. The obtained posterior PDFs can be used as prior PDF for another deterministic or stochastic optimization. While only hydraulic head observation data are used in this case study, various data sources, such as flux, could be jointly inversed. This kind of extension can be accomplished by developing a new BMARS model using the updated training dataset generated from the existing representative MODFLOW simulations, where flux is already simulated. It has been emphasized in the context that the primary computational cost for BMARS-based Bayesian framework is for running the suite of representative simulations, and the time cost for BMARS model fitting, validation and the inversion process is trivial. With the new data, the surrogate-based inversion can be conducted to update posterior distributions conveniently without requirement to run physical models. It should be noted that BMARS model induces additional errors from MODFLOW model. Although it is cross-validated against MODFLOW model in this study, it would be interesting to investigate its impact and propagation to posterior distributions by taking advantage of Bayesian inference (Smith et al., 2015). In this study, the errors between BMARS predicted and observed data are assumed to follow multivariate normal distribution to simplify the Bayesian treatment, which are usually correct for large number of samples. However, a more rigorous discussion on likelihood functions is expected in future application of this Bayesian framework (Schoups and Vrugt, 2010). Acknowledgements This work was performed under grant #SR/SCI/ETHS/11/01. We would like to thank Center for Information System and Water Research Center at Sultan Qaboos University for research facility management. We also appreciate the Ministry of Regional Municipalities and Water Resources of Oman for field data. References Amerjeed, M., 2016. Estimation of Groundwater Recharge in Northern Oman Mountains Using Multiple Techniques. Sultan Qaboos University, Oman (Ph.D. thesis). Bear, J., 1972. Dynamics of Fluids in Porous Media. Elsevier, New York, p. 764.
M. Chen et al. / Journal of Hydrology 557 (2018) 826–837 Behzadian, K., Kapelan, Z., Savic, D., Ardeshi, R.A., 2009. Stochastic sampling design using a multi-objective genetic algorithm and adaptive neural networks. Environ. Model. Software 24 (4), 530–541. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123–140. Bühlmann, P., Yu, B., 2002. Analyzing bagging. Ann. Stat. 30, 927–961. Bühlmann, P., 2003. Bagging, subagging and bragging for improving some prediction algorithms. In: Akrita, M.G., Politis, D.N. (Eds.), Recent Advances and Trends in Nonparametric Statistics. JAI Press (Elsevier), pp. 19–34. Carrera, J., 1988. State of the art of the inverse problem applied to the flow and solute transport problems, groundwater glow and quality modeling. NATO ASI Ser. 224, 549–583. Carrera, J., Alcolea, A., Medina, A., Hidalgo, J., Luit, J., 2005. Inverse problem in hydrogeology. Hydrogeol. J. 13, 206–222. https://doi.org/10.1007/s10040-0040404-7. Chen, M., Sun, Y., Fu, P., Carrigan, C.R., Lu, Z., Tong, C., Buscheck, T.A., 2013. Surrogate-based optimization of hydraulic fracturing in pre-existing fracture networks. Comput. Geosci. 58, 69–79. https://doi.org/10.1016/j. cageo.2013.05.006. Chen, M., Tompson, A.F.B., Mellors, R.J., Ramirez, A.L., Dyer, K.M., Yang, X., Wagoner, J.L., 2014. An efficient Bayesian inversion of a geothermal prospect using a multivariate adaptive regression spline method. Appl. Energy 136, 619–627. https://doi.org/10.1016/j.apenergy.2014.09.063. Chen, M., Tompson, A.F.B., Mellors, R.J., Abdalla, O., 2015. An efficient optimization of well placement and control for a geothermal prospect under geological uncertainty. Appl. Energy 137, 352–363. https://doi.org/10.1016/j. apenergy.2014.10.036. Chen, M., Izady, A., Abdalla, O., 2017. An efficient surrogate-based simulationoptimization method for calibrating a regional MODFLOW model. J. Hydrol. 544, 591–603. https://doi.org/10.1016/j.jhydrol.2016.12.011. Dai, Z., Samper, J., 2004. Inverse problem of multicomponent reactive chemical transport in porous media: formulation and applications. Water Resour. Res. 40, W07407. https://doi.org/10.1029/2004WR003248. Dai, Z., Middleton, R., Viswanathan, H., Fessenden-Rahn, J., Bauman, J., Pawar, R., Lee, S., McPherson, B., 2014. An integrated framework for optimizing CO2 sequestration and enhanced oil recovery. Environ. Sci. Technol. Lett. 1, 49–54. de Marsily, G., Delhomme, J., Delay, F., Buoro, A., 1999. 40 years of inverse problem in hydrogeology. CR Acad. Sci. Ser. II A Earth Planet Sci. 329 (2), 73–87. Doherty, J., 2005. PEST: Model Independent Parameter Estimation, User Manual. Watermark Numerical Computing. Efendiev, Y., Datta-Gupta, A., Ginting, V., Ma, X., Mallick, B., 2011. An efficient twostage Markov chain Monte Carlo method for dynamic data integration. Water Resour. Res. 41, W12423. https://doi.org/10.1029/2004WR003764. Fen, C.S., Chan, C.C., Cheng, H.C., 2009. Assessing a response surface-based optimization approach for soil vapor extraction system design. J. Water Resour. Plann. Manage. 135 (3), 198–207. Friedman, J.H., 1991. Multivariate adaptive regression splines. Ann. Stat. 19 (1), 1–67. Fu, J., Gomez-Hernandez, J., 2009. Uncertainty assessment and data worth in groundwater flow and mass transport modeling using a blocking Markov Chain Monte Carlo method. J. Hydrol. 364, 328–341. Geo-Resources Consultants, 2006. Drilling and Aquifer Testing in the Northern Batinah. Gilks, W., Richardson, S., Spiegelhalter, D., 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC. 486 pp. Hassan, A.E., Bekhit, H.M., Chapman, J.B., 2009. Using Markov Chain Monte Carlo to quantify parameter uncertainty and its effect on predictions of a groundwater flow model. Environ. Modell. Software 24 (6), 749–763. Hill, M.C., Tiedeman, C.R., 2007. Effective Groundwater Model Calibration, with Analysis of Data, Sensitivities, Predictions, and Uncertainty. John Wiley, New York, p. 480. Izady, A., Abdalla, O., Ahmadi, T., Chen, M., 2017. An efficient methodology to design optimal groundwater level monitoring network in Al-Buraimi region. Oman. Arab. J. Geosci. 10 (26), 1–14. https://doi.org/10.1007/s12517-016-2802-2. Jordan, A.B., Stauffer, P.H., Harp, D., Carey, J.W., Pawar, R., 2015. A response surface model to predict CO2 and brine leakage along cemented wellbores. Int. J. Greenh. Gas Control 33, 27–39. https://doi.org/10.1016/j.ijggc.2014.12.002. Kavetski, D., Kuczera, G., Franks, S.W., 2006. Bayesian analysis of input uncertainty in hydrologic modeling: 2. Application. Water Resour. Res. 42, W03408. https:// doi.org/10.1029/2005WR004376. Keating, E.H., Harp, D.H., Dai, Z., Pawar, R.J., 2016. Reduced order models for assessing CO2 impacts in shallow unconfined aquifers. Int. J. Greenh. Gas Control 46, 187–196. https://doi.org/10.1016/j.ijggc.2016.01.008. Kerrou, J., Renard, P., 2010. A numerical analysis of dimensionality and heterogeneity effects on advective dispersive seawater intrusion processes. Hydrogeol. J. 18 (1), 55–72. https://doi.org/10.1007/s10040-009-0533-0. Kerrou, J., Renard, P., Lecca, G., Tarhouni, J., 2010. Grid-enabled Monte Carlo analysis of the impacts of uncertain discharge rates on seawater intrusion in the Korba aquifer (Tunisia). J. Sci. Hydrol. 55 (8), 1325–1336. https://doi.org/10.1080/ 02626667.2010.519706. Khaleghi, B., Khamis, A., Karray, F.O., Razavi, S.N., 2013. Multisensor data fusion: a review of the state-of-the-art. Inform. Fusion 14 (1), 28–44. Laloy, E., Rogiers, B., Vrugt, J.A., Mallants, D., Jacques, D., 2013. Efficient posterior exploration of a high-dimensional groundwater model from two-stage Markov chain Monte Carlo simulation and polynomial chaos expansion. Water Resour. Res. 49, 2664–2682. https://doi.org/10.1002/wrcr.20226.
837
McKay, M., Beckman, R., Conover, W., 1979. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21 (2), 239–245. R.G. Niswonger, S. Panday, M. Ibaraki, MODFLOW-NWT, A Newton formulation for MODFLOW-2005: U.S. Geological Survey Techniques and Methods 6–A37, 2011, 44 p. Oladyshkin, S., de Barros, F.P.J., Nowak, W., 2012. Global sensitivity analysis: a flexible and efficient framework with an example from stochastic hydrogeology. Adv. Water Resour. 37, 10–22. https://doi.org/10.1016/j. advwatres.2011.11.001. Oliver, D., Cunha, L., Reynolds, A., 1997. Markov chain Monte Carlo methods for conditioning a log permeability field to pressure data. Math. Geosci. 29, 61–91. Picard, R.R., Cook, R.D., 1984. Cross-validation of regression models. J. Am. Stat. Assoc. 79, 575–583. Poeter, E.P., Hill, M.C., 1999. UCODE, a computer code for universal inverse modeling. Comput. Geosci. 25, 457–462. https://doi.org/10.1016/S0098-3004 (98)00149-6. Rajabi, M.M., Ataie-Ashtiani, B., Simmons, C.T., 2015. Polynomial chaos expansions for uncertainty propagation and moment independent sensitivity analysis of seawater intrusion simulations. J. Hydrol. 520, 101–122. https://doi.org/ 10.1016/j.jhydrol.2014.11.020. Rajabi, M.M., Ataie-Ashtiani, B., 2016. Efficient fuzzy Bayesian inference algorithms for incorporating expert knowledge in parameter estimation. J. Hydrol 536, 255–272. https://doi.org/10.1016/j.jhydrol.2016.02.029. Razavi, S., Tolson, B.A., Burn, D.H., 2012. Review of surrogate modeling in water resources. Water Resour. Res. 48, W07401. https://doi.org/10.1029/ 2011WR011527. Regis, R.G., Shoemaker, C.A., 2007. A stochastic radial basis function method for the global optimization of expensive functions. Informs J. Comput. 19 (4), 497–509. Sadeghi-Tabas, S., Akbarpour, A., Pourreza, Bilondi M., Zahra, Samadi S., 2016. Toward reliable calibration of aquifer hydrodynamic parameters: characterizing and optimization of arid groundwater system using swarm intelligence optimization algorithm. Special Issues Water Resour. Arid Areas Arab. J. Geosci. 9, 719. Sanz, E., Voss, C.I., 2006. Inverse modeling for seawater intrusion in coastal aquifers: insights about parameter sensitivities, variances, correlations and estimation procedures derived from the Henry problem. Adv. Water Resour. 29, 439–457. https://doi.org/10.1016/j.advwatres.2005.05.014. Shang, H., Wang, W., Dai, Z., Duan, L., Zhao, Y., Zhang, J., 2016. An ecology-oriented exploitation mode of groundwater resources in the northern Tianshan Mountain, China. J. Hydrol. 543, 386–394. https://doi.org/10.1016/j. jhydrol.2016.10.012. Schoups, G., Vrugt, J.A., 2010. A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour. Res. 46 (10), W10531. https://doi.org/ 10.1029/2009WR008933. Simpson, T.W., Mistree, F., 2001. Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA J. 39 (12), 2233– 2241. Smith, T.J., Marshall, L.A., 2008. Bayesian methods in hydrologic modeling: a study of recent advancements in Markov chain Monte Carlo techniques. Water Resour. Res. 44, W00B05. Smith, T.J., Marshall, L.A., Sharma, A., 2015. Modeling residual hydrologic errors with Bayesian inference. J. Hydrol. 528, 29–37. https://doi.org/10.1016/j. jhydrol.2015.05.051. Sobol, I.M., bol’ 1993. Sensitivity estimates for non-linear mathematical models. Math. Modeling Comput. Exp 4, 407–414. Sobol, I.M., 2001. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 55, 271–280. https://doi. org/10.1016/S0378-4754(00)00270-6. Sochala, P., Le Maître, O.P., 2013. Polynomial chaos expansion for subsurface flows with uncertain soil parameters. Adv. Water Resour. 62, 139–154. https://doi. org/10.1016/j.advwatres.2013.10.003. Sudret, B., 2008. Global sensitivity analysis using polynomial chaos expansions. Reliab. Eng. Syst. Saf. 93, 964–979. https://doi.org/10.1016/j.ress.2007.04.002. Tarantola, A., 2004. Inverse problem theory and method for model parameter estimation. SIAM, Philadelphia, PA. Tonkin, M., Doherty, J., 2009. Calibration-constrained Monte Carlo analysis of highly-parameterized models using subspace techniques. Water Resour. Res. 45 (12), W00B10. Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Higdon, D., Robinson, B.A., Hyman, J.M., 2009. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. Int. J. Nonlin. Sci. Numer. Simul. 10 (3), 273–290. Wagner, B.J., 1995. Recent advances in simulation-optimization groundwater management modeling. Rev. Geophys. 33 (S2), 1021–1028. Wei, P., Lu, Z., Yuan, X., 2013. Monte Carlo simulation for moment-independent sensitivity analysis. Reliab. Eng. Syst. Saf. 110, 60–67. https://doi.org/10.1016/j. ress.2012.09.005. Zeng, L., Shi, L., Zhang, D., Wu, L., 2012. A sparse grid based Bayesian method for contaminant source identification. Adv. Water Resour. 37, 1–9. Zhang, X.S., Srinivasan, R., Van Liew, M., 2009. Approximating SWAT model using artificial neural network and support vector machine. J. Am. Water Resour. Assoc. 45 (2), 460–474.