Advances in Water Resources 36 (2012) 51–63
Contents lists available at ScienceDirect
Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres
Impact of hydrogeological data on measures of uncertainty, site characterization and environmental performance metrics Felipe P.J. de Barros a,⇑, Souheil Ezzedine b, Yoram Rubin c a
Dept. of Geotechnical Engineering and Geo-sciences, Universitat Politecnica de Catalunya-BarcelonaTech, Jordi Girona 1-3, 08034 Barcelona, Spain NSED, Lawrence Livermore National Laboratory (LLNL), 7000 East Ave., L-188, Livermore, CA 94550, USA c Dept. of Civil and Environmental Engineering, University of California, Berkeley, 627 Davis Hall, Berkeley, CA 94720-1710, USA b
a r t i c l e
i n f o
Article history: Available online 20 May 2011 Keywords: Stochastic hydrogeology Human health risk assessment Conditioning Information theory Uncertainty reduction Contaminant transport
a b s t r a c t The significance of conditioning predictions of environmental performance metrics (EPMs) on hydrogeological data in heterogeneous porous media is addressed. Conditioning EPMs on available data reduces uncertainty and increases the reliability of model predictions. We present a rational and concise approach to investigate the impact of conditioning EPMs on data as a function of the location of the environmentally sensitive target receptor, data types and spacing between measurements. We illustrate how the concept of comparative information yield curves introduced in de Barros et al. [de Barros FPJ, Rubin Y, Maxwell R. The concept of comparative information yield curves and its application to risk-based site characterization. Water Resour Res 2009;45:W06401. doi:10.1029/2008WR007324] could be used to assess site characterization needs as a function of flow and transport dimensionality and EPMs. For a given EPM, we show how alternative uncertainty reduction metrics yield distinct gains of information from a variety of sampling schemes. Our results show that uncertainty reduction is EPM dependent (e.g., travel times) and does not necessarily indicate uncertainty reduction in an alternative EPM (e.g., human health risk). The results show how the position of the environmental target, flow dimensionality and the choice of the uncertainty reduction metric can be used to assist in field sampling campaigns. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Subsurface formations are, in general, heterogeneous and often characterized through a sparse number of measurements of head and hydraulic conductivity or through a series of geophysical measurements and surveys [1–3]. The spatial variability of hydrogeological properties together with the high cost of data acquisition prohibit full detailed site characterization, thus leading scientists and engineers to rely on probabilistic methods for assessing contaminant plume migration and its impact on human health [4–6]. For such reasons, the probabilistic paradigm for subsurface characterization has been receiving increasing attention in the scientific community as well as by environmental regulation agencies (e.g., [7,8,5,9–11]). In a probabilistic framework, aquifer properties are considered as Space Random Functions (SRFs) [6]. Data collected during site characterization is used to: (1) infer the hydrogeological parameters and to estimate their corresponding uncertainty and (2) to update conceptual models used for prediction, for example, using geostatistical methods [12–14,6,15,16]. Subsequently, these esti⇑ Corresponding author. E-mail addresses: (F.P.J. de Barros).
[email protected],
[email protected]
0309-1708/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2011.05.004
mated parameters and their associated uncertainty serve as input to hydraulic conductivity random field generators, flow and transport models. However, one of the main challenges in hydrogeological sciences and in applications is in quantifying the worth of data acquisition in terms of improved model prediction and reduction in uncertainty. For instance, de Barros et al. [17] showed how characterization needs vary according to the concentration sampling volume, plume size and Péclet number using a risk-driven approach. Moreover, de Barros and Rubin [9] and de Barros et al. [17] demonstrated that under certain transport conditions, human health risk uncertainty reduction could benefit more from characterizing health-related parameters and populational (behavioral) statistics compared to geological site investigation. Along a similar line, Nowak et al. [18] illustrated that improving the prediction of two distinct metrics (concentration and arrival time variances) require different sampling schemes of head and hydraulic conductivity measurements. These works indicate that further research efforts should be invested in understanding the relationship between prediction goals, site characterization and quantifying the associated uncertainty. Our goal is to investigate how the value of hydrogeological data varies according to: (i) the prediction to be modeled and (ii) the choice of metric used to quantify the associated uncertainty. The results from the present study will help answer the following
52
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
relevant questions: How does the information gained from a sampling scheme translates into uncertainty reduction as a function of the prediction goal, position of the environmentally sensitive target and data types available? How does this information, obtained through sampling, propagates through different measures of uncertainty reduction (variances, coefficient of variations, the spread of probability distribution functions, etc.)? In the current work, predictions that we shall evaluate against data include concentrations, travel times and human health risk. We will refer to these predictions as environmental performance metrics (EPMs). We will illustrate how alternative measures of uncertainty for these EPMs yield different interpretations of uncertainty reduction for a given sampling scheme. In addition, we will show how the contribution of site characterization towards uncertainty reduction within an EPM varies with the position of the environmentally sensitive target relative to the source and data types. To analyze this problem, the concept of comparative information yield curves (CIYC) provided in de Barros et al. [17] will be used allowing one to graphically map the expected value of the uncertainty reduction in the SRF parameters in the EPMs.
Although other EPMs types exist, such as solute fluxes [22], in this work we will focus only on C, s and r. We also assume that the conceptual model for a is known. In the following subsections, we will investigate each of the above mentioned EPMs and their relevant sources of uncertainty.
2. Model predictions and uncertainty
To compute the concentration from an instantaneously released contaminant pulse of dimensions 2Li in the ith direction, with i = 1, . . . , m at time to = 0, we will make use of the Lagrangian framework [23–26]. Modeling concentration through the Eulerian framework is an alternative as shown in [8,27]. A solute particle situated at the position vector a within the contaminant source with initial concentration Co(a) [M/L3] will travel along a streamtube following a trajectory X(tja). Here, we are interested in evaluating conservative estimates in risk-related problems, therefore we neglect effects due to pore-scale dispersion and transport is considered to be purely advective. The purely advective formulation also provides an upper bound in the concentration uncertainty. The concentration at any point in space x and time t is given by:
Suppose we wish to determine the outcome of a prediction denoted by a. The prediction a will be referred as an environmental performance metric (EPM). Here, a can represent human health risk, well drawdown, travel time or the concentration of a contaminant that can be evaluated at any point in space x and time t. The EPM a is determined from a set of input parameters h inferred from information I o . For example, I o can include actual measurement values (e.g., hydraulic conductivity). The EPM a is related to h and I o by a transfer function s (for instance, a governing partial differential equation or a risk model):
a ¼ sðhjI o Þ:
ð1Þ
Since we are interested in hydrogeological applications, h contains relevant parameters such as porosity, head gradients, hydraulic conductivity, chemical reaction rates or health parameters (needed to evaluate adverse health effects due to groundwater contamination). Due to incomplete characterization, h is uncertain and the uncertainty in h is propagated to a through s. More information in I o leads to better estimates of h, and consequently tighter uncertainty bounds on a. I o could also be used to update the prediction model, e.g., s in Eq. (1). Because of incomplete site characterization, we can only estimate the statistical moments of a such as the mean hai and its variance r2a . Whenever possible, the ultimate goal is determining the probability density function (pdf) of the EPM. From the statistical moments of a, we can measure the amount of uncertainty in the EPM through some appropriate metric Ma ðI o Þ f ðs; I o Þ. For instance, Ma ðI o Þ can be represented by the coefficient of variation CVa = ra/hai or the variance r2a . In this work, we will aim at problems related to groundwater contamination and we will investigate the following EPMs: 1. Concentration: In most studies, the main focus is on the magnitude of the concentration a = C and if it exceeds a pre-defined maximum concentration level (e.g., MCL, [19]). 2. Travel times: In some cases, authorities are concerned if a contamination will reach a control plane before a specified time. In this case, the EPM is travel time a = s [20]. 3. Human health risk: The most important matter from the socioeconomical perspective is if the contamination, together with the physiological and behavioral characteristics of the exposed population, will cause adverse health effects [7,5,9,21]. In this case, the EPM is human health risk a = r [19,11].
3. Problem formulation Consider an aquifer with spatially variable log-conductivity Y = ln K and uniform porosity ne. Let x = (x1, . . . , xm) (with m 6 3) denote the Cartesian vector with m being the dimensionality of the flow domain. We assume Y to be statistically stationary and characterized by its mean mY, variance r2Y and covariance function CY (assumed to be known). The integral scale of Y is given by IY,i with i = 1, . . . , m. For simplicity, we assume that flow is at steadystate and uniform-in-the-average along the longitudinal direction with mean velocity U = (U1, 0, 0). Because of heterogeneity, the plume will meander with the spatially variable velocity v(x). 3.1. Predicting concentrations
Cðx; tÞ ¼
Z
Z
L1
Lm
C o ðaÞd½x XðtjaÞda;
... L1
ð2Þ
Lm
where d is the Dirac operator. The concentration C in Eq. (2) will be characterized by its mean, hCi and variance r2c . In this case, the parameter vector h represents the SRF parameters, i.e., h ¼ ðmY ; r2Y ; IY;i Þ inferred from the information available I o . To obtain closed form expressions for hCi and r2c , we will make use of the first-order expansion results from the literature [28,25]. Alternatives approaches can be found in the literature (e.g., [29,30]). Assuming Co to be constant and with the aid of the Gaussian particle displacement pdf, the expected concentration becomes [24]:
hCðx; tjI o ; hÞi ¼ C o
m Y
ui ðx; tjI o ; hÞ;
ð3Þ
i¼1
where
" # " # 1 xi þ Li U i ðI o ; hÞt 1 xi Li U i ðI o ;hÞt ui ðx; tjI o ; hÞ ¼ erf pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi erf pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 2 2 2X ii ðtjI o ;hÞ 2X ii ðtjI o ; hÞ ð4Þ where Xii is the one-particle displacement covariance function and ui is an auxiliary function. The variance r2c is easily obtained and provides upper bounds for the concentration fluctuations (e.g., [28]):
r2c ðx; tjI o ; hÞ ¼ hCðx; tjI o ; hÞi2
Co 1 : hCðx; tjI o ; hÞi
ð5Þ
With the expressions above, we can evaluate the mean and variance of the concentration at any point in space. Eqs. (3) and (5) are valid for very small sampling volumes and r2Y K 1. For larger
53
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
sampling volumes, refer to [28,24,26,31]. The information provided in Eqs. (3) and (5) can be used as input for a defined C pdf model [32,31]. In the present case, the C pdf is bimodal (see [23]). 3.2. Predicting travel times The travel time s between the contaminant source and the control plane situated at a distance Lcp (perpendicular to the mean flow direction) is a significant quantity for modeling solute transport. Under the stochastic paradigm, s is a random variable. For a solute released at time to = 0, its cumulative distribution function (cdf) Gs(s) is given by [23,33,6]:
( ) 1 Lcp U 1 ðI o ; hÞt p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Gs ðsjI o ; h; Lcp Þ ¼ erfc 2 2X 11 ðtjI o ; hÞ
ð6Þ
with mean and variance given by:
sðLcp jI o ; hÞ ¼
Z
r2s ðLcp jI o ; hÞ ¼ 2
1
Z0
rðxo ; tp jb; I o ; hÞ ¼ b C e ðxo ; t p jI o ; hÞ ;
r2r ðxo ; tp jb; I o ; hÞ ¼ b2 Var C e xo ; tp jI o ; h ;
F R ðr jb; I o ; hÞ ¼ Pr½r 6 r jb; I o ; h:
ð13Þ
One of the challenges in predicting human health risk lies on the fact that an additional source of uncertainty arises from the human health parameters (denoted here by b, see Eq. (10)). To account for its uncertainty, we marginalize Eq. (13) with the pdf for b given by fb(b):
Z
F R ðr jb; I o ; hÞfb ðbÞdb:
ð14Þ
b
1 Gs ðtjI o ; h; Lcp Þ dt;
t½1 Gs ðtjI o ; h; Lcp Þdt
ð11Þ ð12Þ
with Var[] denoting the variance operator. Following de Barros and Rubin [9], we assume that risk is log-normally distributed with pdf fR ðrjb; I o ; hÞ and cdf F R ðrjb; I o ; hÞ. Given F R ðrjb; I o ; hÞ, one can evaluate the probability that risk is below a threshold value r⁄ stipulated by an environmental agency:
F R ðr jI o ; hÞ ¼
1
ð7Þ
sðLcp jI o ; hÞ 2 :
ð8Þ
0
The model for Gs(s) was derived assuming a Gaussian particle displacement pdf. This is a reasonable assumption because of the Central Limit Theorem: The total displacement can be viewed as the sum of many small displacements [34]. Furthermore, since we are dealing with uniform-in-the-average flow condition along the longitudinal direction, the relevant component is i = 1. Although our choice for Gs(s) and its moments are based on the works of Dagan [23] and Rubin and Dagan [35], the literature provides a wide range of different models [36–39]. 3.3. Predicting human health risk In many applications, human health risk is the final and ultimate EPM [19,40,7,41]. In fact, human health risk may be perceived as the EPM that decision makers are most concerned since it relates contamination to adverse health effects. In subsequent analysis, we make use of the commonly-used carcinogenic linear risk model for the groundwater ingestion pathway [19]:
r ¼ b C e with : IR ED EF b ¼ fmo ; BW AT
ð9Þ ð10Þ
where IR is the ingestion rate of water (l/day), BW body weight (kg), AT is the average time of the expected lifetime (days), ED is the exposure duration (years), EF is the daily exposure frequency (day/year) and fmo metabolized fraction of a certain carcinogenic contaminant from ingestion [19,7,5]. For notation purposes, we lump all these health parameters within b such that we can make a distinction between classes of parameters. Ce is the concentration of exposure at the location of the environmentally sensitive target xo at time tp in which the peak concentration occurs [24]. In the worst case scenarios, Ce is equal to the peak of the resident concentration such as the one defined in Section 3.1 but it can also represent other types of concentrations measured in field such as volume-averaged, flux-averaged concentrations or averaged over the exposure duration. Additional discussion on choice of the concentration average (both in space and time) and its implications in human health risk can be found in Maxwell and Kastenberg [41], Hassan et al. [42] and de Barros and Rubin [9]. Given the uncertainty in hydrogeological properties and transport parameters, risk can be characterized through its first two statistical moments:
We focus only on the uncertainty of b for a single individual. Issues related to inter-individual variability were addressed in [41]. Summarizing, uncertainty in r is due to uncertainty in both the concentration of exposure and health-related parameters (cancer potency factors, metabolized fractions, dose–response models, etc.) and has been subject of study in the literature [7,41,5,17,43,44]. 4. Conditioning and uncertainty reduction 4.1. Conditioning particle displacement statistics As more data is collected, the estimates for the model parameters could change as well as their corresponding uncertainties. In the expressions given in Sections 3.1, 3.2 and 3.3, the hydrogeological field data is manifested through X ii ðtjI o ; hÞ and U i ðI o ; hÞ. Here, our database consists of N log-conductivity data I o ¼ ðY 1 ; Y 2 ; . . . ; Y N Þ where Y j ¼ Yðxj Þ; j ¼ 1; . . . ; N and xj represent the sample locations (see [45]). With the additional assumption that Yj and the velocity are jointly multi-Gaussian, we can express the conditional particle displacement covariance X ij ðtjI o ; hÞ in terms of the conditional velocity covariance ucij and the cross-covariance between Ui and Y denoted by C U i Y , as follows (see Rubin [45] and Chapters 6 and 10 of Rubin [6]):
X cij X ij ðtjI o ; hÞ ¼
Z Z t
t
h Ei D 0 00 ucij X ci ðt0 Þ ; X cj ðt 00 Þ dt dt ;
ð15Þ
where
ucii ðx; x0 Þ ¼ uii ðx; x0 Þ
N X
kp ðxÞC Ui Y ðx0 ; xp Þ;
ð16Þ
p¼1
and N X
kp ðxÞC Y ðxp ; xk Þ ¼ C Ui Y ðx; xk Þ:
ð17Þ
p¼1
The assumption of multi-Gaussianity for Yj and the velocity is reasonable for uniform-in-the-average flow condition for r2Y K 1 as shown in [45,46]. Expressions for uii ; C U i Y and CY are reported in the literature [45,47–49,6] and we shall not replicate them here. The superscript ‘‘c’’ denotes conditioning and the coefficients kp (with p = 1, . . . , N) are the solutions of the linear system in Eq. (17). The integration procedure necessary to evaluate Eqs. (15)–(17) is a cumbersome task and time consuming. Ezzedine [48] provides a fast procedure for calculating the three-dimensional, axi-symmetric, covariance
54
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
and cross-covariance functions needed to evaluate Eqs. (15)–(17). The conditioning methodology chosen here aligns well within the selected analytical framework. However, there is a wide spectrum of methods that can be found in the literature [50,13,51,15,52,53] and the choice of method for conditioning depends on the application as well as the available computational resources. 4.2. Parametric distribution, information yield curves and metrics of uncertainty reduction The joint use of the prior SRF parameters’ pdf, denoted here by pprior(h), with the measurement pdf, given by fI o jh ðI o jhÞ (with I o representing measurements), allows one to estimate the posterior SRF function pdf parameters through Bayes theorem:
fhjI o ðhjI o Þ ¼ R
fI o jh ðI o jhÞpprior ðhÞ
: fI o jh I o j~h pprior ~h d~h
ð18Þ
Here, pprior(h) is determined using the Maximum Relative Entropy (MRE) principle [54,33,55]. Because we are interested in assessing the impact of measurements on concentration moments or on travel times pdf, it is more appropriate to relate a measure of uncertainty in the EPM estimates (e.g., its coefficient of variation or inter-percentile range) to a single scalar value that lumps together all the information contained in h. We will use of the relative entropy of h, denoted by REh [12,56,6], to quantify the impact of additional measurements on fhjI o ðhjI o Þ relative to the prior pprior(h). Following Christakos [12], we have:
REh ðI o Þ ¼
Z h
"
# fhjI o ðhjI o Þ dh P 0; fhjI o ðhjI o Þ ln pprior ðhÞ
ð19Þ
REh ðI o Þ ¼ 0 iff f hjI o ðhjI o Þ pprior ðhÞ: For each amount of information I o , we can compute a scalar value of REh ðI o Þ. When fhjI o ðhjI o Þ is equal to pprior ðhÞ; REh ðI o Þ is equal to zero. As more data is added to I o ; REh ðI o Þ becomes larger in magnitude [12]. Thus REh ðI o Þ represents the ‘‘distance’’ between fhjI o ðhjI o Þ and pprior(h). Higher values of REh imply larger differences between the conditional and unconditional stages. Consequently, for a given sampling scheme, we can relate REh ðI o Þ to the following quantities: 1. The EPM a ¼ sðhjI o Þ, see Eq. (1), with a = (C, s, r) and its statistical moments of a, i.e., hai and r2a . 2. A measure Ma ðI o Þ that quantifies the gain of information (or decrease of uncertainty) of the EPM at information level I o . By plotting REh ðI o Þ versus a measure of uncertainty reduction Ma ðI o Þ for a, we can evaluate graphically the impact of measurements in reducing the uncertainty of the EPM. These curves, Ma ðI o Þ versus REh ðI o Þ, are called Comparative Information Yield Curves (CIYC) and its main properties and concepts were developed in de Barros and Rubin [9] and de Barros et al. [17] within a riskdriven approach for site characterization. The CIYC was also applied in an optimal design context by Nowak et al. [18]. Its objective is to evaluate, graphically, the uncertainty reduction in the EPM a as a function of the reduction of uncertainty in h. We will use the following two metrics for Ma ðI o Þ:
unc ra rca ; runc
e ¼
CV a CV unc a
5. Illustration and discussion 5.1. Data used in simulations We perform 2D and 3D simulations to illustrate the concepts presented in the current work. The analysis is done for both axisymmetric 3D and isotropic 2D aquifers for comparison using an exponential Y covariance function. Here, we denote IY,1 = IY,2 = IY and IY,3 = IY,v as the horizontal and vertical Y integral scales, respectively. The anisotropic ratio is given by f = IY,v/IY. The statistics of the EPMs, a = (C, s, r), are obtained first from unconditional simulations by using some prior knowledge. Afterward, measurements of both log-conductivity Y and velocity V (at specified dimensionless locations n ¼ x1 =IY ) are collected. Sampling locations are spaced in terms of integral scales in order to avoid measurement redundancy and for demonstration purposes. We do not account for measurement errors in the current analysis. For a framework that yields optimal sampling locations refer to [18]. At each field measurement protocol, denoted by {m1}, {m2}, {m3} and {m4}, we obtain new values for the statistics of a at specific locations within the domain. Measurement protocols {m1}, {m2} and {m3} contain 3, 5 and 9Y samples respectively while {m4} contains 9Y and 9V samples. For illustration purposes, we consider only r2Y to be the uncertain SRF parameter within h. The relevant parameters and dimensionless groups used in the simulations are summarized in Table 1. Table 2 provides the dimensionless locations of the conditioning points and specifies the measurement protocols {mi}. We point
Table 1 Parameters used for analysis (in dimensionless numbers). Input data Source dimensions for 2D: (L1/IY, L2/IY) Source dimensions for 3D: (L1/IY, L2/IY, L3/IY) Dimensionless coordinates: (x1/IY, x2/IY, x3/IY) Anisotropy ratio, f = IY,v/IY Covariance function for log-conductivity
(1, 0.5) (1, 0.5, 0.1) (n, g, f) 0.5 and 1 Exponential
Table 2 Sampling locations (in dimensionless numbers – normalized by the horizontal integral scale IY : n ¼ x1 =IY ; g ¼ x2 =IY and f = x3/IY) and data types (Y and V). Sampling densities are denoted by {mi} with i = 1, 2, 3 and 4.
ð20Þ
Longitudinal sampling locations n⁄ at g = f = 0
ð21Þ
{m1} {m2} {m3} and {m4}
n⁄ = (0, 5, 10) n⁄ = (0, 2.5, 5, 7.5, 10) n⁄ = (0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10)
Measurement types {m1}, {m2} and {m3} {m4}
Measurements of Y Measurements of Y and velocity
a
c
w¼
metric (w) shows the change in the coefficient of variations (ratio between conditional CV ca and unconditional CV unc a ). A difference between Eq. (20) and Eq. (21) is that w lumps together the information of the first two moments whereas e measures the changes in the second moment. Other measures of uncertainty are formally summarized in Nowak [57]. In Section 3, the Y-field is assumed to be stationary. The same field can be made conditional to measurements, which would then lead to a conditional mean field which does not have to be stationary. This is well aligned with the Bayesian approach where the prior is taken to be stationary and it is then updated (with the measurements) to produce a non-stationary posterior. The measurements are viewed as realizations of the random process given by the same prior. This approach has been widely used in the literature [58–61,16].
with a = (C, s, r). The first metric (e) quantifies to the changes in EPM variance (relative to the unconditional case) whereas the second
55
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
out that these sampling locations were not obtained by optimal design as done in [18]. This sampling grid is used in order to investigate the effect of conditioning a on data. With measurement locations given in Table 2 and a synthetic generation of Y samples, we can now condition our EPMs (see Section 4.1). The outcome results in the conditional particle displacement covariance (Xij) as well as the updated parameters (U1 and r2Y ) necessary for the evaluation of the moments of a = (C, s, r). Fig. 1 displays conditional and unconditional Xii for the isotropic 2D case (f ? 1). 3D results are displayed in Fig. 2 for f = 0.5. Conditional estimates of the 2D and 3D displacements are lower than the unconditional ones. Fig. 3 illustrates the displacement covariances for different data types (log-conductivity and velocity). The results in Fig. 3 will be used later to compare the effects of different data types on the EPMs. By making use of Eq. (18) and the assumption of multi-Gaussian measurement distribution in I o , we can infer the pdf for h based on prior information. The result is shown in Fig. 4. These distributions will be used to evaluate REh ðI o Þ from Eq. (19) and are necessary for the construction of the CIYC (see Section 4.2). The prior is assumed to be uniformly distributed (between 0.1 and 0.8) given that it is common that prior information is given in terms of bounding values. It is worth mentioning that the prior can also be estimated using Maximum Relative Entropy (MRE) principle as presented in [54,33,55]. MRE provides an approach for modeling the pdf from
(a)
(b)
(a)
(c)
Fig. 2. Conditional and unconditional one-particle covariance functions: (a) longitudinal, (b) transverse and (c) vertical directions. Results for f = 0.5 (3D). Conditional curves for I o ¼ fm1 g; fm2 g and {m3}. See Table 2.
(b)
Fig. 1. Conditional and unconditional one-particle displacement covariance functions: (a) longitudinal and (b) transverse directions. Results for f ? 1 (2D). Conditional curves for I o ¼ fm1 g; fm2 g and {m3}. See Table 2.
information that characterizes the probability distribution incompletely (e.g., statistical moments or bounds) and selects the pdf with the highest information entropy with respect to the prior information (while satisfying the given information allowing to employ the distribution model that is least subjective). The procedure used here is detailed in Chapter 13 of Rubin [6] and applications can be found in the literature [62,16].
56
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
(a)
Fig. 4. Probability density functions (pdf) for SRF parameters (in this case r2Y ). Results obtained by making use of Bayes theorem (see Eq. (18) and procedure detailed in Section 4).
(a = C, s and r) and its dependency on the uncertainty reduction metric Ma ðI o Þ.
(b)
(c)
Fig. 3. Conditional one-particle covariance functions: (a) longitudinal, (b) transverse and (c) vertical directions. Results for f = 0.5 (3D). Comparison between data types (log-conductivity and velocity). Conditional curves for I o ¼ fm3 g and {m4}. See Table 2.
5.2. Dependence of the gain of information on EPMs and uncertainty reduction metrics For the subsequent Sections 5.2.1 and 5.2.2, we will illustrate how the gain of information varies according to different EPMs
5.2.1. Different EPMs We evaluate the worth of data for the following EPMs: resident concentrations, travel times and human health risk. For this subsection, we quantify uncertainty reduction in the EPM through Ma ðI o Þ ¼ e, see Eq. (20). We will illustrate results for 2D and 3D flow cases for comparison. It is useful to define the following dimensionless coordinates: For 2D, we have n = x1/IY, g = x2/IY whereas for the 3D simulations we have n = x1/IY, g = x2/IY and f = x3/IY (see Table 1). 3D results were evaluated for f = 0.5 (Table 1). We will evaluate the EPMs at the plume’s center line (g = 0 in 2D and g = f = 0 in 3D) for short and intermediate travel distances (n = 2.5 and 10). Fig. 5 shows the CIYC for concentration in 2D and 3D. It depicts how concentration statistics are affected by the sampling scheme adopted at different transverse positions (plume’s centerline g = 0 and off-set g = 0.45). For each stage of information I o (unconditional, {m1}, {m2} and {m3}, see Table 2), we have a corresponding REh (Eq. (19)). The horizontal axis quantifies the information present in the distribution fhjI o ðhjI o Þ (Eq. (18)) relative to the unconditional information pprior(h) (see Eqs. 18 and 19). We can map each REh (Eq. (19)) to a corresponding value of e (Eq. (20)) at different locations of the domain. By comparing the curves in Fig. 5, we quantify the effect of additional measurements on e (Eq. (20)). Fig. 5 shows that the CIYC is a useful tool for planning of data acquisition, because it shows how the EPM (at different locations) could be affected by the data. Analyzing the 2D isotropic case depicted in Fig. 5a, we observe that changes in e between transverse positions g = 0 and g = 0.45 are large for small travel distance n = 2.5 when compared to the curves for the intermediate distance n = 10. This is because at small travel distances, the plume is not that distorted since it did not fully sample all the heterogeneity of the aquifer. In other words, it is still strongly conditioned by its initial conditions and a few Y measurements help reduce the concentration variance. The difference between the curves at g = 0 and g = 0.45 for n = 2.5 is also due to the fact that the concentration variance is bimodal at small travel distances as shown in Figs. 6 and 7 [28,26]. The target position (n, g) = (2.5, 0.45) is near the plume’s fringe on
57
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63 2D, f → ∞, α = C
1
3D, f = 0.5, α = C
1.4
(ξ, η, ζ) = (2.5, 0, 0)
1.2
0.8
(ξ, η, ζ) = (2.5, 0.45, 0) (ξ, η, ζ) = (10, 0, 0)
1
(ξ, η, ζ) = (10, 0.45, 0)
0.6 ε
ε
0.8 (ξ, η) = (2.5, 0)
0.4
0.6
(ξ, η) = (2.5, 0.45) (ξ, η) = (10, 0)
0.4
(ξ, η) = (10, 0.45)
0.2
0.2
(a) 0 0
0.1
0.2 REθ
0.3
(b)
0 0
0.4
0.1
0.2 REθ
0.3
0.4
Fig. 5. Comparative information yield curves (CIYC) for concentration (a = C). Plots obtained for: (a) 2D and (b) 3D simulations as a function of travel distance n and transverse positions g and f. Sensitive target located along plume’s centerline (continuous curves) and fringe (dashed curves). Uncertainty reduction measured by e (Eq. 20) versus REh (Eq. 19).
Fig. 6. Normalized concentration variances in 2D, Var½C r2C =C 2o , as a function of the spatial domain at dimensionless time tU/IY = 2.5. Surface plots shown for: (a) unconditional and (b) conditional on {m3}. Off-set target position (n, g) = (2.5, 0.45). Transverse contaminant source dimension is equal to L2/IY = 0.5 to L2/IY = +0.5.
0.30
0.30
Unc.
2.5 0
0.25
(a)
(b)
0.20
0.15
Var C
Var C
0.20 m1
0.10
m3
Unc.
0.15
m2 0.10
0.05 0.00
2.5 0.45
0.25
m2
m3 0
1
2
3
4
5
6
m1
0.05
7
tU IY
0.00
0
1
2
3
4
5
6
7
tU IY
Fig. 7. Illustration of the normalized concentration variances, Var½C r2C =C 2o , as a function of dimensionless time tU/IY. Results for unconditional and all conditional cases. Concentration variance for: (a) Target aligned with the contaminant source (n = 2.5 and g = 0) and (b) target at the off-set position from the source (n = 2.5 and g = 0.45).
the halo of the variance and far from the measurements (see Fig. 6). As explained in Rubin [28], the halo shape in the concentration variance, depicted in Figs. 6 and 7, has strong implications in site characterization. For instance, Fig. 7 illustrates how conditioning changes the concentration variance as a function of the environmentally sensitive target position and time. As the position of the target approaches the plume’s fringe (g = 0.45), the differences between concentration variances computed for different sampling densities {mi} are small as depicted in Fig. 7b. This small difference leads to a small change in e for the curve evaluated at (n, g) = (2.5, 0.45) as shown in Fig. 5a. The opposite occurs for the 3D anisotropic case: in Fig. 5b, the curves at transverse positions (g, f) = (0, 0) and (g, f) = (0.45, 0) are closer together at small travel distance n = 2.5 when compared to the 2D case. Comparison between Fig. 5a and b at n = 2.5 shows that the same sampling scheme yields different changes in e
(Eq. (20)). In Fig. 5b, we can clearly observe the effect of the third dimension projected in the information value: In the 2D case (depicted in Fig. 5a), the particle pathway at small travel distance is limited when compared to the 3D trajectories. Consequently, uncertainty reduction is most likely to be smaller in 3D when compared to the 2D case. In 2D, the uncertainty is better constrained; while in 3D, more data may be needed to accomplish the same uncertainty reduction as in 2D. This explains why the curves for the 2D case at n = 10 seems to indicate a reduction in the slope when compared to the 3D curves at n = 10 (compare Fig. 5). Next, we wish to show how the value of information also changes according to the EPM. In Fig. 5 we used resident concentration statistics to evaluate gain of information. For the subsequent case, we use travel times as an alternative EPM. In addition, we also show how parametric uncertainty in h plays an important role and cannot be ignored in such analysis as pointed
58
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
out in Rubin and Dagan [35]. To obtain the CIYC for travel times, we make use of the expressions in Section 3.2. Fig. 8 depicts the CIYC in 2D and 3D for travel times to a control plane situated at n Lcp = 2.5 and 10, respectively (using Eqs. 7, 8). Results are shown with and without parametric uncertainty in h (denoted as PU in the legend of Fig. 8). Comparing Figs. 8a and 8b, we see how the same sampling scheme yields larger changes in the travel time variance in the 2D case compared to the 3D case. This is related to the difference in the dimensionality of the domain: In 3D, solute particles have more meandering pathways because of the flexibility for upward/downward movement and consequently, more measurements are needed to achieve an e equivalent to the one shown in the 2D plot (Fig. 8a). Fig. 8 illustrates how parametric uncertainty affects the CIYC: curves that contain parametric uncertainty have smaller values of e for a given REh. Comparing the results in Fig. 8 with the ones obtained for resident concentration in Fig. 5, one can see how the same sampling strategy could have different effects on the EPM variances of concentration and travel times. One of the main differences between the EPM used in Fig. 5 and the one used in Fig. 8 is that in the travel time EPM, we are only concerned with the arrival of a solute particle to a control plane Lcp (in this case, perpendicular to the main flow direction [20]). In the travel time approach, we are not so concerned where the streamline crosses the control plane, whereas for the concentration EPM, the opposite occurs: determining if a streamline will hit an environmentally sensitive target (with a small sampling window) becomes relevant. This result illustrates how the sampling design has to be aligned with the prediction to be modeled. The main implication is that there is not a sampling strategy that fits all predictions, and this is why the CIYC is important as a design and protocol tool. In addition, the velocity is correlated over much larger distances than the correlation length of the concentration (which varies more erratically) [63,64]. Fig. 8 shows that, for small travel distances, a small number of measurements is enough to sample the heterogeneity of the aquifer whereas, for increasing Lcp, more measurements are needed to capture fast flow conduits or stagnation points. The last EPM we shall investigate is the human health risk (a = r). Fig. 9 depicts how the alternative sampling schemes could affect human health risk variances. To compute human health risk, we used the resident concentration predictions from Section 3.1 while accounting for the uncertainty in the human health related parameters b using Eq. (14). We assumed, for illustration purposes, that b is log-normally distributed with mean and standard deviation equal to 5.5 and 0.6, respectively (both quantities in logarithm space). These values are based on a simple Monte Carlo simulation using data from the literature [41,5,11,21]. Many different distribu1
tions can be assigned to b as well as to each of the parameters within b, see Eq. (10) [41]. Here, we have computed risk using the peak of the mean concentration and its corresponding variance. A more appropriate procedure would be to evaluate the concentration peak at each single realization and compute its associated risk as done in [41]. An important quantity for risk assessment is the statistics of the peak exposure concentration [42,27]. In our work, we evaluated the peak of the mean concentration in lieu of the mean of the peak concentration, which could be obtained following [42,65,27]. As discussed in Hassan et al. [42], to capture the statistics of the peak exposure concentration, the relative dispersion framework [66] is more adequate. Andricevic [27] provides the formulation to obtain exposure concentration within the relative dispersion framework while a comparative analysis between analytical solutions (in relative and absolute dispersion framework) with numerical simulations is given in Hassan et al. [65]. Comparing Figs. 5, 8 and 9, we see how the choice of the EPM can influence the information gain. For instance, Figs. 9a and b contains an additional source of uncertainty related to the health parameters (characterized by b). Comparing the curves for small travel distances (n = 2.5 at both g = 0 and 0.45) in Figs. 5 and 9, we observe that these curves are closer together when the EPM is human health risk. This is expected since, for the human health risk EPM, hydrogeological data increases its significance for increasing distance from the contaminant source [9]. At locations near the contaminant source, risk uncertainty reduction benefits more from characterizing the health parameters b since the probability that the plume will bypass the environmental target is small [9,17]. 5.2.2. Alternative measures of uncertainty reduction If we select another metric for Ma ðI o Þ to quantify uncertainty reduction, such as w from Eq. (21), we could observe how the value of information manifests differently when compared to e from Eq. (20). Fig. 10 depicts the uncertainty reduction in w for a = C. As explained in Section 4.2, w measures the ratio of ‘‘dispersion’’ of the conditional concentration prediction to the unconditional one. Smaller values of w imply less uncertainty. The advantage of w over e is that the first measure lumps together the information of the first two moments (see Eq. (21)) whereas the former just quantifies the changes in the second moment (see Eq. (20)). Consequently, w is provides information differently than e. In Fig. 10a, we observe that the curves in the 2D simulation are spaced from each other whereas in the 3D results depicted in Fig. 10b, the curves are somewhat lumped together. Similar to Fig. 5a, the 2D curves (g = 0 and 0.45) in Fig. 10a for short travel distances (n = 2.5) are well separated. This is because concentration statistics are sensitive to the location of the target since the
2D, f → ∞, α = τ
0.8
(a)
0.7
0.8
(b)
0.6 0.5 ε
ε
0.6 ξ = 2.5
0.4
ξ = 10
0.2 REθ
0.3
ξ = 2.5
0.2
ξ = 10 (PU)
0.1
0.4 0.3
ξ = 2.5 (PU)
0.2
0 0
3D, f = 0.5, α = τ
ξ = 2.5 (PU) ξ = 10
0.1 0.4
0 0
ξ = 10 (PU) 0.1
0.2 REθ
0.3
0.4
Fig. 8. Comparative information yield curves (CIYC) for travel times (a = s). Plots obtained for: (a) 2D and (b) 3D simulations as a function of early and intermediate travel distance (n = 2.5 and 10). Results with and without parametric uncertainty (PU) in h (dashed and continuous curves, respectively). Uncertainty reduction measured by e (Eq. 20) versus REh (Eq. 19). Measurement locations for: (i) {m1}, n⁄ = (0, 5, 10); (ii) {m2}, n⁄ = (0, 2.5, 5, 7.5, 10); and (iii) {m3}, n⁄ = (0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10).
59
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63 2D, f → ∞, α = r
2
(a)
(ξ, η) = (2.5, 0) (ξ, η) = (2.5, 0.45) 1.5
3D, f = 0.5, α = r
2
(ξ, η, ζ) = (2.5, 0.45, 0) 1.5
(ξ, η) = (10, 0)
(ξ, η, ζ) = (10, 0, 0) (ξ, η, ζ) = (10, 0.45, 0)
1
1
ε
ε
(ξ, η) = (10, 0.45)
0.5
0.5
0 0
(b)
(ξ, η, ζ) = (2.5, 0, 0)
0.1
0.2 REθ
0.3
0 0
0.4
0.1
0.2 REθ
0.3
0.4
Fig. 9. Comparative information yield curves (CIYC) for health risk (a = r). Plots obtained for: (a) 2D and (b) 3D simulations as a function of travel distance n and transverse positions g and f. Sensitive target located along plume’s centerline (continuous curves) and fringe (dashed curves). Uncertainty reduction measured by e (Eq. 20) versus REh (Eq. 19). Uncertainty in b in Eq. (10) is included. Measurement locations for: (i) {m1}, n⁄ = (0, 5, 10); (ii) {m2}, n⁄ = (0, 2.5, 5, 7.5, 10); and (iii) {m3}, n⁄ = (0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10).
2D, f → ∞, α = C
1
(ξ, η) = (2.5, 0)
0.9
(ξ, η, ζ) = (2.5, 0, 0) (ξ, η, ζ) = (2.5, 0.45, 0)
0.9
(ξ, η) = (10, 0) (ξ, η) = (10, 0.45)
0.7
(ξ, η, ζ) = (10, 0, 0)
0.85
0.6
ψ
ψ
0.95
(ξ, η) = (2.5, 0.45)
0.8
3D, f = 0.5, α = C
1
(ξ, η, ζ) = (10, 0.45, 0)
0.8 0.75
0.5 0.4
0.7
0.3
0.65
0.2 0
(a) 0.1
0.2 REθ
0.3
0.4
(b)
0
0.1
0.2 REθ
0.3
0.4
Fig. 10. Comparative information yield curves for concentration (a = C). Plots obtained for: (a) 2D and (b) 3D simulations as a function of travel distance n and transverse positions g and f. Sensitive target located along plume’s centerline (continuous curves) and fringe (dashed curves). Uncertainty reduction measured by w (Eq. 21) versus REh (Eq. 19). Measurement locations for: (i) {m1}, n⁄ = (0, 5, 10); (ii) {m2}, n⁄ = (0, 2.5, 5, 7.5, 10); and (iii) {m3}, n⁄ = (0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10).
2D, f → ∞, α = r
0.35
(ξ, η) = (2.5, 0)
0.3
(ξ, η) = (2.5, 0.45)
0.2
(a)
(ξ, η) = (10, 0)
0.6
(ξ, η) = (10, 0.45)
0.5
rcrit = 10−4
0.15
(ξ, η, ζ) = (2.5, 0, 0)
0.4
(ξ, η, ζ) = (2.5, 0.45, 0) (ξ, η, ζ) = (10, 0, 0)
0.3
0.1
(ξ, η, ζ) = (10, 0.45, 0)
0.2
0.05 0 0
(b)
0.7
FR(r)
FR(rcrit)
0.25
3D, f = 0.5, α = r
0.8
0.1 0.1
0.2 REθ
0.3
0.4
0 0
0.1
0.2 REθ
0.3
0.4
Fig. 11. Comparative information yield curves for health risk (a = r). Plots obtained for: (a) 2D and (b) 3D simulations as a function of travel distance n and transverse positions g and f. Sensitive target located along plume’s centerline (continuous curves) and fringe (dashed curves). Uncertainty reduction measured by e (Eq. 20) versus REh (Eq. 19). Uncertainty in b in Eq. (10) is included. Measurement locations for: (i) {m1}, n⁄ = (0, 5, 10); (ii) {m2}, n⁄ = (0, 2.5, 5, 7.5, 10); and (iii) {m3}, n⁄ = (0, 1.25, 2.5, 3.75, 5, 6.25, 7.5, 8.75, 10).
bimodal shape of the concentration variance plays a strong role in the uncertainty quantification (especially at short travel distance). As shown in Fig. 10a, in the 2D case, a small number of measurements is capable of reducing w. This reduction is significant at short travel distance n = 2.5 and when the target is aligned with the source (g = 0). This is due to the fact that the sampling scheme we selected is aligned with the contaminant source’s centerline. At
larger travel distances (n = 10), more data is needed in order to diminish the uncertainty of plume position, concentration values and the corresponding fluctuations around its mean value. In this case, the information gain in Ma ðI o Þ ¼ w is quite different than the plots given in Fig. 5 where Ma ðI o Þ ¼ e. This highlights the importance of considering the EPM as well as the metric used to quantify uncertainty reduction, Ma ðI o Þ. This is clearly illustrated
60
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
by comparing Figs. 10a (using a = C and Ma ðI o Þ ¼ w) and 9a (using a = r and Ma ðI o Þ ¼ e). At short distance and target aligned with the contaminant source, e.g., (n, g) = (2.5, 0), Fig. 9a depicts small changes in the variance of human health risk (total change in e 0) since the human physiological uncertainty overwhelms the uncertainty from the hydrogeological component at locations near the contaminant source [9]. On the other hand, in Fig. 10a, the curve evaluated at the same point n = 2.5 and g = 0 (for a = C) represents the situation with the largest uncertainty reduction (total change in w 0.7). In this case, for a = C and Ma ðI o Þ ¼ w, hydrogeological sampling plays a much more significant role. Finally, we wish to illustrate the use of another important metric commonly used in probabilistic human health risk assessment: the risk cdf FR given by Eq. (13). It informs the probability of risk being below a threshold value stipulated by environmental agencies (e.g., rcrit = 104): FR(rcrit) = Pr[r 6 rcrit]. Fig. 11 provides the CIYC using FR(rcrit) as a uncertainty metric for both 2D and 3D simulations as a function of REh. Fig. 11a and b depicts the differences between 2D and 3D results: The probability of risk being below rcrit is larger for 3D than in 2D since in 3D, the chances the plume can bypass the environmental target are higher. This is clearly observed in Fig. 11a (2D case) at short travel distance n = 2.5 where the probability the plume hitting the target is high leading to very low values of FR(rcrit). The results in Fig. 11 also illustrate how the probability of risk being below rcrit changes with the position of the target for a fixed REh. The probability of having lower risk values increases if the sensitive target is placed at the fringe of the plume (where uncertainty is highest). For the prior information used in this simulation, see Tables 1 and 2, there is always a probability of risk exceedance (see value for FR when REh = 0). This result is important because it shows how the probability of risk being below rcrit varies according to the information available and the position of the target. We can see from the results shown in this section that the utility of a given scheme varies significantly according to the physical setup as well as the dimensionality of the flow domain. Moreover, depending on how one quantifies uncertainty reduction (through e or w), information yields are different and for such reason, the choice of the uncertainty reduction metric Ma ðI o Þ should be done depending on the application at hand and the EPM of interest. Similar analysis can be done for other EPMs (travel times and human health risk) and results (not shown here) also varied according to the metric used to quantify uncertainty reduction Ma ðI o Þ. If one is interested in the magnitude of concentration (relative to a threshold value, e.g., MCL), then the uncertainty reduction metric must account for information that includes the mean concentration value, its variance or preferably the full concentration cdf. If the concern is in predicting the concentration (or travel time) fluctuations around its mean value, than a metric including the variance is sufficient (see [18]). Similar issues were explored in de Barros et al. [17] where the value of information was addressed together with the CIYC in a human health risk-driven context. In that work, the authors showed how the value of information stemming from the health component varied according to the EPM (which depends on environmental regulations, e.g., solute discharge, flux-averaged concentrations, etc.) therefore, providing recommendations for setting up site characterization priorities.
groundwater velocity measurement probe described in [67]. For the upcoming results, we will condition our EPM on {m4}: 9Y and 9V (see Table 2 and Fig. 3). Results depicted here are also given for a 3D anisotropic formation (f = 0.5, see Table 1). Figs. 12a and b shows the impact of conditioning the first two moments of concentration given by Eqs. (3)–(5) on {m4}. Adding velocity samples leads to narrower curves for the concentration mean and variance. However, for the physical scenario at hand, the difference between the two conditional curves is moderate.
(a)
(b)
(c)
5.3. Use of different data types Up to now, we have conditioned the EPMs on Y measurements. However, other data types can be used. At this point we wish to illustrate how conditioning on both flow (V) and log-conductivity (Y) data affects the statistics of concentration and travel time. For this demonstration, the velocity is measured at a very small scale (equivalent to the scale of Y). This can be achieved by using the
Fig. 12. Plots (a) and (b): concentration moments obtained using 9 joint logconductivity and flow measurements ({m4}: 9Y and 9V). Comparison with unconditional simulations at the plume’s centerline. Plot (c): travel time cdf for unconditional and conditional simulations. Results for 3D anisotropic formation. Results conditional on: (i) 9 log-conductivity measurements (9Y) and (ii) 9 joint logconductivity and flow measurements ({m4}: 9Y and 9V).
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
Now we show how conditioning the travel time cdf Gs, Eq. (6), on V and Y measurements yields only a very marginal improvement as opposed to the concentration moments. From Fig. 12c, we see that conditioning on V, in addition to Y samples, does not improve much the EPM estimates. One would expect the V measurements to be more beneficial than Y measurements since, in order to compute the velocity, knowledge of porosity and the hydraulic heads are needed thus introducing additional errors on top of the measurement errors associated with Y samples. For our scenario and model assumptions (using linear theory), our results only showed a marginal gain. We believe that this is because, in our illustration, the velocity is modeled as a linear function of the hydraulic conductivity (due to the small variance approximation) and the fact that we did not assume any error in the heads and in the porosity. This is a clear example of when additional sampling efforts yield a marginal return in the EPM. In such situations, resource allocation to hydrogeological characterization may not be as efficient in reducing uncertainty. For instance, if one is addressing human health risk due to groundwater contamination evaluated using travel time statistics [7,9], then such resources could be better allocated towards better understanding of the population’s behavioral and physiological characteristics [5,9,17]. 6. Conclusions In this work, we addressed issues concerning site characterization and its relation to both environmental performance metrics (EPMs) and uncertainty reduction metrics. One of the main points illustrated was the importance of considering the sampling scheme according to the target position, data types and EPMs. We highlight the following observations: 1. Sensitivity of the information gain to the EPM and target position: The translation of the utility of a sampling scheme into uncertainty reduction is dependent on the EPM, the position of the environmental target and data type. Knowing what needs to be predicted is a key element in driving a sampling campaign and might help in setting priorities towards site characterization. This is consistent with the health risk-driven approach towards identifying characterization needs provided in de Barros et al. [17] and with the optimal design study given by Nowak et al. [18]. We have shown that hydrogeological measurements manifest differently if one uses human health risk as a EPM as opposed to concentrations or travel times. This issue poses serious challenges to scientists, regulators and decision makers since resolutions related to resource allocation clearly depend on the EPM under consideration. For example, the information gain for the concentration EPM is particularly sensitive if the environmental target is positioned at the fringe or at the centroid of the plume at short travel distances because of the halo shape present in the concentration variance. As shown in the results, reducing the estimation variance for the concentration at the halo is a challenging task due to the high concentration fluctuations at the contaminant plume’s fringe. 2. Choice of the uncertainty reduction metric: For a fixed EPM, we showed how the information obtained through sampling is propagated differently according to the choice of the uncertainty reduction metric. Our results highlight the challenge underlying the ambiguity and inconsistency between these metrics that quantify uncertainty reduction. Reduction of uncertainty in a particular metric does not necessarily imply uncertainty reduction in others or an overall improvement. The quantification of uncertainty reduction should be aligned with the characterization goals, which in turn should be related to a specific EPM.
61
3. Information manifests differently according to the flow dimensionality: The utility of a given sampling scheme varies according the dimensionality of the flow domain. As expected, results for 2D are different to the ones in 3D. Particular attention should be given to the risk cdf since the CIYC showed completely distinct behavior between 2D and 3D. When setting up a sampling design, one must be careful when taking assumptions related to flow dimensionality and should use full knowledge of the main physical characteristics in order to define characterization needs. Furthermore, our results showed how the EPM is also sensitive to the flow dimensionility. 4. Uncertainty in the SRF parameters: Including parametric uncertainty can yield different information gain and must be accounted. The effect of parametric uncertainty is clearly reflected in the slopes of the CIYC. Finally, we showed that, according to the physical scenarios investigated, additional samples and different types of information can at times yield only limited decrease in uncertainty. This has implications in real applications since this point is related to questions concerning the expansion of existing measurement networks or issues regarding selecting between alternative targets for additional investment. As pointed out in de Barros et al. [17], such efforts may not always be justified, because they can potentially yield only marginal improvement in the predictive capability. This issue is of practical interest when dealing with human health risk since financial resources could be invested towards better understanding of the physiological and behavioral component of the risk assessment process. A wide selection of sampling techniques are available that vary in many different resolutions and offer direct or indirect information on the parameters that are relevant for the EPM [2,3]. Thus having rational guidance to manage these alternatives is relevant because resources are limited. We highlight that the different EPMs used in this work are not expected to behave similarly. Our aim is to show that different metrics lead to different behavior and that we have a rational approach to illustrate the importance of goal-oriented site characterization. Our results are based on the assumption that the conceptual model is known. This is seldom the case in real practical cases. Neuman [14] showed how model uncertainty can be accounted in site characterization by making use of Bayesian model averaging. Nowak et al. [18] provided a optimal design framework which made use of the Matérn covariance family to address model uncertainty in the geostatistical model through continuous Bayesian model averaging. Incorporating model uncertainty (through Bayesian model averaging) into the current framework can be done and may lead to interesting conclusions. Furthermore, the results obtained in here were not cast within an optimal design framework as done in Nowak et al. [18]. Sampling locations were selected in order to show the impact of data on the EPMs. For a full site characterization protocol, our framework would need to account for the ideas presented in Fienen et al. [68]. We wish to emphasize that the analytical models and conditioning techniques used in this work served for the purpose of addressing the issues posed in the introduction. Many other methods ranging from numerical to analytical tools are available in the literature [69–71,27]. The expressions used in the current work are easily expandable to finite Péclet numbers as well as large sampling volumes [26]. The quantitative results shown in this work are also dependent on the sampling scheme and the assumptions adopted in the physical model. For instance, variability in the porosity (assumed uniform in this work) could have an effect on the travel time pdf. Results here are valid for a divergence-free flow (free of boundary effects) and for low to mild heterogeneity ðr2Y K 1Þ. Ranges of applicability are reported in [46,72]. Different
62
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63
measurement sampling configurations, as well as other data types, may result in different slopes of the CIYC. Acknowledgments The first author acknowledges helpful discussions with Wolfgang Nowak at the early stages of this work and the German Research Foundation (DFG) for financial support of the project within the Cluster of Excellence in Simulation Technology (EXC 310/1) at the University of Stuttgart. The support of the Spanish Ministry of Science via the ‘‘Juan de la Cierva’’ program is also acknowledged. This study has been funded by the US DOE Office of Biological and Environmental Research, Environmental Remediation Science Program (ERSP), through DOE-ERSP Grant DE-FG0206ER06-16 as part of Hanford 300 Area Integrated Field Research Challenge Project. The second author would like to acknowledge that this work was partially performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. We acknowledge the comments made by three anonymous reviewers. References [1] Ezzedine S, Rubin Y, Chen J. Bayesian method for hydrogeological site characterization using borehole and geophysical survey data: theory and application to the Lawrence Livermore National Laboratory Superfund site. Water Resour Res 1999;35(9):2671–83. [2] Hubbard S, Rubin Y. Hydrogeological parameter estimation using geophysical data: a review of selected techniques. J Contam Hydrol 2000;45(1–2):3–34. [3] Rubin Y, Hubbard S. Hydrogeophysics. Springer Verlag; 2005. [4] James B, Gorelick S. When enough is enough: the worth of monitoring data in aquifer remediation design. Water Resour Res 1994;30(12):3499–513. [5] Maxwell RM, Kastenberg WE, Rubin Y. A methodology to integrate site characterization information into groundwater-driven health risk assessment. Water Resour Res 1999;35(9):2841–85. [6] Rubin Y. Applied stochastic hydrogeology. Oxford: Oxford University Press; 2003. [7] Andricevic R, Cvetkovic V. Evaluation of risk from contaminants migrating by groundwater. Water Resour Res 1996;32(3):611–21. [8] Kapoor V, Kitanidis P. Advection–diffusion in spatially random flows: formulation of concentration covariance. Stoch Hydrol Hydraul 1997;11(5):397–422. [9] de Barros FPJ, Rubin Y. A risk-driven approach for subsurface site characterization. Water Resour Res 2008;44:W01414. doi:10.1029/ 2007WR006081. [10] Bolster D, Barahona M, Dentz M, Fernandez-Garcia D, Sanchez-Vila X, Trinchero P, et al. Probabilistic risk analysis of groundwater remediation strategies. Water Resour Res 2009;45. [11] USEPA, Risk assessment guidance for Superfund: Volume III – Part A, Process for conducting probabilistic risk assessment, Tech. Rep. Rep.EPA 540/R-02/002, December 2001. [12] Christakos G. Random field models in earth sciences. 4th ed. New York: Dover; 1992. [13] Zimmerman D, De Marsily G, Gotway C, Marietta M, Axness C, Beauheim R, et al. A comparison of seven geostatistically based inverse approaches to estimate transmissivities for modeling advective transport by groundwater flow. Water Resour Res 1998;34(6):1373–413. [14] Neuman S. Maximum likelihood Bayesian averaging of uncertain model predictions. Stoch Environ Res Risk Assess 2003;17(5):291–305. [15] Hendricks Franssen H, Alcolea A, Riva M, Bakr M, van der Wiel N, Stauffer F, et al. A comparison of seven methods for the inverse modelling of groundwater flow. Application to the characterisation of well catchments. Adv Water Resour 2009;32(6):851–72. [16] Rubin Y, Chen X, Murakami H, Hahn M. A Bayesian approach for inverse modeling, data assimilation, and conditional simulation of spatial random fields. Water Resour Res 2010;46(9):W09520. [17] de Barros FPJ, Rubin Y, Maxwell RM. The concept of comparative information yield curves and its application to risk-based site characterization. Water Resour Res 2009;45:W06401. doi:10.1029/2008WR007324. [18] Nowak W, de Barros F, Rubin Y. Bayesian geostatistical design: task-driven optimal site investigation when the geostatistical model is uncertain. Water Resour Res 2010;46(3):W03535. [19] USEPA, Risk assessment guidance for Superfund Volume I: Part A, Human health manual.Tech. Rep. Rep.EPA/540/1-89/002, December 1989. [20] Rubin Y, Ezzedine S. The travel times of solutes at the Cape Cod tracer experiment: data analysis, modeling, and structural parameters inference. Water Resour Res 1997;33(7):1537–47. [21] Maxwell RM, Carle SF, Tompson AFB. Contamination, risk, and heterogeneity: on the effectiveness of aquifer remediation. Environ Geol 2008;54(8):1771–86.
[22] Dagan G, Cvetkovic V, Shapiro A. A solute flux approach to transport in heterogeneous formations: 1. The general framework. Water Resour Res 1992;28(5):1369–76. [23] Dagan G. Flow and transport in porous formations. Berlin: Springer Verlag; 1989. [24] Rubin Y, Cushey M, Bellin A. Modeling of transport in groundwater for environmental risk assessment. Stoch Hydrol Hydraul 1994;8(1):57–77. [25] Fiori A, Dagan G. Concentration fluctuations in aquifer transport: a rigorous first-order solution and applications. J Contam Hydrol 2000; 45(1–2):139–63. [26] Tonina D, Bellin A. Effects of pore-scale dispersion, degree of heterogeneity, sampling size, and source volume on the concentration moments of conservative solutes in heterogeneous formations. Adv Water Resour 2008;31(2):339–54. [27] Andricevic R. Exposure concentration statistics in the subsurface transport. Adv Water Resour 2008;31(4):714–25. [28] Rubin Y. Transport in heterogeneous porous media: prediction and uncertainty. Water Resour Res 1991;27(7):1723–38. [29] Cirpka O, Schwede R, Luo J, Dentz M. Concentration statistics for mixingcontrolled reactive transport in random heterogeneous media. J Contam Hydrol 2008;98(1–2):61–74. [30] Tartakovsky D, Dentz M, Lichtner P. Probability density functions for advective-reactive transport with uncertain reaction rates. Water Resour Res 2009;45(7):W07414. [31] Schwede R, Cirpka O, Nowak W, Neuweiler I. Impact of sampling volume on the probability density function of steady state concentration. Water Resour Res 2008;44:12433. [32] Bellin A, Tonina D. Probability density function of non-reactive solute concentration in heterogeneous porous formations. J Contam Hydrol 2007;94(1–2):109–25. [33] Woodbury A, Rubin Y. A full-Bayesian approach to parameter inference from tracer travel time moments and investigation of scale effects at the Cape Cod experimental site. Water Resour Res 2000;36(1):159–71. [34] Dagan G, Nguyen V. A comparison of travel time and concentration approaches to modeling transport by groundwater. J Contam Hydrol 1989;4(1):79–91. [35] Rubin Y, Dagan G. Conditional estimation of solute travel time in heterogeneous formations: impact of transmissivity measurements. Water Resour Res 1992;28(4):1033–40. [36] Cvetkovic A, Shapiro AM, Dagan G. A solute flux approach to transport in heterogeneous formations: 2. Uncertainty analysis. Water Resour Res 1992;28(5):1377–88. doi:10.1029/91WR03085. [37] Guadagnini A, Sánchez-Vila X, Riva M, De Simoni M. Mean travel time of conservative solutes in randomly heterogeneous unbounded domains under mean uniform flow. Water Resour Res 2003;39(3):1050. [38] Sanchez-Vila X, Rubin Y. Travel time moments for sorbing solutes in heterogeneous domains under nonuniform flow conditions. Water Resour Res 2003;39(4):1086. [39] Sanchez-Vila X, Guadagnini A. Travel time and trajectory moments of conservative solutes in three dimensional heterogeneous porous media under mean uniform flow. Adv Water Resour 2005;28(5):429–39. [40] USEPA. Risk assessment guidance for Superfund Volume I: Part B, Human health evaluation. Tech. Rep. Rep.PA/540/R-92/003, December 1991. [41] Maxwell RM, Kastenberg WE. Stochastic environmental risk analysis: an integrated methodology for predicting cancer risk from contaminated groundwater. Stoch Environ Res Risk Assess 1999;13(1):27–47. [42] Hassan A, Andricevic R, Cvetkovic V. Computational issues in the determination of solute discharge moments and implications for comparison to analytical solutions. Adv Water Resour 2001;24(6):607–19. [43] Siirila E, Navarre-Sitchler A, Maxwell R, McCray J. A quantitative methodology to assess the risks to human health from CO2 leakage into groundwater. Adv Water Resour 2012;36:146–64. doi:10.1016/j.advwatres.2010.11.005. [44] de Barros FPJ, Bolster D, Sanchez-Vila X, Nowak W. A divide and conquer approach to cope with uncertainty, human health risk, and decision making in contaminant hydrology. Water Resour Res. doi:10.1029/2010WR009954. [45] Rubin Y. Prediction of tracer plume migration in disordered porous media by the method of conditional probabilities. Water Resour Res 1991;27(6):1291–308. [46] Bellin A, Salandin P, Rinaldo A. Simulation of dispersion in heterogeneous porous formations: statistics, first-order theories, convergence of computations. Water Resour Res 1992;28(9):2211–27. [47] Rubin Y, Dagan G. A note on head and velocity covariances in threedimensional flow through heterogeneous anisotropic porous media. Water Resour Res 1992;28(5):1463–70. [48] Ezzedine S. Fast computation of head and velocity covariances in threedimensional statistically axisymmetric heterogeneous porous media. Water Resour Res 1997;33(1):267–70. [49] Butera I, Tanda M. Solute transport analysis through heterogeneous media in nonuniform in the average flow by a stochastic approach. Transp Porous Media 1999;36(3):255–91. [50] Bellin A, Rubin Y. HYDRO_GEN: a spatially distributed random field generator for correlated properties. Stoch Hydrol Hydraul 1996;10(4):253–78. [51] Nowak W, Cirpka O. Geostatistical inference of hydraulic conductivity and dispersivities from hydraulic heads and tracer data. Water Resour Res 2006;42(8):8416. [52] Nowak W. Best unbiased ensemble linearization and the quasi-linear Kalman ensemble generator. Water Resour Res 2009;45.
F.P.J. de Barros et al. / Advances in Water Resources 36 (2012) 51–63 [53] Castagna M, Bellin A. A Bayesian approach for inversion of hydraulic tomographic data. Water Resour Res 2009;45(4):W04410. [54] Woodbury A, Ulrych T. Minimum relative entropy and probabilistic inversion in groundwater hydrology. Stoch Hydrol Hydraul 1998;12(5):317–58. [55] Hou Z, Rubin Y. On minimum relative entropy concepts and prior compatibility issues in vadose zone inverse and forward modeling. Water Resour Res 2005;41(W12425). doi:10.1029/2005WR004082. [56] Kullback S. Information theory and statistics. Dover Pubns; 1997. [57] Nowak W. Measures of parameter uncertainty in geostatistical estimation and design. Math Geosci, in press. doi:10.1007/s11004-009-9245-1. [58] Kitanidis P, Vomvoris E. A geostatistical approach to the inverse problem in groundwater modeling (steady state) and one-dimensional simulations. Water Resour Res 1983;19(3):677–90. [59] Kitanidis P. Parameter uncertainty in estimation of spatial functions: Bayesian analysis. Water Resour Res 1986;22(4):499–507. [60] Copty N, Rubin Y, Mavko G. Geophysical-hydrological identification of field permeabilities through Bayesian updating. Water Resour Res 1993;29(8):2813–25. [61] Chen J, Hubbard S, Rubin Y. Estimating the hydraulic conductivity at the South Oyster Site from geophysical tomographic data using Bayesian techniques based on the normal linear regression model. Water Resour Res 2001;37(6):1603–13. [62] Murakami H, Chen X, Hahn M, Liu Y, Rockhold M, Vermeul V, et al. Bayesian approach for three-dimensional aquifer characterization at the Hanford 300 Area. Hydrol Earth Syst Sci 2010;14:1989–2001.
63
[63] Bellin A, Rubin Y, Rinaldo A. Eulerian–Lagrangian approach for modeling of flow and transport in heterogeneous geological formations. Water Resour Res 1994;30(11):2913–24. [64] Bellin A, Rubin Y. On the use of peak concentration arrival times for the inference of hydrogeological parameters. Water Resour Res 2004;40(7):W07401. [65] Hassan A, Andricevic R, Cvetkovic V. Evaluation of analytical solute discharge moments using numerical modeling in absolute and relative dispersion frameworks. Water Resour Res 2002;38(2):1009. [66] Andricevic R, Cvetkovic V. Relative dispersion for solute flux in aquifers. J Fluid Mech 1998;361:145–74. [67] Labaky W, Devlin J, Gillham R. Probe for measuring groundwater velocity at the centimeter scale. Environ Sci Technol 2007;41(24):8453–8. [68] Fienen M, Clemo T, Kitanidis P. An interactive Bayesian geostatistical inverse protocol for hydraulic tomography. Water Resour Res 2008;44:W00B01. [69] Ashby S, Falgout R. A parallel multigrid preconditioned conjugate gradient algorithm for groundwater flow simulations. Nucl Sci Eng 1996;124(1):145–59. [70] Fiori A, Berglund S, Cvetkovic V, Dagan G. A first-order analysis of solute flux statistics in aquifers: the combined effect of pore-scale dispersion, sampling, and linear sorption kinetics. Water Resour Res 2002;38(8):1137. [71] Gotovac H, Andricevic R, Gotovac B. Multi-resolution adaptive modeling of groundwater flow and transport problems. Adv Water Resour 2007;30(5):1105–26. [72] Bellin A, Rinaldo A, Bosma W, van der Zee S, Rubin Y. Linear equilibrium adsorbing solute transport in physically and chemically heterogeneous porous formations: 1. Analytical solutions. Water Resour Res 1993;29(12):4019–30.