A stochastic approach to model validation

A stochastic approach to model validation

Advances in Water Resources 15 (1992) 15-32 A stochastic approach to model validation Steven J. Luis* & Dennis McLaughlin Ralph M. Parsons Laboratory...

3MB Sizes 8 Downloads 131 Views

Advances in Water Resources 15 (1992) 15-32

A stochastic approach to model validation Steven J. Luis* & Dennis McLaughlin Ralph M. Parsons Laboratory, Department of Civil Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

This paper describes a stochastic approach for assessing the validity of environmental models. In order to illustrate basic concepts we focus on the problem of modeling moisture movement through an unsaturated porous medium. We assume that the modeling objective is to predict the mean distribution of moisture content over time and space. The mean moisture content describes the large-scale flow behavior of most interest in many practical applications. The model validation process attempts to determine whether the model's predictions are acceptably close to the mean. This can be accomplished by comparing small-scale measurements of moisture content to the model's predictions. Differences between these two quantities can be attributed to three distinct 'error sources': (1) measurement error, (2) spatial heterogeneity, and (3) model error. If we adopt appropriate stochastic descriptions for the first two sources of error we can view model validation as a hypothesis testing problem where the null hypothesis states that model error is negligible. We illustrate this concept by comparing the predictions of a simple two-dimensional deterministic model to measurements collected during a field experiment carried out near Las Cruces, New Mexico. Preliminary results from this field test indicate that a stochastic approach to validation can identify model deficiencies and provide objective standards for model performance.

1 INTRODUCTION

that data limitations complicate the modeling process. Subsurface environments are typically observed only at a limited number of times and locations and the soil properties that control moisture movement are difficult to measure. As a result, we have very little objective information about the performance of unsaturated flow and transport models over time and space scales much larger than those studied in artificial laboratory situations. In this paper we propose a systematic approach for analyzing the performance of unsaturated flow models in realworld applications where uncertainties are large and data are limited. We believe that the basic concepts are sufficiently general to be used in many other applications. Our approach to model validation begins by identifying the factors which contribute to differences between predictions and observations. These 'error sources' can be grouped in various ways, depending on the model's objectives. For purposes of illustration we concentrate on models that attempt to predict large-scale trends in moisture content. Also, we assume that the data used for validation are small-scale moisture content measurements (e.g. neutron probe measurements collected at discrete points along a vertical borehole). In this case it is convenient to identify the following error sources:

One of the most basic requirements of the scientific method is the need to demonstrate in an objective way that a theory (or model) provides an acceptable description of reality. Generally speaking, this implies that the model should be able to predict experimental observations with some degree of accuracy. In environmental applications this fairly straightforward concept is complicated by a number of factors which are familiar to both modelers and experimenters. First, natural systems are often very heterogeneous and difficult to characterize in any simple way. Second, the field data needed to formulate and test theories about such systems are difficult to obtain. As a result, most environmental modeling efforts are plagued by data limitations. This applies both to the input data used to generate model predictions and to the validation data needed to check model performance. The problem of modeling moisture movement through unsaturated soil provides a good illustration of the way *Current address: Cambridge Environmental Inc, 58 Charles Street, Cambridge, Massachusetts 02141, USA. Advances in Water Resources 0309-1708/92/$05.00 © 1992 Elsevier Science Publishers Ltd.

(1) Measurement 15

Error:

differences

between

the

16

S.J. Luis, D. McLaughlin measured and true small-scale values of moisture content. (2) Spatial Heterogeneity: differences between the largescale trend we wish to predict and the true smallscale values of moisture content. (3) Model Error: differences between the model's predictions and the actual large-scale trend.

These three error sources need to be treated quite differently in an analysis of model performance. Measurement error depends only on the methods used to measure small-scale moisture content and has nothing to do with the model per se. Spatial heterogeneity is responsible for differences between small-scale moisture content values and large-scale trends. These differences are not really errors but are, rather, a reflection of the difference in scale between the measured and predicted variables. Model error is, on the other hand, a reflection of a model's inability to correctly predict the large-scale trend which is of primary interest. Model error can have many sources, including inappropriate assumptions or approximations, errors in estimating model parameters, and numerical errors. Since no mathematical model of the natural environment is perfect, we can expect that model errors will always contribute to differences between predictions and measurements. There are two closely related ways to analyze such errors. The first, which we refer to as model validation, uses field data to determine whether or not model error is 'significant'. The second, which we refer to as accuracy assessment, uses probabilistic concepts to predict the magnitude of model errors over time and space. Model validation addresses the question of whether or not a model adequately represents observed phenomena while accuracy assessment addresses the larger question of how well a model will perform under conditions that have not yet been observed. It might be argued that accuracy assessment is really what we care about - - model validation is just a means to that end. Although this is probably true, the reality is that assessment of model accuracy usually depends on many assumptions (see McLaughlin and Wood T M for a detailed discussion). One of the most significant assumptions made in such assessments is that the model's basic structure (e.g. its set of governing equations) is correct. In this case, accuracy assessment reduces to an evaluation of the effect of parameter estimation errors. 4'22 Although the assumption of structural perfection is convenient it is not necessarily justified in most practical applications. Our view is that a model validation (i.e. a comparison of model predictions with field data) should first be used to establish that the model is able to explain observed phenomena. If the model passes a set of reasonable validation criteria, we can then proceed with an accuracy assessment which assumes that structural errors are negligible. This process is not foolproof since models that pass a validation test based on limited data can still be

erroneous. Nevertheless, it does provide a way to screen out models that are clearly inadequate. With these considerations in mind, we can now turn to the question of how a validation test should be structured. One way to proceed is to estimate the effects of measurement error and spatial heterogeneity and then assume that discrepancies between measurements and predictions which cannot be explained by these factors must be due to model error. This is essentially the approach we take here. We adopt probabilistic descriptions of measurement error and spatial heterogeneity which enable us to construct confidence intervals and other test statistics. These statistics are then used to test the hypothesis that model errors are significant at a specified level of confidence (or risk). Although we view model validation as a hypothesis testing problem, the results presented in our example indicate that we cannot blindly apply standard hypothesis testing techniques. The spatially and temporally distributed nature of environmental modeling problems considerably complicates the validation process. In particular, we may find that models which perform well at some times and locations do not perform well at others. The next section of this paper describes our general approach to model validation. This is followed by a detailed discussion of a field example. The example compares predictions from a relatively simple deterministic model to moisture content measurements obtained during an infiltration experiment conducted near Las Cruces, New Mexico. 26 The Las Cruces experiment provides an unusually extensive set of soils data and validation measurements collected over horizontal and vertical distances of several metres and over timescales of a few years. We believe that this data set will enable us to compare a number of different modeling approaches using a common set of validation criteria. The validation analysis described here is the first in this series of tests.

2 GENERAL APPROACH Our approach to model validation is based on the recognition that the subsurface environment is heterogeneous and complex. Continuum descriptions of this environment are complicated by the presence of fractures, macropores, and other discontinuities which are difficult to m a p and characterize. Hydrologically significant soil properties (e.g. moisture retention and unsaturated hydraulic conductivity curves) vary significantly over scales on the order of tens of centimetres. ~ t3.2~ Moreover, the hydrologic fluxes which influence moisture movement (e.g. infiltration and evapotranspiration) vary over both time and space. Since subsurface data are difficult to obtain, it is practically impossible to construct an accurate description of small-scale fluctuations in soil properties and hydrologic

A stochastic approach to model validation fluxes. It is not reasonable to expect an unsaturated flow model to accurately predict detailed fluctuations in moisture content, particularly since our primary interest is often in large-scale behavior. If we want to focus on an unsaturated flow model's ability to predict large-scale behavior we must come up with appropriate measures of large-scale moisture movement. Some possibilities include bulk quantities such as spatial moments or smoothed quantities such as the overall trend in moisture content. Here we focus on the moisture content trend, primarily because it can be conveniently compared to uninterpolated neutron probe measurements. If we adopt a stochastic view of heterogeneity which presumes that the local fluctuation length scale is much smaller than the characteristic scale of the trend, the trend value at any time and location can be approximated by the ensemble mean (or expected value) of the moisture content. ]' The ensemble mean has a number of desirable statistical properties which make it a good choice for estimating uncertain variables such as moisture content. 7 Our emphasis on the mean moisture content is, however, not essential. The basic concepts described in this section are equally applicable to many other measures of model performance. Our stochastic description of spatial heterogeneity assumes that spatial variations in moisture content are caused, in part, by small-scale fluctuations in the moisture retention and hydraulic conductivity functions which appear in Richard's equation: (70 _ (?t

C(~b) 8~_~ = 8t

V - {K(~k)[V(~9 - z)]}

(1)

where ~(x,t) [L] is the soil water tension at vector location x [L] and time t [T], 0(x,t) [1] is the soil moisture content, C(~) = -a0(~,)/a~, [L '] is the soil moisture capacity, 0(~) [1] is the soil moisture retention, K(~) [LT t] is the unsaturated hydraulic conductivity, and z [L] is elevation (positive upwards). Note that both 0(~) and K(~) are functions of the soil water tension ~, which is defined to be positive in unsaturated soils. The shape of each of these functions varies over space in response to local changes in the physical and chemical composition of the soil. Figure 1 shows a typical set of log conductivity curves at several different locations. These curves are generally steeper where the soil is coarse and flatter where the soil is fine. Spatial variations in the magnitude and local slope of the hydraulic conductivity curves can have significant effects on the large-scale behavior of soil moisture, particularly in soils with thin highly contrasting layers. IH3 This is a consequence of the nonlinear structure of Richard's equation. Stochastic theories of unsaturated flow account for spatial heterogeneity by expressing soil properties as functions of a small number of locationdependent random parameters. These random parameters are typically characterized by their mean and spatial covariance.

17

InK(I/')I InKo(~/'o)

InK(~)I",,,,

I '~'%"

--i ~

Fig. 1. Typical log hydraulic conductivity curves from different locations at a hypothetical site. Inset shows definitions of log hydraulic conductivity slope and intercept.

Once we have adopted a stochastic description of soil heterogeneity we can derive the statistical properties of soil tension and moisture content, at least in principle, by solving eqn (1). One way to accomplish this is to carry out a Monte Carlo simulation, using many different realizations of 0(~) and K(~) obtained from a synthetic random field generator such as the turning bands algorithm. 24 Since Monte Carlo simulation of threedimensional unsaturated flow is not yet computationally feasible for problems of realistic size, we adopt the small perturbation approach suggested by Mantoglou and Gelhar,~ L~and Polmann. 21 In this case eqn (1) is linearized about the mean tension. Fourier transform techniques are then used to relate the spectral density of ~p to the spectral densities of random parameters appearing in the functions 0(q)) and K(~b). The spectral density of ~ is used to estimate the ensemble variances of tension and moisture content about their respective means. Although our modeling objective is to reproduce the ensemble mean of the moisture content distribution, the model we use does not have to be based on stochastic concepts. Most unsaturated flow models produce predictions which are smoother than reality and which are generally interpreted as trends. It is useful to have a common framework for comparing such models. So, for example, we may want to know whether a simple vertical infiltration model based on an analytical solution to Richard's equation provides a statistically acceptable fit to the ensemble mean. The same question could be asked of a numerical model which adopts a more complex layered representation of the flow domain 56 or of a stochastic model which assumes that soil properties are random fields. 'H3 All of these models attempt to approximate large-scale trends in moisture content by using 'effective' soil properties that are supposed to capture the bulk effects of small-scale spatial variability. For validation purposes we need not concern ourselves with the theoretical rationale used to derive effective properties. We only need a systematic way to evaluate

S.J. Luis, D. McLaughlin

18

the model's p e r f o r m a n c e for a given set o f effective parameters. In practice we must base our evaluation o f model p e r f o r m a n c e on c o m p a r i s o n s between model predictions and m e a s u r e m e n t s at a few scattered times and locations. This implies that we need to infer the model's ability to predict the ensemble mean indirectly, by analyzing the statistical properties of m e a s u r e m e n t residuals. One way to do this is to d e c o m p o s e the residuals into terms which account for the three error sources identified earlier. 14We begin by defining the jth m e a s u r e m e n t residual ej, observed at location xj and time tJ (for j = 1. . . . J, where J is the total n u m b e r of moisture content measurements used for validation): e(X/, ti) =

c,i =

0* - 0j(~), Vj

(2)

where 0* = 0*(xj, t;) [1] is the m e a s u r e m e n t of moisture content at (xj, tj) and 0s = 0(x;, tjl6) [1] is the model prediction at the same location and time. This prediction is c o m p u t e d from a set of estimated model p a r a m e t e r s which are assembled in the vector ~. Here we will generally use the symbol ^ to refer to estimated quantities. Equation (2) m a y be rewritten so that the three error sources a p p e a r explicitly:

- 0j] + [ 0 , - 0i] + [ 0 i - oj(o], vj

= meas.

error

natural heterogeneity

model error

(3) where 0j = O(xj, tj) [1] is the true moisture content at (xj, tj) and 0/ = Oj(xj, tj) [1] is the ensemble m e a n of 0/ (we use the overbar symbol t h r o u g h o u t to indicate the m a t h e m a t i c a l expectation operation). E q u a t i o n (3) clearly defines the separate errors which contribute to differences between m e a s u r e m e n t s and predictions. We now consider the hypothesis that the model's prediction is equal to the ensemble mean. This is equivalent to assuming that the model error term in (3) is zero. In statistical terms we a d o p t the following null hypothesis: H0: Model error is negligible, 0j(O) =

0j, V/

(4)

If we wish to apply the statistical theory of hypothesis testing to the model validation p r o b l e m we must find test statistics which can be used to check the hypothesis defined in eqn (4). These statistics should depend on available moisture content m e a s u r e m e n t s and should be designed to minimize the risk associated with m a k i n g erroneous decisions. Decision errors can be classified as either Type I (rejecting the hypothesis when it is true) or Type II (accepting the hypothesis when it is false). I f the test is very stringent it will have a small Type II error and a large T y p e I error (i.e. it will tend incorrectly to reject good models). If, on the other hand, the test is less stringent it will have a large T y p e II error and a small Type I error (i.e. it will tend incorrectly to accept bad models). There is no rigorous way to develop an ' o p t i m a l ' test

for the spatially and temporally distributed hypothesis testing p r o b l e m posed above. Instead, we need to propose a n u m b e r o f 'reasonable' tests which capture different sorts of model inadequacies. One of the most basic qualitative tests is to examine the m e a s u r e m e n t residuals graphically to check for obvious trends or anomalies. A m o r e quantitative a p p r o a c h is to determine whether statistics such as the sample m e a n and covariance of the residuals are consistent with hypothesis H0. W h e n the hypothesis is true the mean m e a s u r e m e n t residual is: =

tot -

03 + [Oi -

0,],

vi

(5)

The second term on the right-hand side is zero by construction. The first term is simply the mean m e a s u r e m e n t error (the m e a s u r e m e n t bias). In the absence of any information to the contrary we assume that the instrument used to measure moisture content is properly calibrated so that the m e a s u r e m e n t bias is zero. In this case, we expect the measurement residuals to have a zero mean at all sample times and locations: ~, =

0, Vj

(6)

This condition is fairly easy to test. In order to derive the variance of the m e a s u r e m e n t residual we introduce some additional assumptions. If the mean m e a s u r e m e n t residual is zero and we assume that m e a s u r e m e n t errors are uncorrelated with errors due to spatial heterogeneity the covariance between two different m e a s u r e m e n t residuals is: P~:,;(j, k) = [0" - 0/][0~' - Ok] + [0j - 0j][0 k - 0t]

(7) If we further assume that m e a s u r e m e n t errors at different times and locations are uncorrelated with one another and have a c o m m o n variance we can write eqn (7) as:

e,:,:(j, k) =

G + G,(J, k)

(8)

where rr~, is the measurement error variance, Poo(J, k) is the covariance between the true point moisture contents 0j and Ok, and 6jk is the K r o n e c k e r delta function (equal to one if j = k, equal to zero otherwise). The desired m e a s u r e m e n t residual variance can then be written as: ~r~, =

~r~. + aa,

(9)

It m a y be argued that m e a s u r e m e n t errors should be correlated with errors due to soil heterogeneity (e.g. because it is m o r e difficult to obtain representative m e a s u r e m e n t s in regions where the soil is very dry). By the same token, m e a s u r e m e n t errors at different locations m a y also be correlated with one another. If we knew the structure of these correlations they could be taken into account in the derivation leading to eqn (9). In fact, we rarely have enough data to derive such correlations. Consequently, we assume zero correlation here, primarily to simplify our presentation of basic concepts.

19

A stochastic approach to model validation

If reliable correlation information is available in a given application it would be advisable to include it. The moisture content variance in eqn (9) plays a key role in our approach since it defines how much variability we should expect around the model's predictions when the model structure and measurements are both perfect. That is, this variance establishes a type of lower bound on the model's ability to predict point values of moisture content. If we can derive the moisture content variance directly from the small-scale flow equation presented in eqn (1), we can use eqn (9) to evaluate the measurement residual variance to be expected when hypothesis H0 is true. If the actual residual variance is clearly much larger we can presume that H0 is not true (i.e. that model structure errors are significant). A practical method for estimating the moisture content variance is discussed later in this paper. For now, we briefly consider some statistical tests which may be used to test H0. Equations (6), (8) and (9) suggest a few simple test statistics. ~~ 2.1 M e a n residual test

Since the expectation of ej should be zero at every time and location, a sample mean computed from m a n y measurement residuals should be close to zero if Hypothesis H0 is true. This implies a test of the following form: 1 s

Decide H0 is true if: m,: =

~j__~ ~j • =

0"~/

< v

(10)

where v [1] is a test threshold selected to give the desired two-sided Type I error probability (or 'significance level'). If the hypothesis is true and the measurements are sufficiently far apart for the residuals to be uncorrelated, m,: will have a mean of zero and a standard deviation of 1/~/-JJ. If we assume that m~ is normally distributed (based on central limit considerations) the threshold value may be readily obtained from a standard normal probability table. Although the Type II error is difficult to evaluate explicitly, it will decrease as J becomes larger, for a specified significance level. If some of the measurements are too close for spatial correlations to be ignored, the test sample size (J) may be reduced to account in an approximate way for correlation effects. 2.2 M e a n squared residual test

If the measurement residuals conform to a particular probability distribution, we would expect a certain percentage to lie outside confidence bounds derived from this distribution. If, for example, the residuals are assumed to be normally distributed, the following curves define a 95% region: 01 =

0j _+ 1.96~

where a~j is obtained from eqn (9). If a significant number of the measurements 0* lie outside this region, we should

reject Hypothesis H 0. A somewhat more convenient version of the same concept relies on the following meansquared error test: 23 Decide Ho is true if: Z2 =

e7 < v -'v

/=l O'~/

(1 1)

Here v [1] is a test threshold selected to give the desired Type I error probability (or 'significance level'). If the hypothesis is true and the measurements are sufficiently far apart for the residuals to be uncorrelated normally distributed random variables, the test statistic Z2 follows a chi-squared probability distribution with J degrees of freedom. As with the mean test, the Type II error can be expected to decrease as J becomes larger, for a specified significance level. Also, the number of degrees of freedom may be reduced to account for correlation effects when the measurements are closely spaced. 2.3 Spatial structure test

If a significant number of the moisture content measurements are close enough to one another to be correlated it is possible to check whether or not the measurement residuals have the statistically stationary spectral density predicted by a stochastic perturbation analysis of eqn (1). One of the best ways to do this is to pass the residuals through a spatial 'whitening filter', z2 The output of this filter should be an uncorrelated series of adjusted measurement residuals if Hypothesis H0 is true. The spatial whitening approach is most easily implemented in one dimension (e.g. along a single neutron probe access tube). In this case the whiteness (or lack of correlation) of the filter outputs may be checked with the following test: 1 ,1

Decide H0 is true if: r e =

I

~ =~ ~:/%~1 < v

(12)

where e~ [1] is the filter output at location/', ef+~ [1] is the filter output at the next sampled location,/ + 1, and v [1] is a threshold selected to obtain a desired Type I error probability. I f J is reasonably large and the hypothesis is true this test statistic should be close to zero and v should be much smaller than 1.0. Further details are discussed in Schweppe. 22 The three tests outlined above consider different aspects of the validation problem. The mean residual test checks for systematic biases (e.g. models which give moisture contents which are consistently too high). The mean-squared residual test checks for overall fit (e.g. models which misplace the location of infiltration fronts). The spatial structure test checks for more subtle spatial features (e.g. the shape of a wetting front). Moreover, these tests can be applied to all available measurements or to selected subsets such as all measurements taken at a particular time or along a particular transect. This range of possibilities complicates the task of reaching an unequivocal yes or no conclusion about the results of a model validation. Our view is that it is

20

S.J. Luis, D. McLaughlin

wise to examine as many performance criteria or test statistics as possible in order to establish an overall picture of model performance. This is illustrated in the example discussed in the next section. The general approach outlined above provides a systematic framework for validating spatially distributed unsaturated flow models.~4 It should be pointed out that the particular methods we propose for evaluating the moisture content variance rely on assumptions that may not be justifiable in some applications. Also, further study is needed to determine the statistical properties (Type I and II error probabilities) of spatially distributed hypothesis tests. Nevertheless, we believe that the procedure we propose could make the model validation process much more objective. It also provides a way to relate our ability to validate to the amount of data we have available (recall, for example, the role of the sample size J in our suggested hypothesis tests). We discuss some of the broader implications of our approach at the end of this paper.

3 VALIDATION OF A T W O - D I M E N S I O N A L M O D E L OF T H E LAS CRUCES T R E N C H EXPERIMENT This section describes a relatively simple application of our model validation approach. Our objective is to illustrate validation methodology rather than to advocate a particular model or modeling philosophy. The model we consider is based on a two-dimensional version of Richard's equation with spatially uniform 'effective' soil properties. It is used to predict subsurface moisture contents during an extended (two-year) infiltration and redistribution experiment. The model's 'effective' properties are inferred from a large number of soil samples taken from the experimental site prior to the start of infiltration. The model validation is based on neutron probe measurements of moisture content collected from several access tubes distributed throughout the site. The entire process of model development and validation is intended to mimic a typical modeling application. The following subsections provide more details on the problem formulation, the statistical methods used to describe spatial heterogeneity, and the results of the model validation.

3.1 The Las Cruces field experiment The field experiment selected for our example is the first infiltration experiment conducted at the Las Cruces Trench Site. This site is run by investigators from the University of Arizona and New Mexico State University at Las C r u c e s . 19"2°'26"27 The Las Cruces trench experiment has a number of exceptional features which make it particularly attractive for a model validation study. First, it was designed expressly to provide data for model

validation. Second, large numbers of spatially distributed soil samples were collected before the experiment started. This data set provides a good picture of spatial heterogeneity at the site. Third, boundary conditions were carefully controlled throughout the experiment in an effort to isolate the effects of soil variability on moisture movement. Fourth, the time and space scales of the experiment are greater than those used in most other investigations of unsaturated flow. The experiment has excited considerable interest among numerical modelers and is one of several included in the I N T R A V A L model validation project. ~8 The experimental site is located 40km north-east of Las Cruces, New Mexico. Like much of the Southwest, the region is arid, receiving 23cm/yr of precipitation annually. Access to the subsurface environment is provided by an excavated east-west trench 26-4 m long by 4-8 m wide by 6.0m deep and by a number of vertical boreholes. During excavation a large number of soil samples and in-situ permeameter measurements were collected on the north face of the trench. After this, the trench was covered and provided with circulated air in order to moderate temperature fluctuations. The ground surface around the experimental plots was also covered in order to prevent infiltration and evaporation from the area lying outside the irrigation strip. Las Cruces Experiment 1 is located on the south side of the trench and consists of a 4.0 by 9.0 m irrigation strip extending southward, perpendicular to the trench. The site has been instrumented with neutron probe access tubes which are used to monitor water movement. Access tubes were arranged in three rows. One row was placed along the centerline of the strip and two were placed perpendicular to the strip, defining two cross sections. Here we limit our attention to moisture data collected from the cross section farthest from the trench. In this cross section, the eight neutron tubes are arranged from 8.11 m to the left of the centerline to 5.75 m to the right. A perspective view of the site looking at the south side of the trench with the experimental plot and access tubes used here is shown in Fig. 2a. We have adopted the coordinate system employed on Experiment 1 by workers at the site. The origin is at the soil surface where the centerline of the wetting strip meets the face of the trench. The x coordinate increases to the right, y increases away from the trench along the centerline of the wetting strip and z is positive upward. The soil at the site is composed primarily of consolidated loamy sand and sandy loam. The profile exposed by trench excavation reveals irregular horizontal layering. Moisture retention curves and saturated hydraulic conductivity values have been estimated for about 450 soil cores taken from the north face of the trench. 27 Unsaturated hydraulic conductivity values are not currently available. The saturated conductivity values do not correlate well with the textural properties of the exposed soil layers. Figure 3a shows that these

A stochastic approach to model validation

21

2m

'

2 meters Uniform applied f l u x @



I

Z e r o - f I ux



"i rrigated ~trip

i i

I/

Symmetry boundary; zero-flux

Y

Zero-flux

-X k r/

J , - ....

B o u n d a r y of c o n s t a n t tension

z

: -24

m

x : +10 m

x=O

(b)

(a)

i i i i

Fig. 2. Las Cruces Experiment 1 site (a) and associated simulation domain (b).

values tend to span the same range in all of the measured layers, with the notable exception of the top layer. Figure 3b shows a contour plot of available log hydraulic conductivity values. For all practical purposes, the Las Cruces saturated log hydraulic conductivities could have been generated randomly. The infiltration experiment consisted of a spatially

uniform application of water throughout the strip at a rate of 1.8cm/day for 82 days. Water movement was monitored during infiltration and for a considerable time afterward. The moisture content data used here cover a period of about two years after the start of infiltration, m These data were collected at depth intervals of about 25cm in each neutron access tube, initially at time

[

I +

+

+

>.,

g o

+

~,

+

Ii

2

T

O

,

2

Y~

+*

,

+

o

+

1

#

0

(a)

, o

I -2

,

,

+

,

I -4

-6

Elevation (m)

r--lbelow

1.05

1.05to 1.46 1.46 to 1.88 I~ 1.88 to 2.29 2.29 to 2.70 ~above 2.70

(b)

~- v

c -; o >~ -" ~ w -' 4 -( qO.O

-8.0

-6.0

-4.0

-2.0

0.0

2.0

4.0

6,0

8.0

tO.O

Distance from Centerline (m)

Fig. 3. Saturated log hydraulic conductivity measurements from the Las Cruces trench.

22

S.J. Luis, D. McLaughlin /-I/

-9 n

. I . . . I

. - .

I . - -

I . - .

ii~ii~~i~i~ii~ii~@~~ ~ i i i ~ i i i : : ~ ~ i~:~:,~:.~................. ~

o

. . - I

I - .

- I - .

, I - .

-I

:,~i~i:i:~i~ii:!~i:.i~i~:i~i:~@;:'~:t~-:~

. ~;~:.:^.:..~::." ....

-4.o -5.o -6.0 ~

.11::;'::~.....

-10.0 - 8 . 0 -6.0

- 4 . 0 -2.0

0.0

2.0

4.0

6.0

Distance from Centerline (m)

t=-2 days

8.0

10.0

~

0.05 0.05 to 0.08 0.08 to 0.10 0.10 to 0.12 0.12 to 0.15

below

!

0.15 to 0.17 0.17 to 0.20 0.20 to 0,22 0.22 to 0.25 labove 0.25

Fig. 4. Initial moisture content distribution for Las Cruces Experiment 1. intervals of days and later at time intervals of months. Initial values of volumetric moisture content obtained before the start of infiltration range from about 5 to 10 per cent. Figure 4 shows a contour plot of the moisture content measured at a time t = - 2 (two days before infiltration began) in the vertical plane formed by the neutron probe access tubes. 3.2 The two-dimensional simulation model

Las Cruces Experiment 1 has been simulated by a number of investigators, including Bouloutas 2 and Kool et al. 9 The two-dimensional deterministic simulation model considered here is based on the model developed by Bouloutas. It uses a mixed formulation of Richard's equation which is linearized with a modified Picard approximation and is spatially discretized using a Galerkin finite element method with linear basis functions. Temporal discretization is based on a two-step fully implicit method which provides greater accuracy and better stability than more c o m m o n one-step methods. Timestep size varies adaptively, increasing when convergence is rapid and decreasing when convergence is slow. The set of algebraic equations obtained after spatial and temporal discretization is solved with a preconditioned conjugate gradient method which gives greatly improved performance over solution techniques traditionally employed in this type of application. The mixed Richard's equation formulation ensures conservation of mass. The Bouloutas model performs a two-year simulation in about two hours on a DECstation 3100. The two-dimensional simulation domain used in our modeling application was designed to represent the right half of the vertical plane formed by the second row of neutron probe access tubes used in Experiment 1 (see Fig. 2b). The selection of a two-dimensional domain is equivalent to ignoring flow in the y direction (horizontal flow normal to the trench face). This is obviously a potential source of model error which could influence the outcome of a validation investigation. The domain dimensions of 10 m wide by 24 m deep were selected to minimize the influence of boundary conditions at the bottom and sides. As a result, the simulation domain is much larger than the region over which data were actually collected. H a l f of the wetting strip is shown at the upper left-hand corner of the domain. Infiltration

from the strip is represented by a constant flux which is set to 1.8 cm/day for the first 82 days and zero thereafter. Zero flux boundary conditions are maintained along the left side, which represents the centerline of the experimental cross section, and along the top and right-hand boundaries where water is not applied. The constant tension condition imposed at the lower boundary of the simulation domain has relatively little effect on the validation results since this boundary is far beneath the monitored region. For purposes of illustrating and comparing simulation results, the results for the right half of the cross section are simply reflected onto the left half. The model's boundary and initial conditions are summarized in Table 1. In order to apply our unsaturated flow model we need to specify moisture retention and hydraulic conductivity curves over the range of tensions observed in the infiltration experiment. Since we have only one (saturated) conductivity value at each soil sampling location, we know only one point of the corresponding hydraulic conductivity curve. As a result, we need to introduce additional assumptions before we can obtain the required model inputs. These assumptions probably have a significant impact on our analysis but are unavoidable until we can obtain independent measurements of unsaturated hydraulic conductivity. Mualem ~7has developed a capillary bundle theory for deriving unsaturated hydraulic conductivity curves from moisture retention and saturated hydraulic conductivity data. We adopt the version of this theory proposed by van Genuchten. 25 In particular, we develop a moisture

Table 1. Inputs for the two-dimensional deterministic model of Las Cruces Experiment number 1

Ks = 268.8cm/day 0s = 0.322 0r = 0.0501 ~vc = 0.0532 ~vG = 0-0499 0(+) ~n = 1.42 , /~(+) I n = 1.49 Initial conditions: ~p = 10000cm Lower tension boundary condition: ~ = 10000cm Wetting strip flux boundary condition: q = 1.8 cm/day Zero flux conditions are imposed on all other boundaries

A stochastic approach to model validation >

23

5

o

0 Q.___ o

~g "10 _o~-10

0.40

o

8 •~

0.30

0.20 0.10 0.00

,

,

,

i

,

2000

,

,

i

,

.

,

4000

i

,

,

,

6000 Tension

i

. 10000

8000

(cm)

Fig. 5. Mean (effective) moisture retention and log hydraulic conductivity functions derived from the Las Cruces trench soil samples. content curve for each soil sample by fitting (in a leastsquares sense) the following function to the corresponding unsaturated moisture content data: 0i(l~/) =

0ri "{- ( 0 s i - 0ri)[] 4- ((~vGi0)nil mi

(13)

here 0i(~) is the moisture content [1] at tension ff [1], 0ri is the residual moisture content [1], 0s~ is the moisture content at saturation [1], and ~vG~[L-I], n~ [1] and m~ [1] are fitting parameters with m~ = 1 - l/ni. The index i refers to the soil sample collected at location x~ [L]. The values of 0,~ and 0r~ are reported for each Las Cruces soil sample and are held constant during the fitting process. The least-squares estimates of ~vG~ and n~ are obtained with a Marquardt search algorithm. 4 The complete population of 450 curves obtained by fitting all available soil sample measurements provides the information needed to construct a statistical description of moisture retention variability. In particular, we can estimate the sample mean and standard deviation taken over this population at any given tension. The sample mean moisture retention curve does not generally have a van Genuchten form. It is, however, convenient for simulation purposes to fit this mean by a van Genuchten function having the same general form as eqn (13). The mean has its own set of fitting parameters which have roles similar to the parameters C~vGi,0,, 0s~, and n~which appear in eqn (13). The fitted mean moisture retention function obtained from the Las Cruces data set is plotted in Fig. 5. The moisture content standard deviation function can be fitted by a simple decreasing function of tension such as a decaying exponential. 2~ Details are provided in Luis. ~° The van Genuchten version of Mualem's theory uses an unsaturated hydraulic conductivity function which depends on the moisture retention coefficients 0tvG~,mi, and ni as well as the saturated hydraulic conductivity K~i.

This function has the following form: 25 l

K~(O) =

Ksi

-

l

txn i 1

t~vOiqJ)

[1 +

(~vG,~)"']

. . . . i,mi/2 [1 + t~vG~qO l

mi]2 (14)

Since ~vai, mi, ni, and Ksi are available for each Las Cruces soil sample, eqn (14) can be used to construct an unsaturated hydraulic conductivity function for each sampling location. In practice, it is often convenient to work with the natural log of this function. We can estimate the sample log conductivity mean and variance taken over the resulting population of 450 curves. The sample mean log hydraulic conductivity function does not generally have a van Genuchten form. It is convenient, however, to fit this mean with a function having the same general form as the natural log of eqn (14). The fitting parameters have roles similar to the C~vc~iand n~ which appear in eqn (14) (Ksi is held fixed at the value measured for soil sample i). Note that these parameters are not the same as those used to fit the mean moisture content curve. The fitted mean log hydraulic conductivity function is plotted in Fig. 5. Further details are provided in Luis. l° We somewhat loosely interpret the mean moisture retention and mean log hydraulic conductivity curves obtained from the above procedure as 'effective' soil properties. That is, our deterministic model treats the site as if it were composed of a single uniform soil described by these curves. The ad hoc technique we use to obtain the 'effective' soil properties is intended to duplicate the type of input estimation encountered in practical applications where the advantages of simplicity may outweigh those of theoretical consistency. Some of the problems associated with the concept of effective parameters are discussed by Dagan and Bresler. 3 The effective soil properties used in our deterministic

S.J. Luis, D. McLaughlin

24

model depend on the fitting parameters used to approximate the mean moisture content and log hydraulic conductivity curves. These parameters (which are represented by the vector ~ in eqn (2)) are estimated from soil sample measurements. The values used in Las Cruces Experiment 1 are summarized in Table 1. It can be expected that the estimated parameters will be closer to the unknown 'true' values if the sample size is large and representative. This suggests that the likelihood of obtaining acceptable predictions depends both on the validity of the effective parameter formulation used in the model and on the quantity and quality of the data set used to estimate the effective parameters. Both factors can contribute to the 'model error' which we consider in the model validation process. 3.3 Characterizing soil heterogeneity at the Las Cruces site The deterministic model described above provides a way to simulate large-scale moisture movement at the Las Cruces site. In order to specify how close the model's predictions should be to small-scale measurements we need to quantify the effects of spatial heterogeneity and measurement error. Following the approach discussed earlier we introduce stochastic descriptions for both 'error sources'. This enables us to develop hypothesis tests and related measures of model performance. The stochastic methods we use to describe soil heterogeneity are based on concepts outlined in Mantoglou and Gelhar ~t-~3 and Polmann. 21 The basic idea is to express small-scale moisture retention and log hydraulic conductivity curves as functions which depend both on tension and a few location-dependent parameters. The location-dependent parameters are assumed to be three-dimensional random fields characterized by specified mean and spatial covariance functions. Random variations in the soil parameters induce random variations in the moisture retention and log hydraulic conductivity functions which, in turn, induce random variations in moisture content and tension. The description of log hydraulic conductivity adopted by Polmann 2~is based on an extension of the well-known Gardner expression: InK(0)

=

lnK0(O0) - ~($o)$

(15)

where $o [L] is a fixed reference tension, a($0) [L l] is the slope of a straight line drawn tangent to the log hydraulic conductivity at $0, and In K0($o) [L --t ] is the zero tension intercept of the same line. This extended expression is quite general since it can accommodate nonlinear In K($) curves. In addition, its quasilinear form is particularly convenient for derivations of moisture content statistics from Richard's equation. ~l ~3 It should be noted that ln Ko($0) # ln K($o) is not the saturated conductivity but is rather the value obtained when the tangent line is extended back to the zero tension ordinate. This is illus-

trated in Fig. l. Equation (15) essentially defines two new soil property curves - - the slope function ~(tp) and the intercept function ln K0(O). These functions, together with the moisture retention function 0(~), provide the information needed to characterize the hydraulic behavior of an unsaturated soil at any given location. The effects of spatial heterogeneity can be introduced by writing the soil property function at location i as the sum of a mean term and a random fluctuation: lnK0~(O) = e~(O) =

F0(O) + /~F0(O)F~)Ni

(16a)

fi(O) + /~,(O)e~

(16b)

0M,) = 0(~,) +/~0(4,)0~

(16c)

where F0(O) [1], a(0) [L ~], and 0(~) [1] are the soil property means,/~v0(O) [1], /~(O) [L I], and/~0(O) [1] are the soil property standard deviations, and F(~hi[1], ~ i [1], and 0~ [1] are zero mean, unit variance random fluctuations with specified spatial correlation functions. The mean and standard deviations functions in eqn (16) depend on tension but not on location. The random fluctuations depend on location but not on tension. We assume that the random fluctuations are statistically anisotropic (i.e. their vertical correlation scales are much less than their horizontal correlation scales) and statistically stationary (i.e. their statistical properties are invariant under spatial translation). Available soils data indicate that this assumption is justified for the Las Cruces example considered here. 27 The populations of van Genuchten moisture retention and log hydraulic conductivity curves described in the last section can be used to derive the mean and standard deviation functions in eqn (16). The moisture retention functions 0(~) and/~o(~b) are approximated by the van Genuchten mean and exponential standard deviation curve fits mentioned in the discussion following eqn (13). The other functions are obtained indirectly from the log hydraulic conductivity population. Log conductivity slope and intercept functions may be constructed for each soil sample by differentiating eqn (14). These functions, which depend on the sample's van Genuchten parameters, have the following general form: ~i(O) -

1 ~Ki (I]/) K~($) ~9

lnK0i($ ) = lnKi($) + ~i($)$

(17) (18)

where Ki(~b) is given by eqn (14). The resulting population of 450 intercept and slope curves can be used to compute the means and standard deviations at any given tension. It is convenient to fit the means by functions having the same general form as eqns (17) and (18). Each mean has its own set of fitting parameters which have roles similar to the parameters ~vo~, 0ri, 0s~, Ks~ and n~ which appear in eqn (14). The standard deviations fiFO(S) and/~(~O) can be fitted by decaying exponentials. Other statistical information such as the cross-correlations

A stochastic approach to model validation among F;~, a~, and O~ may be obtained from by identifying the normalized fluctuations associated with each soil sample and then computing the appropriate sample statistics. Luis m provides an extensive discussion of the derivation of unsaturated soil statistics from soils data available at Las Cruces.

neglect time-dependence. The resulting simplified steadystate expressions for the tension and moisture content variances at any given vector location x and time t are: 2~

dO(~) + 2E[~'0~]flo0p ) - -

3.4 Derivation of the moisture content and measurement error variances

The description of soil heterogeneity in the above subsection can be used to derive moisture content statistics 2 needed to apply the hypothesis such as the variance aoi test proposed earlier. The connections between the statistics of unsaturated soil properties and moisture content are determined by the small-scale Richard's equation, which relates the random parameters CliO(x), x] and K[~,(x), x] at any location x to the moisture content O(x') at some other point x'. Unfortunately, it is quite difficult to derive moisture content statistics directly from the Richard's equation. There are essentially only two alternatives which apply to field-scale problems: (1) Monte Carlo simulation, and (2) small perturbation techniques. As mentioned earlier, Monte Carlo simulation is not yet feasible for multi-dimensional problems of the size considered here. Small perturbation techniques are feasible but only if a number of simplifying assumptions are introduced. This puts us in the difficult position of being forced to adopt even more assumptions, some of them questionable, before we can continue with our validation analysis. Our view is that it is best to see where the analysis leads so that we can establish whether or not the basic concepts are useful. Consequently, we adopt a pragmatic attitude towards assumptions, making only as many as appear to be necessary to obtain a rough estimate of moisture content variance. Our hope is that questionable assumptions can be reexamined and perhaps relaxed in the future when better analytical methods become available. Given this perspective on assumptions, we can summarize the small perturbation analysis used to obtain the moisture content variance needed to complete our model validation. Our analysis is based on the three-dimensional stochastic theory of unsaturated flow developed by Yeh et al., 2s 3o Mantoglou and Oelhar, ~q3 and Polmann. 2~ This theory makes a 'local stationarity' assumption which stipulates that fluctuations in all random variables occur on a scale much smaller than changes in the corresponding ensemble mean. ~-13 In addition, we assume that the soil is highly stratified and that the horizontal correlation scale is much greater than the vertical correlation scale. We also assume that the random fluctuations F;~, c~, and 0~ share a common vertical spatial correlation scale but are uncorrelated with one another. We neglect dependence upon mean tension spatial gradients. In effect, this assumes that moisture movement takes place only due to the effect of gravity. We also

25

o (x, t)

=

~1

A*(I + 2xA*)

(19)

[flF0(,l~) 2 + fl~(~)21~2] (20)

where: (21)

=

A*(~)

=

~(~) + ~

d~p

dV0(~) dO

(22)

Here ~(x, t) is the mean tension [L], a~(x, t) is the variance of moisture content [1], a~(x, t) is the variance of tension [L2], and 2, is the vertical correlation scale [L]. The term E[~b'0N,] [L] appearing in eqn (19) is the covariance between small fluctuations in tension and small fluctuations in the moisture retention function at the specified time and location. This term is a complicated function of the spatial covariances (or spectra) of FoN, ~ , and 0~. The complete expression is provided with a derivation in Luis. m Note that small perturbation theory provides a way to derive moisture content and tension covariance functions in addition to the variances given in eqns (19) and (20). Covariances are needed to carry out certain hypothesis tests such as the spatial structure test given in eqn (12), which is not used in our Las Cruces example. Our assumption that the Las Cruces soil is highly stratified is supported by our analysis of several hundred saturated hydraulic conductivity measurements, which suggests that the vertical correlation scale is on the order of 25 cm. This is nearly an order if magnitude less than the estimated horizontal correlation scale of 2.0 m. Our other assumptions are more difficult to justify. It is clear, for example, that the flow system under consideration is not at steady state and that its spatial gradients are not zero. It is also likely that the soil properties are correlated with one another. Nevertheless, we feel that the simplified expressions which result after our assumptions are introduced capture the overall effect of soil heterogeneity on moisture content uncertainty. This is supported by the results discussed in the next subsection. If the hypothesis H0 defined in eqn (4) is true, then the mean moisture content is equal to the corresponding model prediction at the time and location in question: O(x, t) =

O(x, t, O)

(23)

where 0 represents the vector of estimated parameters used to compute the prediction 0. Since 0 and ~ are related by the model's effective moisture retention curve this is equivalent to specifying that the mean tension is

S.J. Luis, D. McLaughlin

26

equal to the corresponding tension prediction: ~(x, t) =

~(x, t, ~)

(24)

In this case, the ~'s appearing in eqns (19)-(22) may be replaced by the model's tension predictions ~(x, t). It is important to note that the variance expressions given in eqns (19)-(22) presume that the random soil properties vary in three dimensions even though the mean moisture content and tension may only vary in one or two dimensions. That is, we account for the fact that a two-dimensional model will be unable to capture small-scale spatial variations in the unmodeled third dimension. However, if our two-dimensional model passes a validation test at some particular time we would expect that systematic trends in the corresponding experimental data would be confined to the two modeled dimensions. The moisture content variance o f e q n (19) is one of two components included in the measurement residual variance defined in eqn (10). The other component is the measurement error variance a~,. In the Las Cruces example, we adopted a value of 0.01 (1 per cent) for the measurement error standard deviation (P.J. Wierenga, personal communication). This value was obtained during the Las Cruces experiment from replicated neutron probe measurements of the moisture content in a soil column. Water contents in the column were determined gravimetrically, in accordance with standard neutron probe calibration procedures. Measurement error makes a relatively small contribution to the total residual variance (0-0001 compared to a total of from 0.0025 to about 0.02). The residual variance, which generally varies over space, is used to compute the test statistics and test thresholds defined in eqns (9)-(12). 3.5 Validation results

It is useful to begin our discussion of validation results by comparing contour plots of measured and predicted moisture content values for Las Cruces Experiment 1. These plots are shown for four different measurement times in Figs. 6 and 7. Simulation results are truncated at 6 m below the surface in order to coincide with the lowest depth at which moisture content data were taken. Tickmarks around the borders of the plots indicate the node spacing used in the numerical simulation. A kriging algorithm is used to interpolate experimental data onto the same grid. It should be kept in mind that the outermost neutron tube boreholes are located at x = - 8.11 m and + 5-75m. Data values from these boreholes are extrapolated out to the - 10 m and + l 0 m boundaries of the plots. As might be expected, the smooth model prediction contours in Figs 6 and 7 differ substantially from the irregular experimental results. This effect can be seen to increase with time as the experimental moisture distribution shows a tendency to move laterally along

preferred paths. By contrast, the predicted distribution quickly assumes a bulb-like shape which persists throughout the simulation. Differences such as these appear to be due to soil heterogeneities which are reflected both in variable initial conditions and in variable hydraulic conductivities. It is worth mentioning that contour plots of the change in measured and predicted moisture contents from their initial values can be instructive when initial conditions are highly variable. In the example considered here the overall match between measured and predicted results is about the same as in Figs 6 and 7. It is possible to get a more spatially aggregated view of model performance if we compare the spatial moments of the measured and predicted moisture contents. 9 In order to do this we use the model's linear basis functions to obtain interpolated predictions at the neutron probe sampling locations. All moments are computed from changes in moisture content relative to the initial value. The discrete second moment of order (k,1) is: N

Mkl(t) =

k I E AO(xi' zi' t)xiziA,

(25)

i-I

where AO(xz, zg, t) [1] is the change in moisture content from the initial value at (x~, z~) and A, [L2] is the area associated with measurement i. The central moments of interest here are computed from:

AO = Moo/A T

(26a)

zc = Mol/Moo

(26b)

a2x, = 2

fizz

=

M2o/Moo - x~

(26c)

Mo2/Moo- -

(26d)

2

Zc

where AT [L 2] is the total area considered. The zero-order central moment A0 measures total mass change in the two-dimensional simulation domain (assuming homogeneity in the direction normal to this domain). The first order moment zc indicates the change in location of the moisture plume's vertical center of mass. The secondorder non-central moments a2xx and az,2 measure spreading in the horizontal and vertical directions, respectively. The measured and predicted moments for Las Cruces Experiment 1 are compared in Table 2. The A0 values show that the observed moisture content increases throughout the infiltration period (the first 82 days of the simulation). The increase observed after 82 days probably reflects redistribution of moisture in the direction normal to the simulation domain. The zc values indicate that the simulated center of mass lags behind its 2 (vertical spreading) values measured counterpart. The G~ seem to be in agreement at 82 days but the simulated values are noticeably less than the measured values at 30, 371 and 737 days. The most interesting moment result is the azx (horizontal spreading) comparison, which indicates that the simulation moisture distribution does not spread as much as the measured distribution. This

A stochastic approach to model validation

27

-1 -3.0 -4.0 -2" I -5.0 -6.

1o

',i i ',',iiiiiiiiii ', iiiiii

-2.0

1 ::::::::: :::::::::::::::::::::

n1 -6.5'0 -10.0

-8.0

-6.0

-2.0 0.0 2.0 4.0 Distance from Centerline(m)

-4.0

6.0

8.0

6.0

8.0 10.0

10.0

t=30 days

r--]below 0.05 ~ o.05to 0.08 0.08 to 0.10 ~ O.lOto 0.12 0.12 to 0.15

B 0.15 to 0.17 B 0.17to 0.20 ~ 0.20 to 0.22 ~ 0.22to 0.25 ~ a b o v e 0.25

¢:: -,' o

UJ

v" "1

~-2

uJ

-5 -E

-10.0 -8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 Distancefrom Centerline(m)

t=82 days Fig. 6. Comparison of Las Cruces Experiment 1 measured and simulated moisture content distributions at 30 and 82 days. conclusion is supported by a qualitative inspection of the contour plots presented in Figs 6 and 7. Contour plots and spatial moment evaluations provide some indication of model capabilities but they do not compare the model prediction to the variable it is supposed to match - - the mean moisture content. In order to investigate this aspect of model performance we need to account for the effects of spatial heterogeneity. This can be done if we apply the error decomposition and hypothesis testing concepts described earlier. In particular, we can use eqn (19) to evaluate the residual error variance expression provided in eqn (9). This can, in turn, be used to construct approximate confidence regions (based on a normality assumption) and to compute the test statistics and thresholds needed to carry out hypothesis tests. In order to illustrate this process, we examine model predictions, 95% confidence regions and experimental data along selected vertical and horizontal transects. Figure 8 shows four vertical transects at 82 and 737 days. Each one of these transects corresponds to observations made along a single neutron probe access tube. Figure 9

shows horizontal transects at the same times. Each of these transects corresponds to neutron probe samples taken from a particular depth. It is apparent from these figures that nearly all of the observed moisture content values fall within the 82-day confidence intervals. The model captures the wetting front at this time as well as might be expected given the level of moisture content variability exhibited in the measurements. Although a large fraction of the observed moisture content values fall within the 737-day confidence intervals there is a significant number of outliers in the two more remote vertical transects (at - 8 . 1 1 m and + 5.75m). This is further evidence that the model fails to predict the horizontal spreading revealed by the experimental moisture content measurements. Table 3 shows the percentage of observations falling within the 95% confidence region at each sampling time, including two times (30 and 371 days) not plotted in Figs 8 and 9. These results are consistent with the moment comparisons presented earlier. They clearly indicate that the model performs better at intermediate times than at early or late times. When all sampling times and

S.J. Luis, D. McLaughlin

28

Ev_. -1 o

LU

-E

o (I~ -"

~-~ W

-10.0

-8.0

-6.0

-4.0

-2.0

0.0

2.0

4.0

6.0

8.0

t0.0

Distance from Centerline (m)

t=371 days

LU

~

below 0.05 0.05 to 0.08 0.08 to 0.10 0.10 to 0.12 0.12 to 0.15

ii

0.15 to 0.17 0.17 to 0.20 0.20 to 0.22

-!

E-1 o

-LI Ill

-E -10.0

-8.0

-6.0

-4.0

-2.0

0.0

2.0

4.0

6.0

8.0

10.0

Distance from Centerline (m)

t=737 days Fig. 7. Comparison of Las Cruces Experiment 1 measured and simulated moisture content distributions at 371 and 737 days. locations are considered 91.6% of the measurements fall within the 95% confidence region. This suggests that the model is marginal, at best. The hypothesis tests of eqns (10) and (11) provide additional insight about the model's behavior. Tables 4 and 5 show the test results for the mean residual and mean-squared residual tests for model error. In each case, a 'pass' means that the test accepts the Hypothesis H0 (no model error) while a 'fail'

Table 2. Comparison of measured and simulated moisture content spatial moments for Las Cruces Experiment number 1

Time (days)

A0

zc (m)

a 2 (m2)

02xx(m2)

Meas. Simul. Meas. Simul. Meas. Simul. Meas. Simul. 30 82 371 737

0-04 0.06 0.06 0.07

0.03 0.07 0.05 0.04

-2.2 -3.1 -3.2 -3.2

- 1,3 -2-8 -3.1 -3.1

1.1 2.6 2.2 2.1

0"7 2.8 3.0 3.0

3.5 4.9 11.0 13-0

2.3 3.6 6.8 8.0

means that the test rejects this hypothesis. Results are reported for significance levels of l, 5, and 10 per cent. Note that the test outcomes are not very sensitive to the significance level selected. This reflects the large sample size used to compute the test statistics. As before, the model's deficiencies are most apparent at early and late times. It is reasonable to ask whether or not the model passed the validation test. Although it is certainly possible to obtain a yes or no answer to this question (e.g. by basing the answer on a strict application of a 5% significance level criterion to one or more of the lines in Table 4 or 5), it is not clear that such an answer is meaningful. In practice our assessment of validity should consider the model's ultimate use. Will it provide velocities which will, in turn, be used to estimate contaminant exposures or travel times? Will it provide moisture predictions which will be used to optimize an irrigation strategy? It seems apparent that the concept of validity must be defined within a broader context than the validation experiment per se. This important topic is beyond the scope of our

29

A stochastic approach to model validation

-2

i j/[ +i :: L +

-.

i

8 2 d a y vertical transect at -8.11 m

+ + .

?

737 day vertical transect at -8.11m

+

"+ 4: +

i ......

g ........

> @

-2

...............

,....

82 day vertical t r a n s e c t at -O.02m

w

~ + -

++ -I4+ +

+~

+

+4 + +++

-4

.............

+.~ , t

++

.........

..........

1

i

' / ~+ +'{ 82 daY ; e r t i c a l i I . + + i t r a n s e c t at .: /'~ ..: 3"89m .," + + + J " .."



++

,

.

,

ii

737 day

+"

") }.--"

-4

+!

¢

i

-2

737 day vertical transect at -0.02 m

~. +

4

-6

! ! : :

+ + +

++ +4" 21:

+

vertical transect at 389m

++t

....

J +

+

i

i ........ i .............

-6

c

.o >

0

w

.......

.

i++

-.,:.

+i :t . . . . . . . . .82 . . . .day . . . . .vertical ........ t r a n s e c t at 5.75rn

% -4-

:

!+

• i

~4- +

. i -i i

i~: + ~ +.~

transect at 5.75 m

+ -6

0

......

......

0.1

i .........

, .......

0'2 0'3 M o i s t u r e content

0.4

0

0.1

0.2 0.3 M o i s t u r e content

0.4

Fig. 8. Vertical transects showing simulated and measured moisture contents for Las Cruces Experiment 1. paper but it is one which, nevertheless, deserves serious discussion. In our opinion, the most useful aspect of our validation experiment is the information it provides about the model's deficiencies. It is apparent that the model underestimates horizontal spreading at later times. This is the primary source of the validation problems it experiences at 737 days. There is reason to suspect that this deficiency will persist and perhaps even become more pronounced as time increases. It would be very useful to look at model performance over a longer period (at least another year or two) to see whether the spreading issue is important. Also, it would be useful to examine the impact of the many assumptions we have made in our

analysis. We hope to address this important research topic in the future.

4 CONCLUSIONS

In this paper we have presented a systematic method for assessing the validity of environmental simulation models. This method is illustrated with an application to unsaturated flow modeling. Although we assume that the modeling objective is to predict the mean moisture content over time and space, the basic concepts we have described are sufficiently general to be applied to other objectives. It should, for example, be possible to validate

S.J. Luis, D. McLaughlin

30 0.4

,

.

.

.

,

.. . . . . . . . . ,, ..' ..'

8 2 day horizontal ,, t r a n s e c t .. a t 0 . 5 m

737 d a y h o r i z o n t a l t r a n s e c t a t O,Sm

.

..

..................

...

0.2

.... : .... . _ ~ ' - . + o F---T--: .... 7-i"

.........

......

"7.7-;-7;:

.

,

,

,

i

0.4

,

,

,

,

.

.

,

,



.

-

.

.

.

.

.

.

,

737 day horizontal transect at 2.0m

82 day horizontal " transect at

.. . . . . . . . .

I

,2-Om

.' C 0

v

.

..

. . . . . . . . . . . . . . . .

. .

0"2

3 .... i ....

I[

.............

.............

0

....'

....

. . . . . . . . .

.

' . . . . . . . . . . . . . .

,

,

. . . . - ' "

,

,

I

. . . . . . . . . . . . . . . . . .

.

.

.

.

I

,

- . . . . . . . .

,

. . . . . . . .

'

-

0-4

:"

"'.

:

737 d a y horizontal transect at 6,0m

82 d a y horizontal transect at 6.0rn

'

....

...............

....

0-2 $'

; Z

. . . .

Z:;

-~. . . . . . . . . . . . . . . . . . 0

.

-10

.

.

.

J

-5

,.. .

.

-... .

.

.

.

.

.

..........

. . . . . . . . . . . . . . . . . . .

.

0

.

4-

'"

.........

.

.

5

10

-10

Distance from centreline (m)

...

..............

.

.

...."

" . . , ,

,

,

.

-5

j

. . . .

i

0

Distance from

.......... ,

,

5 centreline

,

10 (m)

Fig. 9. Horizontal transects showing simulated and measured moisture contents for Las Cruces Experiment 1.

models which are designed to predict bulk vertical infiltration rates or the locations of moving moisture fronts. In all of these cases our approach requires that we devise stochastic descriptions of soil heterogeneity and measurement error. Once this is done, an error decomposition analogous to the one given in eqn (3) can be used to analyze differences between model predictions and field measurements (the measurement residuals). This decomposition enables us to separate model-independent errors caused by spatial heterogeneity or measurement error from model-dependent errors caused by conceptual deficiencies or erroneous inputs. If we can derive the statistical properties of the measurement residuals we can

develop statistical tests which check the hypothesis that the model error is negligible. We have applied our model validation approach to a well-instrumented infiltration experiment carried out at a site located near Las Cruces, New Mexico. The twodimensional numerical model investigated in this application assumes that soil properties at the site can be described by a set of spatially uniform 'effective' moisture retention and log hydraulic conductivity parameters. Model errors can be expected to result both from the two-dimensional assumption and from the use of a rather ad hoc method for estimating effective parameters. The effective parameter estimates used in the validation were inferred from a large set of soil samples collected before

Table 3. Confidence interval statistics for Las Cruces Experiment 1 Table 4. Mean residual statistics for Las Cruces Experiment 1

Time (days)

Number of observations

Percent of observations within Confidence Intervals

30 82 371 737 Total

157 189 184 181 711

87-9 98'4 90"8 86-2 91-6

Time (days)

30 82 371 737

m~

5.654 0-067 4-824 11.230

Significance level 10%

5%

1%

Fail Pass Fail Fail

Fail Pass Fail Fail

Fail Pass Fail Fail

A stochastic approach to model validation

31

Table 5. Mean-~uared ~sidual stati~ics ~ r Las Cruces Ex~r~entl

dation, and model application should be viewed as different aspects of a single process.

Time (days)

ACKNOWLEDGEMENTS

30 82 371 737

Degrees of freedom

157 189 184 181

~2

379.8 92.1 181.2 323.0

Significance level 10%

5%

1%

Fail Pass Pass Fail

Fail Pass Pass Fail

Fail Pass Pass Fail

the experiment was started. Our validation indicates that the two-dimensional model underestimates horizontal spreading of the infiltrating moisture plume. Its performance is better at intermediate sampling times than at the earliest (30 days) and latest (737 days) times examined. The model appears to be able to predict the overall behavior of the moisture plume at time scales of two years and space scales of about 20 metres. It is not clear that the model will be able to work equally well over longer time and space scales. Our stochastic representation of soil heterogeneity implicitly assumes perfect knowledge of the means, variances, and correlation scales of the random parameters used to describe moisture retention and hydraulic conductivity. In view of the large amount of data collected at the site, this is probably not a bad assumption for the moisture retention parameters. The hydraulic conductivity parameters are much more uncertain since they are inferred from a capillary bundle theory which assumes that moisture retention and hydraulic conductivity are causally related. Although it is likely that these two soil properties are correlated with one another, it is not clear that the capillary bundle theory or, for that matter, any other theory can replace measurements of unsaturated hydraulic conductivity. If we had such measurements our validation results would be more reliable. In any case, it is unreasonable to expect modelers always to have soil property data sets as large as those available for the Las Cruces experiment. In order to make our analysis more realistic we should account for uncertainties in the soil property statistics used to compute the moisture content covariance and other quantities required to test validation hypotheses. This can be done if expressions such as eqn (19) are assumed to be conditioned on soil property sample statistics which are themselves random variables. When the uncertainty of these sample statistics is taken into account the validation confidence intervals will widen and the hypothesis tests will become less stringent. In the limit as the sample statistics become very uncertain our ability to reach a meaningful validation decision will be compromised. This implies that the success of a validation effort will be strongly dependent on the quantity and quality of data used to describe soil heterogeneity and to check model predictions. Ultimately, data collection, model vali-

The research described in this paper was supported, in part, by the US Nuclear Regulatory Commission under Contract No. NRC-04-88-074. The authors wish to acknowledge the helpful comments and suggestions provided by Thomas J. Nicholson and Lynn W. Gelhar. REFERENCES 1. Benjamin, J.R. & Cornell, C.A., Probability, Statistics and Decision Jot Civil Engineers. McGraw-Hill, New York, 1970. 2. Bouloutas, E.T., Improved numerical methods for modeling flow and transport processes in partially saturated porous media. PhD Thesis, Dept of Civil Engineering, M.I.T., Cambridge, MA, 1989. 3. Dagan, G. & Bresler, E., Unsaturated flow in spatially variable fields 1. Derivation of models of infiltration and redistribution. Water Resour. Res., 19(2) (1983) 413-20. 4. Draper, N.R. & Smith, M., Applied Regression Analysis. John Wiley, New York, 1967. 5. Hills, R.G., Porro, I., Hudson, D.B. & Wierenga, P.J., Modeling one-dimensional infiltration into very dry soils, 1. Model development and evaluation. Water Resour. Res., 25(6) (1989) 1259-70. 6. Hills, R.G., Porto, I., Hudson, D.B. & Wierenga, P.J., Modeling one-dimensional infiltration into very dry soils, 2. Estimation of the soil water parameters and model predictions. Water Resour. Res., 25(6) (1989) 1271-82. 7. Jazwinski, A.H., Stochastic Processes and Filtering Theory. Academic Press, New York, 1970, 8. Kendall, M. & Stuart, A., The Advanced Theo O, of Statistics, Vol. 2 Injerence and Relationship. Macmillan, New York, 1977. 9. Kool, J.B., Huyakorn, P.S. & Wierenga, P.J., Model calibration and simulation of flow in a heterogeneous soil. Proceedings Modelcare 90 In ternational Conference on Calibration and Reliability in Groundwater Modelling, The Hague, The Netherlands, 3-6 September 1990. 10. Luis, S.J., A stochastic approach to validation of unsaturated flow models for site characterization. Engineer's thesis, Dept. of Civil Engng., M.I.T., Cambridge, MA, 1991. 11. Mantoglou, A. & Gelhar, L.W., Large scale models of transient unsaturated flow systems. Water Resour. Res., 23(1) (1987) 37-46. 12. Mantoglou, A. & Gelhar, L.W., Capillary tension head variance, mean soil moisture content, and effective specific soil moisture capacity of transient unsaturated flow in stratified soils. Water Resour. Res., 23(1) (1987) 47-56. 13. Montoglou, A. & Gelhar, L.W., Effective hydraulic conductivities of transient unsaturated flow in stratified soils. Water Resour. Res., 23(1) (1987) 57-68. 14. McLaughlin, D. & Luis, S., A stochastic approach for validating unsaturated flow models. Proc. Geoval 90 Symposium on Validation of Geosphere Performance Assessment Models, Stockholm, Sweden, 14-17 May 1990. 15. McLaughlin, D. & Wood, E.F., A distributed parameter approach for evaluating the accuracy of groundwater predictions, 1. Theory. Water Resourc. Res., 24(7) (1988) 1037-47.

32

S.J. Luis, D. McLaughlin

16. McLaughlin, D. & Wood, E.F., A distributed parameter approach for evaluating the accuracy of groundwater predictions, 2. Application to groundwater flow. Water Resour. Res., 24(7) (1988) 1048-60. 17. Mualem, Y., A new model for predicting the hydraulic conductivity of unsaturated porous media. Water Resour. Res., 12(3) (1976) 513-22. 18. Nicholson, T.J., Recent accomplishments in the Intraval Project - - A status report on validation efforts. Proc. Geoval 90 Symposium on Validation of Geosphere Flow and Transport Models, Stockholm, Sweden, 14-17 May 1990. 19. Nicholson, T.J., Wierenga, P.J., Gee, G., Jacobson, E.A., Polmann, D.J. McLaughlin, D. & Gelhar, L.W., Validation of stochastic flow and transport models for unsaturated soils: Field study and preliminary results. Proc. DOE/ AECL Conference on Geostatistical, Sensitivity, and Uneertainty Methods' Jor Groundwater Flow and Radionuclide Transport Modeling, San Francisco, CA, 15-17 September 1988 (CONF-870971), ed. B.E. Buxton. Battelle Press, Columbus, Ohio, pp. 275-96, 1989. 20. Polmann, D.J., Vomvoris, E.G., McLaughlin, D., Hammick, E.M. & Gelhar, L.W., Application of Stochastic Methods to the Simulation of Large-Scale Unsaturated Flow and Transport. Report NUREG/CR-5094, US Nuclear Regulatory Commission, Washington, DC, 1988. 21. Polmann, D.J., Application of Stochastic Methods to Transient Flow and Transport in Heterogeneous Unsaturated Soils. PhD thesis, Dept of Civil Engineering, M.I.T., Cambridge, MA, 1990. 22. Schweppe, F.C., Uncertain Dynamic Systems. Prentice Hall, New Jersey, 1973.

23. Shea, D. Spectral analysis and performance evaluation for unsaturated flow modeling. SM thesis, Dept of Civil Engineering, M.I.T., Cambridge, MA, 1989. 24. Tompson, A.F.B., Ababou, R. & Gelhar, L.W., Implementation of the three-dimensional turning bands random field generator. Water Resour. Res., 25(10) (1989) 2227-43. 25. van Genuchten, M. Th., A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Sci. Soc. Am. J., 44(5) (1980) 892-8. 26. Wierenga, P.J., Gelhar, L.W., Simmons, C.S., Gee, G.W. & Nicholson, T.J., Validation of Stochastic Flow and Transport Models',for Unsaturated Soils: A Comprehensive Field Study. Report NUREG/CR-4622, US Nuclear Regulatory Commission, Washington, DC, 1986. 27. Wierenga, P.J., Hudson, D., Vinson, J., Nash, M. & Toorman, A., Soil Physical Properties at the Las Cruces Trench Site. Soil and Water Science Rept No. 89-002, Univ. of Arizona, Tucson, 1989. 28. Yeh, T.C., Gelhar, L.W. & Gutjahr, A.L., Stochastic analysis of unsaturated flow in heterogeneous soils 1. Statistically isotropic media. Water Resour. Res., 21(4) (1985) 447-56. 29. Yeh, T.C., Gelhar, L.W. & Gutjahr, A.L., Stochastic analysis of unsaturated flow in heterogeneous soils 2. Statistically anisotropic media with variable c~. Water Resour. Res., 21(4) (1985)457-64. 30. Yeh, T.C., Gelhar, L.W. & Gutjahr, A.L., Stochastic analysis of unsaturated flow in heterogeneous soils 3. Observations and applications. Water Resour. Res., 21(4) (1985) 465-71.