Chapter eight Geostatistics

Chapter eight Geostatistics

CHAPTER EIGHT GEOSTATISTICS 8.1 INTRODUCTION Statistics is generally concerned with the analysis and interpretation of uncertainty resulting from...

1MB Sizes 62 Downloads 93 Views

CHAPTER

EIGHT

GEOSTATISTICS

8.1

INTRODUCTION

Statistics is generally concerned with the analysis and interpretation of uncertainty resulting from limited sampling of the variable of interest. It typically employs a population distribution model that assumes that samples are normally distributed and uncorrelated. Experimental data in environmental, geotechnical, or earth sciences tend to be specially correlated. Clearly, it is to be expected that pollutant concentrations in a subsurface, for example, will be more far apart. This observation is true of most earth science data since closely spaced locations, hence samples would have subjected to similar physical, chemical and biological processes. Such correlated data cannot, therefore, be effectively manipulated with the classical statistical technique. Geostatistics is a statistical method that is particularly useful in situations where a sample value is affected by its location and its relationship with its neighbours, i.e., spatially correlated data. Historically, geostatistical application originated in the mining and petroleum exploration industries, through the pioneering work of D.C. Krige in South Africa in the 1950s, and its mathematical formalization by G. Matheron in the early 1960s (Matheron, 1971). It is now routinely employed in the environmental field, for example, to estimate chemical and/or physical parameters at locations where data is unavailable. Geostatistics are potentially powerful tool for the analysis of pollutant concentrations in a subsurface. Many restrictions have been introduced, such as the need of a large number of samples, but the method remains very efficient. The geostatistical process is a two step procedure. The first is the calculation of the experimental variogram and fitting a model to it. The variogram merely describes the special relationship between the data points. This is perhaps the most important step in kriging process. Kriging is a moving average technique which uses the variogram parameters to obtain the relationship between the data points. Estimation techniques based on the theory of regionalised variables can be used to obtain the best (in the sense that the estimation error is minimized), linear (i.e., estimated concentrations are given by linear combinations of measured concentrations), unbiased estimates of pollutant concentrations at unmeasured points in the plume based on a limited number of groundwater samples. Then, these estimates may be used to map pollutant concentrations within the plume. Furthermore, because the estimation errors at each point are also computed, it is potentially possible to evaluate the "information content" of a groundwater sampling scheme, i.e., to determine the value of individual measurements in defining the areal extent of groundwater pollution. The estimation errors may also be useful in developing procedures for optimizing the design and cost of future groundwater sampling and monitoring programs.

185

186

GEOSTATISTICS

8.2

DATA ANALYSIS C O N C E P T S

8.2.1

Histogram

The histogram is a common way to represent the distribution of experimental data. Consider a set of n measurements that are sorted in increasing order, z~ < z2 < " < z,. The interval between the largest and the smallest value is divided into m intervals by the points a o, a~, ..., am.~, am. The intervals are usually of equal length, and are selected in such away that the histogram is relatively free from abrupt ups and downs. A measurement z belongs to the k-th interval if ak.t -< z < ak. Let nk be the number of measurements that belong to the k-th interval. The ratio n J n represents the frequency of occurrences in the k-th interval. Plotting the number nk or the frequency of occurrences nk/n as a bar diagram results in a histogram, such as depicted in Figure 8.1.

Figure 8.1. Histogram of pollutant concentration data.

8.2.2

Ogive

The ogive is the cumulative frequency distribution of the data set. For the sorted data, z 1 < z: < ... < z,, we compute p~ -

i - 0.5

,

for

i = 1, . . . , n

[8.1]

n

where p, is a number that increases from 0 to 1, and represents the ratio of number of data values smaller than or equal to z~. A plot of p, against z, yields an ogive, such as shown in Figure 8.2.

DATA ANALYSIS CONCEPTS

187

1.0

0.8

er 0.6

.~

0.4

E (~

0.2

I

I

I

I

I

I

I

I

0

100

200

300

400

500

600

700

800

Concentration (ppm)

Figure 8.2. Ogive of pollutant concentration data.

8.2.3

Summary Statistics Arithmetic Mean The arithmetic mean represents the average value of the data set, and is given by:

m =

Z l + Z2+

"'" + Zn

n

-

l

2.., zt n i=2

[8.2] Ik.

_1

Median The median, Zm, is defined as the data value that is larger than half of the data set and smaller than the other half. It is defined as follows:

Zm

-

zt, (zt+zt+l)/2,

l = (n + 1)/2, l =n/2,

/f n = odd if n = even

Mean Square Difference The mean square difference is a measure of the spread of the data set. It is given by:

[8.3]

188

GEOSTATISTICS

112 = (Zl- m)2+ "'" + (an- m)2

1 s (z,- m) 2 n i=l

[8.4]

where 112is known as the variance, and its square root, 11, is the standard deviation.

Skewness Coefficient The skewness coefficient, ks, is a measure of the degree of symmetry about the central value. It is expressed as:

1 s (Z~- m)3 / :

n

~:l

[8.S]

1i3 It is a dimensionless number. A symmetric distribution has a k,. value of zero. If the data contains many values slightly smaller than the mean and a few values much larger than the mean, the coefficient of skewness is positive. If there are many values slightly larger and a few values much smaller than the mean, the coefficient of skewness is negative.

8.2.4

Normal Distribution The normal probability density function (PDF) is given by:

f (z) -

1 exp V/2rt:o2

)

(Z - m)2

[8.6]

2o 2

For data set which is normally distributed (bell-shaped), the mean and standard deviation provide enough information to reconstruct the histogram with acceptable accuracy. Figure 8.3 shows a histogram with mean m and variance o 2 and the normal probability density function distribution with the same mean and variance.

8.3

SEMIVARIOGRAM

The semivariogram is a graph that is most commonly used in applied geostatistics to explore spatial interdependence. It contains information about the scale of fluctuation of a variable. Consider the case of n measurements, Z(Xl), z(x2) .... , z(x,), where the argument x denotes the array of coordinates of the point where these measurements were taken. A plot of the square difference 1/2 [z(x,) - z ( x ' ,)] 2 against the separation distance I x, - x ~1 for n(n- 1)/2 measurement pairs, yields a scatter plot, as shown in Figure 8.4. A semivariogram is the smooth line through this scatter plot.

DATA ANALYSIS CONCEPTS

189

Figure 8.3. Histogram and theoretical distribution of normal data.

m,,

v

r>-

2

--

1

-

0 0

o"

',,

i",

I

I

1

2

.,'

~/

",

I 3

4

Figure 8.4. Semivariogram.

In the standard method of plotting the semivariogram, the axis of separation distance is divided into consecutive intervals. The k-th interval is [hlk, hUk] and contains Ark pairs of measurements [z(x,), z(x ~)]. The semivariogram function, Y(hk) is given by:

v(h k) = ~

1

Nk

~,=, [z(x~) - z(x'~)] ~

[8.7]

190

GEOSTATISTICS

The index i refers to each pair of measurements z(x3 and z(x ~) for which i

[8.8]

h tk -< IIxi - x ill < h uk

This interval is represented by a single point hk equal to the average value: Nk

k

=

1 ~ N k i:l

[[xi _ x,i[

[8.9]

A connected plot of Y(hk) against hk results in a semivariogram, as shown in Figure 8.4.

Nonstationary -..

Range ~>-

Sill

2 Stationary

0

1

2

3

h

Figure 8.5. Semivariogram illustrating stationary and nonstationary behaviour.

The behaviour of the semivariogram at distances comparable to the size of the domain determines whether the function is stationary or not. A function is considered stationary if it consists of small-scale fluctuations (compared to the size of the domain) about some well-defined mean value. For such a function, the semivariogram should stabilize around a value, called the sill, as shown in Figure 8.5. The sill is approximately equal to the variance of the data. For a stationary function, the length scale at which the sill is obtained describes the scale at which two measurements of the variable become practically uncorrelated. This length scale is known as range, correlation length or radius o f neighbourhood.

INTRINSIC MODELLING 8.4

191

INTRINSIC MODELLING

Consider that pollutant concentration, or any other spatially variable quantity, is a function of the spatial coordinates and may be represented as z(x~), z(x~, x2), or Z(Xl, x2, x3), depending on whether the quantity varies in one, two, or three dimensions. The notation z(x) will be used to include all three cases, where x is the location index (a vector with one, two, or three components). Thus,

Xl/

X1 X

=

" X1 ,

X

=

X2

;

X

=

X2 X3

[8.10]

The function z(x), known as a regionalised orfield variable, is not known everywhere, and needs to be estimated, at unknown locations, from available observations. The actual unknown z(x) is one out of a collection (or ensemble) of possibilities of z(x; 1), z(x; 2) .... This ensemble defines all possible solutions to the problem at hand. The members of the ensemble are known as realization or sample functions. The ensemble of realizations defines what is known as a random function. Since we are interested in calculating averages over all possible realizations, we use statistical moments. In linear estimation, we use the first two statistical moments of the random field, namely: (1) The mean function (m) (first moment), which gives the expected value (E) at any point x:

m(x) - E[z(x)] (2)

[8.11]

The covariance function (second moment), which is the covariance for any pair x and x':

R(h) = E[(z(x)- m(x)) (z(x')- m(x'))]

[8.12]

where h = II x - x' II = x/(x,- X ' l ) 2 -at- ( x 2 - x 2) 2 + ( x 3 - x ; ) 2 . Eqs. [8.11] and [8.12] constitute a stationary model. This model is also isotropic because it uses only the length and not the orientation of the linear segment that connects the two points. The covariance at h = 0 is known as the variance or sill of the stationary function. The sill should be finite. R(h) vanishes or tends to vanish when h exceeds a certain value, known as the range. In order to apply this model in interpolation, we first need to find the mean m, and select an expression for the covariance function and determine the optimum values of its parameters. In most cases, the mean is not known beforehand, and needs to be inferred from the data. To avoid this, it is often convenient to work with a semivariogram: 1 y(h) = -~ E[(z(x)- z(x')) 2]

[8.13]

192

GEOSTATISTICS For a stationary function, the relation between the semivariogram and the covariance function

is: 1 y(h) : ~1 E[(z(x)- z(x')) 2] : -~ E[((z(x)- m(x))- (z(x')- m(x'))) 21 1

: - E[(z(x)- m(x)) (z(x')- m(x'))] + -~ E[(z(x)- m(x)) 2]

[8.14]

+ 1 E[(z(x')- m(x')) 2] : - R(h) + R(O) 2

8.5

STRUCTURAL ANALYSIS

Structural analysis is the selection and fitting of mathematical expressions for the required first two moments of the regionalised variable. The form of these expressions comprises the model. A list of such mathematical models is given below. 8.5.1

Stationary Models

Gaussian Model

The Gaussian model is defined by the covariance and semivariogram functions: R(h) = o2 exp(- Lh--~~ } [8.15]

where o is the variance, and L is the length parameter, with values greater than zero. The covariance function decays asymptotically, as shown in Figure 8.6. Exponential Model

For the exponential model, the covariance and semivariogram functions are given by: R(h) = 02exp( - h ) [8.16] Y(h) = 0 2 ( 1 - e x p ( -

hi)

where o is the variance, and L is the length parameter. It should be noted that both O 2 and L are greater than zero. This model is quite popular, particularly for hydrogeological application. A schematic of the semivariogram and covariance function is shown in Figure 8.7.

STRUCTURAL ANALYSIS

193

1.0 0.8

v(h)

0.6 0.4

(h) ,

----~11~ .........................

I

0.2

I

0 0.1 ~_ range (a) ~

0.2 h

0.3

0.4

0.3

0.4

Figure 8.6. Gaussian semivariogram and covariance functions.

1.0 0.8 "

y(h)

I

0.6 0.4

R(h) 9

0.2

0

0.1

I

0.2 h

range (a )

}~

Figure 8.7. Exponential semivariogram and covariance functions.

194

GEOSTATISTICS

Spherical Model For the spherical model, the covariance and semivariogram functions are given by: \

1

R(h) =

--

3h 2a

-t-

1 h3]o 2 2a )

O<_h<_a

3

h>a

0,

[8.17] l( 3 h V(h) =

2 a

1 h31 02 2a 3

O<_h<_a h>a

122 ,

where the parameters are the variance 02 > 0 and the range a > 0. A schematic of the semivariogram and covariance function is shown in Figure 8.8.

1.0

i

i

y(h)

0.8 0.6 0.4

R(h)

0.2

0

0.2

0.1

0.3

0.4

h Range (a)

-1~

Figure 8.8. Spherical semivariogram and covariance functions.

Nugget Effect Model The covariance and semivariogram functions for the nugget effect model are given by: 0, R(h) = Co6(h) = Co ,

h>O h =0

y(h) = C o(1 - 6 ( h ) ) = {Co0~

h>0 h=0

[8.18]

STRUCTURAL ANALYSIS

195

where the parameters are the nugget variance Co > 0 and the Kronecker delta, 6(h), which equals one if h = 0 and 0 for all other cases. A schematic of the semivariogram and covariance functions is shown in Figure 8.9. The semivariogram jumps from 0 (at h = 0) to o 2 (at h > 0). The covariance function drops off from o 2 (at h -- 0) to 0 ( at h > 0).

1.0 I

, .

0.8

.

.

.

.

.

.

.

.

.

.

.

.

.

Z

I .

.

.

.

.

.

_L

I

.

.

.

.

.

I

1

0.6

.

.

.

.

.

.

.

.

.

.

q- .

.

.

.

.

.

T-

.

.

.

.

.

I- .

.

.

.

.

.

.

.

.

.

.

, 0.4

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

T

i - -

i . . . . . . . . . . . . . . . . . . . .

1 . . . . . . . . . . . . . . . . . . . . .

i

.....................

0.2

~. . . . . . .

[ IP 0

0.1

R(h)

} ......

l ,]

i

0.2

0.3

_

'," 0.4

Figure 8.9. Nugget effect semivariogram and covariance functions.

8.5.2 Nonstationary Models Power Model The semivariogram of the power model is given by: y(h) - ~h ~

[8.19]

where ~ and 13 are the model parameters, with tt > 0 and 0 < 13 < 2. A schematic of the semivariogram is shown in Figure 8.10.

Linear Model The semivariogram of the linear model is given by: y(h) - t~h

[8.20]

where the only parameter is the slope tx > 0 of the semivariogram. A schematic of the semivariograrn is shown in Figure 8.11.

196

GEOSTATISTICS

1.0 0.8 0.6 I

0.4 0.2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

J.

I

. . . . . . . . . . . . . . . . . . .

~t

. . . . . . . . . . . . . . .

I

9

I

0

0.25

0.5

0.75

1.0

0.5

0.75

1.0

Figure 8.10. Power function semivariogram.

1.0 0.8 0.6 ..C:

0.4 0.2

0

0.25

Figure 8.11. Linear function semivariogram.

8.5.3

Model Superposition

Linear with nugget effect The semivariogram for the linear with nugget effect model is given by:

STRUCTURAL ANALYSIS

?(h)

= ICo+ ah, t 'Ymean'

197

h <_ a

[8.21]

h > a

where ? is the sample variance, a is the linear coefficient, a is the range, Co is the variability due to nugget effect. A schematic of the semivariogram is shown in Figure 8.12. ....

1.0 h

0.8

L .........................

I

0.6 0.4

I

I

,

I

'

I

I

0.2

Co (VariabOity due to nugget effect) 0

0.25

0.5

0.75

1.0

Figure 8.12. Linear with nugget effect semivariogram.

1.0

I

P

I

"

r

o.e Y I ~

I

0.4 ~ir

.

/

. . . . . . .

.

.

.

.

1 t

.

(~,artance of _C~ t~e transition model)

, . . . . . . .

~.......

!

-

I I

0.2

Co (Variab ity due to nu~rget effect) 0

0.25

'

, I

0.5

0.75

Figure 8.13. Spherical with nugget effect semivariogram.

1.0

198

GEOSTATISTICS

Spherical with nugget effect The semivariogram for the spherical with nugget effect model is given by:

l ( -23h_ 21ah 33) '

h<_a

[Co+ C 1,

h>a

y(h) = C~

C1

a

[8.22]

where C~ is variance of the transition model. A schematic of the semivariogram is shown in Figure 8.13.

8.6

KRIGING

Kriging is merely a weighted moving average technique. It can be used to assign an estimated value to a particular location (point kriging) or to a block (block kriging). Like any other estimate, a Kriged estimate is a weighted combination of the sample values around the point to be estimated. Other linear unbiased estimators exist, such as polygons, triangles, and inverse distance methods. However, Kriging is the best linear unbiased estimator. Kriging is the process of estimating the value of specially distributed variable from adjacent values while considering the interdependence expressed in the semivariogram. The Kriging process involves the construction of a weighted moving average equation which is used to estimate the true value of a regionalised variable at specific locations. This equation is designed to minimize the effect of the relatively high variance of the sample values by including knowledge of the variance between the estimated point and other sample points within the range. The main aspects to consider in understanding the theory of Kriging are the estimation error and the weighting coefficients. Given n measurements of z at locations with spatial coordinates xl, x2,..., x n, the value of z at an arbitrary point Xo is estimated by assuming that it is a linear combination of the measurements, i.e.,

Zo* - ~

~'i z(x i)

[8.23]

i=l

Thus, the problem is reduced to selecting a set of coefficients ~1, ..., ~.,. These coefficients are the weights which are assigned to the n sample values. The difference between the estimate zo~and actual value Z(Xo) is the estimation error:

Zo* - z(x o) = ~

)~i z(xi) - z(x o)

i=l

The estimator should meet the following specifications:

[8.24]

KRIGING (1)

199

U n b i a s e d n e s s : On the average, i.e., over all possible solutions or realizations, the estimation error must be zero. That is:

ZoZ Xo , mm( l)mO1 l

[8.25]

The numerical value of the mean, m, is not specified. Therefore, for the estimator to be unbiased for any value of the mean, it is required that:

~

.~ = 1

[8.26]

i=l

Imposing the unbiasedness constraint eliminates the unknown parameter m. (2)

M i n i m u m Variance: The mean square estimation error must be minimum. The mean square error in terms of the semivariogram is given by (Kitanidis, 1997)"

.E~Zo*-Z~.o~l:

-

:s s ~,~,~,-,--, ~+~ :s ~,,~-,--o /=1 j=l

[8.27]

i:1

Thus, the problem of best (minimum mean square error) unbiased estimation of the )~ coefficients may be reduced to a constrained optimization problem. Values of)~l, ..., )~nthat minimize the objective function given by Eq. [8.27] while satisfying the constraint given by Eq. [8.26] are sought. This problem can be solved easily by using Lagrange multipliers, a standard optimization method. The necessary conditions for the minimization are given by the linear kriging system of n+ 1 equations with n+ 1 unknowns:

- ~2 ~ (

x ~ - xjll~ + ~ - - v ( i x , -

Xoll~,

i = 1, 2 ..... , n

[8.28]

j=l

Xj = 1 j=l

where g is the Lagrange multiplier.

[8.29]

200

GEOSTATISTICS It is common practice to write the kriging system in matrix notation, in the form:

[8.30]

Ax=b where x is the vector of the unknowns

X

[8.31]

=

kt

b is the vector -Y(llx I - Xoll) -Y(llx 2 - Xoll) b

[8.32]

__.

-Y(llx. 1

xoll)

and A is the matrix of coefficients 0 -y(Ix= - xlll)

-y(llx~ - x= I) 0

-Y(llx I - x , ll) -Y(llx 2 - x , ll) 9

a

-y(Ix.

- x~l)

1

-y(Ix.

- x 2 I)

1

0 1

~

[8.33]

1 0

Therefore, the problem is reduced to the solution of a linear system9 Solving this system, we obtain ~,~, ..., Xn, and ~t. In this manner, the linear estimator of Eq. [8.23] is fully specified. In addition, we can quantify the accuracy of the estimate through the mean square estimation error (MSE), which can be obtained by substituting into Eq. [ 8.27] the values of ~.~, ..., Xn obtained from the solution of the kriging system.

SOLUTION METHODOLOGY 8.7

(1)

(2)

(3) (4)

201

SOLUTION M E T H O D O L O G Y A geostatistical analysis of pollutant concentrations, for example, consists of four steps: Determine if the measured pollutant concentrations are additive and normally distributed. If they are not normally distributed, appropriate transformations are required (e.g., a natural log transformation often improves the fit of the data to a normal distribution); Estimate the spatial correlation of pairs of measured concentrations as a function of the distance and the direction of their separation, i.e., determine the experimental semivariogram; Perform a structural analysis, i.e., fit a theoretical model to the experimental semivariogram; and Perform kriging. The fitted model is used in an interpolation procedure, known as kriging, to estimate expected values of pollutant concentrations at points within the plume where measurements were not taken.

Step 1: Additivity and normal distribution The use of linear geostatistics requires that the representative volume (ReV) be additive (Journel and Huijbregts, 1978). Ion concentrations expressed in units of mass (or weight or volume) of solute per unit volume of groundwater do not represent true accumulations and are not additive. Thus, the pollutant concentrations obtained from groundwater should be multiplied by the porosity of the soil at the point where the sample was taken. The need to convert pollutant concentrations to additive variables means that the porosity of the soil also must be measured (or estimated) for each sample location. A second requirement of geostatistical analysis is that the ReV be normally distributed. Although geostatistical techniques have been proposed for cases where the probability distribution of the ReV deviates from normality, they are not usually necessary since deviations from normality can often be corrected by an appropriate transformation or by the elimination of erroneous measurements in the sample data. Often, a log-transformation will improve the fit of the ReV to a normal distribution. This transformation is given by:

y(x) = log [z(x) + A]

[8.34]

where y(x) is the natural log-transformed pollutant concentrations, z(x) is the pollutant concentrations, and A is a constant added to z(x) to improve the fit. The mean and variance of z(x) are given by (Rendu, 1978):

m =exp(, + o 2 = m 2 [exp (132) - 1]

[8.35]

[8.36]

202

GEOSTATISTICS

where rn is the mean of z(x), o 2 the variance of z(x), a is the mean ofy(x), and

B2

is the variance of

y(x). Step 2: Calculation of experimental semivariogram Given a normally distributed set of data z(x) (it may be the natural concentration or the logtransformed concentration), an estimation of the two first moments is required for a linear geostatistical analysis. The first order moment is simply the expectation or mean. There are three second-order moments in geostatistical analysis, namely, variance, covariance, and variogram. These are, usually, functions of the position, (x). However, under the assumption of stationarity, the covariance and semivariogram do not depend on the positions of the pairs of measurement points employed, but only on the vector h separating them (i.e., h = xi - xm). Most of the time, it is not possible to know, a priori, the variance and covariance of z(x) as required for second-order stationarity. In this case, a less strict hypothesis of stationarity can be used. Since only one realization of pollutant concentrations is available in the sample data z(x), only an estimate of the semivariogram, given by Eq. [8.7], is possible. There are two sources of errors associated with the estimate of the semivariogram. Since a realization of the pollutant concentration is not known for every point in the plume, the first source of error is due to the limited number of measurements that are available. This error is called the variance of estimation, and it depends on the number of samples N(h) used to estimate ),(h), for each vector h. The error decreases as N(h) increases. The second source of error is from the fluctuations of the local mean (defined from point to point in the plume) about the assumed mean for the entire plume. These fluctuations cause estimates of the semivariogram to also vary from point to point in the plume. A theoretical analysis of these two sources of estimation errors can be used to determine the minimum number of sample pairs and the maximum magnitude of the vector h for which the semivariogram can be approximated. The analysis requires that the fourth-order moments of z(x) be known. For few samples, these moments cannot be computed accurately. However, practical rules exist for estimating the semivariogram from a set of sample values (Journel and Huijbregts, 1978): Rule #1:

N(h) > 30 - 50

[8.38]

Rule #2:

Ithll< L/2

[8.39]

where IIhllis the magnitude of the separation vector h, and L is the longest dimension (in direction of h) of the pollutant plume. The restriction on Ithll given by Eq. [8.39] has important implications for the estimation procedure (kriging) to be described in step 4. Since the experimental semivariogram cannot be computed for I[hllgreater than L/2, any calculation made with the fitted semivariogram models are restricted to I[hllless than L/2. One important consequence of this restriction is that kriging should not be used to estimate pollutant concentrations for points that are at a distance greater than L/2 away from the nearest sample. The restriction on the number of sample pairs given by Eq. [8.38] also affects the estimation of the true semivariogram. In the general case, an adequate number of sample pairs can be obtained by grouping sample pairs into distance classes for each specified vector h o. Sample pairs can be

APPLICATION

203

grouped by defining tolerance intervals for Itholl,i.e., A Itholl.If a sample pair falls within the tolerance intervals for the specified vector h o, that sample pair is retained and used in the calculation of T(ho). Sample pairs that do not meet these criteria are discarded.

Step 3: Structural analysis Structural analysis is the fitting of a mathematical model to the experimental semivariogram. Any mathematical function may be used to model an experimental semivariogram as long as it is positive-definite, i.e., increasing or constant with increasing h and non-negative. In practice, only a few types of functions are typically used. Models are of two types depending on whether or not the experimental semivariogram exhibits a sill. The most commonly used models are discussed in section 8.5. The procedure for fitting a mathematical model to the experimental semivariogram consists of several steps. Preliminary estimates of the model parameters (e.g., a and Co) can be obtained by visual inspection or by the method of weighted least squares (weighted because the number of sample pairs used to compute each point of the semivariogram is, usually, different). At this stage, the fitting procedure can be guided by knowledge of the physical properties of the pollutant, plume, subsurface, and local groundwater flow pattern.

Step 4: Estimation of the unknown contaminant concentrations (Kriging) When we know only the two first moments of the distribution, the only way to estimate the unknown contaminant concentration is to assume that it is a linear combination of the known values, i.e.,

z *(x) : ~ I., z(xi)

[8.40]

i=1

where ~'i are the weights assigned to the n sample values and z*(x) is estimated value of the contaminant concentration at location x. The best linear unbiased estimate may be obtained by using the Lagrangian method, as discussed previously.

8.8

APPLICATION

The data used in this example, reproduced in Table 8.1, came from Chem-Dyne waste site, located in Hamilton, Ohio, near the confluence of Great Miami River and the Fort Hydraulic Canal (Cooper and Istok, 1988). The Chem-Dyne Corporation began operation in 1975 as a chemical waste transfer, disposal, and storage facility. Wastes that have been present at the site include pesticides, PCBs, polychloronated biphenyls, lab packs, acids, resins, solvents, heavy metals, and cyanides (CH2M Hill, 1984). As of 1979, the waste disposal and storage activities at the site were terminated, and a cleanup program was implemented under the management of the US Army Corps of Engineers (US EPA, 1984). The site has the following geologic characteristics: the aquifer underlying the site consists of unconsolidated alluvial deposits extending to depth greater than 45 m. The deposits consist

204

GEOSTATISTICS

primarily of sandy gravel or gravelly sand. Overlying these deposits, but sometimes extending below the water table, are finer deposits of silts, silty sands, and sandy silty clays. On top of these finer materials, a cap of fill and rubble has been placed. This fill is extremely variable in composition but does not appear to extend below the water table (CH2M Hill, 1984).

Table 8.1" Well locations (x, y, z) and barium concentrations Well x y z Conc. Well x Num. ( m ) ( m ) (m) (mg/m 2) Num. (m)

y (m)

z (m)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

547 541 752 759 350 310 451 470 538 538 412 688 562 534 496 421 245 207

9.89 2.69 3.08 11.95 11.95 1.37 1.05 11.20 2.35 10.88 2.16 19.14 2.65 19.14 10.03 1.95 9.88 2.4

795 802 769 678 623 573 573 536 517 519 451 472 477 551 496 370 115 335

320 331 543 348 412 288 260 194 292 374 310 442 455 656 611 83 237 489

10.31 2.49 2.23 0.59 1.50 0.36 0.99 1.45 0.81 1.88 1.07 8.35 0.51 0.00 2.35 2.19 1.63 1.63

1400 1280 901 4330 3460 7360 0 1700 2680 4630 707 13500 1950 0 1410 0 1100 0

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

151 162 309 327 566 519 534 534 536 547 586 376 453 175 346 406 397 395

Conc. (mg/m 2) 832 790 0 4940 2680 1680 3080 9220 4520 755 887 0 7560 1530 0 4440 700 1690

With the geostatistical procedures described in the previous sections, the mean and the variance of the concentration in the soil were calculated. Then, histograms and the cumulative distributions were obtained, as shown in Figures 8.14 to 8.16 for barium. From the cumulative distribution plot, it is possible to determine whether the concentrations are normally or In-normally distributed. To get the In-transformed data, we apply the transformation C' = In (C). Most of the time, these operations give a smooth curve that better approximates a normal distribution. The test of normality (or In-normality) is usually done by visual inspection. Quantitatively the probability that the ReV is normally distributed may be determined from any of several statistics, e.g., the Chisquared statistic, the Kolmogorov and Smirnov statistic, or the Shapiro-Wilk Statistic. From Figures 8.14 to 8.16, it is possible to see that this distribution is most likely a Innormal distribution.

APPLICATION

205

Figure 8.15. Cumulative distribution of barium concentration.

To analyse barium distribution in the polluted soil, the semivariogram shown in Figure 8.17 is constructed. A theoretical semivariogram that approximates the experimental semivariogram, shown in Figure 8.17, is then thought. Various theoretical semivariogram models, some of which are detailed in section 8.5, are evaluated, and that with the minimum mean squared error (MSE), given by Eq. [8.27], is selected.

206

GEOSTATISTICS

Figure 8.17. Experimental semivariogram and the best fit model of semivariogram (linear with nugget effect).

In the present example, a linear model with a nugget effect provided the best fit, of the models attempted, to the experimental data. The main coefficients are: (1) range = 300 (m), (2) slope of the linear part - 0.002, and (3) nugget effect = 0.95 (In (semivariogram)). The calculated MSE

SUMMARY AND CONCLUDING REMARKS

207

is 0.1. From Figure 8.17 we see, however, the linear with a nugget effect model is not in good accord with the experimental semivariogram. Table 8.2 shows results of calculations of barium concentrations and variance at different locations, specifically at z = 2 m, x = 400 m, and y varying from 200 m to 600 m at 25 m intervals.

Table 8.2: Calculated barium concentrations and variance Result x y z Kriging estimates No. m m m * 103 (m$/m 2) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

8.9

400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400

200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

4 3.7 0.7 5.4 1.9 4.2 4.4 2.7 4.5 3.8 2.2 5.3 3.8 1.8 2.8 1.2 3.8

Kriging variance * 106 (mg 2/m 4) 2.8 2.6 2.8 2.6 2.8 2.6 2.8 3.2 2.8 2.8 2.8 2.8 3.0 2.8 2.8 2.8 3.0

SUMMARY AND CONCLUDING REMARKS

In summary, the geostatistical analysis gives a good estimation of the pollution trends for subsurface soil. Then, from the calculated results, we can estimate the area of high pollutant concentrations and determine the locations of the next sampling points and groundwater monitoring well locations.