Spatial Statistics (
)
–
Contents lists available at ScienceDirect
Spatial Statistics journal homepage: www.elsevier.com/locate/spasta
Universal kriging with training images Lewis Li a,∗ , Thomas Romary b , Jef Caers a a
Department of Energy Resources Engineering, Stanford University, Stanford, CA 94305, United States
b
Equipe Géostatistique, Centre de Géosciences, Mines ParisTech, Fontainebleau, France
article
info
Article history: Received 27 August 2014 Accepted 15 April 2015 Available online xxxx Keywords: Geostatistics Variogram Training image Universal kriging Trend modeling
∗
abstract In the past decade, the training image (TI) has received considerable attention as a source for modeling spatial continuity in geostatistics. In this paper, the use of TIs in the context of kriging is investigated, specifically universal kriging (UK). Traditionally, kriging relies on a random function model formulation whereby the target variable is decomposed into a trend and residual. While the theory is firm and elegant, the actual practice of UK remains challenging; in particular when data is sparse, and the modeler has to decide what to model as the trend and as the residual. This paper juxtaposes this variogram-based universal kriging (UK-v) with a TI-based approach (UK-TI). It is found that the latter need not rely on random function theory, but rather on the specification of a TI on which ‘‘universal’’ conditions are verified. Through illustrations with examples, it is seen that the modeling challenge in UK-TI is on the training image. Using a Monte Carlo study, the statistical performance of both methods is found to be comparable. Recommendations on which method to choose, based on practical criteria, are also formulated. Additionally, the study provides more insight into the use of the TI in general, including in multiple-point geostatistics. © 2015 Elsevier B.V. All rights reserved.
Corresponding author. Tel.: +1 650 725 0734. E-mail address:
[email protected] (L. Li).
http://dx.doi.org/10.1016/j.spasta.2015.04.004 2211-6753/© 2015 Elsevier B.V. All rights reserved.
2
L. Li et al. / Spatial Statistics (
)
–
1. Introduction The use of training images and the development of methodologies and algorithms for modeling spatial continuity has received considerable attention since the publication of Strebelle’s seminal 2002 paper in Mathematical Geology (Strebelle, 2002). Many application areas beyond subsurface geology have now been reached, including medical imaging (Pham, 2012), soil sciences (Meerschman et al., 2013), remote sensing (Atkinson et al., 2008; Jha et al., 2013), climate science (Mariethoz et al., 2012), ecology (Relethford, 2008; Ver Hoef, 2008), and even finance (Kanevski et al., 2008). Several algorithms have been developed that aim to extract spatial statistics from these training images, some were based on rigorous random function theory and Markov Random field formulations (Tjelmeland and Besag, 1998; Winkler, 2003; Cressie, 1993; Daly, 2005; Stien and Kolbjørnsen, 2011), others were inspired by random function theory (Strebelle, 2002; Dimitrakopoulos et al., 2010; Peredo and Ortiz, 2011), in addition to stochastic computer-aided design methods that do not rely on random function theory (Zhang et al., 2006; Arpat and Caers, 2007; Mariethoz et al., 2010; Honarkhah and Caers, 2010; Tahmasebi et al., 2012; Mariethoz and Lefebvre, 2014; Mahmud et al., 2014). The reader is advised to check Hu and Chugunova (2008), Daly and Caers (2010) for reviews as well. While these algorithms represent a diversity of purposes and aims, they share a common trait: borrowing some statistics from training images instead of relying on variograms modeled from data. Due to the growing interest and understanding of training images, there has been a revival in investigating the link between it and random fields (Ortiz, 2008). Recently, Emery and Lantuéjoul (2013) examined using the training image as a substitute for a random field model in the context of geostatistical simulation. In this paper, traditional variogram based methods, which have been at the core of geostatistics since its inception (Matheron, 1969), and the topic of many books since (e.g. Goovaerts, 1997, Chilès and Delfiner, 2009, Diggle and Ribeiro, 2007), are examined. The particular method of interest is universal kriging; an approach formulated by Matheron based on Random Function (RF) theory, which uses variograms to verify a universality unbiased condition (herein referred to as UK-v). Specifically, an alternative version is proposed that is not dependent on RF theory, but rather relies on a training image(s) to verify the universality conditions (UK-TI). Journel (1997) propounded a similar idea, based on Matheron’s schéma glissant or repetition process. However, at the time the level of maturity of training image methods prevented any practical case studies. The objective of this paper is to link recent advancements in MPS with a RF-less formulation to perform universal kriging. From a conceptual viewpoint, UK-TI relies on a non-parametric approach of interpreting spatial continuity that is more intuitive to non-experts of geostatistics than parametric models such as the variograms used by UK-v. In contrast to UK-v, which models the target variable as the sum of a trend and a residual component, UK-TI shifts the modeling emphasis from estimating the residual covariance to the generation of a representative training image, an area that has seen much development in recent years. This paper is primarily concerned with the conceptual differences between these two approaches, as well as examining their practical applications to real cases. A brief review of UK-v is given followed by a discussion on the challenges in applying the theory to practice. Next, the UK-TI formulation is developed, and the differences from UK-v are explored. In the example section, both methods are applied to the same case studies, and considerations such as: the size and nature of the training image, the decomposition of trend versus residual values, and the problem of sparse datasets, are explored. 2. A review of universal kriging 2.1. Optimal prediction with known covariance The Universal Kriging (UK) model was introduced by Matheron in Matheron (1969) as a way to optimally estimate a natural variable in the presence of a systematic behavior or trend. UK is a random function model where the expectation m(x) of the random function model Z (x), x ∈ Rd (assumed to model the regionalized variable z at stake), is unknown. UK differs from simple kriging (SK) in that the expectation is allowed to vary spatially; an important distinction especially when data is sparse, or extrapolation beyond the data is required (Journel and Rossi, 1989). The expectation is assumed to have
L. Li et al. / Spatial Statistics (
)
–
3
d the form m(x) = l=0 βl fl (x), where the fl are known functions of R and the βl are unknown constant coefficients. In his seminal work (Matheron, 1969), Matheron introduces UK as the following model:
L
Z (x) =
L
βl fl (x) + Y (x)
(1)
l =0
where Y (x) is an independent zero mean second order stationary random function1 with covariance C(x, y) = E(Y (x)Y (y)), often called the residual. As quoted by Matheron, the UK estimates deserve this qualifier in the sense that they can be used whatever the value of β , hence the so-called ‘‘universality conditions’’ designating the non-bias conditions. Indeed, the kriging predictor Z ∗ of Z at location x0 is a linear combination of the sampled values: Z ∗ (x0 ) =
n
λi Z (xi ).
(2)
i=1
As it is also unbiased, which gives: E(Z ∗ (x0 ) − Z (x0 )) =
=
n
L
λi
i=1
l =0
L
βl
βl fl (xi ) −
βl fl (x0 )
l =0
n
l =0
L
λi fl (xi ) − fl (x0 ) = 0.
(3)
i =1
(3) must be true whatever the value of the βl . Hence the ‘universality conditions’ that the kriging estimate must verify are: n
λi fl (xi ) = fl (x0 ).
(4)
i=1
The next step is to minimize the estimation mean squared error under the condition stipulated in (4). This gives us: E(Z ∗ (x0 ) − Z (x0 ))2 = V(Z ∗ (x0 ) − Z (x0 )) =
n n
λi λj Cij − 2
i =1 j =1
n
λi Ci0 + C00 ,
(5)
i =1
where Cij = C(xi , xj ). The condition (4) is enforced by using the method of Lagrange multipliers. This leads to the UK system (for detailed calculations: Chilès and Delfiner, 2009):
n L λ µl fl (xi ) = Ci0 j Cij +
∀ i ∈ {1, . . . , n}
n λi fl (xi ) = fl (x0 )
∀ l ∈ {0, . . . , L}.
j =1
l =0
(6)
i=1
This system is invertible if and only if the matrix F, whose columns are the functions fl values at (x1 , . . . , xn ) is of full rank L + 1. By denoting β = (F′ C −1 F)−1 F′ C −1 Z, the generalized least squares (GLS) estimator of the drift parameters vector, the UK estimate at x0 takes the following form: Z (x0 ) = ∗
l l =0
fl (x0 ) βl +
n i=1
λ
K i
Z (xi ) −
l
fl (xi ) βl
,
(7)
l=0
where the λKi are the simple kriging weights computed as if m(x) were known and subtracted from the data. This illustrates the additivity theorem (Matheron, 1969): the kriging estimate is the sum of the optimal estimate of m(x0 ), while the second term corresponds to the simple kriging estimate of the
1 In Matheron (1969), Matheron allows Y to be an intrinsic random function but shows that the possible constant term in (1) is indeterminate under this assumption, which is in accordance with the strict intrinsic hypothesis.
4
L. Li et al. / Spatial Statistics (
)
–
residual term, where the true mean has been replaced by its optimal estimate. The additivity property also holds for the UK prediction error variance (e.g. Chilès and Delfiner, 2009). Usually the first basis function (case i = 0) is the function identically equal to 1, which guarantees that the residual Y (x) is centered. The others can be any functions known over the domain of interest. In particular, explanatory variables, when available, can be taken into account in the model (1). This is frequently the case in environmental applications, e.g. De Fouquet et al. (2007) and Romary et al. (2011). UK is sometimes referred to as in (1) where the (fi (x))i=0,...,p are monomials, while the term external drift kriging is used when the fi are any other functions, including auxiliary variables. 2.2. Practical considerations of UK-v Application of UK-v using the aforementioned procedure requires knowledge of the covariance matrix C , which in practice is almost never known. The estimation of the covariance parameters is a difficult exercise, and a wide variety of approaches such as moments method or maximum-likelihood method under a Gaussian hypothesis have been proposed. Refer to Appendix A for a discussion of covariance estimation techniques. Furthermore, UK-v is generally performed using moving neighborhoods, which can result in discontinuities in the resulting kriging map. In practical applications, this can be addressed using an approach known as the Continuous Kriging (CK) correction, which is further expounded upon in Appendix B. 2.3. Discussion While UK has been widely used for the past forty years, it is still controversial since its formulation (1), of combining a deterministic drift and a correlated residual, hides technical difficulties in its implementation. The numerous different existing estimation methods all have their drawbacks: variogram-based methods suffer from bias while likelihood-based methods may not be appropriate when the distribution of the residuals departs from the Gaussian assumption. In particular, the number of drift functions should not be so large so as to allow the estimation of the covariance parameters. Perhaps the most important point concerns the separation of scales that should exist between the structures of the drift functions and of the residual. Indeed, the more separated the two components are, the lower will be the bias. It is in the philosophy of the UK model that the drift functions capture the large scale fluctuations while the residual structure govern the smaller ones. The difficulty lies in the separation between these two components, which is left to the practitioner and should be guided by the data. There is still a fair amount of choice in the practice of UK (Matheron, 1978) but still remains an active domain of research (Paciorek, 2010). 3. Universal kriging with training images 3.1. Formulation In this section, a radically different approach that does not rely on the notion of random functions is reviewed and further developed. The ideas presented here were first proposed by Matheron (1978) as schéma glissant and later further suggested as deterministic geostatistics in Journel (1997), although neither author provided any practical case studies. The main contribution in this paper will be on providing links with recent developments in geostatistics that rely on training images instead of variogram models to specify spatial continuity. Assuming at least one training image to be available on a gridded domain denoted as: zTI = {zTI (x), x ∈ DTI } .
(8)
In the same way as done for universal kriging, an ‘‘estimator’’ for Z at location x0 is required. This is denoted as z # , a notation used to distinguish it from the random-function based estimator Z ⋆ in (2): z # ( x0 ) =
n i =1
λi z (xi ).
(9)
L. Li et al. / Spatial Statistics (
)
–
5
Note the choice of lower case notation, as random variables are not used to derive equations for the weights. The data event in (9) has a data configuration denoted as: cf = {x0 + h1 , x0 + h2 , . . . , x0 + hn } .
(10)
Equivalent to unbiasedness in (3), a condition is introduced as follows: Zero error sum condition: If estimation was to be performed over the training image, using the same data configuration as the data in Section (10), then the sum of the differences between the estimators and the actual values equals zero. Mathematically this can be written as:
# zTI (x) − zTI (x) = 0.
(11)
x∈TIcf
The summing cannot be specified over the entire training image, instead goes over an area eroded with the data configuration cf. The erosion is one of two fundamental operations in mathematical morphology (a field co-invented by Matheron and Serra (Serra, 1983)). TIcf denotes this eroded area. TIcf = {x0 |x0 + hi ∈ TI , ∀i = 1 . . . n} .
(12)
That is to say, for a given data configuration, the eroded image represents the set of locations on the training image that the given data configuration would be valid at. The zero error-sum condition is further developed as follows:
# zTI
n
λi
i=1
⇒
i=1
n
x∈TIcf
i =1
(x) − zTI (x) =
x∈TIcf
⇒
z (x + hi ) =
λi z (xi ) − zTI (x) = 0
zTI (x)
x∈TIcf
x∈TIcf
1 1 λi (zTI (x + hi )) = z (x). nTIcf x∈TI nTI cf x∈TI TI cf
(13)
cf
nTIcf is the number of grid cells or locations x in the eroded TI. It appears now two averages are present. The first is the average of the z-values in the TI eroded by cf. avTIcf =
1
nTIcf x∈TI cf
zTI (x).
(14)
The second is the average of the z-values in the TI, first eroded by the cf, then translated by −h. avTIcf (hi ) =
1
nTIcf x∈TI cf
zTI (x + hi ).
(15)
The zero error-sum condition, which is the equivalent of the unbiasedness constraint, is then written as: n
λi avTIcf (hi ) = avTIcf .
(16)
i=1
Equivalent to the second condition in kriging, the following loss condition is specified: Minimum sum of square error condition: If estimation was to be performed over the training image, using the same data configuration, then the sum of the squared differences between the (linear) estimators and the actual values are smallest possible.
6
L. Li et al. / Spatial Statistics (
)
–
Mathematically this translates to: min
(zTI# (x) − zTI (x))2 .
(17)
x∈TIcf
The sum of squares can be further developed as:
sse =
(zTI# (x) − zTI (x))2
x∈TIcf
n
=
x∈TIcf
λi zTI (x + hi ) − zTI (x)
i=1
n n
=
x∈TIcf
=
2
n n
λi λj
n
λi zTI (x + hi )zTI (x) +
2 zTI
( x)
i =1
zTI (x + hi )zTI (x + hj )
x∈TIcf
i=1 j=1
−2
λi λj zTI (x + hi )zTI (x + hj ) − 2
i=1 j=1
n
λi
zTI (x + hi )zTI (x) +
2 zTI (x).
(18)
x∈TIcf
x∈TIcf
i=1
The notation for the various sums of products is denoted sop, which is the analog of the experimental non-centered covariance estimate: sopTI (hi , hj ) =
1
nTIcf x∈TI cf
zTI (x + hi )zTI (x + hj ).
(19)
Hence, the sse in (18) can be expressed as: sse nTIcf
=
n n
λi λj sopTI (hi , hj ) − 2
i=1 j=1
n
λi sopTI (hi , 0) + sopTI (0, 0).
(20)
i=1
The notation sop is used to emphasize that the notation of covariance (requiring expectation) is not used here. Minimization of the sse under the linear constraint proceeds in the exact same way as for kriging and results in the following normal system of equations: n
λj sopTI (hi , hj ) + µavTIcf (hi ) = sopTI (hi , 0) i = 1..n
j=1 n
λi avTIcf (hi ) = avTIcf .
(21)
i =1
Note that the sum of products must be computed over the eroded training image for the system to be consistent. Refer to Appendix B for a detailed treatment. The minimum achieved by solving this problem is then the error variance under this procedure and is expressed as: ssemin = sopTI (0, 0) −
n
sopTI (hi , 0) − µavTIcf .
(22)
i =1
3.2. Practical considerations of UK-TI Training image Obviously, the largest hurdle for practical application of UK-TI is the availability of a training image that is representative of the primary domain. Specifically, the training image should be reflective
L. Li et al. / Spatial Statistics (
)
–
7
of the spatial continuity, and presence of any patterns or features of the domain in question. Quite often, this is not available, but rather a training image that is on a different domain, and potentially a different trend that is not indicative of the primary domain, is available. In TI literature, this trend is commonly referred to as the auxiliary variable (Hu and Chugunova, 2008). The auxiliary variable for the primary domain can be obtained from a variety of sources (for instance remote sensing, or subsurface seismic). A variety of approaches in MPS have been developed to generate non-stationary models such as DISPAT (Honarkhah and Caers, 2010), SIMPAT (Arpat and Caers, 2007), or Direct Sampling (Mariethoz et al., 2010) that still have the characteristics of the original training image, but with this local trend. This procedure is also not limited to a single auxiliary variable. Even if auxiliary data on the primary domain is not available, one can still attempt to generate a local training image through conditional simulation by using the data on the primary domain as the conditioning data. See Mariethoz and Caers (2014) for a comprehensive overview of the MPS methodologies that can also account for scale/affinity and orientation. In Section 4, Direct Sampling will be used to provide a demonstration of the capabilities of such MPS techniques. Finite domain artifacts Another practical consideration that may arise when applying UK-TI is that due to the finite physical size of the training image, only local search neighborhoods can be used. An implicit assumption of kriging in general is that the study area is infinite, which can cause large kriging weights to be assigned to the boundary data points in a given data configuration (Babak and Deutsch, 2008). This is analogous to the string effect described in Deutsch (1993), and is a direct consequence of the dimension change of the eroded image (Eq. (12)), at local neighborhood boundaries. This causes the cross SOP terms in Eq. (21) to fluctuate, causing artifacts. A remedy for this issue is suggested in Deutsch (1994), by progressively increasing the search neighborhood, solving the kriging system, and averaging the kriging weights. This is referred to as the finite domain kriging correction. A second tactic that can be used in conjunction is to set a minimum number of data points that each search neighborhood must encompass; this is especially important when dealing with sparse data sets. 3.3. Discussion Before presenting case examples in which both approaches are compared practically, the main methodological differences are emphasized. Modeling effort: One of the most important differences lies on where the effort in modeling takes place. UK-v often calls for a parametric approach of fitting parametric functions to the data of the domain. Training image methods rely on the availability of images that are reflective of spatial variability of the domain but are not calculated from data as is the case for variograms. As a result, UK-TI separates conditioning from spatial model specification. Because training images are borrowed from elsewhere, a potentially important issue, in cases of more dense data, is the consistency between the training image and the actual data. This issue is currently an important area of research in multiplepoint geostatistics as well (Pérez et al., 2014; Boisvert et al., 2007, 2010). Scale separation: UK-v relies on a decomposition into trend and residual in the modeling domain (see above discussion) while UK-TI relies on a decomposition in the training image domain. In UK-TI, if a trend is present, then the training image needs to contain this trend and modeling of any trend takes place in that domain. The topic of constructing non-stationary training image is therefore particularly relevant to make the UK-TI method applicable in real cases. The single ‘universality condition’ (4) applies regardless of the nature of the training image. If any trend is present in the training image then this trend will be reflected through the eroded average (14) and (15). Neighborhood: UK-v can employ both a local and global neighborhood. For practical reasons, due to the often limited size of the TI, UK-TI can be used with local neighborhoods only. This is the direct consequence of using a random function model vs using a finite training image. The random function concept, in terms of macro-ergodicity (Matheron, 1978), relies on the notion that the model domain itself is part of an infinitely large area; as a result, the global neighborhood is in fact a local neighborhood. UK-TI does not rely on the property of ergodicity of random function theory, hence
8
L. Li et al. / Spatial Statistics (
(a) Reference.
)
–
(b) Training image.
Fig. 1. Reference map and training image for stationary case, generated using a multivariate Gaussian distribution with zero mean and isotropic spherical variogram of range 25.
one is required to specify models using a local neighborhood. The latter is true also for any of the MPS methods. 4. Case studies In this section, a comparison of the two approaches is presented by their application to different examples. Three cases are considered: a zero drift case, a simple drift case, and a complex drift case. Using these examples, the effects of training image size, decomposition of trend versus residual, and sparse data sets are discussed. 4.1. Zero drift case The zero drift case is first considered, where both the reference image (Fig. 1(a)), and the training image (Fig. 1(b)) are generated using a multi-variate Gaussian distribution. The modeling domain is selected to be 150 × 150 while the training image has dimensions 250 × 250. From this reference map, 100 points are randomly sampled to serve as hard conditioning points for performing kriging estimation. 4.1.1. UK-v Initially, the true variogram is assumed to be known; this reflects a reasonable scenario since an experimental variogram can be calculated from the training image (Fig. 1(b)), and used to fit a variogram model. Since a fairly sparse data set of 100 points is used for the estimation, it is possible to use a global neighborhood for computing kriging; this is referred to as Global kriging (GK) (Fig. 2). For sake of comparison, kriging with a local search window with radius 30 is also computed. In the latter case, discontinuities due to this local neighborhood arise, and thus the Continuous kriging (CK) correction described in Section 2.2 and Appendix B is applied. CK yields the estimate and variances shown in Fig. 3. Upon visual inspection, GK and CK yielded similar estimates, and variances, with only a few differing regions. A more detailed comparison will be presented in Section 4.1.3. 4.1.2. UK-TI To illustrate UK-TI, the training image (Fig. 1(a)) is used directly in the estimation. A flexible search radius is used; that is, a minimum set of 12 data points is used at each location for estimation. This still produces a few discontinuities shown in Fig. 4, and is expected for the reasons outlined in Section 3.2. The previously mentioned finite kriging approach is applied, generating the results shown in Fig. 5.
L. Li et al. / Spatial Statistics (
(a) Kriging estimate.
)
–
9
(b) Kriging variance.
Fig. 2. UK-v performed with a global search window. This is typically only an option with sparse data sets due to computational issues.
(a) Kriging estimate.
(b) Kriging variance.
Fig. 3. UK-v performed with a local search window of radius 30. To mitigate any potential discontinuities due to the moving local window, the Continuous kriging correction described in Section 2.2 and Appendix B was applied.
4.1.3. Performance To assess the performance of UK-TI versus UK-v, a Monte Carlo study is performed, in which hard data sample sizes are varied to include 25, 50, 100, 200, 400 and 600 points randomly sampled from the reference image. This is repeated 100 times for each sample size, and each approach is applied for each data set. For assessing the error, the relative mean square error (ReMSE) between the resulting kriging estimates and the reference is used. ReMSE =
1
N (Z (xi ) − Z ∗ (xi ))2 .
N σ 2 i =1
(23)
The average over all 100 drawings is tabulated in Table 1. For this stationary case, σ 2 = 1. As expected, the ReMSE decreases as the number of sampled data points increases for all methods. As the number of sampled points increases, the ReMSE for all four methods seem to converge towards the same
10
L. Li et al. / Spatial Statistics (
)
–
(a) Kriging estimate.
(b) Kriging variance. Fig. 4. UK-TI applied using a flexible search radius; the estimate at each point used at least 12 data points. Note that some artifacts occur at the boundaries of neighborhood regions.
value. This portends that for denser data sets the two methods may give increasingly similar results. At data sets with 600 points, the ReMSE of all four methods is the same to the third significant digit. Conversely, at sparse data sets such as 25 or 50 points, UK-TI with finite kriging appears to outperform UK-v with continuous kriging. 4.1.4. Discussion Training image dimension An important issue to consider is the size of the training image. Previously, a single training image size had been used (250 × 250). Varying sizes of TIs (150 × 150,200 × 200, and 250 × 250) are generated using the same Gaussian distribution and variogram as Fig. 1(a), then used for UK-TI. For each of the three training image sizes, the resulting kriging variances are plotted against the kriging variance computed from Global Kriging in Fig. 6. It can be observed that the kriging variances from UK-TI tend towards that of the global kriging variance with increasing TI size, but remains below the first bisectrix in average. This indicates that UK-TI may tend to underestimate the estimation variance, and is most likely due to the finite size of the training image.
L. Li et al. / Spatial Statistics (
)
–
11
(a) Kriging estimate.
(b) Kriging variance. Fig. 5. UK-TI applied with the finite kriging correction described in Section 3.2 to the scenario in Fig. 4. Table 1 ReMSE with varying number of sampled points for the stationary case. 100 random samples were drawn for each sample size and the ReMSE shown is computed as the mean over all runs. # pts
25
50
100
200
400
600
CK GK UK-TI UK-TI Finite
1.13825 1.00937 1.01676 0.98281
0.85412 0.80926 0.85591 0.80961
0.6186 0.60231 0.62601 0.60181
0.42602 0.42256 0.40543 0.41161
0.29165 0.29035 0.29391 0.29411
0.23634 0.23589 0.23241 0.23197
As is the case with MPS algorithms, the size of the training image is clearly an important factor in performing UK-TI. Generally, the relative size of the TI compared to the largest features in the domain must be considered. This size consideration is required under the principle of ergodicity (Caers and Zhang, 2004). Too small of a training image will lead to undesirably large fluctuations of long range correlations, and hence can cause artifacts in the resulting kriging map. This was experienced in the stationary case when the training image was less than 150 × 150 in size. Consequently, the authors recommend at least the same size of the modeling domain was required to minimize artifacts.
12
L. Li et al. / Spatial Statistics (
)
–
Fig. 6. Kriging variance of estimate computed using UK-TI with varying training image size versus kriging variance from global kriging.
Training image consistency One potential risk of any training image based approach is selecting a training image that may not be consistent with the modeling domain. To demonstrate such a case, consider the reference image given in Fig. 7(a), and 100 points uniformed sampled from it (Fig. 7(b)). This synthetic example is generated using a sequential Gaussian simulation, and ideally, a training image should also share common characteristics such as Fig. 8(a). Conversely, a second training image is generated using a cookie cutter approach, such that it contain distinct disks of higher value than the rest of the map (Fig. 8(b)). This alternate TI is visually different from that of the reference image, but still maintains certain similar characteristics. For instance, their histograms (not shown) are both standard normal, while their variograms (Fig. 9) show a close resemblance. When UK-TI is performed using these two training images, the kriging maps in Fig. 10 are obtained. From a visual inspection, the two maps look very similar, and a correlation analysis of the two estimates, indicates a Pearson coefficient of 0.9634. This suggests that if simple checks such as the histogram and variogram are indeed verified, that UK-TI with even visually different training images may still provide appropriate kriging estimates. 4.2. Simple drift case To examine the case of a simple drift case, the following function is used to generate a reference image: Z (x) =
y
+ σ 2 R(x). (24) 200 R is a stationary Gaussian field with zero mean, unit variance, an exponential variogram with range equal to 25 and σ 2 = 0.05. A 200 × 100 modeling domain was chosen, which results in a trend that varies linearly from 0 to 1 from bottom to top. A single realization was generated from Eq. (24), and used as the reference (Fig. 11(b)). This unknown truth is then used to extract 100 hard conditioning points. 4.2.1. UK-v To apply UK-v, a decomposition of the modeling variable into a trend and a residual is required, using either ordinary least squares, generalized least squares, or some combination of the two such the
L. Li et al. / Spatial Statistics (
)
–
13
(a) Reference map.
(b) Sampled points. Fig. 7. A stationary reference map generated using a multivariate Gaussian distribution with zero mean and isotropic spherical variogram of range 35. 100 data points were randomly sampled and used as the hard conditioning for performing kriging estimation.
method described in Section 2.1 (Cressie, 1993). Three commonly used techniques for estimating the trend in UK-v are applied. The first uses OLS to compute the trend directly using the conditioning data. The second assumes knowledge of the true covariance, and GLS is used to estimate the trend, while the third method uses the iterative two step method from Section 2.1 to estimate the covariance before estimating the trend. This is denoted as GLSest. The variogram of the residuals for all three cases is then fitted automatically (Fig. 12(b)), where only the nugget and exponential modeling with the automatic model fitting function of the R package RGeostats (Renard, 2014) and (Desassis and Renard, 2013). Note that due to the regularity of the trend function with respect to that of the covariance, the bias of the estimated residual variogram, can be considered negligible. The raw variogram (Fig. 12(a)) shows the parabolic behavior of the empirical variogram, which is indicative of the presence of a trend. This reinforces the necessity of the decomposition. UK-v is applied using the three methods to produce the kriging maps shown in Fig. 13. In this particular simple case, the trend is actually linear, thus OLS/GLS can be used to provide an apt description. However, due to the sparseness of the data, the estimated drift from all three methods have some variation in the x component, while the true drift (Fig. 11(a)),
14
L. Li et al. / Spatial Statistics (
)
–
(a) Training image 1.
(b) Training image 2. Fig. 8. Two visually dissimilar training images for applying UK-TI in Fig. 7(a). Training image 1 (Fig. 8(a)) was generated using a single Gaussian simulation while training image 2 (Fig. 8(b)) was generated two Gaussian simulations combined using a cookie cutter approach.
only depends on y. Furthermore, all three approaches appear to give very similar estimates. This will be quantitatively explored further in Section 4.2.3. 4.2.2. UK-TI To illustrate UK-TI, the availability of a training image indicative of the spatial variation of the modeling domain, but is itself on a different domain, is assumed. The training image is shown in Fig. 14, which shows a realization that is similar to the reference Fig. 11(b), but has different dimensions, and has a trend that is oriented in the EW direction instead of NS. This mimics a situation where the training image has features characteristic of the domain but rotated in a different direction, which may occur in many practical situations. As discussed earlier, UK-TI requires a training image on a domain that is comparable to the modeling domain. Consequently, advances in MPS are used to generate such a training image, specifically the Direct Sampling method that was briefly outlined in Section 3.2. An auxiliary variable on the modeling domain is needed to proceed. In most applications, this can come from some other source of information (e.g. seismic or stress fields in subsurface). However, for the sake of illustration, the same OLS estimated trend from UK-v has used as the auxiliary variable
L. Li et al. / Spatial Statistics (
)
–
15
(a) Training image 1 variogram.
(b) Training image 2 variogram. Fig. 9. Omnidirectional empirical variogram for the two training images along with fitted variogram using RGeostats. Both have nugget 0, range of approximately 35, and sill of 0.95.
(a) Training image 1 estimate.
(b) Training image 2 estimate.
Fig. 10. Kriging estimate using UK-TI and the two visually dissimilar training images.
(Fig. 15(a)). Based off this auxiliary variable, a non-stationary training image with the same size as the modeling domain is generated using Direct Sampling (Mariethoz et al., 2010) (Fig. 15(b)). This new
16
L. Li et al. / Spatial Statistics (
(a) True trend.
)
–
(b) Reference map.
Fig. 11. Reference map for non-stationary case, generated using trend in (a) and expression (24).
(a) Data variogram.
(b) Residual variogram.
Fig. 12. The empirical raw and residual variograms computed for the non-stationary case. The automatic variogram fitting function in RGeostats was applied.
TI does not need to be conditioned to the hard conditioning data. Note that this new training image retains the spatial variability of the original training image (Fig. 14), but is on the same domain as the conditioning domain. When UK-TI is performed with the this image, the resulting kriging map (Fig. 15(c)) is produced. 4.2.3. Performance The Monte Carlo study is repeated for the trend case by once again computing the ReMSE using (23), and σ 2 = 0.05. 100 sets of data points were randomly sampled for each sample size, and UK-TI was applied to each of them. The averaged ReMSE values for all 5 methods are shown in Table 2. OLS appears to produce the lowest ReMSE, which is somewhat surprising, given that OLS uses a non efficient estimate of the trend. GLSest yields the worse results for the UK-v methods due to the variability
L. Li et al. / Spatial Statistics (
(a) OLS.
)
–
(b) GLS.
(c) GLSest. Fig. 13. Kriging estimates for UK-v, where the trend has been estimated using three different methods.
(a) Training image.
(b) Training image trend.
Fig. 14. Training image and trend for non-stationary case where both trend and TI are similar to domain but rotated.
17
18
L. Li et al. / Spatial Statistics (
(a) OLS trend.
)
–
(b) Training image.
(c) UK-TI estimate.
Fig. 15. When auxiliary data is not given on the modeling domain, OLS training images generated using Direct Sampling (DS) for non-stationary case.
Table 2 ReMSE with varying number of sampled points for stationary case. 100 random samples were drawn for each sample size and the ReMSE shown is computed as the mean over all runs. # sampled pts
25
50
100
200
400
600
OLS GLS GLSest UK-TI UK-TI Finite
0.84555 0.85298 0.88029 0.99149 0.95243
0.74708 0.75116 0.78553 0.89308 0.82801
0.64971 0.65204 0.70701 0.77975 0.72258
0.54624 0.54735 0.59606 0.62394 0.5932
0.41702 0.41777 0.44424 0.45659 0.44249
0.36126 0.36161 0.40556 0.42833 0.38391
induced by the variogram fitting. The UK-TI results remain very close to those obtained by the UK-v methods, and as with the zero drift case, all methods produce similar ReMSE values as the data set becomes denser. 4.2.4. Discussion The simple drift case illustrates the implementation of the two methods when there is a non-zero drift. In this particular case, the drift is linear in nature, and thus can be decently delineated using OLS/GLS, thus allowing both UK-v, and UK-TI to provide comparable kriging maps. In the UK-TI case, the common situation where the given training image is on a different domain than the modeling domain is studied, and thus MPS algorithms such as DS can be used to produce appropriate training images. While in this particular example, the auxiliary variable was obtained in a similar fashion as UK-v, any variable that describes the drift can be used. 4.3. Complex drift case In real applications, drift expressions may fall short in capturing complex trend variations. To illustrate some of the common challenges faced in such scenarios, we investigate the celebrated Walker Lake data-set (Isaaks and Srivastava, 1989). The data is given as a 260 × 300 digital elevation model (DEM) as shown in Fig. 16(a). Walker Lake is particularly challenging case due to the complex nature of the trend (the alternation of highly variable mountain topography with flat lakebeds). Furthermore, the distribution of the elevations is skewed, necessitating us to perform a histogram
L. Li et al. / Spatial Statistics (
(a) Reference map.
)
–
19
(b) Sampled points.
Fig. 16. Walker Lake DEM used as reference, and 100 points sampled from the DEM to serve as hard data points when performing UK-TI and UK-v.
transformation of the raw elevations into a uniform distribution from 0 to 1. To mimic the scenario of having only sparse data, we randomly sampled only 100 data points (Fig. 16(b)) from the truth. 4.3.1. UK-v Walker Lake presents a complex trend that cannot accurately be described using ordinary least squares. Instead, interpolation was performed on the sampled data points (Fig. 16(b)) to estimate the trend. A variety of different interpolation techniques (e.g. bicubic Keys, 1981, thin plate splines Duchon, 1977, and biharmonic Sandwell, 1987) can be performed. For this case, we opted to use Thin Plate Splines implementation from the software package fields (Furrer et al., 2006). The smoothing parameter has been set such that a residual correlation structure could be fitted. Note that this value was significantly lower than the one selected by cross-validation. The resulting residuals can then be fitted using the automatic variogram fitting function in RGeostats to yield Fig. 17(a). The resulting variogram along with the interpolated trend is used by UK-v to generate the kriging map shown in Fig. 17(b). 4.3.2. UK-TI To demonstrate UK-TI, consider the situation where a DEM of an analog region is available. Ideally, this analog DEM would contain similar characteristics such as the mountains and lake-beds found in Walker Lake. Furthermore, the spatial variation of these major features found in the training image should be similar to those within the same structures found in Walker Lake. To simulate the ideal scenario where the DEM of an neighboring region of Walker Lake is available, a synthetic training image is generated (Fig. 18(a)). This neighboring training image also contains a set of mountain ranges and lakes, but is of a different size (400 × 400), and contains a different trend. To perform UK-TI, a ‘‘local’’ training image is required. This local TI should contain the same features as Fig. 18(a), but have a similar trend to Walker Lake. This construction is performed stochastically using the Direct Sampling method (Mariethoz et al., 2010). The DS algorithm allows for the incorporation of auxiliary data, which in general will be some coarser scale data (e.g.seismic), that is indicative of trend. To mimic such data, we took a moving average of the training image (Fig. 18(b)). Likewise, we used a biharmonic interpolation of the hard data points to simulate the auxiliary data on the modeling domain. Applying Direct Sampling with this data generates an set of local training images such as the one indicated in Fig. 19(a). This generated TI now contains features similar to the original TI, but has a trend that is comparable
20
(a) Residual variogram: Walker Lake.
L. Li et al. / Spatial Statistics (
)
–
(b) Kriging map generated by UK-v.
Fig. 17. An experimental variogram can be fitted on the residuals, and used along with the estimated trend to perform UK-v.
(a) Training image.
(b) Auxiliary variable.
Fig. 18. Synthetic training image that would be representative of a neighboring region around Walker Lake. For illustration purposes, an auxiliary variable was constructed by taking a moving average.
to Walker Lake’s, and thus UK-TI is then performed, yielding the kriging map in Fig. 19(b). We will discuss in a later section the scenario when there is no auxiliary data on the modeling domain. 4.3.3. Performance For the complex drift case, both the Mean Square Error (MSE) (Table 3) and an qualitative assessment of how well the large scale features (mountains and lakes) can be discerned in the kriged estimate, are used to evaluate the performance of the two approaches. The MSE values indicate that the two methods tend to approach the same values as the number of conditioning points increases. This suggests that when data is dense, the two approaches yield similar results. This is confirmed by the example shown in Fig. 20(a) and (b), where 1000 points were sampled. However, it can be noted that when data is sparse, UK-TI is able to produce estimates with lower MSEs. By examining the UK-v kriging map (Fig. 17(a)), large deviations in kriging values around hard data points cause a bulls-eye effect. This is expected because the residual variogram does contain a significant nugget. Furthermore,
L. Li et al. / Spatial Statistics (
(a) Local training image.
)
–
21
(b) Kriging map.
Fig. 19. Local training image for Walker Lake simulated using Direct Sampling, and used with UK-TI to produce the kriging estimate given in the right image.
Table 3 MSE with varying number of sampled points for Walker Lake case. 100 random samples were drawn for each sample size and the MSE shown is computed as the mean. # sampled pts
25
50
100
200
400
600
UK-v UK-TI
0.15033 0.12496
0.12242 0.11321
0.10432 0.09305
0.08230 0.07851
0.07274 0.06966
0.06971 0.06914
it is hard to differentiate large scale features in the kriging map such as mountains and lakes. Again, this can be seen as a direct result of the variogram fitting procedure, as large range correlations which are important in discerning these large scale features, lie beyond the range, and are smoothed out. On the other hand, the kriging map generated by UK-TI (Fig. 19(b)) does not contain bulls-eyes, and describes the lakes and mountains more clearly than the UK-v map. As seen in Table 3, in this ideal situation, where both a representative training image and an auxiliary variable on the modeling is available, UK-TI is able to produce estimates with lower MSEs than UK-v.
4.3.4. Discussion As seen in the previous section, UK-TI is able to produce lower MSEs than UK-v when a representative training image, and an auxiliary variable are available. This ideal situation may not always be the case in practical application. This section will consider such scenarios where the training image is non-representative, and the auxiliary variable is missing. Scenario A: no auxiliary data available: When auxiliary data is not available on either or both the training image and modeling domain, the only option is to use the hard data points as conditioning points for performing Direct Sampling. In contrast to the previous cases, the resulting local training images may not have a trend that is representative of Walker Lake (Fig. 21(a)). However, as long as the original training image contains the correct large scale features, the resulting local training image will still also contain the required features (albeit positioned incorrectly). When Fig. 21(a) is used as the training image for UK-TI, the estimate in Fig. 21(b) is produced, resulting in a MSE of 0.09368. It is noted that this estimate still closely resembles that of Fig. 19(b) (generated with auxiliary data), and the MSEs for the two approaches are also within the same third significant digit.
22
(a) UK-v.
L. Li et al. / Spatial Statistics (
)
–
(b) UK-TI.
Fig. 20. Local training image for Walker Lake simulated using Direct Sampling, and used with UK-TI to produce kriging map.
(a) Training image.
(b) Kriging map.
Fig. 21. A TI can be produced by using the hard data points as conditioning data and jointly simulating both the target and auxiliary data. This TI is then used as an input for UK-TI.
Scenario B: non-representative TI: Another possible scenario, is when a non-representative training image is used. This pertains to the situation when there is limited understanding of the spatial variability of the primary variable, and thus multiple TIs could be selected. This specific case considers the effects of applying UK-TI with a training image that is of a different scale from Walker Lake (Fig. 22). That is, features in the TI are much smaller than the ones in Walker Lake (the major lakes or mountains have been replaced with small hills and ponds). Using the same primary domain auxiliary data from the ideal scenario described in 4.3.2, and the non-representative TI, the local training image shown in Fig. 23(a) is generated by Direct Sampling. It should be noted that despite the original TI being of a different scale, this local TI still contains features with sizes that are comparable to that of Walker Lake and the ideal case local TI (Fig. 19(a)).
L. Li et al. / Spatial Statistics (
)
–
23
Fig. 22. A synthetic training image for Walker Lake that is non-representative. In this image, the scale is incorrect, as the expectation of the target variable changes too rapidly with its coordinate.
(a) Training image.
(b) Kriging map.
Fig. 23. A local TI generated using Direct Sampling with the image in Fig. 22 and primary domain auxiliary data. The corresponding UK-TI result is shown on the right.
The resulting estimate from UK-TI is shown in Fig. 23(b), which has a corresponding MSE of 0.09368. While this estimate is still able to discern the main mountain, the main lake feature is smaller. However, overall the estimate still compares well with the estimate from the ideal case and Scenario A (depicted in Figs. 19(b) and 21(b) respectively), both visually and in terms of MSE. Scenario C: non-representative TI and no auxiliary data available: The final scenario considered is the ‘‘worst case’’ case, where neither a representative training image nor auxiliary data is available. The same methodology as Scenario A is applied to generate the local training image shown in Fig. 24(a). This local TI is closer in resemblance to the original non-representative TI (Fig. 22) than Walker Lake,
24
L. Li et al. / Spatial Statistics (
(a) Training image.
)
–
(b) Kriging map.
Fig. 24. A local training image produced from a non-representative training image without auxiliary data. The corresponding UK-TI is markedly deteriorated when compared to the previous examples. The main lake itself in Walker Lake region can no longer be discerned. Table 4 MSE of the three scenario described in this section in comparison with those of the ideal UK-TI case and UK-v described in Sections 4.3.2 and 4.3.1 respectively. The estimates were computed using 100 data points sampled from Walker Lake. Scenario
UK-v
Ideal
A
B
C
MSE
0.10432
0.09305
0.09368
0.09401
0.09631
or any of the other local TIs. The resulting UK-TI estimate is given in Fig. 24(b), and yields a MSE of 0.09641, which is comparably larger than that of both Scenarios A and B. Summary: This section examined three scenarios in which the training image and auxiliary data are either inaccurate or missing; the results of which are shown in Table 4. As expected, the MSEs appear to correspond with the degree to which the local training images are representative of Walker Lake. In Scenarios A and C, the lack of auxiliary data causes the local TIs to closely resemble the original TI. In Scenario A, the local TI still adequately describes Walker Lake because the original TI was representative, while in Scenario C the use of an non-representative TI, results in a local TI that is non-representative both in terms of scale and feature orientation. Accordingly, Scenario A yields an UK-TI estimate with similar performance to that of the ideal case. Meanwhile, the estimate in Scenario C produces a larger MSE, and exhibits structures that are oriented in the EW direction rather than the predominate NW-SE trend found in Walker Lake. Despite using the same non-representative TI as Scenario C, Scenario B is still able to produce a local TI that resembles Walker Lake, and an estimate with a similar MSE to Scenario A and the ideal case. This can be attributed to the auxiliary variable containing both scale and feature orientation information. Overall, this indicates that for Walker Lake, while the application of UK-TI with both a representative TI and auxiliary data will produce the most optimal estimate, reasonably close estimates can also be generated for scenarios where only one of the two is available. Difficulties will arise when a non-representative TI is used without auxiliary data, and thus is something that should be kept in mind for practical applications. Nonetheless, in the case of Walker Lake with 100 hard data points, despite the use of a TI with different scale and no auxiliary variables, UK-TI still produces an estimate with a lower MSE than UK-v. Additional research is required to ascertain if this outcome would hold if training images that are even more radically non-representative are used, and reflects an area of future work.
L. Li et al. / Spatial Statistics (
)
–
25
5. Conclusion The main difference between UK-v and UK-TI lies in the focus of modeling. In UK-v, the goal is to obtain a variogram of a residual through parametric modeling. This requires the separation of scales; that is the identification of what is trend and what is residual. This requires a decision on the part of the modeler, since it cannot be calculated from data, because only z-data are available. Ad-hoc approaches of separating trend from residual may lead to biases, which can be resolved via iteration, but still remain challenging for complex trend examples such as Walker Lake. Conversely, UK-TI shifts the modeling effort to obtaining a training image reflective of the domain’s spatial continuity. Consequently, any decisions related to trend and residual need to be made in the training image domain, and as a consequence are made largely independent of data. Developments in training image algorithms such as the use of auxiliary data have helped increase the flexibility of this approach. However, this does come at a risk of inconsistency as the modeled local image may not be reflective of the actual data. This inconsistency risk will increase with an increasing amount of data. Simple checks could consist of comparing the experimental univariate distribution (histogram) and variogram of the sample data and training image. The verification of training image consistency remains an area of ongoing research within the MPS field (Boisvert et al., 2007, 2010; Pérez et al., 2014). The question remains of what happens when radically different training images (perhaps with different connectivities and anisotropies) are used. This mimics the situations where there is limited understanding of the domain, and multiple scenarios of spatial variability could be proposed. Studies regarding the uncertainty of such estimates remain an area of future research. It should also be noted that there is a considerable difference between (1) using the UK-TI approach and (2) calculating an experimental variogram from the training image and then modeling it in the UK-v approach. The latter still relies on a decomposition of trend and residual (as required by the UK-v system of equations), while the former does not; the UK-TI systems of equations deal only with the z-variable. The UK-TI shifts the modeling focus entirely to the generation of a domain specific training image. This modeling will need to be performed prior to solving the UK-TI system in the same sense as any variogram modeling in UK-v will require filtering experimental variograms through parametric modeling. The difference is that the image provides a direct visual appreciation of the model, while the variogram provides a parametric one. UK-v allows for a global neighborhood, while UK-TI does not. The result may be that UK-v has greater accuracy because more data is used in the estimates, although from the example studies, it is not clear if accuracy gain is significant, especially in dense data sets. The local neighborhood approaches allow for more flexibility, which is needed in practical case studies, but comes at a cost of consistency when such local estimates are presented in a global kriging map. In the end, the results concur with Matheron’s perspective of estimating and choosing (Matheron, 1978), namely that the decision between using these two approaches comes down to a choice. This choice should be motivated from the practical field data and situation primordially. At least theoretically, these two approaches compare well and the example studies show they also compare well in terms of the resulting maps. The results of the case studies in this paper suggest that UK-v should be used when sufficient data is available to infer variograms and trend information and to use UK-TI when there is a lack of such data and one needs to rely on auxiliary in formation. When neither dense data nor auxiliary variables are available, UK-TI is still recommended but the practitioner should keep in mind the consistency of the training image being used. Appendix A. UK-v covariance estimation Traditional geostatistics infers the covariance model parameters through variogram fitting, to avoid any distribution assumption and to allow fast computation. In the UK settings (1), the empirical variogram of the residual is required. However, the true residuals cannot be observed, as the drift is unknown. Therefore, estimated residuals have to be computed from an estimated drift. It has also been shown (Matheron, 1969; Armstrong, 1984) that the empirical variogram of the residuals always present a negative bias. (Beckers and Bogaert, 1998) contains a detailed study of the bias. Furthermore, even if the true covariance is known, and the drift is estimated (commonly using Ordinary Least
26
L. Li et al. / Spatial Statistics (
)
–
Squares(OLS) or Generalized Least Squares (GLS)), the empirical variogram of the residuals would still be biased, and is unfortunately unavoidable. See Appendix C for a treatment of the residual bias. In practice, OLS is generally used to estimate the residuals as a first guess. If F is the matrix of predictors, and Z is the matrix of target variables, then:
βOLS = (F ′ F )−1 F ′ Z.
(A.1)
COLS can then be constructed by fitting a covariance to ui where: ui = (Z − F βOLS )i .
(A.2)
The OLS estimate of the drift parameters is suboptimal, though unbiased. Consequently, the empirical variogram computed from the OLS-based estimated residuals will also be rather imprecise and with a large bias. Several guidelines can be found in the literature. Diggle and Ribeiro (2007) argues that ‘the discrepancy between observed and true residuals would be less marked in a larger data-set, and the negative bias in the sample variogram consequently smaller’, and proposes to estimate the variogram from OLS based residuals. Their statement is backed up by the asymptotic study conducted in Lahiri et al. (2002). The variogram estimation from the residuals can also be conducted in an iterative manner as proposed in Gelfand et al. (2010). The estimated GLS is computed as: −1 −1 ′−1 βGLS = (F′ COLS F) F COLS Z.
(A.3)
The covariance CGLS can be constructed in the same manner as Eqs. (A.2) and (A.3). The authors note that ‘One may even iterate between mean estimation and semivariogram estimation several times, but, in practice, this procedure usually stops with the first Generalized Least Squares fit’. There is no evidence for such an iterative scheme to converge towards an acceptable solution as the issue of the bias is not tackled out. Finally, one can adopt the bias correction technique proposed in Beckers and Bogaert (1998). Maximum likelihood methods have also been developed for this problem as they allow, at least theoretically, to fit both drift and covariance parameters of the model (1) simultaneously, Mardia and Marshall (1984). These methods are not exempt from issues however that can be addressed by different variants, see Pardo-Iguzquiza and Dowd (1998), Stein (1999) and Gelfand et al. (2010) among others for reviews. The most important issue with likelihood-based methods is of course the Gaussian assumption. As quoted by Stein (1999), ‘‘this is a highly restrictive assumption’’. To ensure that this assumption is tenable, normality tests can be conducted on the estimated residuals. It is however, seldom applicable to natural variables. Appendix B. Continuous kriging correction In practice, UK-v is performed with local neighborhoods, and hence as the target point moves, the data within the local neighborhood are suddenly removed from the neighborhood. This causes the weights of such data to immediately diminish, and causes discontinuities within the kriging map. Rivoirard and Romary (2011) proposed a method to ameliorate such issues, based on the penalization of boundary data points. This can be implemented by modifying Eq. (6) as:
n L λ V + λ C + µl fl (xi ) = Ci0 i i j ij
∀i ∈ {1, . . . , n}
n λi fl (xi ) = fl (x0 )
∀l ∈ {0, . . . , L}.
j=1
l =0
(B.1)
i =1
The penalty term Vi can be chosen to be equal to 0 for data close to data points, and increasing towards infinity as the data point approaches the boundary of the neighborhood. Refer to (Rivoirard and Romary, 2011) for a detailed discussion on the selection.
L. Li et al. / Spatial Statistics (
)
–
27
Appendix C. UK-v variogram bias It can be shown that even if true covariance is known, and the drift estimated by GLS, the empirical variogram of the residuals would still be biased. This bias is unfortunately unavoidable. This arises due to the fact that the covariance of Y and the f have virtually no chance to be algebraically orthogonal, though Y is supposed to be independent from F in (1), e.g. Matheron (1970). Indeed, if Y ∗ is the vector of the estimated residuals, i, j ∈ {1, . . . , n}, the expectation of the variogram at two points xi and xj of the (GLS) estimated residuals is:
E
1 2
1 (Y ∗ (xi ) − Y ∗ (xj ))2 = E[(Z (xi ) − m∗ (xi ) − (Z (xj ) − m∗ (xj )))2 ] 2
1
=
2
E[((Z (xi ) − m(xi ) − (Z (xj ) − m(xj )))
+ (m(xi ) − m∗ (xi )) − (m(xj ) − m∗ (xj )))2 ] 1
=
2
E[(Z (xi ) − m(xi ) − (Z (xj ) − m(xj )))2 ]
1
+ E[(m(xi ) − m∗ (xi )) − (m(xj ) − m∗ (xj ))2 ] 2
− E[(Z (xi ) − m(xi ) − (Z (xj ) − m(xj ))) · ((m(xi ) − m∗ (xi )) − (m(xj ) − m∗ (xj )))] = γ ( h) −
L L 1
2 l=0 m=0
(fl (xi ) − fl (xj ))alm (fm (xi ) − fm (xj ))
(C.1)
where m∗ (x) is the drift estimate at location x, and alm is the general term of the matrix (F′ C −1 F)−1 . Note that these matrices are positive definite. The bias is therefore negative. For the OLS estimate, the expectation of the variogram takes the following form:
E
1 2
(Y (xi ) − Y (xj )) ∗
∗
2
= γ ( h) +
×
L L 1
2 l=0 m=0
(fl (xi ) − fl (xj ))blm (fm (xi ) − fm (xj ))
L L n (fl (xi ) − fl (xj ))clm fm (xk )(C (xi , xk ) − C (xj , xk )) (C.2) l=0 m=0 k=1
where blm is the general term of (F F) F C F(F′ F)−1 and clm is the general term of (F′ F)−1 . The bias can be interpreted as a measure of the colinearity between the drift functions and the covariance used: if F and C were orthogonal, this bias would be null. This emphasizes the importance of the scale separation between the drift and the residuals. The drift is meant to capture the most continuous and large scale fluctuations of the regionalized variable while the residual accounts for smaller scale fluctuations. Indeed, as can be seen in (C.1) that the bias is stronger for distances where the variations of the fl are more important. Consequently, in general, the bias is low at short distances and increases with the distance. ′
−1 ′
Appendix D. Proof of positive definiteness of UK-TI system Solution of the system in Eq. (21) requires that the matrix of sum of products values is positive definite. It is logical and intuitive that any experimental statistics properly eroded over an existing (real) image will provide such system. In similar sense, the experimental covariance carries the same property. A formal mathematical proof can also be given, noting that by definition: 1
nTIcf x∈TI cf
(ZTI# (x))2 ≥ 0.
(D.1)
28
L. Li et al. / Spatial Statistics (
)
–
As a consequence, this leads to: 1
n n
nTIcf x∈TI α=1 β=1 cf
zTI (x + hα )zTI (x + hβ ) ≥ 0.
(D.2)
Equivalently: n n
sopTI (hα , hβ ) ≥ 0,
∀λα , λβ .
(D.3)
α=1 β=1
Hence the matrix of sum of products is positive definite and hence only one solution. References Armstrong, M., 1984. Problems with universal kriging. Math. Geol. 16 (1), 101–108. Arpat, G.B., Caers, J., 2007. Conditional simulation with patterns. Math. Geol. 39 (2), 177–203. Atkinson, P.M., Pardo-Iguzquiza, E., Chica-Olmo, M., 2008. Downscaling cokriging for super-resolution mapping of continua in remotely sensed images. IEEE Trans. Geosci. Remote Sens. 46 (2), 573–580. Babak, O., Deutsch, C.V., 2008. Two new approaches to avoid large kriging weights to end samples in strings of data. In: 8th International Geostatistics Congress. Santiago, Chile. Beckers, F., Bogaert, P., 1998. Nonstationarity of the mean and unbiased variogram estimation: extension of the weighted leastsquares method. Math. Geol. 30 (2), 223–240. Boisvert, J.B., Pyrcz, M.J., Deutsch, C.V., 2007. Multiple-point statistics for training image selection. Nat. Resour. Res. 16 (4), 313–321. Boisvert, J.B., Pyrcz, M.J., Deutsch, C.V., 2010. Multiple point metrics to assess categorical variable models. Nat. Resour. Res. 19 (3), 165–175. Caers, J., Zhang, T., 2004. Multiple-point geostatistics: a quantitative vehicle for integrating geologic analogs into multiple reservoir models. Technical Report. Stanford University. Chilès, J., Delfiner, P., 2009. Geostatistics: Modeling Spatial Uncertainty. In: Wiley Series in Probability and Statistics, Wiley. Cressie, N., 1993. Statistics for Spatial Data. Wiley-Interscience New York, New York, USA. Daly, C., 2005. Higher order models using entropy, Markov random fields and sequential simulation. In: Geostatistics Banff 2004. Springer, pp. 215–224. Daly, C., Caers, J., 2010. Multi-point geostatistics—an introductory overview. First Break 28 (9). De Fouquet, C., Gallois, D., Perron, G., 2007. Geostatistical characterization of the nitrogen dioxide concentration in an urban area: Part I: Spatial variability and cartography of the annual concentration. Atmos. Environ. 41 (32), 6701–6714. Desassis, N., Renard, D., 2013. Automatic variogram modeling by iterative least squares: univariate and multivariate cases. Math. Geosci. 45 (4), 453–470. Deutsch, C.V., 1993. Kriging in a finite domain. Math. Geol. 25 (1), 41–52. Deutsch, C.V., 1994. Kriging with strings of data. Math. Geol. 26 (5), 623–638. Diggle, P., Ribeiro, P., 2007. Model-Based Geostatistics. In: Springer Series in Statistics, Springer. Dimitrakopoulos, R., Mustapha, H., Gloaguen, E., 2010. High-order statistics of spatial random fields: exploring spatial cumulants for modeling complex non-Gaussian and non-linear phenomena. Math. Geosci. 42 (1), 65–99. Duchon, J., 1977. Splines minimizing rotation-invariant semi-norms in sobolev spaces. In: Constructive Theory of Functions of Several Variables. Springer, pp. 85–100. Emery, X., Lantuéjoul, C., 2013. Can a training image be a substitute for a random field model? Math. Geosci. 1–15. Furrer, F., Nychka, D., Sain, S., 2006. Fields Package. National Center for Atmospheric Research; Boulder, CO. URL: http://www. cgd.ucar.edu/Software/Fields. Gelfand, A.E., Diggle, P., Guttorp, P., Fuentes, M., 2010. Handbook of Spatial Statistics. CRC Press. Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. In: Applied Geostatistics Series, Oxford University Press. Honarkhah, M., Caers, J., 2010. Stochastic simulation of patterns using distance-based pattern modeling. Math. Geosci. 42 (5), 487–517. Hu, L., Chugunova, T., 2008. Multiple-point geostatistics for modeling subsurface heterogeneity: A comprehensive review. Water Resour. Res. 44 (11). Isaaks, E., Srivastava, R., 1989. Applied Geostatistics. Oxford University Press. Jha, S.K., Mariethoz, G., Evans, J.P., McCabe, M.F., 2013. Demonstration of a geostatistical approach to physically consistent downscaling of climate modeling simulations. Water Resour. Res. 49 (1), 245–259. Journel, A.G., 1997. Deterministic geostatistics: A new visit. In: Baafi, E.Y., Schofield, N.A. (Eds.), Geostatistics Wollongong 96. Vol. 1. Kluwer Academic Publisher, pp. 292–301. Journel, A.G., Rossi, M., 1989. When do we need a trend model in kriging? Math. Geol. 21 (7), 715–739. Kanevski, M., Maignan, M., Pozdnoukhov, A., Timonin, V., 2008. Interest rates mapping. Physica A 387 (15), 3897–3903. Keys, R., 1981. Cubic convolution interpolation for digital image processing. IEEE Trans. Speech Signal Process. 29 (I), 1153–1160. Lahiri, S.N., Lee, Y., Cressie, N., 2002. On asymptotic distribution and asymptotic efficiency of least squares estimators of spatial variogram parameters. J. Statist. Plann. Inference 103 (1), 65–85. Mahmud, K., Mariethoz, G., Caers, J., Tahmasebi, P., Baker, A., 2014. Simulation of earth textures by conditional image quilting. Water Resour. Res. 50 (4), 3088–3107.
L. Li et al. / Spatial Statistics (
)
–
29
Mardia, K.V., Marshall, R., 1984. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71 (1), 135–146. Mariethoz, G., Caers, J., 2014. Multiple-Point Geostatistics: Stochastic Modeling with Training Images. Wiley. Mariethoz, G., Kelly, B.F., Baker, A., 2012. Quantifying the value of laminated stalagmites for paleoclimate reconstructions. Geophys. Res. Lett. 39 (5). Mariethoz, G., Lefebvre, S., 2014. Bridges between multiple-point geostatistics and texture synthesis: Review and guidelines for future research. Comput. Geosci.. Mariethoz, G., Renard, P., Straubhaar, J., 2010. The direct sampling method to perform multiple-point geostatistical simulations. Water Resour. Res. 46 (11). Matheron, G., 1969. Le Krigeage Universel. In: Fontainebleau: Cahiers du Centre de Morphologie Mathématique, vol. 1. École des Mines de Paris. Matheron, G., 1970. La Théorie des Variables Régionalisées, et ses Applications. In: Les Cahiers du Centre de Morphologie Mathématique de Fontainebleau, École Nationale Supérieure des Mines. Matheron, G., 1978. Estimer et Choisir: Essai sur la Pratique des Probabilités. In: Cahiers du Centre de morphologie mathématique de Fontainebleau, Ecole Nationale Superieure des Mines de Paris. Meerschman, E., Van Meirvenne, M., Van De Vijver, E., De Smedt, P., Islam, M., Saey, T., 2013. Mapping complex soil patterns with multiple-point geostatistics. Eur. J. Soil Sci. 64 (2), 183–191. Ortiz, J.M., 2008. An overview of the challenges of multiple-point geostatistics. In: Geostats 2008—Proceedings of the Eighth International Geostatistics Congress. Vol. 1, pp. 11–20. Paciorek, C.J., 2010. The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Statist. Sci. 25 (1), 107. Pardo-Iguzquiza, E., Dowd, P., 1998. The second-order stationary universal kriging model revisited. Math. Geol. 30 (4), 347–378. Peredo, O., Ortiz, J.M., 2011. Parallel implementation of simulated annealing to reproduce multiple-point statistics. Comput. Geosci. 37 (8), 1110–1121. Pérez, C., Mariethoz, G., Ortiz, J.M., 2014. Verifying the high-order consistency of training images with data for multiple-point geostatistics. Comput. Geosci. 70, 190–205. Pham, T.D., 2012. Supervised restoration of degraded medical images using multiple-point geostatistics. Comput. Methods Programs Biomed. 106 (3), 201–209. Relethford, J.H., 2008. Geostatistics and spatial analysis in biological anthropology. Am. J. Phys. Anthropol. 136 (1), 1–10. Renard, D., 2014. B.N.D.N.B.H.O.F.L.F. RGeostats: Geostatistical R Package. MinesTech Paris; Paris, France. URL: http://rgeostats. free.fr/. Rivoirard, J., Romary, T., 2011. Continuity for kriging with moving neighborhood. Math. Geosci. 43 (4), 469–481. Romary, T., de Fouquet, C., Malherbe, L., 2011. Sampling design for air quality measurement surveys: An optimization approach. Atmos. Environ. 45 (21), 3613–3620. Sandwell, D.T., 1987. Biharmonic Spline Interpolation of GEOS-3 and seat sat altimeter data. Geophys. Res. Lett. 14 (2), 139–142. Serra, J., 1983. Image Analysis and Mathematical Morphology. Academic Press, Inc.. Stein, M.L., 1999. Interpolation of Spatial Data: Some Theory for Kriging. Springer. Stien, M., Kolbjørnsen, O., 2011. Facies modeling using a Markov mesh model specification. Math. Geosci. 43 (6), 611–624. Strebelle, S., 2002. Conditional simulation of complex geological structures using multiple-point statistics. Math. Geol. 34 (1), 1–21. Tahmasebi, P., Hezarkhani, A., Sahimi, M., 2012. Multiple-point geostatistical modeling based on the cross-correlation functions. Comput. Geosci. 16 (3), 779–797. Tjelmeland, H., Besag, J., 1998. Markov random fields with higher-order interactions. Scand. J. Statist. 25 (3), 415–433. Ver Hoef, J.M., 2008. Spatial methods for plot-based sampling of wildlife populations. Environ. Ecol. Stat. 15 (1), 3–13. Winkler, G., 2003. Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction, Vol. 27. Springer. Zhang, T., Switzer, P., Journel, A., 2006. Filter-based classification of training image patterns for spatial simulation. Math. Geol. 38 (1), 63–80.