A machine learning approach to the potential-field method for implicit modeling of geological structures

A machine learning approach to the potential-field method for implicit modeling of geological structures

Computers & Geosciences 103 (2017) 173–182 Contents lists available at ScienceDirect Computers & Geosciences journal homepage: www.elsevier.com/loca...

2MB Sizes 88 Downloads 160 Views

Computers & Geosciences 103 (2017) 173–182

Contents lists available at ScienceDirect

Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo

Research paper

A machine learning approach to the potential-field method for implicit modeling of geological structures ⁎,1

Ítalo Gomes Gonçalves

MARK

, Sissa Kumaira, Felipe Guadagnin

Campus Caçapava do Sul, Universidade Federal do Pampa, Brazil

A R T I C L E I N F O

A BS T RAC T

Keywords: Implicit modeling Machine learning Potential field Kriging 3D geological modeling Compositional data analysis

Implicit modeling has experienced a rise in popularity over the last decade due to its advantages in terms of speed and reproducibility in comparison with manual digitization of geological structures. The potential-field method consists in interpolating a scalar function that indicates to which side of a geological boundary a given point belongs to, based on cokriging of point data and structural orientations. This work proposes a vector potential-field solution from a machine learning perspective, recasting the problem as multi-class classification, which alleviates some of the original method's assumptions. The potentials related to each geological class are interpreted in a compositional data framework. Variogram modeling is avoided through the use of maximum likelihood to train the model, and an uncertainty measure is introduced. The methodology was applied to the modeling of a sample dataset provided with the software Move™. The calculations were implemented in the R language and 3D visualizations were prepared with the rgl package.

1. Introduction Implicit methods for 3D geological modeling have risen in popularity over the last decade (Calcagno et al., 2008; Maxelon et al., 2009; Caumon et al., 2013; Hillier et al., 2013; Jessell et al., 2014; Vollgger et al., 2015; Wu et al., 2015). Their advantages over explicit models include reproducibility, automation, easy model update with newly acquired information, minimal user-induced bias, and the straightforward incorporation of multi source information (Vollgger et al., 2015). As stated by McLennan and Deutsch (2006), a good methodology for implicit modeling should be simple, realistic, and provide some measure of uncertainty in its results. Geostatistics is the technique of choice for modeling spatial variations of properties in geoscientific related problems (Chilès and Delfiner, 1999; Goovaerts, 1997; Isaaks and Srivastava, 1989). In geological and/or structural modeling, different forms of kriging have been used to model geological surfaces (Carr et al., 2001; Cowan et al., 2003, 2004; Vollgger et al., 2015), orientation data (Gumiaux et al., 2003), and geological surfaces constrained by orientation data (Aug, 2004; Calcagno et al., 2008; Chilès et al., 2004; Lajaunie et al., 1997). The latter was termed as “potential-field method”. It is possible to find implementations of these methods in commercial software. Machine learning comprises a set of statistical techniques that have a wide range of applications, such as spam detection, handwritten text



1

Correspondence to: Av. Pedro Anunciação, 111, Caçapava do Sul, RS, Brazil. E-mail address: [email protected] (Í.G. Gonçalves). Source code is available on https://github.com/italo-goncalves/geomod3D.git

http://dx.doi.org/10.1016/j.cageo.2017.03.015 Received 3 October 2016; Accepted 21 March 2017 Available online 22 March 2017 0098-3004/ © 2017 Elsevier Ltd. All rights reserved.

and speech recognition, and recommender systems. As defined by Flach (2012), machine learning is “the study of algorithms and systems that improve their knowledge or performance with experience”, with the “experience” coming from the available data. In other words, a machine learning algorithm effectively “learns” how to perform a particular task. Recently there have been works that apply machine learning methods in the geosciences, such as decision trees to aid in mineral prospection (Rodriguez-Galiano et al., 2015) and geological mapping (Cracknell and Reading, 2014), support vector machines for geological modeling (Smirnoff et al., 2008; Wang et al., 2014), clustering methods to aid standard geostatistics (Kapageridis, 2014) and identify homogeneous domains in wide areas (Romary et al., 2014), and attempts to bridge the gap between machine learning and geostatistics (Hristopulos, 2015). Furthermore, kriging itself is a machine learning technique widely used for classification and regression tasks (Rasmussen and Williams, 2006) and function optimization (Chan, 2010) in any number of dimensions, although it is referred to as Gaussian Process in the machine learning literature (Rasmussen and Williams, 2006). In order to further promote the application of machine learning in the geosciences, the present work approaches the original potentialfield method by Lajaunie et al. (1997) from another perspective, recasting it as a multi-class classification problem. Classification is done within the compositional data framework from Tolosana-Delgado

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

∂ϕc(xs) = ∇ϕc(xs), l ds = 0, ∂l d

et al. (2008) with the probabilistic treatment by Rasmussen and Williams (2006). The covariance function parameters are inferred through maximization of the log-likelihood (Mardia and Marshall, 1984), eliminating the need for complicated manual variography of orientation data (Chilès et al., 2004; Calcagno et al., 2008). The model also does not depend on structural data to work and does not make assumptions on the structures’ polarity (i.e. the younging direction). The algorithms were implemented in the R language (Core Team, 2017) and tested on a dataset contained within the software Move™ (Midland Valley Exploration, 2014). The article is structured in the following manner: Sections 2 and 3 lay out the theory and methodology; Section 4 presents a case study; in Section 5 the results, strengths and limitations of the method are discussed; and Section 6 presents the conclusions and suggestions for future work.

where

i =1

3.1. Kriging estimation of the potential field Rasmussen and Williams (2006) define the Gaussian Process (GP) as “a collection of random variables, any finite number of which have a joint Gaussian distribution”. It can be seen as a distribution over functions, specified by a mean function m(x) and a covariance function k (x , y), with x and y being points in space. It is assumed that the potential components defined above are distributed as GPs:

ϕc(x) ∼ .7(mc(x), kc(x , y)),

(1)

expϕc(x) C +1

∑c =1 expϕc(x)

c ∈ 1, 2, …, C

(4)

In order to apply this model to the geological data, one must deal with the matter of how to assign potential vectors to the data points, as no meaningful probability can be derived for them. All one knows is that a point is either inside a geological class or in a boundary between two classes, which only indicates the dominant class (i.e. the one with highest potential component and probability) but gives no hint as to its absolute value. Note that simply assigning 0/1 values for the probabilities is not possible, as this would result in an infinite potential. Tolosana-Delgado et al. (2008) deal with this by assigning an arbitrary b probability 1 − b to a point's true class and C to the others, with 0 < b < 0.5. It is shown that this results in a constant scale factor β in log-transformed space. In order to stay aligned with the definition of the GP given above, the approach proposed here is to assume the potential components are independent, normally distributed random variables with variance σ02. The mean potential component ϕc(xp) at a data point p, p ∈ 1, 2, …, P , is then given by

where ϕc(x) is the log-transformed value and the second term in the right side corresponds to the logarithm of the geometric mean of the C +1 probabilities. This transformation has the property that ∑c =1 ϕc(x) = 0 . The probabilities can be recovered with a back-transform, also known in the machine learning community as the softmax function (Bishop, 2006; Flach, 2012):

πc(x) =

ds , ∇ϕc(xs) is its is the directional derivative of ϕc at xs along l

3. Problem formulation

C +1

∑ ln πi(x)

(3)

gradient, and ·, · is the scalar product. In other words, the gradient of every potential component must be orthogonal to the measured directions. If a point belongs to a structural plane there are two structural directions associated with it, one along dip and the other along strike. Note that, unlike the original formulation from Lajaunie et al. (1997), for this method the structural measurements are completely optional, although they represent an important contribution to the final model. Furthermore, due to the way that the potential field is defined here, there is no need to assume an arbitrary modulus for the gradient or to state its structural polarity.

Consider a point in space with coordinates x = (x, y, z )T and geological class label given by L (x) = c , c ∈ 1, 2, …, C , where C is the number of geological classes involved in the problem (geological formations, domains, lithologies, etc.). Associated with each class there is an unknown probability πc(x) = Pr(L (x) = c ), as well as an extra probability πC+1(x) related to an “unknown” class (its role in the assessment of model uncertainty shall be explained later). Compositional data theory (Pawlowsky-Glahn and Buccianti, 2011; Buccianti et al., 2006) defines the simplex, a C-dimensional subspace embedded in a (C + 1)-dimensional space in which the probabilities are C +1 contained. This is so due to the natural constraint that ∑c =1 πc(x) = 1 and πc > 0 ∀ c . The theory states that the best way to work with data constrained in this way is through log-odds. One way of doing so is the central log-ratio transformation (clr):

1 C+1

∂ϕc(xs) ∂l d s

2. Problem statement

ϕc(x) = ln πc(x) −

c ∈ 1, 2, …, C

s

(2)

ϕc(xp)

The log-transformed probabilities are usually called coordinates, due to the vector space structure of the simplex. Here they are denominated potential components in order to establish the link with the potential field from Lajaunie et al. (1997). In that work the authors define the potential as a scalar field, with the geological surfaces of interest being modeled as different isovalues in that field. Here the potential field is defined as a vector field ϕ(x) = (ϕ1(x), ϕ2(x), …, ϕC (x))T , and the potential components ϕc(x) are all linked due to their compositional origin. Implicit modeling amounts to estimating ϕ over the region of interest, the details of which are given in the next section.

⎧1, ⎪ ⎪− 1 , =⎨ C ⎪1− 1 ⎪ C = ⎩ 2

if xp belongs to class c if xpdoes not belong to class c C−1 , 2C

if xplies at the boundary between c and another class (5)

The potential component of the unknown class is calculated as C ϕC +1(x) = − ∑c =1 ϕc(x) in order to respect the zero-sum property of clrtransformed variables. Due to the uncertainty over the potential components, back-transforming through (2) is no longer correct. Instead, one should estimate the proportion of the multivariate normal probability density that favors each class, which can be done through simulation (Rasmussen and Williams, 2006). However, the quantities of interest here are the positions of the geological boundaries, so the calculation of probabilities is a secondary issue. Once P data points and S structural points (hereafter called training points) are observed and Q test points (where the potential is to be estimated) are defined, conditioning the joint distribution of the potential components ϕcP and ϕcQ and directional derivatives of a given class on the known data yields (Rasmussen and Williams, 2006;

2.1 Incorporating structural data The modeled geological bodies are expected to conform to the structural measurements obtained in the field. In order for the model to exhibit this behavior, the iso-potential surfaces must pass tangentially to the measured orientation lines and planes. Given a point s at ds (a position xs , s ∈ 1, 2, …, S , with an associated structural direction l unit vector), this tangent constraint implies that 174

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

⎛ ∂ϕ (x) ∂ϕ (y) ⎞ ∂x∂yk (x , y) Cov ⎜ c , c ⎟ = l l∂vl ∂vl ⎠ ∂u ⎝ ∂u

Lajaunie et al., 1997; Chilès and Delfiner, 1999):

⎡ K + Σ KPS ⎤ ⎡ ϕcP − mc(x)⎤ ⎥ + mc(x) ϕcQ = [ KQP KQS ]⎢ PP ⎥ ⎢ KSS ⎦ ⎣ 0S ⎣ KSP ⎦ −1

VQ = KQQ − [ KQP

⎡ K + Σ KPS ⎤ KPQ ⎤ KQS ]⎢ PP ⎥ + σ02 I ⎥ ⎢ KSS ⎦ ⎢⎣ KSQ ⎥⎦ ⎣ KSP

(14) (6)

where T is the transpose operation. As k (x , y) is a function of two vectors, a special notation is employed:

−1⎡

ij KPS

∂vl

represents the derivative

∂x∂ yk (x, y)

(7)

k (x , y) = σ12 exp( − 3d 2 )

(15)

∇y k (x , y) = 6k (x, y)AT A(x − y)

(16)

∇x ∇y k (x , y) = k (x , y){6AT A − 36

(8)

⎛ ∂ϕ (xj) ⎞ ⎟ = Cov ⎜⎜ϕc(xi), c ⎟ ∂l dj ⎠ ⎝

∂ yk (x, y)

with respect to vector y along direction vl , ∂ul∂vl is the second-order l and vl , ∇y is the gradient vector with cross derivative along directions u respect to y , and ∇x ∇y is the Hessian matrix containing the second-order partial and cross derivatives. As the potential field must be differentiable, the covariance function to be used must be at least parabolic at the origin. A popular choice in the machine learning field is the Gaussian covariance:

where KPP , KSS , and KPS are the point to point, structure to structure, and point to structure covariance matrices, respectively, Σ is a diagonal noise matrix, and VQ is the posterior covariance matrix of the potential components at the test points. Switching the subscript P by Q yields the covariances involving the test points. 0S is the vector containing the directional derivatives of a component, all of which have a value of 0 due to (3). Note that the structures still influence the result due to the cross-covariances. The entries of the matrices’ in (6) and (7) are given by ij KPP = Cov (ϕc(xi), ϕc(xj))

lT ∇x ∇y k (x , y)vl =u

× [AT A(x − y)]

[AT A(x − y)]T } 2

T T

d = (x − y) A A(x − y)

(10)

⎧ 0, point i lies in a boundary involving class c Σ ii = ⎨ 2 ⎩ σ0 , otherwise

(11)

is the model's amplitude or sill, A is an anisotropy matrix, where and d is the normalized distance between x and y after adjusting for anisotropy. If one chooses to work with an isotropic function with 1 range r, the matrix A reduces to r I . Eq. (15) is scaled so that when d=1 the covariance reaches approximately 5% of its maximum value. The Gaussian covariance is known to result in very smooth surfaces. Calcagno et al. (2008) recommend the use of the less smooth cubic covariance:

⎪ ⎪

k (x , y) = σ12(1 − 7d 2 +

Eqs. (6) and (7) are the simple cokriging solution for the predictive mean and (co)variance of the potential components at the test points.2 As the potential components are assumed independent the calculations can be done separately for each value of c. If one is not interested in doing simulations to estimate probabilities, equation (6) can be used alone to give a conditional expectation estimate for the potentials. Inspection of (6) reveals that its form is equivalent to the technique known as dual kriging (Chilès and Delfiner, 1999). The matrix Σ represents the noise variance or nugget effect associated with the training potential components. This assumption of uncertainty means that the kriging solution does not have to be an exact interpolator, which results in a smoothly varying potential vector over the study region. The potential components at boundary points are an exception (see (11)), as the boundary surfaces must be honored by the model. The assumption of noisy potential components also makes the underlying probabilities invariant to a rescaling of the potential vector. If the values in (5) are multiplied by a constant (such as the factor β from Tolosana-Delgado et al. (2008)), the associated noise variance and all covariance matrices (as well as the kriging variance) are proportionately scaled but the proportion of the multivariate normal probability density favoring each class remains constant, which is an interesting property. l and vl , and a covariance Considering points x and y , directions u function k (x , y), the values in (8), (9) and (10) can be calculated by (Chilès and Delfiner, 1999; Rasmussen and Williams, 2006):

Cov (ϕc(x), ϕc(y)) = k (x , y)

(18)

σ12

(9)

⎛ ∂ϕ (x ) ∂ϕ (x ) ⎞ j i ij ⎟ = Cov ⎜⎜ c , c KSS ⎟ l l ∂ d ∂ d ⎝ i j ⎠

(17)

35 3 7 5 3 7 d − d + d ) 4 2 4

⎡ 105 35 3 21 5⎤ T ∇y k (x , y) = σ12⎢14 − d+ d − d ⎥ A A(x − y) ⎣ 4 2 4 ⎦

(19)

(20)

⎡ 105 35 3 21 5⎤ T ∇x ∇y k (x , y) = σ12⎢14 − d+ d − d ⎥AA ⎣ 4 2 4 ⎦ +

σ12 ⎡ 105 105 2 21 4⎤ T + d − d ⎥[A A(x − y)][AT A(x − y)]T ⎢− d ⎣ 4 2 4 ⎦ (21)

Figs. 1 and 2 illustrate the methodology for 1 and 2 dimensions and the two covariance functions, showing the potential field of a single class. Fig. 1 shows error bars related to the noise variance at the data points and a confidence interval for the kriging estimate. The proportion of the gray band in each side of the dashed line is related to a point's probability of belonging to each class. The structural data help compensate the lack of a data point at x=0, giving a more accurate estimate for the boundary at that position. Fig. 2 displays contour lines for different potential values in order to show the shape of the potential field away from the boundary between the classes. It can be seen that the structural data influence the result as much as the point data. Note that, in order to obtain a continuous model, all data must be included in the calculations in (6) and (7) as in Lajaunie et al. (1997). It seems that the Gaussian covariance allows the structures’ effect to propagate through a greater distance.

(12) 3.2. Model parameters and training

⎛ ∂yk (x , y) ∂ϕ (y) ⎞ Cov ⎜ϕc(x), c ⎟ = ∂vl ⎠ ∂vl ⎝

= vlT ∇y k (x , y)

With the methodology laid out, there comes the question of which parameters are best for a given application. Simple kriging requires that the mean function m(x) be provided by the user. In the simplest case, with a single, isotropic covariance function, the parameters σ02, σ12, and range r must also be given. With combinations of multiple covariance functions, anisotropy, and the use of different parameters

(13)

2 In order to ensure invertibility and numerical stability of the covariance matrix in (6) and (7), it may be necessary to add a small value to its main diagonal (such as 10−6 or less).

175

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

Fig. 1. Example of potential field interpolation in one dimension. (a) Gaussian covariance; (b) Gaussian covariance with structural data; (c) cubic covariance; (d) cubic covariance with structural data. Black – assumed true potential; red – 95% confidence interval for the data potential; blue – kriging interpolation; light green – structural data (zero-valued derivatives); gray ribbon – 95% confidence interval for kriging result. The dashed line marks the potential level that separates the two classes, according to (5). Parameters: σ02 = 0.0625, σ12 = 0.14 , r=3. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

involved (Calcagno et al., 2008). In the present case the abstract nature of the potential field and the assumption of noisy data values makes the usefulness of the experimental variogram questionable. This issue can be solved by applying a common practice in the machine learning field, which is the use of the log-likelihood. Its use in geostatistics is less common, but studies do exist (Chu et al., 2011; Mardia and Marshall, 1984). For the present case the log-likelihood is given by (Rasmussen and Williams, 2006):

for each class, the number of parameters increases considerably. With many free parameters, care must be taken not to cause model overfitting (Bishop, 2006; Flach, 2012) which would give spurious results when generalizing to other locations. Some of these values can be set taking into account the definition of the problem. Given the values set to the data potential components in (5), it is expected that the estimated values do not rise or fall too much beyond those limits. It is therefore reasonable to assume the limits correspond to two standard deviations above and below the mean at a given position. The signal variance σ12 associated with this amplitude is then given by

⎛ 1+ 1 σ12 = ⎜⎜ ⎝2 2

1 C

⎞2 2 ⎟ = (C + 1) 2 ⎟ 16C ⎠

3c = −

(22) 1

The mean function m(x) is given a constant value of − C . This implies that away from the data the most likely class is the unknown class which was introduced before. In other words, the default class anywhere is undefined, unless there is data nearby to “prove” otherwise. Thus, the use of simple kriging in this context is justified as the more conservative approach, as ordinary kriging would tend to favor the class with greater spatial representation in the data set in undersampled regions. Kriging with trend is automatically ruled out as the potential field is by definition drift-free (the potential components at the data points are all contained in the interval [ − 1, 1]). The signal noise σ02 and range r remain undetermined (σ02 could be set using a similar logic as for σ12, but one must take care not to overconstrain the model). These are traditionally found through a spatial continuity analysis using the experimental variogram (Isaaks and Srivastava, 1989; Goovaerts, 1997; Chilès and Delfiner, 1999; Chilès et al., 2004; Aug, 2004). However, the variography is usually time-consuming and the modeling procedure carries some subjectivity. The workload is increased further when there is structural data

1 T −1 1 P+S f c K fc − log |K| − log 2π 2 2 2

(23)

⎡ ϕ − m(x)⎤ ⎥ fc = ⎢ cP 0S ⎣ ⎦

(24)

⎡ K + Σ KPS ⎤ K = ⎢ PP ⎥ KSS ⎦ ⎣ KSP

(25)

where |·| stands for the determinant. The likelihood is the probability of “seeing” the data for a given set of parameters, so model training amounts to finding the parameters that maximize it. This is usually done using a gradient-based method, taking the partial derivatives of K with respect to the parameters. In the present case, with only two free parameters, it suffices to make an exhaustive grid search, with the advantage that the values of 3c can be plotted and visualized. By keeping the same σ02 and r for all classes, it is possible to optimize the sum of the C log-likelihoods. If one wishes to have separate parameters for each class and work with anisotropy, one must resort to more complex optimization methods, such as the genetic algorithm (Scrucca, 2016, 2013; Michalewicz, 1996). 3.3. Drawing the geological boundaries A geological boundary is a surface where two potential components have the same value, though this value does not need to be constant 176

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

Fig. 2. Example of potential field interpolation in two dimensions. (a) Gaussian covariance; (b) Gaussian covariance with structural data; (c) cubic covariance; (d) cubic covariance with structural data. Point data: blue – positive; red – negative; black – boundary points. Double arrows – structural data. Thin lines – isopotential lines at the quantiles of the interpolated potential field at 5% increments. Thick line – boundary potential. Parameters: σ02 = 0.0625, σ12 = 0.14 , r=75. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

The unknown class plays a special role here, as its boundary can be seen as the limit of a region of confidence for the model. This way, even without computing the kriging variance it is possible to obtain a measure of model uncertainty.

throughout the surface. In order to locate this surface, from the estimated potential vector ϕ* = (ϕ1, ϕ2 , …, ϕC , ϕC +1)T a skewed potential is calculated:

ω = ϕ* −

ϕ(1) + ϕ(2) 2

1

(26) 4. Case study

where ϕ(1) and ϕ(2) are the highest and second highest potential components in ϕ*, respectively, and 1 is a vector of ones. If the computations are done over a regular grid, the boundary of class c can be drawn by using an algorithm such as the marching cubes (Feng and Tierney, 2008) to interpolate the ωc = 0 isosurface, which marks the transition between the region in which c is the most likely class and the regions dominated by other classes. Eq. (26) ensures that in any point only a single potential component is positive, eliminating the possibility of ambiguities such as, for example, the boundary between formations 1 and 2 passing through the middle of formation 3. As long as each geological class is well represented in the data, the imposition of onlap and/or erosion relationships (Calcagno et al., 2008) may not be necessary.

The model presented here was tested on a dataset contained within the SCAT tutorial of the software Move. It consists of two wells crossing both flanks of a set of folded strata. The wells cross 31 geological horizons (which gives the number of classes C=32), and 66 dip measurements are provided along the wells’ length (Fig. 3). In total there are 458 data points (62 of which lie in geological boundaries) and 132 structural directions (each orientation plane contributes with 2 orthogonal vectors). The data is contained in a 400 × 800 × 1100 m box. Given that there are 32 classes, the signal variance calculated with (22) is σ12 = 0.06647. In order to keep the model simple and facilitate the display of results, an isotropic covariance function with the same 177

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

Fig. 3. Data used to build the model (left) and the reference surfaces (right). The colored spheres represent different geological layers and the yellow discs represent orientation measurements. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. Grid search for the determination of best-fit model parameters. Range is in meters.

range r for all classes is employed. The noise variance σ02 is also kept fixed. The model was trained by performing a grid search with different combinations of r and σ02 and adding the log-likelihoods of all classes

calculated with (23). Fig. 4 displays the values obtained for the Gaussian and cubic covariances. The results are somewhat counter-intuitive. Instead of a sharp 178

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

the region between the two flanks of the fold, with the added effect of giving an estimate for the bottom layer's lower boundary (note that model E places the unknown class between the lower layers, which is an odd result). Table 1 displays two measures that summarize the models: coverage, the fraction of grid nodes not classified as unknown, and accuracy, the fraction of nodes correctly classified (excluding unknowns) given the reference in Fig. 3. Of course, for real datasets a reference is not available but the accuracy can be estimated by cross-validation. The accuracy values might seem low at first glance, but they should be benchmarked against random guessing (Bishop, 2006; Flach, 2012) which for 32 classes gives an expected accuracy of 1/32 = 0.03125. Accuracy for each class individually is displayed in Fig. 8. In can be seen that some classes are more difficult to model than others, especially in he lower part of the region. The Gaussian covariance has shown a better performance than the cubic function overall, but the situation might be reversed for a different dataset. The computations took approximately 27 min for the whole grid search and 25 min to build each model, using a computer with CPU Intel Core i5-2300 with frequency of 2.8 GHz and 8 GB RAM, running under Windows 10.

Table 1 Log-likelihood and results for different combinations of model parameters. Model

Covariance

Range (m)

σ02

Log-likelihood

Coverage

Accuracy

A B C D E F

Gaussian Gaussian Gaussian cubic cubic cubic

300 600 900 300 600 900

0.14 0.20 0.22 0.10 0.14 0.16

−22106.0 −23881.4 −25054.9 −21122.1 −22282.3 −23162.6

0.5222 0.8279 0.9447 0.2536 0.7696 0.8116

0.4318 0.4406 0.5284 0.4541 0.3361 0.3364

peak, the log-likelihood presents a wide plateau with very similar values and a tendency to favor small ranges, which means that in practical terms the solution is not unique. This is likely to be caused by the small spatial coverage of the data. In this situation there are multiple, approximately equally plausible hypothesis for the parameters given the available data. In order to verify the effect of choosing one set of parameters over another, different models were generated for ranges of 300, 600, and 900 m, choosing the best noise variance for each range and covariance function (Table 1). The potential field was estimated over a grid of 10 × 10 × 10 m and the boundary surfaces were drawn with the marching cubes algorithm. The visualizations were made with the rgl package (Adler and Murdoch, 2016). The resulting surfaces are displayed in Figs. 5, 6 and 7. In the upper portion of the region all models show a good agreement overall. In the lower part the models generated with the Gaussian covariance show a more visually convincing result, despite having a slightly lower log-likelihood than the models generated with the cubic covariance. For the range of 300 m the effect of the unknown class assumption is clearly visible, as it dominates all the blank spaces in Fig. 5. The envelope of the unknown class serves as a confidence region for the model. The fact that the log-likelihood is favoring smaller ranges indicates that the model could benefit from more data, as it is reluctant to extrapolate too far from the available data points. For the higher ranges the unknown class is visible only in the lower portion of

5. Discussion The method presented here was conceived over the foundations laid out by Tolosana-Delgado et al. (2008) and Lajaunie et al. (1997), with a probabilistic view borrowed from Rasmussen and Williams (2006). The original simplicial indicator kriging method (Tolosana-Delgado et al., 2008) consists in interpolating a log-transformed version of the variables of interest, according to compositional data principles (Pawlowsky-Glahn and Buccianti, 2011; Buccianti et al., 2006). In order to avoid assigning arbitrary probabilities to the data points, here it is assumed that the log-transformed variables are normally distributed, which results in a range of possible values for the backtransformed probabilities. Tolosana-Delgado et al. (2008) work with

Fig. 5. Implicit model obtained for a range of 300 m. Left: Gaussian covariance (model A). Right: cubic covariance (model D).

179

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

Fig. 6. Implicit model obtained for a range of 600 m. Left: Gaussian covariance (model B). Right: cubic covariance (model E).

Fig. 7. Implicit model obtained for a range of 900 m. Left: Gaussian covariance (model C). Right: cubic covariance (model F).

ables are assumed independent in order to simplify the calculations, but one could also devise a system where, for example, geological units known to be neighbors are given a high correlation. A disadvantage of this approach would be that, instead of solving C systems of equations

the isometric log-ratio transformation (ilr) instead of clr, but the choice is a matter of how one wishes to interpret the transformed variables and does not affect the end result upon back-transformation (Pawlowsky-Glahn and Buccianti, 2011). The log-transformed vari180

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

Fig. 8. Accuracy for each class and model. Class numbers are counted from the bottom up in Figs. 5, 6, and 7. The letters correspond to model labels in Table 1.

(Fouedjio, 2016), which are left for future research.

of size (P + S ) × (P + S ), one would need to handle a single matrix of size (CP + S ) × (CP + S ), which is more computationally intensive. The idea of relating structural measurements to the derivatives of the function of interest was originally conceived by Lajaunie et al. (1997). In that work the surfaces of interest are modeled as isovalues of an arbitrary scalar field somewhat related with the age of deposition of each geological unit, which was termed as potential field. A vector with arbitrary modulus is placed normal to each structural plane, representing this scalar field's gradient. In this work, the term “potential” is not related to the age of the geological units, instead representing the statistical potential of a point belonging to a given geological class. An important difference between that approach and the present one is that here the potential is a vector field, with each coordinate giving the inside/outside status of a point in space regarding each class. In the original method some surfaces must be modeled separately and combined with the others using geological rules (Aug, 2004; Calcagno et al., 2008). The present vector field model infers such rules given the available data, following the machine learning mindset. Another difference is that this method makes use of data located between the geological boundaries and avoids the assumption of arbitrary gradients by explicitly setting the derivatives in the orthogonal directions to 0 (Lajaunie et al. (1997) also worked with data in the form of tangents, but found them to be less informative than gradient data). That said, it is always beneficial to test different models and hypothesis in a modeling project. An interesting avenue for future work is the use of ensemble models (Bishop, 2006; Flach, 2012), which can combine the strengths of many different single models. In regions with complex geology the use of a covariance function with fixed range and/or anisotropy (even if different between classes) might not suffice to provide a good implicit model. The ideal would be for the covariance ellipsoid to adapt to the local configurations of the data. Possibilities include the use of mixtures of GPs (Tresp, 2001; Stachniss et al., 2008) and non-stationary covariance functions

5.1 Stability of the solution and computational cost The practice of modeling with orientation data requires some care due to the product AT A in equations (15)–(21). As the entries of A are inversely proportional to the covariance function's range, the entries of KPS and KSS can be orders of magnitude smaller than those of KPP , resulting in numerical instability. This can be avoided by adding a small regularization term to the diagonal of KSS , but this will result in surfaces that are no longer tangential to the orientation data, and instead intersect the orientations at a small angle. Too much regularization will filter the orientation data altogether, so a compromise must be made between the solution's stability and fidelity to the orientation data. In order to avoid discontinuities in the model, all data must be used to make predictions at new points. The Cholesky factorization of K can be done in 6((P + S )3) time and stored in memory, and predictions take 6(P ) time for the potential (as the derivatives have a value of 0 and can be omitted from the calculation) and 6((P + S )2 ) for the kriging variance per grid node. This is done separately for each class, so if the number of classes and data points is very high the CPU and memory costs can be prohibitive. The machine learning and geostatistics communities have developed sparse methods to overcome this limitation (Quiñonero-candela et al., 2005; Snelson and Ghahramani, 2006; Romary, 2013). Implementation of these methods for the present model is an interesting topic for future investigations. 6. Conclusions and future work This work presented a new machine learning method for implicit modeling of geological bodies. Training the model by maximizing the log-likelihood avoids the laborious and subjective variogram modeling, 181

Computers & Geosciences 103 (2017) 173–182

Í.G. Gonçalves et al.

Gumiaux, C., Gapais, D., Brun, J.P., 2003. Geostatistics applied to best-fit interpolation of orientation data. Tectonophysics 376 (3–4), 241–259. Hillier, M., de Kemp, E., Schetselaar, E., 2013. 3D form line construction by structural field interpolation (SFI) of geologic strike and dip observations. J. Struct. Geol. 51, 167–179, (〈http://dx.doi.org/10.1016/j.jsg.2013.01.012〉). Hristopulos, D.T., 2015. Stochastic Local Interaction (SLI) model: Bridging machine learning and geostatistics. Comput. Geosci. 85 (january), 26–37, (http://linkinghub.elsevier.com/retrieve/pii/S0098300415001302http:// dx.doi.org/10.1016/j.cageo.2015.05.018). Isaaks, E. H., Srivastava, R. M., 1989. Applied Geostatistics. Oxford University, New York, New York, USA. Jessell, M., Aillères, L., Kemp, E.D., Lindsay, M., Wellmann, F., Hillier, M., Laurent, G., Carmichael, T., Martin, R., 2014. Next Generation Three-Dimensional Geologic Modeling and Inversion. SEG Spec. Publ. 18 (SEPTEMBER), 261–272. Kapageridis, I.K., 2014. Variable lag variography using k-means clustering. Comput. Geosci. 85, 49–63, (〈http://dx.doi.org/10.1016/j.cageo.2015.04.004〉.). Lajaunie, C., Courrioux, G., Manuel, L., 1997. Foliation fields and 3D cartography in geology: Principles of a method based on potential interpolation. Math. Geol. 29 (4), 571–584. Mardia, K.V., Marshall, R.J., 1984. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71 (1), 135–146. Maxelon, M., Renard, P., Courrioux, G., Brändli, M., Mancktelow, N., 2009. A workflow to facilitate three-dimensional geometrical modelling of complex poly-deformed geological units. Comput. Geosci. 35 (3), 644–658. McLennan, J. A., Deutsch, C. V., 2006. Implicit Boundary Modeling (BOUNDSIM). Tech. rep., Centre for Computational Geostatistics, Edmonton. Michalewicz, Z., 1996. Genetic Algorithms + Data Structures = Evolution Programs 3rd ed.. Springer, Berlin. Midland Valley Exploration, 2014. Move 〈http://www.mve.com/software/move〉. Pawlowsky-Glahn, V., Buccianti, A., 2011. Compositional Data Analysis: Theory and Applications. Wiley, Chichester. Quiñonero-candela, J., Rasmussen, C.E., Herbrich, R., 2005. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, 1935–1959, (〈http://jmlr.org/papers/volume6/quinonero-candela05a/quinonerocandela05a.pdf〉.). R Core Team, 2017. R: A language and environment for statistical computing 〈https:// www.r-project.org/〉. Rasmussen, C.E., Williams, C.K.I., 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, Massachusetts, (〈http://www.GaussianProcess.org/gpml〉.). Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., Chica-Rivas, M., 2015. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 71, 804–818, (〈http://linkinghub.elsevier.com/retrieve/pii/S0169136815000037〉.). Romary, T., 2013. Incomplete cholesky decomposition for the kriging of large datasets. Spat. Stat. 5 (1), 85–99, (〈http://dx.doi.org/10.1016/j.spasta.2013.04.008〉.). Romary, T., Ors, F., Rivoirard, J., Deraisme, J., 2014. Unsupervised classification of multivariate geostatistical data: Two algorithms. Comput. Geosci. 85, 96–103. Scrucca, L., 2013. GA: a package for Genetic Algorithms in R. J. Stat. Softw. 53 (4), 1–37, (〈http://www.jstatsoft.org/v53/i04/〉.). Scrucca, L., 2016. On some extensions to GA package: hybrid optimisation, parallelisation and islands evolution. 〈http://arxiv.org/abs/1605.01931〉. Smirnoff, A., Boisvert, E., Paradis, S.J., 2008. Support vector machine for 3D modelling from sparse geological information of various origins. Comput. Geosci. 34, 127–143. Snelson, E., Ghahramani, Z., 2006. Sparse Gaussian Processes using Pseudo-inputs. Adv. Neural Inf. Process. Syst. 18, 1257–1264, (〈http://papers.nips.cc/paper/2857-sparse-gaussian-processes-usingpseudo-inputs.pdf〉.). Stachniss, C., Plagemann, C., Lilienthal, a., Burgard, W., 2008. Gas distribution modeling using sparse gaussian process mixture models. Adv. Robot., 1–8, (〈http://www.informatik.uni-freiburg.de/~stachnis/pdf/stachniss08rss.pdf〉.). Tolosana-Delgado, R., Pawlowsky-Glahn, V., Egozcue, J.-J., 2008. Indicator Kriging without Order Relation Violations. Math. Geosci. 40 (3), 327–347, (〈http://link.springer.com/10.1007/s11004-008-9146-8〉.). Tresp, V., 2001. Mixtures of Gaussian processes. Adv. Neural Inf. Process. Syst., 654–660. Vollgger, S.A., Cruden, A.R., Ailleres, L., Cowan, E.J., 2015. Regional dome evolution and its control on ore-grade distribution: Insights from 3D implicit modelling of the Navachab gold deposit. Namib. Ore Geol. Rev. 69, 268–284, (〈http://linkinghub.elsevier.com/retrieve/pii/S0169136815000633〉.). Wang, G., Carr, T.R., Ju, Y., Li, C., 2014. Identifying organic-rich Marcellus Shale lithofacies by support vector machine classifier in the Appalachian basin. Comput. Geosci. 64, 52–60, (〈http://dx.doi.org/10.1016/j.cageo.2013.12.002〉.). Wu, Q., Xu, H., Zou, X., Lei, H., 2015. A 3D modeling approach to complex faults with multi-source data. Comput. Geosci. 77, 126–137, (〈http://linkinghub.elsevier.com/retrieve/pii/S0098300414002313〉.).

but does not guarantee a unique solution in practical terms. This method is capable of working even without any orientation measurements, and it can model the whole region of interest at once, eliminating the need for a separate model for each surface. We believe that the current practices in geostatistics can be much improved by the adoption of machine learning techniques. The efficiency of this method can be increased by employing mixture or non-stationary models in order to accommodate local structures, sparse Gaussian processes to reduce computation time, and ensemble models to reduce ambiguity and increase accuracy. These topics shall be researched in future work. Acknowledgements The authors would like to thank the reviewers, whose contributions have greatly improved the paper. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. References Adler, D., Murdoch, D., 2016. rgl: 3D Visualization Using OpenGL 〈https://cran.rproject.org/package=rgl〉. Aug, C., 2004. Modélisation géologique 3D et caractérisation des incertitudes par la méthode du champ de potentiel. (Ph.D. thesis). Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer, Singapore. Buccianti, A., Mateu-Figueras, G., Pawlowsky-Glahn, V., 2006. Compositional Data Analysis in the Geosciences. The Geological Society, London. Calcagno, P., Chilès, J.P., Courrioux, G., Guillen, A., 2008. Geological modelling from field data and geological knowledge. Part I. Modelling method coupling 3D potentialfield interpolation and geological rules. Phys. Earth Planet. Inter. 171 (1–4), 147–157. Carr, J. C., Beatson, R. K., Cherrie, J. B., Mitchell, T. J., Fright, W. R., McCallum, B. C., Evans, T. R., 2001. Reconstruction and representation of 3D objects with radial basis functions. In: Proceedings of the 28th annual conference on Computer graphics and interactive techniques SIGGRAPH 01, 67-76 〈http://portal.acm.org/citation.cfm? Doid=383259.383266〉. Caumon, G., Gray, G., Antoine, C., Titeux, M.O., 2013. Three-dimensional implicit stratigraphic model building from remote sensing data on tetrahedral meshes: Theory and application to a regional model of la Popa Basin, NE Mexico. IEEE Trans. Geosci. Remote Sens. 51 (3), 1613–1621. Chan, K.-s., 2010. DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by Kriging-Based metamodeling and optimization. J. Stat. Softw. 35, 11. Chilès, J.-p., Delfiner, P., 1999. Geostatistics: Modeling Spatial Uncertainty. Wiley, New York, New York, USA. Chilès, J.P., Aug, C., Guillen, a., Lees, T., 2004. Modelling the geometry of geological units and its uncertainty in 3D from structural data: The Potential-Field Method. Orebody Model. Strateg. Mine Plan. - Spectr. 14 (July), 22–24. Chu, T., Zhu, J., Wang, H., 2011. Penalized maximum likelihood estimation and variable selection in geostatistics. Ann. Stat. 39 (5), 2607–2625. Cowan, E. J., Lane, R. G., Ross, H. J., 2004. Leapfrog’s implicit drawing tool: a new way of drawing geological objects of any shape rapidly in 3D. In: Proceedings of the Australian Institute of Geoscientists Mining Geology 2004 Workshop. Vol. Bulletin 4. pp. 23–25. 〈http://papers2://publication/uuid/F4EBE26D-E7E1-426B-9D2FD7CBBC15D9DA〉. Cowan, E., Beatson, R., Ross, H., Fright, W., McLennan, T., Evans, T., Carr, J., Lane, R., Bright, D., Gillman, A., Oshust, P., Titley, M., 2003. Practical Implicit Geological Modelling. In: Proceedings of the 5th International Mining Geology Conference. No. 8. pp. 89–99. 〈http://hyperfun.org/FHF_Log/Cowan_PIGM03.pdf〉. Cracknell, M. J., Reading, A. M., 2014. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Computers and Geosciences 63, 22-33. 〈http://dx.doi.org/10.1016/j.cageo.2013.10. 008〉. Feng, D., Tierney, L., 2008. Computing and displaying isosurfaces in R. J. Stat. Softw. 28, 1. Flach, P., 2012. Machine Learning: The Art and Science of Algorithms that Make Sense of Data 1st ed.. Cambridge University Press, Cambridge. Fouedjio, F., 2016. Second-order non-stationary modeling approaches for univariate geostatistical data. Stoch. Environ. Res. Risk Assess., 1–20. Goovaerts, P., 1997. Geoestatistics for Natural Resources Evaluation. Oxford University Press, New York, New York, USA.

182