Valid approximation of spatially distributed grain size distributions – A priori information encoded to a feedforward network

Valid approximation of spatially distributed grain size distributions – A priori information encoded to a feedforward network

Computers and Geosciences 113 (2018) 23–32 Contents lists available at ScienceDirect Computers and Geosciences journal homepage: www.elsevier.com/lo...

2MB Sizes 0 Downloads 40 Views

Computers and Geosciences 113 (2018) 23–32

Contents lists available at ScienceDirect

Computers and Geosciences journal homepage: www.elsevier.com/locate/cageo

Research paper

Valid approximation of spatially distributed grain size distributions – A priori information encoded to a feedforward network T. Berthold a, *, P. Milbradt b, V. Berkhahn a a b

Institute for Risk and Reliability, Leibniz University of Hanover, Callinstr. 34, 30167 Hanover, Germany Smile Consult GmbH, Vahrenwalder Str. 4, 30165 Hanover, Germany

A R T I C L E I N F O

A B S T R A C T

Keywords: A priori knowledge Distribution fitting Feedforward neural network Grain size distribution Spatial interpolation Weight constraints

This paper presents a model for the approximation of multiple, spatially distributed grain size distributions based on a feedforward neural network. Since a classical feedforward network does not guarantee to produce valid cumulative distribution functions, a priori information is incor porated into the model by applying weight and architecture constraints. The model is derived in two steps. First, a model is presented that is able to produce a valid distribution function for a single sediment sample. Although initially developed for sediment samples, the model is not limited in its application; it can also be used to approximate any other multimodal continuous distribution function. In the second part, the network is extended in order to capture the spatial variation of the sediment samples that have been obtained from 48 locations in the investigation area. Results show that the model provides an adequate approximation of grain size distributions, satisfying the requirements of a cumulative distribution function.

1. Introduction

lim FðϕÞ ¼ 1

ϕ→þ∞

Distribution models are a standard tool in statistics and are used troughout many fields. In sedimentology, the constitution of the sediment at the seafloor can be described in terms of a grain size distribution. Therefore the sediment sample is analyzed e.g. by sieving, in order to determine the relative amount of grains in discrete size ranges. The size of the grains is usually represented in terms of the ϕ-scale, which was introduced by Krumbein (1938). The cumulative grain size distribution specifies, for each observed grain size ϕi , the relative frequency f rc ðϕi Þ of grains in the sample, with diameters larger than ϕi . The discrete grain size data of a single sediment sample can be approximated by a continuous distribution model, which can be expressed in terms of a cumulative distribution function (CDF) FðϕÞ or its derivative, the probability density function (PDF) f ðϕÞ. The properties of a continuous CDF are given as: Fðϕ1 Þ  Fðϕ2 Þ for all ϕ1 < ϕ2 lim FðϕÞ ¼ 0

ϕ→∞

(1) (2)

The resultant properties of a PDF are defined as: f ðϕÞ  0 for all ϕ 2 ℝ lim f ðϕÞ ¼ 0

ϕ→∞

∫ ℝ f ðϕÞ ¼ 1

0098-3004/© 2018 Elsevier Ltd. All rights reserved.

(4) (5) (6)

Among general distribution models, such as the log-normal or the Weibull distribution, some grain size specific distribution models have been developed, for example, the log-hyperbolic function proposed by Barndorff-Nielsen (1977) or the models proposed by Fredlund et al. (2000) for uni- and bimodal distributions. For a given set of sediment samples from different locations, one might be interested in describing the grain size distributions at all the sample locations and inbetween, which can be entitled as spatial interpolation of grain size distributions. There are several interpolation and

* Corresponding author. E-mail addresses: [email protected] (T. Berthold), [email protected] (P. Milbradt), [email protected] (V. Berkhahn).

https://doi.org/10.1016/j.cageo.2018.01.007 Received 6 May 2016; Received in revised form 1 December 2017; Accepted 15 January 2018

(3)

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

approximation1 methods to tackle this problem, like compositional kriging, kriging of transformed compositional data using the additive logratio transform (Walvoort and Gruijter, 2001; Tolosana-Delgado, 2006; Tolosana-Delgado et al., 2011) or interpolating the grain-size distributions by constrained cubic splines (Kruger, 2003; Weltje and Roberson, 2012) and then applying any spatial interpolation method (e.g. linear interpolation on a Delaunay-Triangulation, nearest neighbour interpolation, kriging, etc.) to it. Besides these well-established methods and models, soft-computing approaches, including neural networks, have also been successfully applied in various studies in the field of geoscience. These include for example the modeling of the bulk density (Shiri et al., 2017b) and the soil water capacity (Shiri et al., 2017a) on the basis of other soil properties, estimating the mud content on the basis of bathymetric parameters and absolute positions of the samples (Li et al., 2011), etc. Other researchers used neural networks for the spatial approximation of different soil properties by dependency of their absolute position ðx; yÞ (Christopher et al., 2001; Oda et al., 2012, 2013). The term neural networks (or more precisely artificial neural networks (ANN)) is related to the field of machine learning and comprises models and methods that were inspired by biological neural networks and the processes of learning. ANN is a data-based approach, where the parameters of the model are learned from the data instead of selecting the parameters manually. This might be an advantage, when less information is available on the data. In this paper we propose an ANN based model for the spatial approximation of grain-size data. While in the above stated examples the models are applied to unconstrained variables, there are underlying constraints for the given grain-size data, as already stated. A classical feedforward network is able to approximate the data well, but generally does not guarantee to produce valid approximations of the distribution functions. The challenge is to develop an appropriate ANN model, which tackles this problem. This will be done in two steps: in the first part of the paper (section 3) a feedforward model is derived based on (Berthold, 2014), which incorporates the given constraints and therefore produces valid approximations for any multimodal grain size distribution for a single sediment sample. In other words this is a function f ðϕÞ, which approximates the relative cumulative frequency f rc dependent on the grain size ϕ. In the second part of the paper (section 4), this model is extended by the two dimensional position ðx; yÞ of the samples in order to give an approximation of spatially distributed grain-size data. This model represents the function: ~f rc : ℝ3 →ℝ; ðϕ; x; yÞ7!f rc

satisfy the properties of a distribution model (model validity) given by equations (1)–(3). On the other hand, the model has to capture the data in all its complexity (model accuracy). The basic network architecture and the notation is given in section 2.2. 2.1. Data The available grain size data, which was provided by the Federal Maritime and Hydrographic Agency (BSH, Germany), was obtained from 48 sediment samples, collected north of the german island of Norderney in 1963, each sample at a different location ðx; yÞ within this domain. The sediment was collected by a grab sampler. A grain size analysis was performed for the topmost (approximately 2 cm) part of the samples resulting in a set of 48 discrete grain size distributions. Fig. 1 gives an overview of the data. The spatial distribution of the samples covers an area of approximately 10:3 km from east to west and 5:5 km from north to south. The number of grain size fractions varies between 21 and 24 per sample. The sample locations are distributed almost regularly (Fig. 1a), whereas the characteristics (mean, sorting, skewness, uni- or multimodality) of the grain size distributions differ significantly. Fig. 1d illustrates this fact by plotting the grain size data of 5 adjacent samples 1; …; 5, which are marked in Fig. 1a. None of the grain size distributions is similar to an adjacent one. Samples 1 and 4 consists of finer material, whereas sample 3 is coarser. Samples 2 and 5 are somewhere in between, in which the grain size of sample 5 has a multimodal distribution. The mean2 (Fig. 1e) and the sorting coefficient3 (Fig. 1f) indicate that this spatial variation of the sediment charateristics is also given at the other locations: samples with similar grain size distributions are not always located in the same region, but are scattered across almost the whole domain. Fig. 2 reveals the complexity of the data in more detail by illustrating the cumulative frequencies f rc at each measurement location ðx; yÞ for different fixed values of ϕ. Obviously, this ”landscape” changes by varying ϕ. This landscape is a plane for small and high values (Fig. 2a and d), however it is uneven for values inbetween (Fig. 2b and c). These characteristics have to be reproduced by the model. 2.2. Feedforward network and notation Basically, the idea of ANN is to learn a mapping ℝn →ℝm from a ndimensional input space to a m-dimensional output space. Usually, the mapping is only known at discrete points represented by the samples that have been observed, and one is interested in an approximation of the underlying mapping in a generalized way. The observed data are used in this context as training patterns. In order to use ANN as a prediction model, one has to perform three phases: first, the network structure has to be defined and the parameters of the network have to be initialized, second, the network parameters are being adjusted in the training process, and third, the network can be applied to unknown data in the prediction phase. The network is assembled by neurons, single units that compute an output based on a given input. The neurons are connected with each other in terms of nodes in a graph. The connections are sometimes called synapses (inspired by the biological background) and are weighted ones. The input value net N of a neuron N is then determined by the propagation function, which is the weighted sum of the output values (activations) out Ni of its predecessors Ni :

(7)

For both subgoals an extensive inspection of the network structure (also called the network topology) is required. The proposed model is dervied by incorporating the given constraints from a network topology point of view. We also consider the important aspect of the representability of the ANN, i.e. the ability of the network to represent a specific function. Within sections 3 and 4, we will evaluate the presented approach by applying both models to real world data. Here, we focus on the correctness of the models by the assessment in terms of the validity and representability. The data as well as a brief summary of ANN, its structure and notation will be introduced in the following section. 2. Model requirements The goal of this paper is to find a suitable model that fits the available grain size data well. Hence, the data, which is described in section 2.1, defines the requirements of the model. On the one hand the model has to

net N ¼

1 The terminology for interpolation and approximation is not always clear. Some authors use the terms exact and not exact interpolation, whereas others refer to exact interpolation just by interpolation, while the more general case (not exact interpolation, which means that the values do not have to be hit exactly) is called approximation. We will use the term approximation in the following.

2 The mean was determined by the formula M9 ¼ 19 ðϕ10 þ ϕ20 þ … þ ϕ90 Þ from (McManus, 1988), with ϕi is the ith percentile, which is determined from the linearly interpolated grain size distribution. ϕ5 3 16 The sorting coefficient was determined by the formula σ 1 ¼ ϕ84 ϕ þ ϕ956:6 from 4 McManus (1988).

X

out Ni ⋅wNi ;N

(8)

i

24

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

Fig. 1. Location and shape of the available data observed north of the german island of Norderney. The data consist of grain size distributions of 48 sediment samples.

Fig. 2. Varying “landscape” of f rc depending on the grain size ϕ. This Fig. quantifies the percentage of the grains with diameters larger than ϕ for each sample.

to the output value out N of the neuron (aN ¼ out N ).4 We use a feedforward network as depicted in Fig. 3, in order to approximate the f rc depending on the grain size ϕ. A feedforward

Here, wNi ;N is the weight of the synapse that connects the neuron Ni with N. The weights significantly influence the output values of the neurons and usually are the parameters that are adjusted during the training phase. The activation value aN ¼ faN ðnet N Þ of N is then calculated by applying a (usually nonlinear) function, which is known as the activation function faN . In general the resulting activation value corresponds directly

4 In section 3 an additional output transformation function will be introduced, so that the output value differs from the activation value.

25

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

Fig. 3. Schematic diagram of the feedforward network. Fig. 4. Proposed 1-n-1-c model with constrained weights (red) that satisfies the given requirements for a valid approximation of a single grain size distribution. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

network corresponds to a graph without cycles and the neurons are typically arranged in layers. In such a network the information is propagated from the input layer to the output layer in a directed way. The structure of the network will be denoted to as n-k-l-m-topology according to the number of neurons of each layer. The flexibility of the ANN is the ability of the network to represent a particular mapping and is significantly influenced by its topology. The additional bias neuron Nb improves this flexibility. The bias neuron is a special neuron, that has no predecessors and constantly sends the output value one (out Nb  1). This enables each activation function faNi of the connected neurons Ni to be shifted from its origin. The amount of the offset depends on the weight wNb ;Ni of the connecting synapse. The bias neuron is connected to all the neurons except the input neurons. For the sake of clarity, multiple bias neurons are plotted in Fig. 3. Since that has no effect on the network topology, the bias neuron is not counted in the notation for the network topology. The input neurons use the identity function as activation function that

3.2. Proposed model The proposed approach incorporates the properties of a CDF by a combination of architecture and weight constraints. We choose to approximate the CDF instead of the PDF, because the activation functions typically used in a neural network are sigmoidal. The shape of the sigmoidal activation functions is naturally very similar to the general shape of a CDF. Especially, the logistic function, which is often used, already meets the requirements of a CDF. Furthermore, from the authors point of view, it is more difficult to incorporate the unit sum constraint of a PDF than the constraints of a CDF. Since the grain size distribution is a mapping F : ℝ→ℝ, we have to choose a 1- … -1-topology. When all activation functions are continuous, the whole network produces a continuous function. As will be shown in this section a feedforward network with 1-n-1topology as depicted in Fig. 4a satisfies equations (1)–(3), if the following is true:

Nx

is (fa i ðnetÞ ¼ net ϕ ), all remaining neurons use the logistic function as their activation function faN ðnet N Þ ¼ 1=ð1 þ expnet Þ. N

 a continuous and strictly monotonic increasing transformation function T is used to extend the range of the activation value of the output neuron No , such that T No ð0Þ < 0 and T No ð1Þ > 1.  the weight of any synapse that is a direct or indirect successor of the input neuron must be strictly positive, i. e. wNϕ ;N1i > 0 and wN1i ;No > 0  the weights in the second tier are additionally constrained more strictly:

3. A valid model for a single grain size distribution In this section, which is based on Berthold (2014), an adequate feedforward network will be derived serving as an approximation model for grain size data of a single sediment sample. 3.1. Literature review for constrained ANN

  wNb ;No ¼ faNo T No ð0Þ ¼ c1

Xiongfeng et al. (1999) used a classical feedforward network to approximate an empirical CDF. Since the empirical CDF tends to the original CDF by inreasing the number of observations, the authors stated that the ANN-model gives an adequate approximation of the original CDF as well. This approach however, does not produce a valid CDF in general, especially not for a small number of observations. A solution to this problem is to consider a priori knowledge, which are the properties of the distribution model, in the ANN-model. Some approaches that incorporate a priori knowledge into feedforward networks have previously been presented. Joerding and Li (1994) stated that in some cases it is sufficient to choose an adequate network architecture (architecture constraint), in other cases, constraints for the interconnected weights can be defined to include the a priori knowledge (weight constraint). Both methods can also be combined. They showed that a monotonic increasing and concave function can be represented by a three-layer feedforward network with positive weights. Chen and Chen (2001) applied the weight constraint method and two new methods, the exponential weight and the adaptive method, to data from a true boiling point curve of crude oil in order to enforce a monotonic function. Wang (1994) developed a model based on an ANN that approximates unimodal distribution functions. The model approximates the related CDF and is based on monoticity and concavity constraints in order to guarantee valid outputs.

n X i¼1

  wN1i ;No ¼ faNo T No ð1Þ  wNb ;No ¼ c2

(9)

(10)

The constrained model, including the assumptions, are summarized in Fig. 4a and will be referred to as 1-n-1-c model (c as abbreviation for constrained). The minimal network topology, we can choose, is the 1-1-topology. Such a network exactly reproduces the activation function of the output neuron, which is the logistic function in our case. Hence, the 1-1-model produces a valid CDF, the quality of approximation however, is only acceptable for an appropriate (simple) set of data. The flexibility of the network5 increases by introducing a hidden layer. The number of neurons in the hidden layer influences the flexibility of the mapping: roughly speaking, each neuron in the hidden layer represents an inflection point in the overall mapping. By introducing the hidden layer, the problem occurs that the ANN is not able to produce the output values 0 and 1 respectively, since the weights in the last layer would have to be infinite values, which is not possible. The problem can

5

26

which means the flexibility of the mapping represented by the network.

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

Fig. 5. Illustrative results produced by the proposed ANN (1-n-1-c model).

be fixed by introducing a transformation function T No that is applied to the activity aNo of the output neuron No so that out No ¼ T No ðaNo Þ. The transformation function has to be monotonic and has to scale the output values, such that T No ð0Þ < 0 and T No ð1Þ > 1. Such a network is flexible enough to represent the data, however it does not guarantee to produce a valid distribution function any more and appropriate constraints have to be defined.



i

X

1. Initialize the weights randomly so that the constraints are satisfied 2. While the stop criterion is not fulfilled (a) choose a training pattern randomly (b) perform the propagation (c) adapt the weights conforming to the learning rule (d) if any weight violates its constraints, adapt the weight again7: N ;N

ϕ 1i  wnew ¼ maxðwNϕ ;N1i ; εÞ, where ε > 0 is a small constant Nb ;No  wnew ¼ c1 1i ;No  wNnew will be updated by the projection algorithm proposed by Chen and Ye (2011), which maps the weights onto the n-dimen-sional simplex with verticesðc2 ; 0; …; 0Þ; ð0; c2 ; …; 0Þ; …; ð0; 0; …; c2 Þ

!!

wN1i ;No ⋅outN1i ðϕÞ þ wNb ;No

(11)

i

3.5. Validation and example

Consider the case ϕ→  ∞ given in equation (2). Assume that the weights are constrained as stated above (monotoniciy) and, even more strictly, let wNϕ ;N1i > 0. Applying these assumptions, the output values out N1i of the neurons in the hidden layer vanish and equation (11) leads to   ! 0 ¼ lim outNo ðϕÞ ¼ T No faNo ð0 þ wNb ;No Þ

(12)

  ⇒wNb ;No ¼ faNo T No ð0Þ ¼ c1

(13)

ϕ→∞

Various training processes were executed on the sample data, using the modified learning rule and different network topologies. We used all the data for training, since the focus of this research is on the feasibility to produce valid approximations using an ANN rather than investigating its generalization ability. Illustrative results are presented in Fig. 5a. The quality of approximation differs according to the topology. A network using the 1-1-1-topology gives a rough approximation of the data, whereas networks using more than one neuron in the hidden layer are increasingly accurate (see Fig. 5a). Irrespective of the accuracy, all the models produce valid approximations of the data, satisfying equations (1)–(3). This is confirmed by the following observations (also compare to the supplementary material of this paper given in Berthold (2017)):

where T No , the inverse of the transformation function, and faNo , the inverse of the activation function, both exist, since T No and faNo are strictly monotonic increasing. Considering the upper limit (ϕ→∞) given in equation (3), using the same assumptions as stated directly above, equation (11) leads to !

No

1 ¼ lim out ðϕÞ ¼ T ϕ→þ∞

(15)

During the training process, the weights of the model are adapted iteratively by the learning rule. In order to enforce the constraints that have been derived in the previous sections, the backpropagation rule has to be modified6 slightly, as follows:

In order to meet the properties of a CDF, the weight constraints are derived in two steps, the weight constraints regarding monotonicity (equation (1)) and the extreme values (equations (2) and (3)).Monotonicity. Monotonicity is established, if all weights are positive (wNϕ ;N1i ; wN1i ;No  0), except those which are connected from the bias neurons, which can have arbitrary values (wNb ;N1i ; wNb ;No 2 ℝ). Similar results were also already stated and discussed by other authors, like (Joerding and Li, 1994) and (Daniels and Velikova, 2010).Extreme values. The output value of the model with respect to the input value ϕ is given as

¼ T No faNo

  wN1i ;No ¼ faNo T No ð1Þ  wNb ;No ¼ c2

3.4. Enforcing the constraints during the training process

3.3. Introducing weight constraints

  out No ðϕÞ ¼ T No faNo ðnetNo ðϕÞÞ

X

No

faNo

X

!! w

N1i ;No

⋅1 þ w

Nb ;No

6 The following pseudo-code presents the standard algorithm. Modifications to it are printed in boldface. 7 η denotes the learning rate, a parameter that is typically chosen from the interval ð0; 1.

(14)

i

27

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

4.1. Example In this example, the data is reduced, in order to examine the model's ability of capturing the spatial variation of the data. Consider the case that the training data provides only one grain size fraction (in this example ϕ ¼ 2:75) for each sample, so that the input value of the input neuron Nϕ is constant over all training patterns. This way, the mapping is reduced to ℝ2 →ℝ and the model only has to capture the spatial variation. Multiple training processes were carried out in order to compare the 3-n-1-c model to an equivalent unconstrained model (3-n-1-u model). For both models, the number of hidden neurons was varied (3, 7, 10, 15, 30). For each network topology 10 instances with different initial weights were trained, each training process was stopped after 108 iterations. The approximation accuracy of both models is compared in Fig. 8. The overall accuracy of the models was evaluated by the root-mean-square error (RMSE) as a global performance measure, which is defined in (16):

Fig. 6. 3-n-1-c model for spatial approximation with additional input neurons Nx and Ny resulting in a constrained 3-n-1-topology. Constrained weights are marked red. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

 The predicted output values of the models in the case of the lower and upper limit (equations (2) and (3)) are in the interval ½0; 1 and the predicted values are monotonic increasing.  the defined weight constraints are enforced constantly during the training process (Fig. 5b):

RMSE ¼

i¼1

i

i

(16)

n

Here, ~f denotes the estimated value, f rc the observed value and n is the number of observations. The RMSE was determined at 50 evenly distributed iteration steps on the logarithmic scale during the whole training process for each instance. At each of those iteration steps, the mean RMSE (of 10 instances) was calculated for each network topology. The lowest mean RMSE is depicted in Fig. 8a for the different network topologies. The plot illustrates two aspects: the error of both models decreases by increasing the number of neurons in the hidden layer, whereas the accuracy of the constrained model does not improve significantly in contrast to the unconstrained model irrespective of the number of hidden neurons. The scatterplots of two instances using 3-151-topology are depicted in Fig. 8b. Obviously, the 3-n-1-c model is not able to capture the spatial variation of the data due to the constraints of the second tier. rc

 wNb ;No ¼ c1 

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  uP  rc u n ~f  f rc 2 t

X wN1i ;No ¼ c2 i

 wN1i ;No > 0 4. Extending the model to spatial approximation The 1-n-1-c model will be extended in order to include the spatial variation of the sediment samples. By introducing two additional input neurons Nx and Ny the x- and y-coordinate of the sample location are incorporated respectively. This leads to the 3-n-1-c model as depicted in Fig. 6. The additional neurons are connected to each neuron of the hidden layer. These connections allow a variation of the CDF depending on the location. Since they are not connected to Nϕ , the requirements (equations (1)–(3)) are not violated and the weights of these synapses do not have to be constrained. Illustrative results produced by this model are depicted in Fig. 7. Again, we used all the available data for training, as we did in section 3 and will do for the other training processes described in this section. A valid CDF is produced at each location, whereas the accuracy of the approximation is not satisfactory. This is due to the weight constraints in the second tier, as the following example reveals:

4.2. Interpretation of the second tier constraints A neuron N and its predecessor weights act as a hyperplane that linearly separates its input space into an active (out N > 0:5) and an inactive (out N  0:5) region. The input space is set up by the output pre

values of the predecessor neurons Nipre . The weights wNi ;N directly influence the input value of N as defined in equation (8) and therefore determine the position and orientation of the hyperplane. pre

The orientation is determined solely by the weights wNi ;N . For the case of the heaviside function, the transition from the inactive to the active region is abrupt. Since we use the logistic function, the transition is

Fig. 7. Illustrative results of the 3-n-1-c model approximating the grain size distributions of the marked samples (Fig. 1a and d). Here, a 3-10-1-topology was used. The approximation quality suffers from the defined constraints. 28

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

Fig. 8. Comparison of the constrained (red) and the unconstrained (blue) model for the simplified training data case as described in the example in section 4.1. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) pre

a smooth one, with the weights wNi ;N also affecting the sharpness of this transition. The weight wNb ;N , in relation to the amount of the weights

4.3. Deriving an adequate network topology The observations from the previous sections can be summarized as follows:

pre

wNi ;N , defines an offset to which the hyperplane is shifted from the origin. In the case of wNb ;N > 0, the hyperplane is moved in the direction of the inactive region, whereas it is moved in the direction of the active region, if

 The model has to be extended to 3 input neurons in order to represent the spatial variability (x; y).  The 1-n-1-topology including the defined constraints are essential in order to produce valid distribution functions.  However, these constraints generally prohibit a good approximation of the spatial variation of the data.  The approximation of the spatial variation can be achieved by an unconstrained model with 2 tiers of synapses.

wNb ;N < 0: Transferred to the case of the 3-n-1-c model above, the contraints in the second tier require the hyperplane (of the output neuron) to go through the point pm ¼ ð0:5; …; 0:5Þ 2 ℝn of the n-dimensional input space. Furthermore the orientation of the hyperplane is constrained; the transition from the inactive to the active region must be in the positive coordinate direction (Fig. 9). Thus, the variety of the possible combinations that the output neuron can accomplish, is reduced. It is not able, for example, to perform the combinations out N1 _ out N2 or out N1 ^ out N2 .

In order to take the advantages from both models and combine them (3-layer topology with constraints for proper distribution functions as well as an unconstrained 3-layer topology for spatial flexibility), the 3-n1-c model will be extended to the 3-m-n-1-c model by introducing an additional layer, called the spatial layer, as depicted in Fig. 10.

Fig. 10. Structure of the proposed 3-m-n-1-c model for the approximation of spatially distributed grain size distributions. The model extends the 3-n-1-c model by adding the spatial layer in order to capture the spatial variation. Constrained weights are marked red. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Fig. 9. Possible positions and orientations of the hyperplane associated with the output neuron due to the constraints in the second tier (for n ¼ 2 neurons in the hidden layer). 29

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

Fig. 11. Comparison the constrained and unconstrained model for approximation of spatially distributed grain size distributions.

4.4. Evaluation of the model

The additional input neurons Nx and Ny are connected to each neuron of the spatial layer. The spatial layer is then connected to the existing hidden layer, further referred to as the CDF layer. None of the new connections are constrained in order to adopt the flexibility, which is essential for the representation of the spatial variation. By this means, we ensure the valid approximation of a distribution function and at the same time enable the model to capture the spatial variation. Some results are depicted in Fig. 11a, which confirm the improvement of the model clearly.

Until now, only qualitative results of the model have been considered. It is to be expected that the approximation accuracy of the model suffers from the constraints. In order to investigate this assumption, the 3-m-n-1c model will be compared to an appropriate unconstrained model (3-m-n1-u model), which is a standard feedforward model as depicted in Fig. 3. For that purpose multiple training processes were carried out in the same manner as in section 4.1: for each model, the topology was varied (3-5-51, 3-10-5-1, 3-15-5-1, 3-5-10-1, 3-10-10-1, 3-15-10-1) and for each

Fig. 12. Comparison of the performance of the 3-m-n-1-c and the 3-m-n-1-u for different network topologies (left: 3-m-5-1-topology, right: 3-m-10-1-topology). 30

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32

Fig. 13. Comparison of the proposed constrained 3-m-n-1-c model (red) and the unconstrained straight forward 3-m-n-1-u model (blue). (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

grain size data, Christoph Wischmeier and Kelvin Tuck from the Institute for Risk and Reliability for very fruitful discussions and providing language help respectively. Furthermore, the authors like to thank three anonymous reviewers for helping to improve the quality of the paper by their comments.

topology 10 instances were trained using different initial weights. The mean RMSE was determined at 50 iteration steps and the minimum is plotted in Fig. 12 for each network topology. The most accurate topology of the constrained and unconstrained model are compared in Fig. 13. The developing of the mean RMSE during the training of both models are very similar, whereas the unconstrained model is slightly more accurate at the end of the training process (Fig. 13a). The accuracy measured by the R2 is 0.9992 for the unconstrained model and 0.9976 for the constrained one, which means that the performance of the unconstrained model is only slightly better. This is also confirmed by the scatterplots given in Fig. 13b and c) as well as the exemplary approximation results (Fig. 11). But the constrained model produces good results as well and has the advantage that it guarantees to produce valid distribution functions at each location. The supplementary material of this paper given in Berthold (2017) includes the discretized approximations of the marked samples produced by the exemplary 3-15-5-1-c model.

References Barndorff-Nielsen, O., 1977. Exponentially decreasing distributions for the logarithm of particle size. In: Proceedings of the Royal Society of London. a. Mathematical and Physical Sciences, vol. 353, pp. 401–419. https://doi.org/10.1098/rspa.1977.0041. Berthold, T., 2014. Trainieren von Feedforward-Netzen mit Nebenbedingungen am Beispiel einer Korngr€ oßenverteilung. In: Kreger, M., Irmler, R. (Eds.), Proceedings of 26. Forum Bauinformatik (Pp. 25–34). Shaker Verlag, Darmstadt. Berthold, T., 2017. Exemplary results of producing valid approximations of grain-size data using constrained ANN, Mendeley Data, v1. URL. https://doi.org/10.17632/ cwpjkxcwcs.2. Chen, C.-w., Chen, D.-z., 2001. Prior-knowledge-based feedforward network simulation of true boiling point curve of crude oil. Comput. Chem. 25, 541–550. https://doi.org/ 10.1016/S0097-8485(00)00116-9. Chen, Y., Ye, X., 2011. Projection onto a simplex,. (pp. 1–7). URL. http://arxiv.org/abs/ 1101.6081.arXiv:1101.6081. Christopher, R. a., Jiang, T., Juang, C.H., 2001. Three-dimensional site characterisation: neural network approach. Geotechnique 51, 799–809. https://doi.org/10.1680/ geot.2001.51.9.799. Daniels, H., Velikova, M., 2010. Monotone and partially monotone neural networks. IEEE Trans. Neural Network. 21, 906–917. https://doi.org/10.1109/TNN.2010.2044803. Fredlund, M.D., Fredlund, D.G., Wilson, G.W., 2000. An equation to represent grain-size distribution. Can. Geotech. J. 37, 817–827. Joerding, W., Li, Y., 1994. Global estimation of feedforward networks with a priori constraints. Comput. Econ. 7, 73–87. https://doi.org/10.1007/BF01299568. URL. https://doi.org/10.1007/BF01299568. Kruger, C.J.C., 2003. Constrained cubic spline interpolation for chemical engineering applications. URL. http://www.korf.co.uk/spline.pdf. Krumbein, W.C., 1938. Size frequency distributions of sediments and the normal phi curve. J. Sediment. Petrol. 8, 84–90. Li, J., Heap, A.D., Potter, A., Daniell, J.J., 2011. Application of machine learning methods to spatial interpolation of environmental variables. Environ. Model. Software 26, 1647–1659. https://doi.org/10.1016/j.envsoft.2011.07.004. McManus, J., 1988. Grain size determination and interpretation. In: Tucker, M. (Ed.), Techniques in Sedimentology Chapter 3. (Pp. 63–85). Wiley-Blackwell. Oda, K., Kitamura, S., Lee, M., 2012. Applicability of artificial neural network to spatial interpolation of soil properties in kansai international airport. In: Proceedings of the Twenty-second (2012) International Offshore and Polar Engineering Conference (Pp. 583–586). Oda, K., Lee, M., Kitamura, S., 2013. Spatial Interpolation of consolidation properties of Holocene clays at Kobe Airport using an artificial neural network. Int. J. Geom. 4, 423–428. Shiri, J., Keshavarzi, A., Kisi, O., Karimi, S., 2017a. Using soil easily measured parameters for estimating soil water capacity: soft computing approaches. Comput. Electron. Agric. 141, 327–339. https://doi.org/10.1016/j.compag.2017.08.012. URL. https:// doi.org/10.1016/j.compag.2017.08.012. Shiri, J., Keshavarzi, A., Kisi, O., Karimi, S., Iturraran-Viveros, U., 2017b. Modeling soil bulk density through a complete data scanning procedure: heuristic alternatives. J. Hydrol. 549, 592–602. https://doi.org/10.1016/j.jhydrol.2017.04.035. URL. https://doi.org/10.1016/j.jhydrol.2017.04.035.

5. Conclusions In this paper we proposed a new model for the valid representation of grain size distributions. The model is based on a feedforward network using a special network topology (architecture constraint) and restrictions for several weights (weight constraints). Thus we can guarantee that the model satisfies the requirements of a distribution function. The constraints are enforced during the training process by a modified backpropagation learning rule. The 1-n-1-c model from section 3 can be used for distribution fitting purposes, whenever a single distribution function is of interest. As other researchers have already noted, the advantage of a neural network approach is that no knowledge of the underlying data must be available in order to choose an adequate distribution function model. The 1-n-1-c was then extended to the 3-m-n-1-c model in section 4.3 and is able to approximate spatially distributed grain size data. Due to the structure of the network, it reproduces valid distribution functions for any location ðx; yÞ. Within this paper, we concentrated on the ability of the model to produce valid results. The ability of generalization has not been investigated and has to be considered in future work. Acknowledgements The authors thank Jennifer Valerius and Manfred Zeiler from the Federal Maritime and Hydrographic Agency (BSH) for providing the

31

T. Berthold et al.

Computers and Geosciences 113 (2018) 23–32 Wang, S., 1994. A neural network method of density estimation for univariate unimodal data. Neural Comput. Appl. 167, 160–167. Weltje, G.J., Roberson, S., 2012. Numerical methods for integrating particle-size frequency distributions. Comput. Geosci. 44, 156–167. https://doi.org/10.1016/ j.cageo.2011.09.020. URL. https://doi.org/10.1016/j.cageo.2011.09.020. Xiongfeng, F., Xianhui, Y., Yongmao, X., 1999. A new method for density estimation by using forward neural network. IJCNN’99. International Joint Conference on Neural Networks. In: Proceedings (Cat. No.99CH36339), vol. 2, pp. 1461–1464. https:// doi.org/10.1109/IJCNN.1999.831181.

Tolosana-Delgado, R., 2006. Geostatistics for Constrained Variables: Positive Data, Compositions and Probabilities. Applications to Environmental Hazard Monitoring. Ph.D. thesis. Universitat de Girona. URL. http://www.tdx.cat/handle/10803/7903. Tolosana-Delgado, R., van den Boogaart, K.G., Pawlowsky-Glahn, V., 2011. Geostatistics for compositions. In: Pawlowsky-Glahn, V., Buccianti, A. (Eds.), Compositional Data Analysis: Theory and Applications Chapter 6. (Pp. 73–86). https://doi.org/10.1002/ 9781119976462.ch6. Walvoort, D., Gruijter, J.D., 2001. Compositional kriging : a spatial interpolation method for compositional data. Math. Geol. 33, 951–966. https://doi.org/10.1023/A: 1012250107121. URL. http://link.springer.com/article/10.1023/A: 1012250107121.

32