Catena 174 (2019) 425–435
Contents lists available at ScienceDirect
Catena journal homepage: www.elsevier.com/locate/catena
Combining spatial autocorrelation with machine learning increases prediction accuracy of soil heavy metals
T
A.P. Sergeeva, A.G. Buevicha,b, , E.M. Baglaevaa, A.V. Shichkina ⁎
a b
Institute of Industrial Ecology UB RAS, Kovalevskaya str., 20, Ekaterinburg 620990, Russia Ural Federal University, Mira str., 32, Ekaterinburg 620002, Russia
ARTICLE INFO
ABSTRACT
Keywords: Topsoil Artificial neural networks Hybrid modelling Residual Kriging GRNNRK MLPRK
A hybrid approach was proposed to simulate the spatial distribution of a number of heavy metals in the surface layer of the soil. The idea of the method is to simulate a nonlinear large-scale trend using an artificial neural network (ANN) and the subsequent modelling of the residuals by geostatistical methods. A comparison was made with the basic modelling methods based on ANN: generalised regression neural network (GRNN) and multilayer perceptron (MLP). The raw data for the surface layer modelling of Cuprum (Cu), Manganese (Mn) and Niccolum (Ni) were obtained as a result of the soil screening in the subarctic city Novy Urengoy, Russia. The ANN structures were selected by the computer simulation based on the root mean square error (RMSE) minimization. The predictive accuracy of each selected approach was verified by the correlation coefficient, the coefficient of determination, RMSE, Willmott's index of agreement (d), a ratio of performance to interquartile distance (RPIQ) between the prediction and raw data from the test data set. It was confirmed that the use of the hybrid approach provides an increase in prediction accuracy in comparison with the basic ANN models. The proposed hybrid approach for each element showed the best predictive accuracy in comparison with other models based on ANN.
1. Introduction Urban soils are a part of the landscape that humans have altered. Industrial activity in recent decades has affected the types of soil background (Saet et al., 1990) and soil-forming factors, such as climate, topography, vegetation and geologic material. The soil as a component of the environment is contaminated from multiple sources and, therefore, it can be used for studying the nature and features of environmental pollution. Soil surveys are the main source of the scientific/raw data about different kinds of urban soil, wherein each survey has a unique set of interrelated properties. Elements in the soil are presented at different levels, wherein the natural differences are reflected between the soil types and consequences of soil use and pollution. These processes and factors may cause the spatial heterogeneity and sometimes anomalies of the pollution and contaminants distributions (Zhang et al., 2008; Sergeev et al., 2010; Guo et al., 2012). A significant heterogeneity of the spatial distributions of geochemical spectra has been detected in the preliminary analysis of the empirical data for the various functional and geographic areas. The analysis of the conducted scientific research has shown that the problem of contaminated soils with heavy metals in urban areas has remained unchanged for several reasons: 1) pollution depends of the ⁎
accumulation and transformation of the heavy metals in the soil of a different genesis; 2) the composition of contaminated soils depends on the technological cycle of the nearest specific production; 3) for the effective control of soil contamination with heavy metals, adaptive assessment methods taking into account the regional features of soil formation are necessary; 4) technogenic geochemical anomalies can be imposed on natural anomalies, which makes it difficult to assess the soil state. Nowadays, the well-known technique for soil contamination prediction is a machine learning method of artificial neural networks (ANNs) (Zeissler and Hertwig, 2011; Dai et al., 2014). One of the principal advantages of ANNs is their ability to detect patterns in data that demonstrate a significant unpredictable nonlinearity. Being a datadriven approach, ANNs only depend on the quality of data and the architecture of the model. ANNs can capture the spatial features of the pattern at different scales describing both the linear and nonlinear effects (Bishop, 1995; Schloeder et al., 2001; Liu et al., 2008; Worsham et al., 2010; Heung et al., 2016; Forsythe et al., 2016). Initially, the artificial neural network was created by McCulloch and Pitts (1943) similar to a biological nervous system and as a virtual “central nervous system” for computer modelling. The structure of an ANN consists of a set of interconnected units (‘neurons’) that estimate
Corresponding author at: Institute of Industrial Ecology UB RAS, Kovalevskaya str., 20, Ekaterinburg 620990, Russia E-mail addresses:
[email protected] (A.P. Sergeev),
[email protected] (A.G. Buevich),
[email protected] (E.M. Baglaeva),
[email protected] (A.V. Shichkin).
https://doi.org/10.1016/j.catena.2018.11.037 Received 14 February 2018; Received in revised form 20 November 2018; Accepted 24 November 2018 0341-8162/ © 2018 Elsevier B.V. All rights reserved.
Catena 174 (2019) 425–435
A.P. Sergeev et al.
the non-linear correlations between each input variable. The input neurons represent predicting variables. They are connected to a single or multiple layer(s) of hidden neurons, which are then linked to the output neurons representing the target variable. The construction of the ANN model is the selection of parameters: the number of hidden layers and the number of neurons within each hidden layer. During the ANN training process, the connections between the neurons are established by assigning weights based on an intrinsic learning process where the weights are iteratively adjusted to match the outputs of the training dataset (Graupe, 2007; Behrens et al., 2005). Different ANN models have been frequently used to predict continuous soil variables, such as particle size fractions (Chang and Islam, 2000; McBratney et al., 2003; Priori et al., 2014) and soil erosion rates (Licznar and Nearing, 2003); however, the use of ANN models for predicting discrete soil data still remains limited with some exceptions (Behrens et al., 2005; Silveira et al., 2013). Machine learning methodologies can help to forecast the chemical elements content in complicated non-linear contexts. The predictive accuracy obtained by ANNs is often higher than that of other methods or prediction of experts (Ziaii et al., 2012; Guo et al., 2012). The ANN model might be applied to the measured data obtained in the monitoring, and then can be used to predict the pollution at the unmonitored locations (Kanevski, 1999; Liu et al., 2009; Shaker et al., 2010; Shaker and Ehlinger, 2014). The most frequently used ANN in environmental studies is multilayer perceptron (MLP) (Rosenblatt, 1958) with the Levenberg-Marquardt training algorithm (Shepherd, 1997; Sarle, 1995). Due to its wide propagation, this type of networks is well-developed and shown high performance (Werbos, 1974). Perceptrons are widely used among the research conducted on chemical elements distribution in soil (Dai et al., 2014; Falamaki, 2013; Li et al., 2011), in particular, heavy metals, such as Cr (Sirven et al., 2006; Anagu et al., 2009; Tarasov et al., 2017). Many researchers have explored MLPs for resource estimation (Koike et al., 2002; Samanta et al., 2005) and proved the superiority of the MLP models over the Geostatistics. Generalised regression neural networks (GRNN) – as another major machine learning technique – are also widely used as interpolators and are known as universal function approximators, which can learn to approximate any continuous nonlinear function between sets of inputs and outputs (Mohanty and Majumdar, 1999). GRNN does not require the learning process to use iterative procedures as back propagation networks (MLP, etc.). It approximates a function obtaining estimates from the learning data set directly minimising the estimation error by the enlargement of the set. Training of a GRNN is performed in one pass and, therefore, it is fast. Generally, this type of networks has four layers: input, pattern, summation and output. Despite the obvious advantages of neural networks, they also have significant shortcomings. First, most ANN construction approaches are heuristic and often do not lead to unambiguous solutions. In addition, the problems that arise in the preparation of a training sample are related to the difficulty of finding a sufficient number of learning examples. Neural networks are good at recognising trends; however, they are often unable to identify small fluctuations in the trait (Kanevski et al., 2009; May et al., 2008). A combination of different techniques is capable of neutralising their weaknesses and multiplying their dignities. In particular, integrating geostatistical kriging of ANN residuals may incorporate the spatial autocorrelation of measured values, which could lead to better predictions and lower errors (Kanevski et al., 1995; Demyanov et al., 1998; Demyanov et al., 2001; Kanevski et al., 2004; Kanevski et al., 2009; Lakes et al., 2009; Seo et al., 2015; Song et al., 2017). The aim of the present research was to use hybrid models by combining the techniques of ANN and geostatistics for the spatial forecasting of the element content in topsoil. We examined the results obtained by applying the models to predict the concentration of the
chemical elements (Cu, Ni, Mn) in topsoil at a particular location of the Subarctic Novy Urengoy, Russia using previously obtained data on the chemical element content as input. We compared four models based on ANN and universal kriging. The MLP and GRNN networks were used for the approximation of element content in topsoil. To evaluate the element content and to predict its spatial distribution, we utilise two competing techniques based on the hybridisation of neural network and geostatistical approaches: generalised regression neural networks residual kriging (GRNNRK) and multi-layer perceptron residual kriging (MLPRK). We also compared the results of hybrid models with ones obtained earlier (Tarasov et al., 2017). 2. Materials and methods 2.1. Study area Topsoil samples were collected in Novy Urengoy, in the southern part of the residual area, Yamalo-Nenets Autonomous Okrug, Russia (N66.08°, E76.67°). This region is a subarctic climatic area (Köppen climate classification Dfc) (McKnight and Hess, 2000). The mean winter temperature (October–May) is below zero and the mean annual air temperature varies from −4 to −11 °C. The terrain is flat and covered urban soil with a low content of humus component. The soil has sand as a basis; the grain size is < 1 mm. The area of sampling was approximately 8.5 km2 (see Fig. 1). 2.2. Soil sampling The soil sampling was carried out at the points located at the nodes of a square grid with a step of 250 m. Their actual location was determined when sampling directly on the terrain, based on the need to sampling the soil in undisturbed, natural sites on the study area. The set of 150 samples from upper soil horizons (0.05 m depth) were collected from different parts of the city. Fig. 1 shows the detailed spatial location of sampling points. Each soil sample is a composite of seven cores taken by the stainless steel grabber (with a 0.05 m internal diameter) within a square area of 1 m2 (in vertexes, inside and centre of the marked square). These seven cores were combined and packed in double polyethylene bags. The inside bag was marked with the sample identifier. The mass of each dried sample was approximately 1 kg. 2.3. Chemical analysis Preparation of the soil specimens and subsequent chemical analysis were conducted in compliance with the standard requirements. The chemical laboratory involved with soil sample preparation and analysis passed through the Russian Federal Certification System. The laboratory meets the general requirements for the competence of testing and calibration laboratories ISO/IEC 17025:2005. The soil preparation steps included the specimens drying, sieving using a sieve with 1 mm diameter, homogenising and milling of 20 g of sub-specimens to 0.074 mm diameter grains. The total contents of Cu, Co, Ni and the early mentioned Cr in soil samples were determined by chemical analysis. During the analysis, soil specimens were digested by treatment with concentrated nitric acid and hydrofluoric acid. After mixing and heating up to 95 °C, the solution was digested with concentrated perchloric acid and, after cooling, it was treated with hydrochloric acid with low-level heating for 30 min. After cooling, the solution was diluted up to 50 ml with deionised water, mixed thoroughly and placed in a polyethylene bottle. Furthermore, the solution was analysed with inductively coupled plasma-mass spectrometry (ICPMS); the device was Perkin Elmer ELAN 9000. The detection limit for each element was 0.1 mg/kg. 426
Catena 174 (2019) 425–435
A.P. Sergeev et al.
Fig. 1. The sampling area - Southern Part, Novy Urengoy city, Yamalo-Nenets Autonomous Okrug, Russia.
2.4. Raw data variogram analysis
2.5. Raw data anisotropy analysis
The environmental raw data are the element content u in i-location with coordinate si = {xi, yi}. These measured contents are realisations of the spatial field u(s). The descriptive statistics were calculated by STATISTICA. In geostatistics, contents are estimated in unsampled locations by taking into account the spatial correlation structure of sample values based on the assumption of both mean and covariance stationarity. Kriging is the most famous method used in spatial prediction since it estimates values for any coordinate with no bias and minimum variance (Yfantis et al., 1987; Goovaerts, 1999). Kriging weights are obtained from the kriging equation using a variogram (Matheron, 1963). The unbiased estimation of a semivariogram function Eq. (1) is a half of the mean square difference between the values of the data pairs.
Experimental directional variograms were built to identify data anisotropy. In the geometric anisotropy case, the values of the experimental variogram are calculated depending on the distances between the points in the pair (lag). The lag was selected in accordance with spatial correlation. The variograms constructed in this way depend on all the pairs of points and are independent from the spatial orientation and pairs (omnidirectional variogram) (Oliver and Webster, 2014). The transformation of the space at which the anisotropy in the data is taken into account is determined by Eq. (2).
(h) =
N (h ) i=1
(u (s i)
u (si + h)) 2
2N (h)
,
s´ =
1
0 0 amin amax
cos sin
sin cos
s
(2)
where amin is the length of the minor axis of the ellipse; amax - length of the main axis of the ellipse covering the sampling area; θ is the angle defining the direction of the principal axis of the ellipse. The angle is measured in the West-East direction counter-clockwise. A tool that gives an idea of the behaviour of the spatial structure is the variogram surface. To construct a variogram surface, the vector h is represented as the projection of the lag Δx and Δy on the coordinate axes. The variogram surface is the surface of values calculated on a regular grid in the lag space according to the semivariogram formula (1). The variogram surface allows us to see the anisotropy and determine the priority directions for constructing variograms. The variogram surface has point symmetry with respect to the point (0, 0). In this case, there is a dependence on the spatial orientation of point's pairs on the variogram surface. This may indicate the presence of anisotropy in the data.
(1)
where γ(h) is the estimation value of a semivariogram at the distance interval h; and N(h) is the number of samples pairs at the distance interval h; u(si) and u(si + h) are the values for two points si and si + h separated by the distance h = |h|. Considering only interpolators, which are built on the basis of weighted averages, the kriging is the best interpolator with an unbiased estimate (Kanevski, 1999; Kanevski et al., 2009). In this work, the exponential model was used to construct the experimental semivariogram (Oliver and Webster, 2014). Also for comparison with the described approach, Universal Kriging (UK) was chosen. This is mathematically equivalent to interpolation methods, which is called “Kriging with External Drift” (KED), wherein auxiliary predictors are used directly to produce kriging weights and “Regression Kriging” (RK) which integrates regression and kriging with varying local means which are mainly explained by auxiliary data (Hengl et al., 2004, 2007). In the present study, we only use the spatial coordinates. However, since we do not have information about other auxiliary variables (only the spatial coordinates), UK was used for comparison as the most suitable geostatistical method.
2.6. Data preparation for modelling All the samples were randomly split into training and test data sets. Raw data (150 samples) were divided into two groups randomly for each element. The first one was a training set of 105 samples. The second group was a test set of 45 samples. The training set was applied 427
Catena 174 (2019) 425–435
A.P. Sergeev et al.
for building kriging, training the networks and for interpolating the surface element distribution. The test set was applied only as an independent data set for models validation.
simulation, the SPREAD parameter varied from 0.01 to 0.3 with step 0.01; in total, 300 simulations were performed. The dependence of the SPREAD parameter and the RMSE in the transformed space on the direction was constructed. Minimum values SPREAD and RMSE determinate the optimal direction. The final network used the parameter SPREAD with the lowest error.
2.7. Creation of MLP The structure of the feed-forward MLP with the LevenbergMarquardt training method like in the paper (Sergeev et al., 2013) was determined during computer simulation. The input layer of MLP was the spatial coordinates of sampling points of the training set; the hidden layer consisting of several neurons, and the output layer represented the element content. The selection of the number of neurons in the hidden layer was carried out by the lower total RMSE of prediction of the element content. The number of neurons varied from 2 to 25. Each network was trained 500 times and the best of them was selected. Network learning quality was checked by the correlation coefficient and RMSE between the results of the network predictions and the training data set.
2.9. ANN-based hybrid algorithms Both hybrid methods applied were three-step algorithms combining two different interpolation techniques in one ensemble. The first step implies estimating large-scale nonlinear trends using neural networks (MLP and GRNN). The second step is the analysis of the stationary residuals by ordinary kriging (exponential model), which is able to provide local estimates. Trained ANNs (MLP or GRNN) predicted content in the training data set points. Then the residuals at the same points were computed. Residuals in the neural network were defined as follows:
2.8. Creation of GRNN
r (si) = uANN (si)
u (si),
(3)
where r(si) are the residuals of the data set si, u(si) are the measured values, uANN(si) are the values estimated by the neural network. The resulting residuals are estimated using kriging. Evaluation in ordinary kriging (OK) is constructed as a linear combination of input data:
GRNN based on non-linear regression theory comprised four layers: input layer, pattern layer, summation layer and output layer. The training data set formed 105 input neurons according to 105 sampling points of the training data set. Every training sample represented a mean to a radial basis neuron. There are two important parameters to construct basic GRNN. The first one is a number of input neurons. The second parameter is the smoothing parameter (SPREAD). The neurons of the pattern layer perform a nonlinear transformation of the input vectors. Choosing the SPREAD parameter determines the width of the input area, to which each basis function responds. During the
rOK (s) =
i r (si),
(4)
where rOK(s) is the estimated value at the point x using OK, λi(s) are the optimal weights with the condition Σλi = 1, and r(si) is the residual of the neural network for the point si. The OK was applied in order to predict the research fields residuals. These residuals were input data for
Fig. 2. The hybrid approach framework.
428
Catena 174 (2019) 425–435
A.P. Sergeev et al.
Table 1 Descriptive statistics of the modelled elements. Element
Mn (training 105) Mn (test 45) Mn (all 150) Ni (training 105) Ni (test 45) Ni (all 150) Cu (training 105) Cu (test 45) Cu (all 150) a b
Content, mg/kg
CV
Min
Max
Mean
SD
46.6 40.2 40.2 4.61 6.87 4.60 8.45 8.00 8.00
248 206 248 46.4 70.3 70.3 27.9 26.3 27.9
115 124 118 17.4 16.6 17.2 15.9 15.7 15.5
33.8 41.1 36.2 8.90 11.6 9.7 3.77 4.00 3.80
Skewness
Kurtosis
0.57 0.39 0.50 1.05 3.07 2.0 0.31 0.58 0.40
1.44 0.62 0.60 0.59 11.3 6.2 0.04 0.35 0.08
Word Clarkea, mg/kg
Average content (podzols)b, mg/kg
850
270
40
27
20
12
Median 112 119 112 14.7 13.7 14.5 15.7 15.1 15.5
0.29 0.33 0.31 0.51 0.69 0.57 0.24 0.25 0.24
World Clarke - average content in soils of the world (Vojtkevich et al., 1977). Element content is in podzols (Kabata-Pendias, 2011).
Fig. 3. Variograms in six directions for element content (left); variogram surface for element content (right). 429
Catena 174 (2019) 425–435
A.P. Sergeev et al.
ordinary kriging (OK) and the same OK simulated residuals in the test data set points. The same learned ANNs calculated element content values in the test data set points. The final step was estimation produced as a sum of ANN predictions and ordinary kriging estimates of the residuals. The element content ug(si) was obtained as the sum of ANN evaluation and residuals evaluation by OK:
3. Results 3.1. Descriptive statistics The descriptive statistics of investigated elements are shown in Table 1. From the basic statistics table, it is observed that the element attributes are erratic and positively skewed. The mean contents of the total elements are under the contents in the Ural Region (Ural Clarke) and in the world soils (World Clarke). The total contents in sandy soil are known to fall into the range of 2 to 123 mg/kg for Cu, from 90 to 1790 mg/kg for Mn and from 1 to 34 mg/kg for Ni in Canada (Frank et al., 1976). The podzols contain Mn from 50 to 2000 mg/kg, Cu from 7 to 70 mg/kg and Ni from 11 to 25 mg/kg (Kabata-Pendias, 2011). Unlike anomaly distributed Cr (Tarasov et al., 2017) (mean value was 245 mg/kg, maximum value was 1265 mg/kg) the specimens with anomaly high Ni content (mean value was 17.2 mg/kg, maximum value was 70.3 mg/kg) did not form spots at the study site. As can be seen from the table, the main statistical parameters for the samples are differ insignificantly. This gives grounds to consider the received samples as representative.
(5)
ug (si) = uANN (si) + rOK (si),
In the work, the ANNs were carried out in MATLAB; to predict the values by kriging the ArcGIS application was used. The hybrid approach framework is shown in Fig. 2. 2.10. Evaluation of the accuracy of interpolation methods The performance of prediction models is based on the model error statistics. The predictive accuracy of each selected approach was verified by the Spearman's rank correlation coefficient, the coefficient of determination, RMSE Eq. (6), d (Willmott, 1981) Eq. (7), RPIQ (BellonMaurel et al., 2010) Eq. (8) between the prediction and raw data from the test data set.
RMSE = d=1
n i=1
(uANN (si) n
u (si ))2
,
3.2. Spatial prediction of element content
(6)
|uANN (si) u (si )| (|uANN (si) u (s)| + |u (si )
u (s)|)
Table 1 shows the positively skewed and leptokurtic distribution of the element contents for the training sites. K-S test indicates that Ni content closes to abnormal distribution (p < 0.01) and content distributions of Mn and Cu do not differ from normal (p > 0.2). Unlike Cr (Tarasov et al., 2017), the samples with high Ni contents do not form spots. The Cr content at the training sites shows a bimodal distribution. The log-transformed data of Cr content minimises the effects of extreme emissions in UK. The significant correlations in pairs Ni–Cr and Mn–Cu (0.77 and 0.69 at p > 0.05, respectively) indicate that these metals have a common origin in the urban soils. Fig. 3a shows the spatial structure depending on the direction (0°,
(7)
where uANN(si) is a predicted content (ANNs, kriging), u(si) is a measured content, n is a number of points, and u (s) is measured mean.
RPIQ =
IQ , RMSE
(8)
where IQ is interquartile distance (IQ = Q3 − Q1). Q1 is the value below which we can find 25% of the samples; Q3 is the value below which we find 75% of the samples.
Fig. 4. RMSE for the element content vs number of neurons in the hidden layer.
430
Catena 174 (2019) 425–435
A.P. Sergeev et al.
Fig. 5. Dependence of the correlation coefficient (left) and RMSE (right) on the SPREAD parameter for various directions (from 0° to 180° in 5° increments).
30°, 60°, 90°, 120° and 150°). For all the elements, the anisotropy of the raw data in the direction of the North-West–South-East is visible on the variogram surface (Fig. 3b). The final configurations of the MLP networks selected were 2-8-1 for Mn and Cu, 2-9-1 for Ni. Thus, the hidden layer contains 8, 8 and 9 neurons for Mn, Ni and Cu, respectively (Fig. 4). The apparent dependence of the correlation coefficient and the mean square error on the parameter SPREAD for various directions also shows the presence and direction of the anisotropy in the data. The diagram is similar to the variogram surface. It also has point symmetry with respect to the centre of the diagram (Fig. 5). Then, we determined the optimal directions for each of the elements for which SPREAD and RMSE have a minimum value. RMSE pointed a minimum values 3.78 mg/kg at 55° for Cu, 36.5 mg/kg at 85° for Mn and 10.9 mg/kg at 110° for Ni. The SPREAD parameter with a minimum error was 0.001 at
55° for Cu, 0.003 at 85° for Mn and 3.9 at 105° for Ni. These were utilised in the final network. 3.3. Assessment of the accuracy of interpolation methods Table 2 shows the statistical parameters used to assess the accuracy of the different methods (the best values are in bold). Comparison of the methods had shown the superiority of MLPRK and GRNNRK in modelling accuracy for the test set. As Table 2 reveals GRNNRK approach had smaller RMSE than GRNN model (for Ni 1.9% and for Cu 2.8% improvement) and kriging (for Mn 3.8%, for Ni 2.8% and for Cu 8.3% improvement). The neural networks also were better than kriging: GRNN and MLP had smaller RMSE for Mn, Ni, Cu (4.7%/11.1%/11.4% and 9.5%/1%/5.4%, respectively). The Index of Agreement (d), (a standardised measure of the degree 431
Catena 174 (2019) 425–435
A.P. Sergeev et al.
Table 2 Accuracy assessment indices of the element content for the test set. Method
Spearman correlation coefficient
Coefficient of determination
d
RPIQ
RMSE, mg/kg
Cu UK MLP MLPRK GRNN (anisotropy) GRNNRK (anisotropy)
0.19 0.46 0.53 0.36 0.47
0.03 0.20 0.26 0.12 0.16
0.23 0.35 0.40 0.24 0.26
0.37 0.51 0.61 0.27 0.36
3.9 3.5 3.4 3.7 3.6
Ni UK MLP MLPRK GRNN (anisotropy) GRNNRK (anisotropy)
0.31 0.50 0.54 0.33 0.39
0.08 0.25 0.28 0.10 0.12
0.33 0.40 0.41 0.33 0.33
0.60 0.98 1.01 0.58 0.63
11.0 9.9 9.7 10.9 10.7
Mn UK MLP MLPRK GRNN (anisotropy) GRNNRK (anisotropy)
0.41 0.53 0.55 0.46 0.49
0.13 0.27 0.29 0.20 0.19
0.34 0.49 0.50 0.43 0.45
0.41 0.81 0.75 0.62 0.61
38.0 34.7 34.3 36.3 36.6
It was expected that modelling the content of Mn and Cu without spatial distortions (without spots, no trends) would require less to predict (e.g. fewer neurons in the hidden layer and/or fewer prediction errors) compared to the anomalously distributed Cr (Tarasov et al., 2017) and Ni. However, to forecast the content of Mn and Cu, more neurons in the hidden layer were required than for Cr (8 and 9, respectively). Probably, the lack of spatial features could lead to the excessive retraining of ANN. The residuals that remained after applying the ANN for the spatial data modelling were evaluated by ordinary kriging. The final estimate was obtained as the sum of the ANN estimate and kriging of the residuals. The advantage of using ANN for modelling the trend in front of other trend models (polynomials, splines, etc.) is that ANN is a universal estimator and models non-linear structures rather well. ANN is able to obtain analytical dependence based on the available data in the learning process. This can be called gradual boosting when another model is applied to the remainders from the previous model. The meaning of this procedure is the decomposition of the spatially placed information, each component of which is modelled by its own model. Decomposition allows a researcher to build a set of relatively simple models and thereby their training can be carried out on a small amount of training sample. In other words, in the presence of a sufficiently large training sample and the availability of computational resources, we can train a rather complex ANN can model all the features of the spatial distribution rather well. However, as a rule, the researcher has at his disposal a very poor set of full-scale data that, in addition, requires a lot of time and material costs. Therefore, the proposed gradual two steps approach (ANN + Residual Kriging) seems promising. Theoretically, adding one more step can also improve the forecast, but the amount by which the accuracy of the forecast will improve will not justify the computational and other costs of this improvement. In the case of a separate application of the ANN, an analysis of the residuals is also important. It helps to interpret the results and evaluate their quality. If the residuals do not have spatial correlation (but are randomly distributed), this may mean that the ANN has completely modelled the data structure. In this case, the ANN evaluation can be taken as a result. If the residuals have the spatial structure, and correlate with the initial data, further modelling of the residuals is necessary. If the residuals have the spatial correlation at shorter distances than the data, the ANN has already simulated the correlation on a larger scale. Such residuals property often leads to the assumption that they are stationary throughout the entire area of research, which cannot be expected for the input data with the trend. Thus, the short radius and stabilised plateau characterise the variogram model for such residuals.
of model prediction error and varies between 0 and 1, where a value of 1 indicates a perfect match, and 0 indicates no agreement at all (Willmott, 1981).) was also better for the MLPRK model for all elements. The advantages of the hybrid models over the models based on ANN were significant only for Cu (8%–12%), for Ni and Mn it is < 3%. Kriging showed the least accurate results, except for the models for Ni where the index was the same for UK, GRNN and GRNNRK. MLP-based models for all elements proved to be better than kriging by > 8%. The RPIQ index, which used interquartile distance IQ (range 50% around the median) as the numerator, was greatest for the MLPRK model for Cu and Ni, and for MLP model for Mn. The hybrid approach was more accurate than basic ANN for Cu (> 18%) and Ni (> 3%). For Mn, the basic models based on ANN were more accurate (from 2% to 8%). Considering all the indicators, the observed methods can be written in descending order of accuracy as follows: MLPRK, MLP, GRNNRK, GRNN, UK. The study of the residuals confirms the importance of variography for the analysis and modelling of spatial data using neural network learning algorithms (MLP and GRNN). Well-trained neural networks model all the structured information. Thus, the residuals should not be spatially correlated. Directed variograms for the residuals demonstrate a pure nugget-effect that is the absence of a spatial correlation (Fig. 6). It was confirmed that the use of the hybrid approach, which applied residual kriging gives an increase in prediction accuracy (by RMSE indicator), compared to the base ANN models, which corresponds to the results (Dai et al., 2014; Tarasov et al., 2017). Fig. 7 shows forecasting maps taken by different methods for Cu content in topsoil. As a rule, strongly spatially dependent properties are determined by internal factors (as soil material and soil texture) in soil characteristics, while external variations (as anthropogenic impact, climate, etc.) control the variability of these weakly spatially dependent parameters (Cambardella et al., 1994). 4. Discussion We compared the approaches to modelling of the spatial distribution of the chemical elements (Mn, Ni, Cu) content in the surface layer of soil (geostatistical techniques (UK), ANNs, and hybrid models) using MLP, GRNN and residual kriging. The spatial distributions of Mn and Cu contents are normal. Ni content has an abnormal distribution. Unlike bimodal distributed, Cr content, which forms anomalies (Sergeev et al., 2013; Sergeev et al., 2015), Ni content do not form spatial spots with anomaly high contents. 432
Catena 174 (2019) 425–435
A.P. Sergeev et al.
Fig. 6. Semivariograms for residuals in directions for MLPRK (left) and GRNNRK (right).
The use of such model in kriging gives correct and accurate results. Residual kriging, like UK, suggests that an unknown mean varies throughout the study area so that the local average cannot be assumed as the constant. In this case, the trend component is modelled separately, using other mathematical or physical approaches. A component of a trend can be modelled using machine learning techniques by using a set of measurements as information to customise its parameters. After selecting the trend, simple or ordinary kriging is used on the models residuals to the measured values of the geochemical field. Residual kriging can also be interpreted as a hybrid model combining different methods on a mathematical or physical basis (Kanevski et al., 1995; Demyanov et al., 1998; Demyanov et al., 2001; Kanevski et al., 2004).
spatial forecasting of element content in topsoil. This study continues to use the ANN models to evaluate the distribution of heavy metals content in the topsoil. Previously, the advantages of these approaches were shown for abnormally distributed Cr (Tarasov et al., 2017) in Subarctic Novy Urengoy, Yamalo-Nenets Autonomous Okrug, Russia. The efficiency of these methods was verified through their comparative evaluation with the application of different models (kriging, MLP, GRNN) for Cu, Ni and Mn distributions predictions at the same particular location. The correlation coefficient, the coefficient of determination, Willmott's index of agreement, RPIQ, and the RMSE were applied as the predictive accuracy indicators of the models for the test data set. Unlike Cr, the content of Cu, Ni and Mn have no pronounced spatial features. The models were built up using the spatial coordinates as the input parameters and the element contents as the output parameters. The residuals of the ANN (MLP and GRNN) models application are then analysed for an estimation of the small-scale variability of the data. The ordinary kriging was performed on the residuals and the outputs were
5. Conclusion In the present work, a modelling approach combining the techniques based on ANN (GRNNRK and MLPRK) was presented for the 433
Catena 174 (2019) 425–435
A.P. Sergeev et al.
GRNN
MLP
UK
a MLPRK
b
c
GRNNRK
d
real
e
f
Fig. 7. Forecasting maps taken by different methods for Cu content test set: (a) UK, (b) MLP, (c) GRNN, (d) MLPRK, (e) GRNNRK, (f) real.
combined with the ANN models to produce the MLPRK and GRNNRK models predictions. As ANN, the most popular multilayer perceptron (MLP) can be used, as well as more complex neural networks (generalised regression, radial basis, etc.). The key point is the analysis and modelling of the correlation structure of the residuals that remain after deduction from the ANN estimation data. This study sought to compare the capabilities of hybrid ANN-kriging methods for modelling the spatial distribution of the heavy metal contents and to demonstrate a more efficient ANN model for forecasting the spatial distribution of the chemical element contents. The research and explanation of the real picture of the spatial distribution of the heavy metals in the surface layer of the soil is an important and interesting task. In our work, we set out to compare several approaches to modelling the spatial distribution of the chemical elements in the surface layer of the soil. On the area of the survey, a certain pattern of the distribution of the elements in the soil was formed. The factors that influenced this picture (geology, climate, global and local mass transfer, anthropogenic impact over the past almost 50 years) are very diverse. Deterministic methods, which are take into account the influence of all these factors, is impossible in principle. If we had a pattern of the distribution that existed before the development of the subarctic territories, we would be able to identify the patterns associated with anthropogenic influence, which could be the real goal of such studies. In this case, (a single screening) it is not possible. Nevertheless, the use of a hybrid approach based on ANNgeostatistics allows, with reasonable accuracy, to reproduce the current pattern of distribution and to identify the sources that affect its changes.
Acknowledgements The authors want to thank the anonymous reviewers for their constructive comments that contributed to improving the paper. References Anagu, I., Ingwersen, J., Utermann, J., Streck, T., 2009. Estimation of heavy metal sorption in German soils using artificial neural networks. Geoderma 152, 104–112. Behrens, T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.-D., Goldschmitt, M., 2005. Digital soil mapping using artificial neural networks. J. Plant Nutr. Soil Sci. 168, 21–33. Bellon-Maurel, V., Fernandez-Ahumada, E., Palagos, B., Roger, J.-M., McBratney, A., 2010. Critical review of chemometric indicators commonly used for assessing the quality of the prediction of soil attributes by NIR spectroscopy. Trends Anal. Chem. 29 (9), 1073–1081. Bishop, C., 1995. Neural Networks for Pattern Recognition. Clarendon, Oxford (504pp). Cambardella, C., Moorman, T., Novak, J., Parkin, T., Karlen, D., Turco, R., Konopka, A., 1994. Field-scale variability of soil properties in central Iowa soils. Soil Sci. Soc. Am. J. 58, 1501–1510. Chang, D.-H., Islam, S., 2000. Estimation of soil physical properties using remote sensing and artificial neural networks. Remote Sens. Environ. 74, 534–544. Dai, F., Zhoua, O., Lva, Z., Wang, X., Liu, G., 2014. Spatial prediction of soil organic matter content integrating artificial neural network and ordinary kriging in Tibetan Plateau. Ecol. Indic. 45, 184–194. Demyanov, V., Kanevsky, M., Chernov, S., Savelieva, E., Timonin, V., 1998. Neural Network Residual Kriging application for climatic data. J. Geogr. Inf. Decis. Anal. 2, 215–232. Demyanov, V., Soltani, S., Kanevski, M., Canu, S., Maignan, M., Savelieva, S., Timonon, V., Pisarenko, V., 2001. Waywlet analysis residual kriging vs. neural network residual kriging. In: Stochastic Environmental Research and Risk Assessment. 15. pp. 18–32. Falamaki, A., 2013. Artificial neural network application for predicting soil distribution coefficient of nickel. J. Environ. Radioact. 115, 6–12. Forsythe, K.W., Marvin, C.H., Valancius, C.J., Watt, J.P., Aversa, J.M., Swales, S.J.,
434
Catena 174 (2019) 425–435
A.P. Sergeev et al.
Moscow, Russia, pp. 84–108 (in Russian). Samanta, B., Ganguli, R., Bandopadhyay, S., 2005. Trans. Inst. Min. Metall. 114, 129–139. Sarle, W.S., 1995. Stopped training and other remedies for overfitting. In: 27th Symposium on the Interface of Computing Science and Statistics, pp. 352–360. Schloeder, C.A., Zimmerman, N.E., Jacobs, M.J., 2001. Comparison of methods for interpolating soil properties using limited data. Soil Sci. Soc. Am. J. 65, 470–479. Seo, Y., Kim, S., Singh, V.P., 2015. Estimating spatial precipitation using regression Kriging and Artificial Neural Network Residual Kriging (RKNNRK) hybrid approach. Water Resour. Manag. 28, 2189–2204. Sergeev, A.P., Baglaeva, E.M., Shichkin, A.V., 2010. Case of soil surface chromium anomaly of a northern urban territory – preliminary results. Atmos. Pollut. Res. 1, 44–49. Sergeev, A.P., Baglaeva, E.M., Medvedev, A.N., 2013. The analysis of the spatial inhomogeneity of the distribution of chromium and nickel by the results of an environmental screening of a surface soil layer at residential areas of municipal formation of Novy Urengoy. Geoecol. Eng. Geol. Hydrogeol. Geocryol.(Issue 3) (232-242 pp. (in Russian)). Sergeev, A.P., Buevich, A.G., Medvedev, A.N., Subbotina, I.E., Sergeeva, M.V., 2015. Artificial neural network and kriging interpolation for the chemical elements contents in the surface layer of soil on a background area. 15th International Multidisciplinary Scientific GeoConference SGEM. Conference Proceedings Book 3 Vol. 2. pp. 49–56. Shaker, R.R., Ehlinger, T.J., 2014. Exploring non-linear relationships between landscape and aquatic ecological condition in southern Wisconsin: a GWR and ANN approach. Int. J. Appl. Geospatial Res. 5 (4), 1–20. Shaker, R., Tofan, L., Bucur, M., Costache, S., Sava, D., Ehlinger, T., 2010. Land cover and landscape as predictors of groundwater contamination: a neural-network modelling approach applied to Dobrogea, Romania. J. Environ. Prot. Ecol. 11 (1), 337–348. Shepherd, A.J., 1997. Second-Order Methods for Neural Networks: Fast and Reliable Training Methods for Multi-Layer Perceptrons. Springer-Verlag (145pp). Silveira, C.T., Oka-Fiori, C., Santos, L.J.C., Sirtoli, A.E., Silva, C.R., Botelho, M.F., 2013. Soil prediction using artificial neural networks and topographic attributes. Geoderma 195–196, 165–172. Sirven, J.-B., Bousquet, B., Canioni, L., Sarger, L., Tellier, S., Potin-Gautier, M., Le Hecho, I., 2006. Qualitative and quantitative investigation of chromium-polluted soils by laser-induced breakdown spectroscopy combined with neural networks analysis. Anal. Bioanal. Chem. 385, 256–262. Song, Y.-Q., Yang, L.-A., Li, B., Hu, Y.-M., Wang, A.-L., Zhou, W., Cui, X.-S., Liu, Y.-L., 2017. Spatial prediction of soil organic matter using a hybrid geostatistical model of an extreme learning machine and ordinary kriging. Sustainability 9, 754. https://doi. org/10.3390/su9050754. Tarasov, D.A., Buevich, A.G., Sergeev, A.P., Shichkin, A.V., 2017. High variation topsoil pollution forecasting in the Russian subarctic: using artificial neural networks combined with residual kriging. Appl. Geochem. https://doi.org/10.1016/j.apgeochem. 2017.07.0070. (In Press). Vojtkevich, V., Miroshnikov, G., Boil, A., Prohorov, V., 1977. The Short Manual on Geochemistry (in Russian). Nedra Publishing, Moscow, Russia (180pp). Werbos, P.J., 1974. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences (PhD thesis). Harvard University. Willmott, C.J., 1981. On the validation of models. Phys. Geogr. 2, 184–194. Worsham, L., Markewitz, D., Nibbelink, N., 2010. Incorporating spatial dependence into estimates of soil carbon contents under different land covers. Soil Sci. Am. J. 74, 635–646. Yfantis, E.A., Flatman, G.T., Behar, J.V., 1987. Efficiency of kriging estimation for square, triangular, and hexagonal grids. Math. Geol. 19, 183–205. Zeissler, K.-O., Hertwig, T., 2011. Artificial Neural Network Instead of Kriging? A Case Study With Soil Contamination of Complex Sources. Landwirtschaft und Geologie, Dresden Access 10.03.2016. http://www.beak.de/beak/sites/default/files/content/ 7_News/111_10_Oct_2011/Pribram2011_3.pdf. Zhang, C., Fay, D., McGrath, D., Grennan, E., Carton, O.T., 2008. Use of trans-Gaussian kriging for national soil geochemical mapping in Ireland. Geochem. Explor. Environ. Anal. 8, 255–265. Ziaii, M., Ardejani, F.D., Ziaei, M., Soleymani, A.A., 2012. Neuro-fuzzy modeling based genetic algorithms for identification of geochemical anomalies in mining geochemistry. Appl. Geochem. 27 (3), 663–676.
Jakubek, D.J., Shaker, R.R., 2016. Geovisualization of mercury contamination in Lake St. Clair sediments. J. Mar. Sci. Eng. 4 (1), 19. Frank, R., Ishida, K., Suda, P., 1976. Metals in agricultural soils of Ontario. Can. J. Soil Sci. 56, 181–196. Goovaerts, P., 1999. Geostatistics in soil science: state of the art and perspectives. Geoderma 89, 1–45. Graupe, D., 2007. Principles of artificial neural networks. In: Advanced Series on Circuits and Systems, 2nd edition. vol. 6 World Scientific Publishing Co, Singapore. Guo, G.H., Wu, F., Xie, F., Zhang, R., 2012. Spatial distribution and pollution assessment of heavy metals in urban soils from southwest China. J. Environ. Sci. 24 (3), 410–418. Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression kriging. Geoderma 120, 75–93. Hengl, T., Heuvelink, G.B.M., Rossite, D.G., 2007. About regression-kriging: from equations to case studies. Comput. Geosci. 33, 1301–1315. Heung, B., Ho, H.C., Zhang, J., Knudby, A., Bulmer, C.E., Schmidt, M.G., 2016. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma 265, 62–77. Kabata-Pendias, A., 2011. Trace Elements in Soils and Plants. Taylor and Francis Group CRC Press, pp. 201–260. Kanevski, M.F., 1999. Spatial predictions of soil contamination using general regression neural networks. Int. J. Syst. Res. Inf. Syst. 8 (4), 241–256. Kanevski, M., Arutyunyan, R., Bolshov, L., Demyanov, V., Maignan, M., 1995. Artificial neural networks and spatial estimations of Chernobyl fallout. Geoinformatics 7 (1–2), 5–11. Kanevski, M., Parkin, R., Pozdnоukhov, A., Timonin, V., Maignan, M., Demyanov, V., Canu, S., 2004. Environmental data mining and modeling based on machine learning algorithms and geostatistics. Environ. Model. Softw. 19, 845–855. Kanevski, M., Pozdnoukhov, A., Timonin, V., 2009. Machine Learning for Spatial Environmental Data: Theory, Applications and Software. 2009. EPFL Press, pp. 377. Koike, K., Matsuda, S., Suzuki, T., Ohmi, M., 2002. Nat. Resour. Res. 11 (2), 135–156. Lakes, T., Müller, D., Krüger, C., 2009. Cropland change in southern Romania: a comparison of logistic regressions and artificial neural networks. Landsc. Ecol. 24 (9), 1195–1206. Li, Y., Li, C., Tao, J.-J., Wang, L.-D., 2011. Study on spatial distribution of soil heavy metals in Huizhou City based on BP-ANN modeling and GIS. Procedia Environ. Sci. 10, 1953–1960. Licznar, P., Nearing, M.A., 2003. Artificial neural networks of soil erosion and runoff prediction at the plot scale. Catena 51, 89–114. Liu, Z.H., Chang, Y., Chen, H.W., 2008. Estimation of forest volume in Huzhong forest area based on RS, GIS and ANN (in Chinese). Chin. J. Appl. Ecol. 19, 1891–1896. Liu, F., He, X., Zhou, L., 2009. Application of generalized regression neural network residual kriging for terrain surface interpolation. In: Proc. SPIE 7492, International Symposium on Spatial Analysis, Spatial-Temporal Data Modeling, and Data Mining, (74925F). Matheron, G., 1963. Principles of geostatistics. Econ. Geol. 58, 1246–1266. May, R.J., Maier, H.R., Dandy, G.D., Fernando, T.M., 2008. Nonlinear variable selection for artificial neural networks using particle mutual information. Environ. Model. Softw. 23 (10–11), 1312–1326. McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–52. McCulloch, W.S., Pitts, W.H., 1943. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133. McKnight, T.L., Hess, D., 2000. Climate Zones and Types: The Köppen System. Physical Geography: A Landscape Appreciation. Prentice Hall, Upper Saddle River, NJ, 0-13020263-0pp. 200–201. Mohanty, K., Majumdar, T.J., 1999. Using artificial neural networks for synthetic surface fitting and the classification of remotely sensed data. Int. J. Appl. Earth Obs. Geoinf. 1 (1), 78–84. Oliver, M.A., Webster, R., 2014. A tutorial guide to geostatistics: computing and modelling variograms and kriging. Catena 113, 56–69. Priori, S., Bianconi, N., Constantini, E.A.C., 2014. Can γ-radiometrics predict soil textural data and stoniness in different parent materials? A comparison of two machine learning methods. Geoderma 226–227, 354–364. Rosenblatt, F., 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65 (6), 386–408. Saet, J.E., Revich, B.A., Yanin, E.P., 1990. Environment Geochemistry. Nedra Publishing,
435