GEODER-12361; No of Pages 10 Geoderma xxx (2016) xxx–xxx
Contents lists available at ScienceDirect
Geoderma journal homepage: www.elsevier.com/locate/geoderma
Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives A. Beucher a,⁎, K. Adhikari b, H. Breuning-Madsen c, M.B. Greve a, P. Österholm d, S. Fröjdö d, N.H. Jensen e, M.H. Greve a a
Aarhus University, Department of Agroecology, 8830 Tjele, Denmark University of Wisconsin-Madison, Department of Soil Science, FD, Hole Soils Lab, 53706 Madison, USA University of Copenhagen, Department of Geosciences and Natural Resource Management, 1350 Copenhagen, Denmark d Åbo Akademi University, Department of Geology and Mineralogy, 20500 Åbo, Finland e Roskilde University, Department of Science and Environment, 4000 Roskilde, Denmark b c
a r t i c l e
i n f o
Article history: Received 5 February 2016 Received in revised form 1 April 2016 Accepted 1 June 2016 Available online xxxx Keywords: Acid sulfate soils Digital soil mapping Artificial neural networks LiDAR-based derivatives
a b s t r a c t Leaching large amounts of acidity and metals into recipient watercourses and estuaries, acid sulfate (a.s.) soils constitute a substantial environmental issue worldwide. Mapping of these soils enables measures to be taken to prevent pollution in high risk areas. In Denmark, legislation prohibits drainage of areas classified as potential a.s. soils without prior permission from environmental authorities. The mapping of these soils was first conducted in the 1980’s. Wetlands, in which Danish potential a.s. soils mostly occur, were targeted and the soils were surveyed through conventional mapping. In this study, a probability map for potential a.s. soil occurrence was constructed for the wetlands located in Jutland, Denmark (c. 6500 km2), using the digital soil mapping (DSM) approach. Among the variety of available DSM techniques, artificial neural networks (ANNs) were selected. More than 8000 existing soil observations and 16 environmental variables, including geology, landscape type, land use and terrain parameters, were available as input data within the modeling. Prediction models based on various network topologies were assessed for different selections of soil observations and combinations of environmental variables. The overall prediction accuracy based on a 30% hold-back validation data reached 70%. Furthermore, the conventional map indicated 32% of the study area (c. 2100 km2) as having a high frequency for potential a.s. soils while the digital map displayed about 46% (c. 3000 km2) as high probability areas for potential a.s. soil occurrence. ANNs, thus, demonstrated promising predictive classification abilities for the mapping of potential a.s. soils on a large extent. © 2016 Elsevier B.V. All rights reserved.
1. Introduction Acid sulfate (a.s.) soils constitute a major environmental issue, presumably affecting more than 200,000 km² of coastal areas worldwide (Andriesse and van Meensvoort, 2006). In many cases, these soils occur in heavily populated areas with a consequent high demand on clean water. They also cause severe ecological damage (i.e. killing fish and other aquatic organisms), as well as the degradation of underground concrete and steel structures to the point of failure. In northern Europe, a.s. soils are mostly located along the coasts of the Baltic Sea and North Sea; they have been studied in Finland (Yli-Halla, 1997; ⁎ Corresponding author at: Aarhus University, Department of Agroecology, 8830 Tjele, Denmark. E-mail addresses:
[email protected] (A. Beucher),
[email protected] (K. Adhikari),
[email protected] (H. Breuning-Madsen),
[email protected] (M.B. Greve), peter.osterholm@abo.fi (P. Österholm), soren.frojdo@abo.fi (S. Fröjdö),
[email protected] (N.H. Jensen),
[email protected] (M.H. Greve).
Österholm and Åström, 2004; Roos and Åström, 2005; Toivonen et al., 2013), Sweden (Sohlenius and Öborn, 2004), Denmark (Madsen et al., 1985; Madsen and Jensen, 1988) and Poland (Urbańska et al., 2012). In the western part of Denmark (i.e. Jutland; Fig. 1), drainage of wetlands, mainly for farming, can lead to the formation of a.s. soils; iron sulfides (mostly pyrite; Madsen et al., 1985) oxidize and sulfuric acid is produced, causing the leaching of metals and the soil pH dropping below 3.0. Subsequently, a.s. soils release a toxic combination of acidity and metals (mainly iron, but also to a lesser degree, aluminium, arsenic, cadmium, cobalt, nickel, zinc and rare earth elements) to the recipients such as streams and estuaries (Nystrand and Österholm, 2013). In particular, the large amounts of leached iron can result in a heavy ochre pollution in water courses (Madsen et al., 1985), easily recognizable by its yellow/orange color. Notably small hotspots of a.s. soils may impact large water bodies. Therefore, mapping of these soils constitutes a critical step to plan and carry out effective mitigation. In Denmark, legislation prohibits drainage of areas classified as potential a.s. soils without prior permission from
http://dx.doi.org/10.1016/j.geoderma.2016.06.001 0016-7061/© 2016 Elsevier B.V. All rights reserved.
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
2
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
Fig. 1. Distribution of the wetlands in Jutland, Denmark.
environmental authorities. The mapping of these soils was carried out in the 1980's. Wetlands were targeted and soils were surveyed through conventional mapping, the procedure including soil sampling and the subsequent determination of pH at the time of sampling and after incubation, as well as the calculation of pyrite content and acid-neutralizing capacity (Madsen et al., 1985). Traditional soil mapping being time- and resource-consuming, alternative spatial modeling techniques may be useful to predict the occurrence of a.s. soils at various scales and extents. Within the Digital Soil Mapping (DSM) approach (McBratney et al., 2003), several techniques have recently been evaluated for mapping a.s. soils. A fuzzy k-means algorithm for clustering analysis was applied on a relatively small coastal a.s. soil area in Australia (Huang et al., 2014a, b). Fuzzy logic and Artificial Neural Network (ANN) techniques were assessed on a.s. soils in Finland at regional and catchment scale, respectively (Beucher et al., 2013, 2014, 2015). The empirical, data-driven ANN techniques constitute efficient pattern recognition and classification tools (Bonham-Carter, 1994), with the ability to generalize from imprecise input data (Porwal et al., 2003) and to handle large datasets (Gershenfeld, 1999). For this study, an ANN method was selected because of the availability of large input datasets (i.e. soil observations and environmental variables). Furthermore, ANNs are frequently applied in DSM for predicting soil attributes (Chang and Islam, 2000; Minasny and McBratney, 2002; Lentzsch et al., 2005; Viscarra Rossel and Behrens, 2010) or soil classes (Zhu, 2000; Behrens et al., 2005; Boruvka and Penizek, 2007; Cavazzi et al., 2013; Chagas et al., 2013; Silveira et al., 2013). The main objective of this study was to assess the predictive classification abilities of an ANN technique for potential a.s. soil mapping of the wetlands in Jutland, Denmark (Fig. 1). 2. Study area The wetlands located in Jutland constitute the study area (Fig. 1). They cover about 6500 km2 and represent approximately 20% of Jutland (Madsen et al., 1985). Wetlands correspond to saturated soils, such as histosols, fluvisols and gleysols (IUSS Working Group WRB, 2006). Plant communities such as Juncus effusus constitute their natural vegetation. Wetlands were mainly used for hay production until the second
half of the 19th century when tile drainage was introduced (Greve et al., 2014). Thus, most of the wetlands (c. 5100 km2) have been artificially drained and intensively farmed using fertilizer and lime, the main crop being cereals and grass (Bou Kheir et al., 2010). The study area comprises various landforms (Madsen et al., 1992); the western part of the study area consists of low-relief sandy glaciofluvial outwash plains from the Weichselian glaciation (i.e. Last Glacial Maximum; c. 1200 km2 of the wetland areas), which surrounds slightly protruding islands of older and strongly eroded moraine landforms from the Saalian glaciation (c. 700 km2 of the wetland areas; Madsen et al., Madsen and Jensen, 1988). The eastern part of the study area is composed of Weichselian moraine landforms (c. 900 km2 of the wetland areas) while the northern part consists of late- and post-glacial marine sediments (c. 2400 km2 of the wetland areas; Madsen and Jensen, 1988). Sub-surface geology varies from North to South in the study area. Cretaceous limestone dominates in northern Jutland and Djursland while Tertiary mica-rich sand and clay prevail in the rest of Jutland (Madsen and Jensen, 1988). The study area has a temperate climate with a winter mean temperature of 0 °C and a summer mean of 16 °C. The average annual precipitation is about 800 mm in central Jutland (Danmarks Meteorologiske Institut, 1998). 3. Material and methods 3.1. Soil observations Soil observations used in this study were extracted from the Ochre Classification database resulting from the potential a.s. soil mapping which was conducted in the 1980's (Madsen et al., 1985). Soils in wetland areas were targeted and surveyed through conventional mapping. Field work was carried out from May to October over a three-year period (1981–83). The selection of 8007 sampling sites was based on historical topographic maps (at scale 1:20,000), geological maps, soil maps and maps from previous moorland studies, and representing an even distribution in wetlands and soil types (Madsen et al., 1985). Each profile was sampled using a portable auger down to 2.5 m and samples were taken from major horizons below or near the groundwater table
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
(in all, c. 14,100 samples). For each profile, the soil type, texture and color were recorded. The presence of calcium carbonate (checked with dilute acid) and the smell of hydrogen sulfides were noted. For each sample, pH was also measured at the time of sampling. In the laboratory, carbonate-free samples were incubated at room temperature and pH was measured. If pH dropped below 3.0 within 16 weeks, the samples were considered as potential a.s. soils (Madsen et al., 1985). For samples containing calcium carbonates, elemental concentrations and mineralogy were determined by atomic absorption spectrometry and X-ray diffraction. In particular, calcium (Ca), magnesium (Mg) and pyrite contents were analyzed, and the acid-neutralizing capacity (ANC) was calculated (Madsen et al., 1985). If the following equation was fulfilled, the samples were classified as potential a.s. soils. ANCðmeq:Ca þ MgÞb34 %pyrite
where the factor 34 corresponds to the amount of sulfuric acid (in meq.) produced by oxidation of 1% of pyrite. This classification was utilized within the modeling which requires two different sets of points: positive points representing potential a.s. soil occurrences, and negative points non-potential a.s. soil sites (i.e. soils which are not and will not become a.s. soils; Fig. 2). From each of the positive and negative sets, 30% of the points were randomly selected for validation and thus excluded from network training.
3
3.2. Environmental data The DSM approach is based on the ‘scorpan’ model (McBratney et al., 2003): S ¼ ƒðs; c; o; r; p; a; nÞ þ e where S is the soil attribute or class to predict (in our case, two classes: potential a.s. or non- potential a.s. soil), and the ‘scorpan’ factors stand for soil (s), climate (c), organisms (o), relief (r), parent material (p), age (a) and spatial position (n). The e is a spatially correlated error, which is not modelled in this study. This equation is an extension of Jenny's (1941) state factor model (i.e. ‘clorpt’, standing for climate, organisms, relief, parent material and geological time), which was designed for soil genesis modeling. The empirical function ƒ used in this study is an ANN model. In the present study, 16 environmental variables were utilized as input data within the modeling. Among these, 10 land surface parameters were derived from the airborne LiDAR (Light Detection and Ranging)-based Digital Elevation Model (DEM) produced by the National Survey and Cadastre of the Danish Ministry of Environment in 2011. The LiDAR point clouds were converted to a raster of 1.6 m grid size for DEM, which was further aggregated to 30.4 m. TerraStream software (Danner et al., 2007) was used to create and process the DEM (e.g. all depressions of ≤50 cm depth were filled so that they would not generate problems during the extraction of surface water flow direction and drainage network). Multiple-flow direction (MFD) or FD8 algorithms (Freeman, 1991) were applied for all flow-related calculations. From the pre-processed DEM, the 10 land surface parameters were extracted in ArcGIS (ESRI, 2014) and SAGA GIS (SAGA GIS): elevation, slope gradient, slope aspect, direct sunlight insolation, mid-slope position, flow accumulation, MRVBF (Multi-Resolution index of Valley Bottom Flatness), SAGAWI (System of Automated Geoscientific Analyses Wetness Index), valley depth and distance to channel network. Categorical variables, namely, geology, geo-region, soil, landscape, land use and wetland soil type maps, were also used as predictors. The geology map represents the parent materials and was extracted from the national geological map (Danmarks Geologiske Undersøgelse, 1978). The geo-region map shows distinct regions in Denmark based on climate and geographical settings. The soil map represents soil types based on soil texture and was compiled by Madsen et al. (1992). The landscape type map shows Danish landforms, mostly referring to quaternary geological developments (Madsen et al., 1992). The land use map corresponds to the land cover types derived from Corine Land Cover data specified for Denmark (Stjernholm and Kjeldgaard, 2004). For the modeling, a common projection (ETRS1989 UTM32N) and a similar cell size (30.4 m) were applied to all the predictors. Table 1 presents an overview of the environmental variables used as predictors in the study while Figs. 3 and 4 display examples of categorical and LiDAR-based variables, respectively, for the whole Jutland area. The categorical input variables were classified numerically using ArcGIS (ESRI, 2014; Table 2). 3.3. Artificial neural networks for potential acid sulfate soil occurrence prediction
Fig. 2. Location of the soil observations extracted from the Ochre Classification database (Madsen et al., 1985).
Artificial neural networks (ANNs) constitute standard machinelearning techniques, which are basically designed to learn how to classify new, unknown numeric information using known data for training. They are thus considered as supervised learning techniques (Zell et al., 1998). Moreover, ANNs have the ability to handle large datasets with many predictors and to approximate the non-linear relationships of these predictors; they also are robust to noise, outliers and overfitting (Gershenfeld, 1999; Chang and Islam, 2000; Viscarra Rossel and Behrens, 2010; Chagas et al., 2013).
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
4
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
Table 1 Environmental variables used as predictors in this study. Modified from Adhikari et al. (2014).
Environmental variables
Original scale/resolution
Brief description
Range
Geology Soil Geo-regions Landscape Land use Wetland type Elevation Slope gradient Slope aspect Direct sunlight insolation Mid-slope position Flow accumulation MRVBF SAGAWI Valley depth Altitude above channel network
1:100,000 1:50,000 1:100,000 1:100,000 1:100,000 1:100,000 30.4 m 30.4 m 30.4 m 30.4 m 30.4 m 30.4 m 30.4 m 30.4 m 30.4 m 30.4 m
Scanned and registered geological map (86 original classes simplified into 11) Map of soil types based on soil texture Scanned geographical regions map Landform types CORINE land cover data adopted in Denmark (34 original classes simplified into 5) Wetland soil types: peat or mineral Elevation of the land surface derived from LiDAR Maximum rate of change between the cells and neighbors Direction of the steepest slope from the North Potential incoming solar radiation calculated for a single year; Böhner and Antonić (2009) Covers the warmer zones of slopes; Bendix (2004) Number of upslope cells Multi-resolution valley bottom flatness: calculates the depositional areas; Gallant and Dowling (2003) Wetness Index, WI = ln(As / tan β): where As is modified catchment area and β is the slope gradient; Böhner et al. (2002) Extent of the valley depth Vertical distance to channel network base level
11 classes 9 classes 10 classes 11 classes 5 classes 2 classes 0 to 170 m 0 to 90° 0 to 360° 254 to 698 0 to 1 1 to 73645 0 to 11 7.2 to 19 0 to 90 m 0 to 56 m
In the present study, we used an ANN method based on Radial Basis Function (RBF) comprising three layers which are all connected: (1) an input layer comprising several nodes, each node transmitting one input variable; (2) a hidden layer with numerous artificial neurons, each representing a RBF (using a Gaussian activation function in our case); (3) an output layer also with artificial neurons, transmitting predicted output values. The application of an ANN consists of two stages: training and classification. Within the study area, each 30.4 m × 30.4 m cell could be defined as a feature vector x in an i-dimensional space: x (x1, x2, x3… xi), where x1, x2, x3…xi each represented a class or continuous value from one of i input variables. During the training stage, the network receives the different input feature vectors corresponding to known examples (i.e. training points representing potential a.s. soil occurrences and non-potential a.s. soil sites, in our case) through the input layer. The input variables are then transmitted to the hidden layer where the neurons compute and extract significant information from them to predict output values. Before the training, link weights are set to each connection through an initialization function (Zell et al., 1998). During the training, the network automatically and iteratively adjusts these link weights so that the predicted output value is as close as possible to the known target output value (Gershenfeld, 1999), for each of the training input feature vectors. The link weights are adjusted according to a learning function (mainly outlining different rates; Bergmeir and Benítez, 2012). During the classification stage, the network receives every existing input feature vectors for the whole study area and classifies them using the learned, calibrated weights. The network thus predicts output values for each cell within the study area, leading to the creation of a probability or predictive map. The output values represent probability values ranging between 0 and 1 (i.e. the lowest and highest probability for potential a.s. soil occurrence, respectively). For this study, the RBF-based ANN integration was implemented within R environment using a package called RSNNS (Bergmeir and Benítez, 2012) for the ANN creation, training and classification. This R package represents an efficient interface for the Stuttgart Neural Network Simulator (SNNS; Zell et al., 1998) used for ANN implementation. We developed our own routines in R in order to automatically and systematically test a great number of ANN models, run validation, select and post-process the most accurate models classification-wise. The models were based on various network topologies as different parameters could be adjusted: the network architecture (in our case, the number of nodes in the input layer and neurons in the hidden layer), the number of iterations, and the initialization and learning functions parameters. Python routines were also developed and utilized for point data conversion.
3.4. Validation The performance of our ANN models to predict potential a.s. soils was evaluated using a 30% hold-back validation data. The relationship between observed and predicted points for the validation data sets was examined by constructing confusion matrices (Bou Kheir et al., 2010). Three criteria were used to evaluate the prediction accuracy: Overall Accuracy (OA), Producer Accuracy (PA) and User Accuracy (UA). OA represents the number of correctly classified points divided by the total number of validation points. PA shows how well the observed points within a class were predicted by the model (i.e. the percentage of correctly classified points among the observed points from a class). It provides a measure of how well the analyst did when producing the map while UA gives a measure for the map user of the probability that the predicted points on the map have been correctly classified during the modeling process (i.e. the percentage of correctly classified points among the predicted points from a class; Bou Kheir et al., 2010). The sum of squared errors for training and validation (SSEtrain and SSEval) were also calculated; the lowest SSEtrain may indicate the optimal number of hidden neurons and the optimal class range of 0–1 (Nykänen, 2008). Within the R routines, the validation results of each model were systematically compared in order to select the models yielding the most accurate classification (i.e. the different accuracy values) and lowest sum of squared errors for training (SSEtrain). The training, classification and validation results, as well as the network parameters were also automatically recorded for the selected models.
4. Results and discussion In this study, the predictive classification abilities of an RBF-based ANN were evaluated for potential a.s. soil occurrence in the wetlands of the Jutland peninsula (c. 6500 km2). A great number of ANN models was generated utilizing various selections of soil observations (displayed in Table 3) and combinations of environmental variables. Several ANN models yielding the most accurate classification and lowest SSEtrain were selected. Their characteristics and validation results are displayed in Tables 4 and 5, respectively. Models A and B both utilized all soil observations (8007; selection 1 displayed in Fig. 2 and Table 3), but different combinations of environmental variables (corresponding to the number of input nodes in Table 4). Models C and D both utilized 7441 soil observations (i.e. selection 2 in Table 3), but different combinations of environmental variables (Table 4). Model E utilized 5885 soil observations (i.e. selection 3 in Table 3) and 10 environmental
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
5
Fig. 3. Examples of categorical variables used within this study.
variables (Table 4). The number of hidden neurons used within these models ranged between 30 and 90, the number of iterations between 100 and 1000 (Table 4). The performance of ANNs being positively correlated to the number of soil observations (Viscarra Rossel and Behrens, 2010), the first selection of soil observations (selection 1 displayed in Fig. 2 and Table 3) comprised all the profiles extracted from the Ochre Classification database. Nevertheless, the most accurate model using all soil observations and 16 input variables (i.e. model A in Tables 4 and 5) yielded only about 60% of OA. Different selections of soil observations were subsequently tested (Table 3). Since PA for negative points in model A only reached 44%, we assumed that the corresponding network had difficulties to properly classify non-potential a.s. soil points. We thus focused on the soil observation classification in order to find possible discrepancies. The main discrepancy was that the soil observations classified as potential a.s. soil occurrences did not only comprise profiles with minimum incubation pH below 3.0 (referred to as strongly acidic profiles),
but also profiles with a minimum incubation pH between 3.1 and 4.0 (referred to as acidic profiles; n = 1556). Among soil observations classified as non-potential a.s. soils, a number of occurrences were ambiguously displaying a minimum incubation pH between 3.1 and 4.0 (n = 566). For selection 2, the latter 566 negatives, wrongly classified as non-potential a.s. soils, were excluded (Table 3). In addition to this, the 1556 perhaps incorrectly classified positive points were also excluded for selection 3 (Table 3). Modeling based on selection 3 utilized the same criteria for potential a.s. soil occurrence as the conventional map for Danish potential a.s. soils created by Madsen et al. (1985). Two different combinations of environmental variables (i.e. number of input nodes in Table 4) were also tested. First, all 16 input variables were utilized. Then, numerous models were tested, each time excluding one of the input variables. Several variables (soil type, slope aspect, direct sunlight insolation, flow accumulation, mid-slope position and distance to channel network) appeared not to contribute as the modeling results did not vary when they were left out. Consequently, 10 input
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
6
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
Fig. 4. Examples of LiDAR-based derivatives used within this study.
variables were utilized for the modeling (elevation, slope gradient, MRVBF, SAGAWI, valley depth, geology, geo-region, landscape, land use and wetland soil type). The categorical variables enabled to target areas where potential a.s. soils typically occur. Considering the landscape predictor, potential a.s. soils occur in various landforms. Marsh areas, as well as late- and post-glacial marine sediment areas represent marine and brackish environments where sulfide-bearing sediments were deposited under anoxic conditions. Glaciofluvial outwash plains (from the Weichselian glaciation) and moraine landforms (from the Saalian and Weichselian glaciations) represent non-marine environments where a high influx of iron sulfates came with the groundwater from pyrite-bearing Tertiary and glacial sediments (Jakobsen, 1988; Madsen and Jensen, 1988). Considering the geology predictor, potential a.s. soils mostly occur in: saltwater clay and sand (in marsh areas, as well as late- and post-glacial marine areas) and freshwater peat and sand (in Saalian and Weichselian moraine landforms and outwash plains; Table 2; Madsen and Jensen, 1988). Considering the land use predictor, potential a.s. soils mainly occur in agricultural areas. Among the LiDAR-based derivatives, elevation and slope gradient enabled to target low-lying, low-relief areas where potential a.s. soils generally occur. Moreover, the models used SAGAWI and MRVBF to detect high
wetness index and depositional areas, respectively, which are associated with potential a.s. soil occurrences. Focusing on OA values in Table 5, it is notable that, using the same selections of points, models B and D performed better than models A and C, respectively, supporting the use of 10 input variables. Excluding ambiguous points, as explained above, slightly increased the performance of our models (Table 5). Model E achieved the best performance (i.e. most accurate classification and lowest SSEtrain; Table 5) and the corresponding predicted results (i.e. probability or predictive maps) were visualized in ArcGIS (Fig. 5). Model E displayed the highest OA (70%) and PA for positive points (76%), but a relatively low PA for negative points (66%): the observed positive points were mostly properly classified but not the observed negative points. Conversely, UA results showed that 81% of predicted negative points were correctly classified, but only 59% for the predicted positive points. This suggests that the model could properly classify most of the positive points, but still encountered difficulties to correctly assign negative points. Furthermore, a more thorough update of the soil observation classification might improve the performance of our models to a certain extent. This study constitutes a first assessment of an ANN for the prediction of potential a.s. soil occurrences in Denmark and, in particular, on a large
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
extent. The RBF-based ANN technique was previously evaluated for a.s. soils in Finland, yielding favorable results at catchment scale (Beucher et al., 2015). The much larger extent of the present study constituted a challenge because of the long computation time associated with ANNs and the computation limits inherent to RSNNS. In particular, as of today, RSNNS cannot handle parallel computing (i.e. rendering a multi-core computer useless), which constitutes a strenuous limit for accelerating computation. Moreover, RSNNS cannot evaluate the prediction uncertainty, which would enable to further assess the reliability of our models. Another challenge within this study was the unavailability of geophysical data for Denmark. Geophysical variables such as the low electromagnetic frequency (3 kHz) imaginary component derived from high resolution low altitude airborne geophysics enables the detection of shallow anomalies which are mainly related to variations of the soil electric conductivity. Sulfide-bearing sediments are expected to yield strong electromagnetic anomalies due to their high contents of soluble salts (Vanhala et al., 2004; Suppala et al., 2005). The imaginary component was previously defined as the most efficient predictor for a.s. soil occurrences in Finland (Beucher et al., 2013, 2014, 2015).
Table 2 Categorical input variables: classification and distribution in the study area. Categorical data
Geology
Soil
Geo-regions
Landscape
Land use
Wetland type a
Class description
Not classified Freshwater clay, lga Saltwater clay, pga Aeolian sand, pga Clayey till, lga Sandy till, lga Freshwater sand, lga Saltwater sand, lga Freshwater clay, pga Freshwater sand, pga Freshwater peat, pga Not mapped Calcareous Clay Sandy clay Heavy clay Fine sand Clayey sand Coarse sand Organic Southeast Denmark Eastern Denmark Mid Jutland Northern Jutland Himmerland Thy Western Jutland Bedrock Terminal moraine Reclaimed land Kettled moraine Aeolian Sub-glacial tunnel valley Moraine Post-glacial marine Late-glacial marine Saalian moraine Glaciofluvial (outwash plains) Marsh Other Artificial surface Forest and semi-natural Wetlands Agriculture Mineral Organic
pg: post-glacial; lg: late-glacial.
Modeling class
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 1 2
Distribution %
km2
1 0 4 6 6 7 9 28 8 9 21 11 1 2 4 3 11 15 24 30 0 10 9 24 16 9 33 0 1 2 3 8 3 13 34 4 10 18 5 1 4 10 7 78 64 36
98 7 261 398 403 487 599 1812 499 605 1378 720 34 110 275 184 720 982 1571 1951 0 628 568 1549 1029 619 2154 0 80 163 179 525 173 865 2196 230 654 1151 333 37 236 650 526 5098 4167 2380
7
Table 3 Different selections of soil observations used within the modeling (positive and negative points representing potential a.s. and non-potential a.s. soil occurrences, respectively). Selection
Soil observations
1 2 3
Total
Positive points
Negative points
Training
Validation
Total
Training
Validation
Total
8007 7441 5885
2706 2706 1616
1159 1159 693
3865 3865 2309
2899 2503 2503
1243 1073 1073
4142 3576 3576
Nevertheless, ANN techniques having the ability to handle large datasets (Gershenfeld, 1999) and non-linear interactions in the data (Viscarra Rossel and Behrens, 2010), the RBF-based ANN models demonstrated promising predictive classification results for a large study area. While the present study utilized the same criteria for a.s soil identification (i.e. minimum incubation pH and ANC compared to pyrite content) as Madsen et al. (1985), the comparison between maps is not straightforward. For the conventional map, Madsen et al. (1985) used a specific classification to represent potential a.s. soil occurrences. The surveyed wetlands were divided into four classes: class 1 for areas comprising more than 50% of potential a.s. soil profiles, class 2 for areas including between 20 and 50% of potential a.s. soil profiles, class 3 for areas comprising less than 20% of potential a.s. soil profiles and class 4 for areas not including any potential a.s. soil profile. This classification is mainly based on expert knowledge and thus highly dependent on the producer. Our predictive map displays the probability to have a potential a.s. soil occurrence in the wetlands, an objective value ranging from 0 to 1. Nevertheless, we could visually compare two small areas (Figs. 5 and 6): the first area centered on the Skjern River catchment located in western Jutland (i.e. maps on the left in Fig. 6) and the second area located in northern Jutland (i.e. maps on the right in Fig. 6). The delta region of the Skjern River constitutes a well-known potential a.s. soil area (Madsen and Jensen, 1988) and appears as such on both ANN probability map and classical map (Fig. 6). Considering the area located in northern Jutland, it is notable that our ANN model could not predict accurately potential a.s. soil occurrences (Fig. 6). The map is however coherent. Moreover, northern Jutland areas were mostly assigned low probability values (Fig. 5). Therefore, additional environmental variables would be needed to improve ANN predictions in these areas. In particular, we might use a predictor representing the distance to the former Litorina coastline since potential a.s. soils formed within post-glacial marine deposits would presumably occur close to this limit. The depth to pre-Quaternary deposits could also be utilized as a predictor for central and southern Jutland where potential a.s. soils can form in non-marine environments (i.e. glaciofluvial outwash plains and moraine landforms) overlaying pyrite-containing Tertiary sediments. Furthermore, we compared the extents calculated from the two maps. The conventional map indicated that 32% of the study area (c. 2100 km2) was potential a.s. soils (i.e. classes 1 and 2 combined: areas comprising more than 20% of potential a.s. soil profiles). The areas
Table 4 Characteristics for a selection of ANN models: input data and network parameters. Models
Soil observations Selection Id
A B C D E
1 1 2 2 3
Total number
8007 8007 7441 7441 5885
Network parameters Number of Input nodes
Hidden neurons
Iterations
16 10 16 10 10
30 50 90 50 40
400 1000 500 900 100
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
8
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
Table 5 Validation results for a selection of ANN models (model achieving the best performance in bold).
5. Conclusion
Validation results Models
OA*
PA* Pos
A B C D E
The variation between the two last extent values can partly be explained by the rather different approaches used to create the maps.
UA*
SSE*
Neg
Pos
Neg
%
%
%
%
%
60 64 62 67 70
76 78 78 80 76
44 50 46 52 66
56 60 61 64 59
66 71 66 71 81
Training
Validation
0.2354 0.2355 0.2374 0.2375 0.2231
0.2330 0.2336 0.2341 0.2331 0.2196
*OA: Overall Accuracy; PA: Producer Accuracy; UA: User Accuracy; SSE: Sum of Squared Errors.
from class 1 (i.e. areas including more than 50% of potential a.s. soil profiles) covered 23% of the study area (c. 1500 km2) while the predictive map displayed about 46% (c. 3000 km2) as high probability areas for potential a.s. soil occurrence (i.e. probability value between 0.5 and 1). Considering the extent, our predictive map appears to over-estimate potential a.s. soil occurrences in comparison with the classical map.
The present study constitutes a first evaluation of an Artificial Neural Network (ANN) technique for potential acid sulfate (a.s.) soil mapping in Denmark. In particular, a Radial Basis Function (RBF)-based ANN method was assessed for the prediction of potential a.s. soil occurrence in the wetlands of the Jutland peninsula (c. 6500 km2). This technique demonstrated promising predictive classification abilities on a relatively large area. The highest overall accuracy of 70% was reached for an RBFbased ANN model based on 5885 soil observations divided in two categories: potential a.s. and non-potential a.s. soil occurrences. While the model correctly classified most of the validation points corresponding to potential a.s. soil occurrences, it encountered difficulties to assign correctly non-potential a.s. soil sites. These results suggest that updating the soil observation classification might improve the performance of our models. Moreover, the RBF-based ANN technique strongly benefited from the use of different environmental variables: legacy, categorical datasets (e.g. geology and landscape maps), as well as LiDAR-based derivatives (e.g. elevation and slope gradient). For the whole study area,
Fig. 5. Probability map for potential a.s. soil occurrence generated by ANN modeling.
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
9
Fig. 6. Comparison between ANN probability map and classical map for potential a.s. soil occurrence in two small areas located in western (on the left) and northern Jutland (on the right).
the extent of potential a.s. soil areas was estimated in the order of 3000 km2, which is larger than the previous estimation based on conventional mapping, suggesting that our model might have over-estimated potential a.s. soil occurrences in the study area. In the future, the produced probability map for potential a.s. soil occurrence could be compared with other maps generated using other spatial modeling techniques. Assessing the prediction uncertainty would also constitute a significant improvement to further evaluate the reliability of our models. Acknowledgments The authors thankfully acknowledge the financial support from K.H. Renlunds Stiftelsen and Ingrid, Margit och Henrik Höijers Donationsfond II. References Adhikari, K., Minasny, B., Greve, M.B., Greve, M.H., 2014. Constructing a soil class map of Denmark based on the FAO legend using digital techniques. Geoderma 214–215, 101–113. http://dx.doi.org/10.1016/j.geoderma.2013.09.023. Andriesse, W., van Meensvoort, M.E.F., 2006. Acid sulfate soils: distribution and extent. Encyclopedia of Soil Science, Lal R. (Ed.), Vol. Vol. 1, 2nd ed., CRC Press, Boca Raton, Florida, 14–19. Behrens, T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.-D., Goldshmitt, M., 2005. Digital soil mapping using artificial neural networks. J. Plant Nutr. Soil Sci. 168, 1–13. http://dx.doi.org/10.1002/jpln.200421414. Bendix, J., 2004. Gelandeklimatologie. Gebruder Borntraeger, Berlin. Bergmeir, C., Benítez, J.M., 2012. Neural networks in R using the Stuttgart neural network simulator: RSNNS. J. Stat. Softw. 46 (7), 1–26. Beucher, A., Österholm, P., Martinkauppi, A., Edén, P., Fröjdö, S., 2013. Artificial neural network for acid sulfate soil mapping: application to the Sirppujoki River catchment area, southwestern Finland. J. Geochem. Explor. 125, 46–55. http://dx.doi.org/10. 1016/j.gexplo.2012.11.002. Beucher, A., Fröjdö, S., Österholm, P., Martinkauppi, A., Edén, P., 2014. Fuzzy logic for acid sulfate soil mapping: application to the southern part of the Finnish coastal areas. Geoderma 226–227, 21–30. http://dx.doi.org/10.1016/j.geoderma.2014.03.004.
Beucher, A., Siemssen, R., Fröjdö, S., Österholm, P., Martinkauppi, A., Edén, P., 2015. Artificial neural network for mapping and characterization of acid sulfate soil: application to Sirppujoki River catchment, southwestern Finland. Geoderma 247–248, 38–50. http://dx.doi.org/10.1016/j.geoderma.2014.11.031. Bonham-Carter, G.F., 1994. Geographic Information Systems for Geoscientists — Modeling with GIS. Computer Methods in the Geosciences Vol. 13. Pergamon, Oxford (398 pp.). Boruvka, L., Penizek, V., 2007. A test of an artificial neural network allocation procedure using the Czech soil survey of agricultural land data. In: Lagacherie, P., McBratney, A.B., Voltz, M. (Eds.), Digital Soil Mapping: An Introductory PerspectiveDevelopments in Soil Science Vol. 31. Elsevier, Amsterdam, pp. 415–424. Bou Kheir, R., Greve, M.H., Bøcher, P.K., Greve, M.B., Larsen, R., McCloy, K., 2010. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models: the case study of Denmark. J. Environ. Manag. 91 (5), 1150–1160. http://dx. doi.org/10.1016/j.jenvman.2010.01.001. Böhner, J., Antonić, O., 2009. Land surface parameters specific to topo-climatology. In: Hengl, T., Reuter, H.I. (Eds.), Geomorphometry: Concepts, Software, Applications. Elsevier, New York, pp. 195–226. Böhner, J., Köthe, R., Conrad, O., Gross, J., Ringeler, A., Selige, T., 2002. Soil regionalization by means of terrain analysis and process parameterization. In: Micheli, E., Nachtergaele, F., Montanarella, L. (Eds.), Soil Classification 2001. Eur. Soil Bur., Res. Rep. No. 7, EUR 20398 EN, Luxembourg, pp. 213–222. Cavazzi, S., Corstanje, R., Mayr, T., Hannam, J., Fealy, R., 2013. Are fine resolution digital elevation models always the best choice in digital soil mapping? Geoderma 195–196, 111–121. http://dx.doi.org/10.1016/j.geoderma.2012.11.020. Chagas, C.d.S., Vieira, C.A.O., Filho, E.I.F., 2013. Comparison between artificial neural networks and maximum likelihood classification in digital soil mapping. Rev. Bras. Ciênc. Solo 37 (2), 339–351. http://dx.doi.org/10.1590/S0100-06832013000200005. Chang, D.H., Islam, S., 2000. Estimation of soil physical properties using remote sensing and artificial neural network. Remote Sens. Environ. 74, 534–544. http://dx.doi.org/ 10.1109/tgrs.2003.809935. Danmarks Geologiske Undersøgelse, 1978. Foreløbige geologogiske kort (1:25,000) over Danmark. DGU Serie A(3). Danmarks Geologiske Undersøgelse, Denmark. Danmarks Meteorologiske Institut, 1998. Danmarks Klima 1997. Danmarks Meteorologiske Institut, Copenhagen. Danner, A., Mølhave, T., Yi, K., Agarwal, P.K., Arge, L., Mitasova, H., 2007. TerraStream: from elevation data to watershed hierarchies. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems, pp. 212–219 http://dx.doi.org/10.1145/1341012.1341049. ESRI, 2014. ArcGIS Desktop: Release 10.3. Environmental Systems Research Institute, Redlands, CA. Freeman, T.G., 1991. Calculating catchment-area with divergent flow based on a regular grid. Comput. Geosci. 17 (3), 413–422. http://dx.doi.org/10.1016/00983004(91)90048-I.
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001
10
A. Beucher et al. / Geoderma xxx (2016) xxx–xxx
Gallant, J.C., Dowling, T.I., 2003. A multi-resolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 39 (12), 1347–1359. http://dx.doi. org/10.1029/2002WR001426. Gershenfeld, N., 1999. The Nature of Mathematical Modelling. Cambridge University Press, Cambridge, p. 356. Greve, M.H., Christensen, O.F., Greve, M.B., Bou Kheir, R., 2014. Change in peat coverage in Danish cultivated soils during the past 35 years. Soil Sci. 179 (5), 250–257. http://dx. doi.org/10.1097/SS.0000000000000066. Huang, J., Nhan, T., Wong, V.N.L., Johnston, S.G., Murray, L.R., Triantafilis, J., 2014a. Digital soil mapping of a coastal acid sulfate soil landscape. Soil Res. 52, 327–339. http://dx. doi.org/10.1071/SR13314. Huang, J., Wong, V.N.L., Triantafilis, J., 2014b. Mapping soil salinity and pH across an estuarine and alluvial plain using electromagnetic and digital elevation model data. Soil Use Manag. 30 (3), 349–402. http://dx.doi.org/10.1111/sum.12122. IUSS Working Group WRB, 2006. World reference base for soil resources 2006. World Soil Resources Reports No. 103. FAO, Rome. Jakobsen, B.H., 1988. Accumulation of pyrite and Fe-rich carbonate and phosphate minerals in a lowland moor area. J. Soil Sci. 39, 447–455. Jenny, H., 1941. Factors of Soil Formation: A System of Quantitative Pedology. McGrawHill, New York. Lentzsch, P., Wieland, R., Wirth, S., 2005. Application of multiple regression and neural network approaches for landscape-scale assessment of soil microbial biomass. Soil Biol. Biochem. 37, 1577–1580. http://dx.doi.org/10.1016/j.soilbio.2005.01.017. McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–52. http://dx.doi.org/10.1016/S0016-7061(03)00223-4. Madsen, H.B., Jensen, N.H., Jakobsen, B.H., Platou, S.W., 1985. A method for identification and mapping potentially acid sulfate soils in Jutland, Denmark. Catena 12, 363–371. Madsen, H.B., Jensen, N.H., 1988. Potentially acid sulfate soils in relation to landforms and geology. Catena 15, 137–145. Madsen, H.B., Nørr, A.H., Holst, K.A., 1992. The Danish soil classification. Atlas Over Denmark I Vol. 3. The Royal Danish Geographical Society, Copenhagen. Minasny, B., McBratney, A.B., 2002. The neuro-m method for fitting neural network parametric pedotransfer functions. Soil Sci. Soc. Am. J. 66, 352–361. http://dx.doi.org/10. 2136/sssaj2002.0352. Nykänen, V., 2008. Radial basis functional link nets used as a prospectivity mapping tool for orogenic gold deposits within the Central Lapland Greenstone Belt, Northern Fennoscandian shield. Nat. Resour. Res. 17 (1), 29–48. http://dx.doi.org/10.1007/ s11053-008-9062-0. Nystrand, M., Österholm, P., 2013. Metal species in a boreal river system affected by acid sulfate soils. J. Appl. Geochem. 31, 133–141. http://dx.doi.org/10.1016/j.apgeochem. 2012.12.015. Österholm, P., Åström, M., 2004. Quantification of current and future leaching of sulfur and metals from boreal acid sulfate soils, western Finland. Aust. J. Soil Res. 42, 547–551.
Porwal, A., Carranza, E.J.M., Hale, M., 2003. Artificial neural networks for mineral potential mapping; a case study from Aravalli Province, Western India. Nat. Resour. Res. 12 (3), 155–171. http://dx.doi.org/10.1023/A:1025171803637. Roos, M., Åström, M., 2005. Hydrochemistry of rivers in an acid sulphate soil hotspot area in western Finland. Agric. Food Sci. 14, 24–33. http://dx.doi.org/10.2137/ 1459606054224075. SAGA GIS, S. System for automated geoscientific analyses http://www.saga-gis.org. Stjernholm, M., Kjeldgaard, A., 2004. CORINE Landcover Update in Denmark — Final Report. National Environment Research Institute (NERI), Denmark. Silveira, C.T., Oka-Fiori, C., Santos, L.J.S., Sirtoli, A.E., Silva, C.R., Botelho, M.F., 2013. Soil prediction using artificial neural networks and topographic attributes. Geoderma 195– 196, 165–172. http://dx.doi.org/10.1016/j.geoderma.2012.11.016. Sohlenius, G., Öborn, I., 2004. Geochemistry geochemistry and partitioning of trace metals in acid sulphate soils in Sweden and Finland before and after sulphide oxidation. Geoderma 122, 167–175. http://dx.doi.org/10.1016/j.geoderma.2004.01.006. Suppala, I., Lintinen, P., Vanhala, H., 2005. Geophysical characterising of sulphide rich finegrained sediments in Seinäjoki area, western Finland. Geol. Surv. Finland Spec. Pap. 38, 61–71. Toivonen, J., Österholm, P., Fröjdö, S., 2013. Hydrological processes behind annual and decadal-scale variations in the water quality of runoff in Finnish catchments with acid sulfate soils. J. Hydrol. 487, 60–69. http://dx.doi.org/10.1016/j.jhydrol.2013.02.034. Urbańska, E., Hulisz, P., Bednarek, R., 2012. Effects of sulphide oxidation on selected soil properties. J. Elem. 17 (3), 505–515. Vanhala, H., Suppala, I., Lintinen, P., 2004. Integrated geophysical study of acid sulphate soil area near Seinäjoki, Southern Finland. Sharing the Earch: EAGE 66th Conference & Exhibition, Paris, France, 7–10 June 2004: Extended Abstracts. EAGE, Houten (4 pp. Optical disc (CD-ROM)). Viscarra Rossel, R.A., Behrens, T., 2010. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 158, 46–54. http://dx.doi.org/10.1016/j. geoderma.2009.12.025. Yli-Halla, M., 1997. Classification of acid sulphate soils of Finland according to soil taxonomy and the FAO/UNESCO legend. Agric. Food Sci. 6, 247–258. http://dx.doi.org/10. 1111/j.1475-2743.1999.tb00065.x. Zell, A., Mamier, G., Vogt, M., Mache, N., Hübner, R., Döring, S., Hermann, K.-U., Soyez, T., Schmalzl, M., Sommer, T., Hatzigeorgiou, A., Posselt, D., Schreiner, T., Kett, B., Clemente, G., Wieland, J., 1998. SNNS Stuttgart Neural Network Simulator User Manual, Version 4.2. IPVR, University of Stuttgart and WSI, University of Tübingenhttp:// www.ra.cs.uni-tuebingen.de/SNNS/. Zhu, A.X., 2000. Mapping soil landscape as spatial continua: the neural network approach. Water Resour. Res. 36, 663–677. http://dx.doi.org/10.1029/1999WR900315.
Please cite this article as: Beucher, A., et al., Mapping potential acid sulfate soils in Denmark using legacy data and LiDAR-based derivatives, Geoderma (2016), http://dx.doi.org/10.1016/j.geoderma.2016.06.001