Computers and Electronics in Agriculture 169 (2020) 105172
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag
Extended model prediction of high-resolution soil organic matter over a large area using limited number of field samples
T
Zhengyong Zhaoa, Qi Yanga, , Dongxiao Suna, Xiaogang Dingb, Fan-Rui Mengc ⁎
a
Guangxi Key Laboratory of Forest Ecology and Conservation, College of Forestry, Guangxi University, Nanning 530004, China Guangdong Academy of Forestry, Guangzhou, Guangdong 510520, China c Faculty of Forestry and Environmental Management, University of New Brunswick, Fredericton E3B 5A3, Canada b
ARTICLE INFO
ABSTRACT
Keywords: Soil organic matter Artificial neural network Limited field sample Extended model Slope Linear equation
Detailed soil organic matter (SOM) spatial distribution maps are essential for soil management and forestry operations. However, mapping of spatial SOM distribution over a large area is a difficult challenge, especially in regions where field samples are difficult to obtain. The objective of this research was to develop a two-stage approach to map SOM content with 10 m-resolution in Yunfu, South China with an area of 7785 km2. In the first stage, using 10-fold cross-validation 511 artificial neural network (ANN) models were built to map SOM content based on 318 field samples from three of five sub-areas of Yunfu (ANN model area). Results indicated that the optimal ANN model with six DEM-derived variables as model inputs, i.e. ANN6, had a good model performance in ANN model area, 5.6 g/kg of root mean squared error (RMSE), 0.81 of R2, and 84.1% of relative overall accuracy (ROA) ± 10%, and the best generalization capability in the rest two of five sub-areas of Yunfu (extended model area), with 7.7 g/kg of RMSE, 0.58 of R2, and 60.7% of ROA ± 10%. In the second stage, using the reverse k-fold cross-validation extended models were developed to adapt ANN6-produced SOM content to fit field samples in the extended model areas. Results indicated the optimal extended model only required 20% of 386 field samples (5-fold) to build a stable and significant linear relationship between ANN6-produced SOM content and measured SOM content from the extended model area, and improved model accuracy with 9–21% of RMSE, 28–29% of R2, and 6–21% of ROA ± 10%. Thus, the two-stage method is a viable way to generate SOM content over a large area with limited number of field samples.
1. Introduction Forest soil organic matter (SOM) is a complex mixture soil component that has major impacts on many soil properties, such as nutrients, pH, cation and anion exchange capacity, and soil structure (Zech et al., 1997; Mao et al., 2014). Amount of SOM is not only considered as a significant indicator of soil quality that sustains the forest ecosystem services such as soil fertility, food production and biogeochemical cycles (Wang and Wang, 2007; Bobrovsky et al., 2010; Barré et al., 2017; Pellitier and Zak, 2018), but also plays a key role in the carbon cycling of terrestrial ecosystems and the mitigation of global climate change (Li et al., 2018b; Mazza et al., 2019). The detailed knowledge about the spatial distribution and storage of SOM is important for studying trend of soil degradation, impacts of climate change and making forest management plans (Smith, 2004). However, obtaining high-resolution SOM distribution maps over a large area is a global challenge and especially for regions where accessing to soil data is restricted
⁎
(Wiesmeier et al., 2011). The traditional methods of acquiring spatial distribution of SOM need a large number of systematical or random samples by field surveys and these SOM data at sampling points are often interpolated to obtain SOM maps. Various interpolation methods have been used to produce SOM maps, such as kriging, co-kriging and regression kriging, with gradually improved accuracy rates (Schloeder et al., 2001; Liu et al., 2008; Wu et al., 2009). However, the accuracies of SOM maps based on different interpolation method are dependent on the density and size of sampling sites (Heuvelink and Bierkens, 1992; McBratney et al., 2003; Zhao et al., 2009; Dai et al., 2014), the distribution of original data points (McBratney et al., 2000), and the quality of the data that affected by the field experience of soil surveyors (Bie and Beckett, 1971; Zhao et al., 2013). As an alternative, various empirical models derived with different statistical methods have been developed to correlate the content of SOM with the predictors, such as digital elevation model (DEM) data and remote sensing data (Bogunovic et al., 2018; Olga
Corresponding author. E-mail address:
[email protected] (Q. Yang).
https://doi.org/10.1016/j.compag.2019.105172 Received 15 October 2019; Received in revised form 11 December 2019; Accepted 20 December 2019 0168-1699/ © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/).
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Fig. 1. Digital elevation model (DEM) data for Yunfu and field soil samples, ANN model and extended model area.
by bedrock and surficial geology at a landscape or regional level (Brady and Weil, 2008). At a local level, soil properties are often modified by hydrological processes associated with local topography (Guo et al., 2013; Aghajani et al., 2015; Bogunovic et al., 2018). This is because water movements along topographical gradients preferentially transfer soil components such as SOM, nutrient elements, and fine soil particles, which in turn affect soil properties (Arp, 2005). Researches have proved that average soil properties are related to geological formations and soil parent materials that could be captured by existing coarse-resolution soil maps (Zhao et al., 2009) and the detailed spatial distribution of soil properties can be modeled with a high-resolution DEM in a local level (Beven and Kirkby, 1979; Martz and Garbrecht, 1992; Ren and Liu, 2000; Zhao et al., 2013). Thus, it is possible to extend the high-resolution DEM-based SOM models to a large area using existing coarseresolution soil maps and limited number of field samples. The objective of this study was to explore the method to produce high-resolution SOM maps over a large area with limited field samples using a two-stage approach. Specific objectives included: (1) building and optimizing a local SOM model at the first stage, i.e. a DEM-based ANN model, with sufficient numbers of field samples that capture the impacts of local topography on SOM spatial distribution at high resolution, (2) building an extended model at the second stage based on existing coarse-resolution soil maps, limited field samples and integrated with the local SOM model to produce high resolution SOM maps over a large area, and (3) evaluating the accuracy of SOM maps predicted by the local ANN model and extended models.
et al., 2018). These empirical models were based on the assumption that SOM content are closely related with the predictor variables that involved in SOM accumulation, and these models were proven to be effective (Bogunovic et al., 2018; Olga et al., 2018). However, the relationships between SOM content and the predictor variables are complicated, strongly non-linear, and sometimes unknown (Lark, 1999). In addition, many predictor variables are often correlated with each other and create statistic challenge of non-linearity and multicollinearity problems (Alvarez et al., 2011; Gautam et al., 2011; Poggio et al., 2016). For example, multiple regressions are based on the hypothesis of linearity and neglect spatial correlation of soil properties (Guo et al., 2013). Geostatistical methods require the assumption of second-order stationarity that needs all moments to be stable (Webster and Oliver, 2007; Guo et al., 2013), and these hypotheses are often not satisfied in areas with complex terrain (Webster, 1985; Qiu et al., 2010; Zhu et al., 2010). In recent years, artificial intelligence models have been increasingly used to implement complex mapping relations between multi-source input variables and SOM content without hypotheses, such as artificial neural network (ANN) model (Dai et al., 2014; Aghajani et al., 2015), random forest model (Grimm et al., 2008; Qi et al., 2017), and support vector regression model (Ballabio, 2009). With all these progresses in soil mapping, it is difficult to produce high-resolution SOM maps over a large area using the existing methods (Thompson et al., 2006). This is because existing models, even artificial intelligence models, are often constructed and calibrated with field samples collected from a small area, which only reflect the relationship between the local environment and SOM conditions in the small field. Thus, these models may have high accuracies in areas with similar environmental conditions, but poor performances in areas with notable deviation of biophysical conditions compared with areas where data were used to calibrate the original model (McBratney et al., 2003). Meanwhile, it needs to point out that collecting detailed field data over a large area is difficult, expensive and time-consuming as a result of spatially variable soil properties across different landscapes (Lagacherie, 2008). As a result, building and calibrating new models for producing high-resolution SOM maps for each sub-area over a large area is also difficult. Due to the fact that bedrock and parent material deposits are the main sources of soil particles and nutrient elements, it has long been recognized that soil properties, including SOM, can greatly be affected
2. Materials and methods 2.1. Study area The studied area is the Yunfu city, Guangdong province, China (lat. 22°22′–23°19′N, long. 111°03′-112°31′W). It has a total area of 7,785 km2, with forest areas of 4,915 km2 (Fig. 1). It is in the subtropical monsoon zone with both high temperature and plenty of precipitation in the same months (from May to October), in which the annual average temperature, rainfall and duration of sunlight are 22.4 °C, 1670 mm and 1684.6 h, respectively (Li et al., 2018; Tang et al., 2019). The elevation ranges from 1.0 to 1320.0 m. Mountains and hilly areas constitute the majority of the landscape (Guangdong 2
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Institute of Geography, 1962). There are different forest types includes the natural-secondary evergreen broad-leaved forest, coniferous forest, and mixed forest, in which the main tree species are Cunninghamia lanceolate, Acacia spp., Eucalyptuspp. and Pinus massoniana (Guangdong Forestry Survey and Planning Institute, 2014). The major soil type is Udults, which covers 86% of forest area in Yunfu (Guangdong Soil Survey Team, 1993). As a common soil type in tropics and subtropics of China, Udults is rich in iron and aluminum as a result of desilication (National Soil Survey Office, 1998; Brady and Weil, 2008). The contents of nitrogen, phosphorus, potassium, and SOM are very low as a result of strong physical and chemical weathering of soil minerals, quick decomposition of organic matter and substantial leaching of nutrients in the field (Guangdong Soil Survey Team, 1993).
content ranged from 2% to 72%, sand from 10% to 95%, and silt from 3% to 65%. 2.3. Predictor variables for modelling Besides existing coarse-resolution soil maps, a total of 15 predictor variables were used for modelling including nine DEM-derived variables and six kinds of field data in the study (Table 2). The DEM data, being obtained from Guangdong Academy of Forestry Sciences, was derived from stereo images of Cartosat-1 (IRS-P5) with 12.5 m-resolution (Zhang and Zhang, 2010) and resampled to 10 m-resolution to meet the needs of map-producing. The spatial analyst extension and developed Forest Hydrology Tools of ArcGIS (ESRI 1999-2013; Meng et al., 2006) was used to derive four topographic variables including slope, aspect, topographic position index (TPI), and potential solar radiation (PSR), five hydrological variables including soil terrain factor (STF), sediment delivery ratio (SDR), depth to water (DTW), flow length, and flow direction. The TPI refers to the relative topographic position of the central point as the difference between the elevation at this point and the mean elevation within a predetermined neighbourhood (Gallant and Wilson, 2000; De Reu et al., 2013). The PSR is the total solar radiation reaching the earth surface based on solar angle, slope, slope aspect and three atmospheric factors that affects the diffuse and direction of solar radiations (Meng et al., 2006). The STF is a modified version of the hydrological similarity index (Ambroise et al., 1996). The SDR is the ratio of the sediment transported to the outlet and total erosion in watershed area (Ferro and Minacapilli, 1995). The DTW is defined as the elevation differences between the land and the nearest water surfaces (Meng et al., 1997; Murphy et al., 2007). Flow length is the length of the maximum ground distance along the flow direction projected to the horizon (Akhtar et al., 2009) and flow direction is defined as the direction of flow and is one of the keys to obtaining surface hydrological features (Greenlee, 1987; Jenson and Domingue, 1988). Besides field clay and sand contents, forest cover data collected in the field were also used in this study. Four schemes of forest type classification from simplicity to complexity were used in this study: the first scheme named FT1 contained two forest types including plantation and natural forest; the combination of broad-leaved forests, coniferous forests and the mixed forest in total of three forest types were considered as the second classified schemes named FT2; and the third and fourth forest classification named FT3 and FT4 contained forest types of 8 and 27, respectively, in a more detailed criterion (Table 1).
2.2. Field sampling and the coarse-resolution soil map A coarse-resolution soil map used in this study was obtained from the Atlas of Guangdong soil with a 1: 2,800,000 scale compiled by Guangdong soil survey team (Guangdong Soil Survey Team, 1993), then transferred to a digital map within a geographic information system. The content of coarse-resolution soil organic matter (CSOM) in the map was delineated at six levels (Level 1: greater than40 g/kg; Level 2: 30–40 g/kg; Level 3: 20–30 g/kg; Level 4: 10–20 g/kg; Level 5: 6–10 g/kg; Level 6: < 6 g/kg; National soil survey office, 1998). In the study area, the proportion of areas from high to low are Level 3, Level 4 and Level 2, covered 41%, 25% and 20% of the total areas, respectively. We used the middle value of each level as the specific content of SOM, i.e., the values of 45.0, 35.0, 25.0, 15.0, 8.0, and 3.0 g/kg in order for level 1 to level 6, with an average value of 22.6 g/kg for SOM content in the study area. Coarse-resolution soil texture (CST) in the map was delineated at nine classes (loose sand, tight sand, sandy loam; light loam; medium loam; heavy loam; light clay; medium clay and heavy clay) based on Katschinski’s texture schemes (Katschinski, 1956; Rousseva, 1997), and the first three levels from high to low areas are medium loam, light loam and heavy loam, which covered 59%, 18% and 9% of the total study areas, respectively. Coarse soil classes (CSC) in the map was delineated at eight classes based on China Soil Classification System (The National Standards of China, 2009) and the first three classes are Lateritic red earths (Typic Kanhapludults), Red earths (Typic Hapludults), and Purplish soils (Typic Eutrochrepts), which covered 64%, 22% and 8% of the total study areas, respectively. Field data for this study were obtained from 704 field samples of forest plots in Yunfu established by Guangdong Academy of Forestry Sciences as part of their investigation and evaluation of forest soil nutrient survey project started from 2015 (Li et al., 2018a). Stratified random sampling method was used to allocate 613 plots in the whole area of Yunfu and the stratification was based on the coarse-resolution SOM map so that there were enough samples within the area of each SOM level (Fig. 1). Additional 91 plots were allocated in one sub-areas of Yunfu based on topographic characteristics and forest cover types that may have impacts on SOM accumulations. The special samples were derived from 11 groups that represented a gradual change of SOM conditions along a profile of slope within various forest types. Soil sample was collected along a soil profile from each forest plot, but only the top-layer of soil (0–20 cm) was assumed to be concerned with the condition of SOM in this study due to the fact that SOM is mainly distributed in the top-layer, which is most directly influenced by climate, vegetation and land use and in which biological activities and the processes of organic matter accumulation or decomposition are the most intense (Maarten et al., 2011). Measured by potassium dichromate-titration method (SAC, 1988: GB 9834-88), the contents of SOM in the top-layer soil ranged from 0.4 g/kg to 114.2 g/kg and had an average value of 24.7 g/kg. Soil texture, percentages of clay (< 0.002 mm), silt (0.002–0.02 mm), and sand (0.02–2 mm) as defined by the international system of soil particle size grades (ISSS, 1929), was measured by densimeter method (FIS: LY/T 1225-1999). Measured clay
2.4. The local SOM model The objective of the first modelling stage was to build a local SOM model to capture the impacts of topographic and vegetation impacts on SOM contents. The optimal model should not only have a good model performance but also have a relatively strong generalization capability. 2.4.1. Artificial neural network model The back-propagation ANNs were used in this stage because they can accommodate nonlinearity when limited discontinuous points are available between the input and output data (Li, 1998). The ANN model has a three-layer structure: an input layer, an output layer, and a hidden layer. The input layer contains independent variables (input layer nodes) for model predictions and the output layer consists of predicted dependent variables (output layer nodes). The hidden layer, connecting the input layer and output layer, determines the complexity of the model and realizes the nonlinear mapping. The ANN model was trained with a back-propagation technique, which adjusted the weight and bias values along a negative gradient descent to minimize the mean squared error (MSE) between the network outputs (predicted values) and the targeted values (reference values) (Sigillito and Hutton, 1990). An early stopping method could be used to quickly confirm the fittest model 3
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Table 1 Predictor variables used on the study. Variable Existing coarse soil map Coarse SOM Coarse soil classes Coarse soil texture DEM-derived variables Topographic position index Slope Aspect Potential solar radiation Soil terrain factor Sediment delivery ratio Vertical slope position Flow direction Flow length Field data Clay content Sand content Forest type1 Forest type2 Forest type3 Forest type4
abbr.
Description
Range/ classes
Reference
CSOM CSC CST
The content of SOM in a 1: 2,800,000 scale soil map Soil type based on China Soil Classification System in a 1: 2,800,000 map Soil texture based on Katschinski’s texture schemes in a 1: 2,800,000 map
6 levels 9 classes 9 classes
Guangdong Soil Survey Team, 1993
TPI slope aspect PSR SFT SDR DTW FD FL
The relative topographic position Slope gradient (degrees) Direction of the steepest slope from the north Potential incoming solar radiation calculated for a single year (kWh.m−2) Modified hydrological similarity index Efficiency of sediment transport in the watershed (%) Elevation difference between the upland and water surfaces (m) The steepest descent direction of each pixel The distance from any point in the river basin to the basin outlet (m)
6 classes 0.0–68.0 10 classes 484–1888 3.6–34.1 0–100 0.0–357.5 8 classes 0.0–2674.1
De Reu et al. 2013
clay sand FT1 FT 2 FT 2
Measured clay content for 0–20 cm soil depth (%) Measured sand content for 0–20 cm soil depth (%) Forest cover data divided into plantation and natural forest Forest cover data divided into broad-leaved, coniferous and the mix forest Forest cover data divided into evergreen coniferous/ broad-leaved, and mix forest, shrub, bamboo, and other forest Forest cover data divided based on dominant species (group)
FT 4
coefficients by avoiding over-fitting, which has the effect of decreasing prediction accuracy outside the training data and improving the generalizability of the ANN model (The MathWorks Inc., 1984-2012). In applying this method, a training dataset was used to calculate the gradient, update the network weights, and estimate the biases. Another dataset, the testing dataset, was used to monitor the training process to prevent over-fitting. If the training MSE decreased and the testing MSE increased, the training of the ANN model was stopped under the assumption that the fittest model coefficients had been obtained. In this study, the Levenberg-Marquardt algorithm (Fun, 1996) was used to train the ANN models. The output layer contained one node, SOM content. The number of hidden layer nodes changed from 5 to 40 based on previous studies (Zhao et al., 2009). The input layer was made up of a number of predictors (candidate inputs) as following.
ESRI 1999-2013 Meng et al. 2006 Ambroise et al. 1996 Ferro and Minacapilli (1995) Meng et al. 1997 Jenson and Domingue (1988) Akhtar et al. 2009 Li et al. 2018
2 classes 3 classes 6 classes 27 classes
2.4.2. Selection of model inputs The CSOM data and DEM-generated topo-hydrologic variables composed of the model inputs, because the average differences of SOM content at large scales could be captured by the CSOM map and SOM content at local scales could be modified by topographic and hydrological processes represented by DEM-derived variables. Each candidate combination of model inputs corresponded to an ANN model. The CSOM data was presented in each combination but DEM-derived topohydrologic variables gradually increased from single to multiple in various combinations. After that, six kinds of field data (Table 1), including clay and sand content, FT1-4, was used as the extra of model inputs, respectively whose objective was mainly to investigate the impacts of soil texture and forest cover on the SOM spatial distribution and test the potential of improving prediction accuracy of the SOM models.
Table 2 Selection of DEM-derived variables for predicting SOM by ANN models and model accuracy assessed using 10-fold cross-validation. No. model
Number of combinations
RMSE (g/kg)
R2*
ROA ± 10(%)
The best combined variables**
ANN1
C19 = 9
10.2
0.32
54.1
slope
7.7
0.61
60.0
slope, DTW
7.4
0.67
69.2
slope, DTW, SDR
7.0
0.71
73.9
slope, TPI, STF, FL
6.2
0.69
81.1
slope, DTW, TPI, STF, FL
5.7
0.81
84.3
slope, DTW, TPI, STF, FL, PSR
6.0
0.79
85.3
slope, DTW, TPI, STF, FD, SDR, PSR
5.0
0.85
86.2
slope, DTW, TPI, STF, FL, FD, PSR, SDR
5.9
0.69
86.1
slope, DTW, TPI, STF, FL, FD, PSR, SDR, aspect
5.3
0.83
91.5
slope, DTW, TPI, SDR, FD, aspect + sand
5.9
0.72
91.2
slope, DTW, STF, SDR, FD, PSR + clay
7.1
0.74
90.0
slope, DTW, TPI, STF, FL, PSR + FT1
8.0
0.67
92.1
slope, DTW, TPI, STF, FD, PSR + FT2
5.9
0.79
92.1
slope, DTW, TPI, STF, FD, SDR + FT3
6.9
0.78
88.7
slope, DTW, TPI, SDR, FD, PSR + FT4
ANN2 ANN3 ANN4 ANN5 ANN6 ANN7 ANN8 ANN9 ANN6-S ANN6-C ANN6-F1 ANN6-F2 ANN6-F3 ANN6-F4
C29 = 36
C39 = 84 C94 = 126 C59 = 126
C69 = 84 C79 = 36
C89 = 9 C99 = 1 C69 + sand = 84 C69 + clay = 84 C69 + FT1 = 84 C69 + FT2 = 84 C69 + FT3 = 84 C69 + FT4 = 84
* All coefficient of determination (R2) were significant at P < 0.01 based on an F-test. ** The best combined variables were selected among all of combinations with the same taken number from DEM-generated variables based on the values of ROA ± 10% of model. For example, C69 (=84) is the number of combinations of nine DEM-derived variables taken six at a time.C69 + sand/clay/FT1/FT2/FT3/FT4 means that sand, clay, FT1, FT2, FT3, or FT4 was presented in each of C69 combination. TPI: topographic position index, PSR: potential solar radiation, STF: soil terrain factor, SDR: sediment delivery ratio, DTW: depth to water, FL: flow length, FD, flow direction. 4
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
2.4.3. First-stage model calibration and validation A total of 318 (217 random samples + 91 special samples) field samples from three of five sub-areas of Yunfu (the ANN model area in Fig. 1; an area of 3,929 km2) were used to build the ANN models and evaluate model performance by a 10-fold cross-validation. In the 10fold cross-validation mode, the entire dataset (318 field samples) was divided into 10 equal subsets, a model was built with data in 9 subsets as calibration data and validated with the remaining subset as validation data; this process was repeated 10 times until all subsets were used as validation set. The divisions made of 318 soil profiles in this study were a stratified random selection after the calibration dataset was stratified according to CSOM level, which ensured that the count of one sub-dataset was proportional to the count of the dataset of the area of each CSOM level. Following the requirement of building ANN model with an early stopping method, the calibration data (9 subsets) were subdivided into a training dataset (80% of the calibration dataset) and a testing dataset (20% of the calibration dataset) within each of model building process (each “fold”), which was repeated 1000 times to obtain the optimal prediction model for this “fold”. A total of 386 field samples from the rest two of five sub-area of Yunfu (the extended model area in Fig. 1; an area of 3,857 km2) were used as an extra independent validation data to examine the performance of the ANN model outside of the building area.
and evaluate model performance by a reverse k-fold cross-validation. In the reverse k-fold cross-validation mode, the entire dataset (386 field samples) was divided into k equal subsets, a model was built with data in one subset as calibration data and validated with the remaining k-1 subset as validation data; this process was repeated k times until all subsets were used as calibration set. The divisions made of 386 field samples in this study also were a stratified random selection after the dataset was stratified according to each of designed dividing criteria. Within each of model building process (each “fold”), calibration data was used to estimate parameters a' and b' with the regression analysis tool (The MathWorks Inc., 1984-2012). Only linear equations that passed P < 0.05 based on F and t test for the significance of the Pearson product-moment correlation coefficients (Moore, 2006) within all of folds were kept. 2.6. Accuracy assessment and model selection Three criteria of model accuracy derived from 10-fold cross-validation were used to screen ANN models. Root mean squared error (RMSE) is a frequently used measure of accuracy that serves to compare predicting errors of different models for a particular dataset (Hyndman and Koehler, 2006). Coefficient of determination (R2) is the proportion of variation explained by each ANN model. Relative overall accuracy (ROA) is defined to assess the relative accuracy of model predictions, when model predictions were considered to be relatively accurate if the model predicted SOM content was within a certain percent of the measured SOM content (Zhao et al., 2009). For example, ROA ± 10% was calculated by counting all predictions within a ± 10% range of the measured SOM content. Successful model predictors have higher values of ROA and R2, and lower values of RMSE. Considering that more additions of model inputs (variables) could cause more uncertainties of model prediction due to their own precisions and accuracies when generated from DEM and the prediction uncertainties could be aggravated due to the variation of model inputs when the ANN models were used to the outside of the building area, the optimal ANN model inputs in this study should engage the maximum DEM-generated variables until an additional variable could not significantly improved the model performance. When screening an extended model, the number of linear equations that composed the extended model, cover area (%) calculated by dividing the sum of areas affected by all linear equations by the entire extended model area, and the value of k folds also were given consideration besides the above three criteria of prediction accuracy. The theoretical number of linear equations was determined by the number of dividing criteria’s levels/classes, such as six of CSOM levels, before keeping out the classes that were presented in less than 5% of the total field samples. The practical number of linear equations was determined by the number of equations that pass P < 0.05 based on F and t test for the significance. If the practical number was the same as the theoretical number, there certainly existed a significant linear relationship between ANN-produced SOM content and measured SOM content in all of field samples; if not, the relationship existed in parts of field samples. Less number of linear equations reduced extended model’s complexity and enhanced linear equation’s stability. High cover area referred to the high efficiency of extended models. Larger k folds that means that only one of k subsets was used to build the extended model indicated that less number of soil sample were required to build the extended model. Considering the optimal extended model should use the limited number of field samples, the extended model, which had relatively high accuracies, less number of linear equations, high cover area and larger k folds was considered to be a better model in this study. Building models, assessing accuracy and screening models at two stages were done by programming using MATLAB software.
2.5. The extended model The objective of the second modelling stage was to build the extended model to adapt SOM content produced by the local ANN model to capture the geological differences over a large area (the extended model area in Fig. 1). Besides holding a good model performance, the optimal extended model should use the least number of field samples, because that which made it possible to extend the local model-produced SOC over a large area using limited numbers of field samples. 2.5.1. The extended model Linear models were developed at this stage because a linear model required the least number of field samples. Each designed extended model is composed of a set of linear equations and each linear equation responded to a special soil property condition. Each linear equation was designed as Eq. (1). ' SOMextended = a' + b'SOMANN
(1)
whereSOMANN is the initial SOM content produced by the newly-build ' responded to a special soil property ANN model. a' , b'and SOMextended condition (sub-area) of the extended model area. a' is the shifting parameter, which described average difference of SOM content between the ANN model area and the extended model area. b' is the stretching parameter, which described the change rate of SOM content between ' the ANN model area and the extended model area. SOMextended is the adapted SOM content. In this study, various attributes of the coarseresolution soil map were used as the criteria to divide the entire extended model area into sub-areas. Designed dividing criteria included CSOM, CST, and CSC. Besides that, five field data, FT1-4 and measured soil texture classes (MST) that were determined based on measured clay and sand contents with the texture classifications of the International Society of Soil Science (ISSS, 1929), were also used as the dividing criteria to test the potential of building the extended model. Each dividing criteria with a set of soil property conditions responded to an extended model with a set of linear equations. 2.5.2. Second-stage model calibration and validation The 386 random field samples with a density of 0.1 samples per km2 from the extended model area were used to build the extended model
5
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Fig. 2. Prediction accuracy of 511 ANN models with various combinations of DEM-derived variables (a) and after re-ranking from high to low within combinations with the same taken number (b), of DEM-derived variables + clay/sand (c), and of DEM-derived variables + FT1/FT2/FT3/FT4 (d). Ci9 means the ANNs using the combinations of nine DEM-derived variables taken i at a time as inputs. C19 +C29 +C39 +C94 + C59 +C69 +C79 +C89 +C99 = 511.
3. Results and discussion
variation, following by DTW, whose explanation increased by 29%. Among the optimal models in Table 2, except the model with four DEMderived variables, slope and DTW were always involved in model inputs, following by TPI and STF. These results were expected since topography and associated with water movement heavily affected the spatial distribution of SOM (Brady and Weil, 2008), and were in keeping with other similar researches (Guo et al., 2009; Zhao et al., 2009). As shown in Fig. 2c, additional field sand and clay content improved model prediction accuracy and average ROA ± 10% values increased by 5.84% for clay and 5.71% for sand compared within 511 ANNs. The sand-based and clay-based optimal models in Table 2 generated almost identical ROA ± 10% (91.5% vs. 91.2%) but a little different RMSE (5.3 g/kg vs. 5.9 g/kg) and R2 (0.83 vs. 0.72). The sand-based optimal model produced slightly better results compared with the clay-based model, although the differences are insignificant. This is contrary to our expectations that clay content should play a more important role in affecting SOM because clay content heavily controls water and nutrient holding capacity and drainage conditions (Brady and Weil, 2008), and are correlated with SOM content (Six et al., 2002; Hong et al., 2019). This result could be caused by the fact that the topsoil (0–20 cm) in Yunfu is in the process of desertification due to a loss of clay from topsoil and sandy soils covered 61.6% of total area (Zhong et al., 1998). As shown in Fig. 2d, additional filed forest cover data could improve model performance and average ROA ± 10% values increased by 5.05%, 5.41%, 6.54%, and 3.95% for FT1-4, respectively, within all ANNs that used six DEM-derived variables together with field forest cover data as model inputs. We expected that the more detailed information the forest cover had, the better the improvement of model accuracy. However, the optimal model based on field forest cover data
3.1. The local SOM model Using 10-fold cross-validation, prediction accuracies of 511 ANNs with various combinations of DEM-derived variables were showed in Fig. 2a. The map indicated that model performance was significantly improved with the increase of DEM-derived variables in general. After re-ranking the accuracies from high to low within the combinations with the same taken number (Fig. 2b), the best combined variables were found and their accuracies were listed in Table 2. The results indicated that the model performance improvement was limited with increasing number of predictor variables when number of variables exceeded a certain limit. For example, gradually adding DEM-derived variables from one to six significantly improved the performance of ANN model, and ROA ± 10% values were increased from 54.1% up to 60.0%, 69.2%, 73.9%, 81.1%, and 84.3%, respectively. However, further adding the seventh-ninth DEM-derived variable only caused a little increase of ROA ± 5% (less than + 1.9%). RMSE and R2 improvements reflected the same trend. This was not a surprise because additional variables introduced were based the order of importance in relation to SOM accumulation. Based on the selection scheme of the optimal ANN model inputs that should engage the maximum DEMgenerated variables until the addition of a variable could not significantly improved model performance (see less than + 2% of ROA ± 10% in this study), the model in Table 2 engaging six DEMderived variables were the optimal model for predicting SOM content. Results of selected DEM-derived variables for the optimal models in Table 2 also indicated that slope was the most important variable for predicting SOM content, which can directly explain 32% of the total 6
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Fig. 3. Measured SOM (Y axis) vs. SOM predicted by ANN models (X axis) using the best combined variables from C19 (a), C69 (b), C89 (c), C69 +sand (d), C69 +clay (e), C69 + FT1 (f), C69 + FT2 (g), C69 + FT3 (h), C69 + FT4 (i) combinations. Ci9 means the ANNs using the combinations of nine DEM-derived variables taken i at a time as inputs.C69 + sand/clay/FT1/FT2/FT3/FT4 means that sand, clay, FT1, FT2, FT3, or FT4 was presented in the best C69 combination.
in Table 2 was the model using FT3 data not the model using FT4 data. These results are not unexpected given that the distribution of FT4 classes in field samples was uneven and it was relatively even for FT3. For FT4, there were a total of 27 classes but only 3 classes (Cunninghamia lanceolate, Acacia spp., Eucalyptuspp. and Pinus massoniana) that were presented in more than 30 field samples for each class and 14 classes that were presented in < 8 field samples for each class. For FT3, there were a total of 6 classes but only 2 classes that were presented in < 30 soil samples for each class. These observations indicated that the uneven distributions of forest classes, especially the classes with few data, made model prediction unreliable and unstable. Measured SOM vs. SOM predicted by ANNs using the best combined variables as model inputs were showed in Fig. 3. The observations were in consistence with the analysis of model performance discussed above (Fig. 2 and Table 2). The ANN model using the best combinations of six DEM-derived variables was demonstrably better than that of one DEMderived variable, but was equal to that of eight DEM-derived variables, and was a little weaker than that of additional field soil texture and forest cover data. However, when the ANN1-9 models were used to produce SOM maps in the rest two of five sub-areas of Yunfu (extended model area in Fig. 1) and accuracies were assessed by 386 field samples, there was a significantly decrease of model performance: the RMSE of extra validation accuracy increased by 1.5–24.0 g/kg compared with building accuracy, R2 declined by 0.23–0.42, and ROA ± 10% declined by 23.5–35.9%. ANN6 had the highest accuracies with 7.7 g/kg of RMSE, 0.58 of R2, and 60.7% of ROA ± 10%, which proved the optimal model in the first modelling stage was not the model with the highest building accuracies (ANN8) but the model that engaged the
maximum DEM-generated variables until an additional variable could not significantly improved model performance (ANN6). Even so, ANN6 showed a weak capability for generalization. This is also not surprise because ANN6 was built by local field samples (from ANN model area in Fig. 1), it could reflect relationships between the local environment (variables) and SOM content but not capture relationships between different environments (variables) and SOM content (McKay et al., 2010). 3.2. The extended model Using the reverse k-fold cross-validation, prediction accuracies of the extended models that took CSOM as dividing criteria to adapt ANN6-produced SOM content were showed in Fig. 4. When k = 1, which means all of field samples was used as calibration data to build an extended model, the theoretical number of linear equations equaled to the practical number of linear equations, which indicated that there existed a significant linear relationship between ANN6-produced SOM content and measured SOM content in all field samples. Moreover, prediction accuracies of the extended model were obviously better than that of the ANN6 (k = 0), with RMSE values decreased by 2.1 g/kg, R2 values increased by 0.16, and ROA ± 10% values increased by 14.5%. When k = 2–5, the significant linear relationship still existed in all field samples, but the gap between calibrating accuracy and validating accuracy became more and more wide. Calibrating accuracies substantially increased in general and validating accuracies substantially decreased. When k = 6–9, the changing trend of calibrating and validating accuracies still remained, but the significant linear relationship 7
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Fig. 5. The reverse 5-fold cross-validation-based accuracies of the extended models that took various data as dividing criteria to adapt content produced by the optimal ANN model with six-DEM-derived variables. : existing coarse-resolution soil organic matter, CSC: existing coarse-resolution soil classes, CST: existing coarse-resolution soil texture classes, FT1-3: surveyed forest type 1–3, MST: measured soil texture classes.
validation stood out as the best model in all of the extend models that took CSOM as dividing criteria. When the number of k folds was more than 5, field samples were too small to build a stable model. In turn, the required soil samples were too many, which would increase the cost of field sampling. Using various data as dividing criteria, prediction accuracies of the extended model built with the reverse 5-fold cross-validation were showed in Fig. 5. Comparing the extended model based on CSOM, the extended models based on CSC and CST data showed slightly lower accuracies, in terms of RMSE and R2, and less practical number of linear equations and cover area. This result demonstrated that the average SOM differences at large scales had been better captured by CSOM than by CSC or CST data. Also shown in Fig. 5, field FT1-3 data have been successfully used as dividing criteria to build the extended model. The results indicated that the more detailed information the field forest cover had, the better the extended model’s accuracy, but the worse the model reliability and stability. So much so that field FT4 data have failed to build the extended model due to the dispersed and uneven distribution of dominant species (group) classes in field samples, which were in keeping with the analysis of the newly-built ANN models. Comparing the extended model based on CST data, the extended model based on MST data showed a much better accuracy in Fig. 5. It should be the result of more accurate soil texture information presented in MST than that in CST. If only considering model accuracy, field forest cover and soil texture information showed a great potential to build the extended models, in which the model based on MST data had the best model performance. But considering the combination of model accuracy and the model reliability and stability (being advertised by the theoretical
Fig. 4. The reverse k-fold cross-validation-based accuracies of the extended models that took existing coarse-resolution soil organic matter (CSOM) data as dividing criteria to adapt SOM content produced by the optimal ANN model with six-DEM-derived variables. 0-fold means the accuracy of the optimal ANN model checked by 386 soil samples from the extended model area. The reverse k-fold means that the entire dataset (386 soil samples) was divided into k equal subsets, an extended model was built with data in one subsets as calibration data and validated with the remaining k-1 subset as validation data.
only existed in part of field samples because the theoretical number of linear equations was more than the practical number of linear equations. The results indicated that the bigger the k folds, the more unstable the extended model. This was reasonable because the total of calibration data decreased from 386 (k = 1), 192 (k = 2), 128 (k = 3), 96 (k = 4), 77 (k = 5), 64 (k = 6), 55 (k = 7), 48 (k = 8), to 43 (k = 9), which gradually made model prediction unreliable and unstable. Furthermore, calibration data with CSOM = lever 4 that was presented in the least field samples in four CSOM levels decreased from 72 (k = 1), 36 (k = 2), 24 (k = 3), 18 (k = 4), 14 (k = 5), 12 (k = 6), 10 (k = 7), 9 (k = 8), to 8 (k = 9), which gradually made it hard, even impossible to build the linear relationship between ANN6-produced SOM content and measured SOM content of field samples. Even so, the calibration and validation accuracies of all extended models were still better than that of ANN6-produced in Fig. 4, which illustrated that the developed extended model can improve the generalization capability of ANN6. Considering the combination of model accuracy, the theoretical and practical number of linear equations, cover area, especially required field samples (k-folds), the extend model with 5-fold cross8
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Table 3 The 5-fold results of the extended model that took existing coarse-resolution soil organic matter (CSOM) as dividing criteria to adapt content produced by the optimal ANN model with six-DEM -derived variables and associated model accuracies by the reverse 5-fold cross-validation. No. fold
Levels of CSOM
1st
2nd
3rd
4th
5th
Total
Total
Total
Total
Total
Calibration
Validation
Number of samples 5 4 3 2 5 4 3 2 5 4 3 2 5 4 3 2 5 4 3 2
Parameters of linear equation a 18 14 28 16 76 18 14 28 16 76 18 14 28 16 76 18 14 28 16 76 18 16 31 17 82
b 5.84 14.04 −0.09 3.28 – 3.21 3.44 1.00 3.93 – −0.63 −1.40 0.73 0.036 – −0.36 10.52 −0.80 5.81 – −3.26 0.66 2.70 −5.13 –
R2*
RMSE (g/ kg)
0.79 0.60 0.91 0.66 – 0.96 0.94 0.98 1.01 – 0.98 1.10 0.92 1.03 – 1.07 0.78 0.99 0.77 – 1.21 1.04 0.87 1.28 –
4.7 9.3 4.0 4.7 5.3 7.0 4.5 5.8 5.6 5.6 5.6 5.4 6.2 5.5 5.5 6.6 6.6 4.4 5.3 5.3 5.6 6.2 4.6 6.5 5.3
0.77 0.47 0.83 0.75 0.72 0.73 0.89 0.81 0.77 0.81 0.83 0.79 0.60 0.81 0.77 0.88 0.71 0.75 0.54 0.81 0.88 0.75 0.74 0.80 0.82
ROA ± 10% (%) 66.7 71.4 85.7 56.3 72.4 72.2 78.6 71.4 68.8 72.4 72.2 92.9 71.4 75.0 76.3 66.7 71.4 60.7 87.5 69.7 77.8 81.3 80.6 70.6 78.0
Number of samples
RMSE (g/ kg) 72 58 115 65 310 72 58 115 65 310 72 58 115 65 310 72 58 115 65 310 72 56 112 64 304
8.2 7.3 5.6 9.5 7.5 7.9 6.9 5.0 7.7 6.7 6.4 7.8 4.9 6.6 6.2 8.0 7.5 5.2 6.9 6.8 9.0 7.0 5.2 7.3 7.0
R2
0.70 0.79 0.78 0.74 0.75 0.70 0.61 0.76 0.70 0.71 0.76 0.66 0.81 0.66 0.74 0.62 0.67 0.75 0.71 0.69 0.67 0.67 0.79 0.67 0.71
ROA ± 10% (%) 61.1 56.9 69.6 56.9 62.6 62.5 72.4 76.5 60.0 69.0 65.3 75.9 76.5 70.8 72.6 75.0 53.4 77.4 63.1 69.4 69.4 85.7 74.1 70.3 74.3
* All underlined coefficient of determination (R2) were significant at 0.01 < P < 0.05 and the others values were significant at P < 0.01 based on an F-test.
and practical number of linear equations), the extend model based on CSOM was still the optimal model in all of the extend models with various data as dividing criteria. The 5-fold process of building the extended model based on CSOM data was listed in Table 3. Within each fold, approximate 20% of field samples (one subset) from the extended area were used as calibration data and the rest (4 subsets) as validation data, and four linear equations were built to respond to four CSOM levels where presented in the extended area. The table said that the theoretical number of linear equations was four based on CSOM levels and the practical number of linear equations also was four because, within 5 folds, all of four linear equations have passed P < 0.05 based on F and t test for the significance, which proved the significant linear relationship between ANN6-produced SOM content and measured SOM content in all of field samples. Also showed in Table 3, as a result of the stratified and random divisions made of field samples, parameters of four linear equations were various with folds and the variation led to the differences of model accuracy between folds. Even so, all extended models from five folds still improved model performance by comparing with ANN6 model: the RMSE declined from 7.7 to 6.2–7.5 g/kg, R2 increased from 0.58 to 0.69–0.75, and ROA ± 10% increased from 60.7 to 62.9–74.3% for validation accuracy, which indicated that the extended models improved the generalization capability of ANN6 model.
researches (Shu et al., 2005; Chen et al., 2010; Liu et al., 2019) in this study area and adjacent areas, SOM content are very low resulted from severe anthropogenic activities at slopes with low-gradients where soil erosion is light to moderate. At slopes with high-gradients, it is not difficult to accumulate SOM because of seldom human exploitations, the rich residues and frequent activities of soil microorganisms. Furthermore, forest soil and water conservation practices at steep slopes significantly reduce the loss of SOM with soil erosion at a watershed scale. In a word, slope steepness played an important role in the distribution of SOM content, which agreed to the analysis of model input selection discussed above. The range of predicted SOC values in Yunfu was 0.0–218.1 g/kg. The mean value of the predicted SOM map (23.9 ± 19.8 g/kg) was slightly higher than that of CSOM map (22.6 g/kg), a little lower than that of field samples from Yunfu (24.7 g/ kg), and kept within a range of 20–40 g/kg that SOM content was described in the Udults zone, South China (Guangdong Soil Census Office, 1993; China Soil Survey Office, 1998) Comparing with CSOM map (one part showed in Fig. 6e), the ANN6produced SOM map have more detailed SOM information, but the original boundary of the CSOM map was still visible in the produced map. This indicated that the CSOM data had a considerable influence on the distribution of predicted SOM. Based on the CSOC map, there were four SOC level: Level 2, 30–40 g/kg; Level 3, 20–30 g/kg; Level 4, 10–20 g/kg; Level 5, 6–10 g/kg. The range of predicted SOC values in Level 2 part was 0.0–188.7 g/kg with a mean of 29.5%, in Level 3 part was 0.0–171.6 g/kg with a mean of 22.5%, in Level 4 part was 0.0–188.6 g/kg with a mean of 21.7%, and in Level 5 part was 0.0–172.3 g/kg with a mean of 26.1%. These indicated that the values of SOM in the CSOM map were not accurate, but the boundary of the CSOM map, which divided the different SOM levels, affected the distribution of predicted SOM values. Nevertheless, this result supported the hypothesis that the average SOM differences at large scales had been captured by the CSOM map and that SOM at local scales were modified by topography and hydrological processes, which could be
3.3. Predicted SOM maps The optimal ANN model, ANN6, was used to produce 10 m-resolution SOM distribution map in Yunfu, China. Considering that ANN6 was composed by ten ANN models generated during 10-fold cross-validation, the mean of the ten SOM maps was used as the ANN6-produded SOM map (Fig. 6a). As shown, the SOM map was some similar to the slope map (one part showed in Fig. 6d). This map indicated the SOM was positively correlated with slope in general. Based on other 9
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
Fig. 6. Predicted SOM content maps using the optimal ANN model with six-DEM-derived variables (a), and the CSOM-based extended models with linear equations from 1st fold (b) and 3rd fold (c), compared with slope map (d) and CSOM map (e) in Yunfu, China. SOM: soil organc matter, CSOM: existing coarse-resolution soil organc matter. Others: waterbody and residential area. More than 50 g/kg were showed as 50 g/kg in these SOM content maps.
modeled with DEM-derived variables. Although the produced SOM map had greater variations than the CSOC map, the relatively low prediction accuracy (7.7 g/kg of RMSE, 0.58 of R2, and 60.8% of ROA ± 10%) indicated that the ANN6 model needed to be improved in extended area. The optimal extended model, the CSOM-based extended model built with the reverse 5-fold cross-validation, was used to improve ANN6produced SOM map based on model parameters from Table 3. Considering of limited field samples in actual situations where the extended model was applied, each fold of the CSOM-based extended model was used to produce the improved SOM map due to the fact that only one of the five field samples was needed for model building. As previously stated, the variation of linear equations that composed the extended model led to the differences of model prediction between folds. Comparing with the ANN6-produed map (Fig. 6a), the SOM content of the 1th fold-produced map (Fig. 6b) changed in the range of −14.1–61.1 g/ kg and a mean value of 6.1 ± 5.5 g/kg, and that of 3rd fold-produced
map (Fig. 6c) changed in the range of −4.6–16.5 g/kg and a mean value of 2.3 ± 1.5 g/kg. The observation showed that the two folds of the extended models only fine-tuned the ANN6-produced SOM map. But even still, the results of the two fine-tunings improved 9–21% of RMSE, 28–29% of R2, and 6–21% of ROA ± 10%. It proved that the extended models worked better in improving the generalizability of the optimal ANN model. 4. Conclusions A two-stage method was developed to map SOM content with 10 mresolution in Yunfu, China with an area of 7,785 km2, with relatively small numbers of field samples. In the first stage, the ANN models were built to predict SOM content based on nine DEM-derived variables using 10-fold cross-validation of 318 field samples (from the ANN model area of Yunfu). Different ANN model structures were tested and we found that the optimal ANN model, ANN6, had the best performance 10
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al.
with RMSE = 5.6 g/kg; R2 = 0.81 and ROA ± 10% = 84.1%. When the local ANN model was directly used to predict SOM content outside of the building area, model prediction accuracy dropped dramatically with R2 and ROA ± 10% decrease by 28% and RMSE increase by 37%. In the second stage, using the reverse k-fold cross-validation extended models were developed base on local ANN model and additional 386 field samples (from the extended model area). Comparing to the optimal local ANN model, the optimal extended model that only required 20% of field samples (5-fold) to build a stable linear model improved prediction accuracy by 9–21% of RMSE, 28–29% of R2, and 6–21% of ROA ± 10%. The ANN6-produced SOM map proved the hypothesis that the average SOM differences at large scales had been captured by the CSOM map and that SOM content at local scales could be modeled with DEM-derived variables. The optimal extended model-produced SOM maps showed the variation of model improvement between folds. Results indicated that the two-stage methods provide an efficient way to produce SOM content map over a large area with limited numbers of field samples. The method could be used in other area as well as for mapping of other soil properties such as soil texture, soil nutrient distributions.
Bie, S.W., Beckett, P.H.T., 1971. Quality control in soil survey: II. The cost of soil survey. J. Soil Sci. 22, 453–465. Bobrovsky, M., Komarov, A., Mikhailov, A., Khanina, L., 2010. Modelling dynamics of soil organic matter under different historical land-use management techniques in European Russia. Ecol. Model. 221, 953–959. Bogunovic, I., Trevisani, S., Pereira, P., Vukadinovic, V., 2018. Mapping soil organic matter in the Baranja region (Croatia): Geological and anthropic forcing parameters. Sci. Total Environ. 643, 335–345. Brady, N.C., Weil, R.R., 2008. The nature and properties of soils, 14th ed. Pearson Education Inc, Upper Saddle River, NJ. Chen, H., Chen, Z., Chen, Z., 2010. Impact of topography on spatial distribution of organic matters in red eroded soil in south China-A case study at hetian in Changting county. Fujian J. Agric. Sci. 25, 369–373 in Chinese. China Soil Survey Office, 1998. Chinese soil. China Agriculture Press, Beijing, China. Dai, F., Zhou, Q., Lv, Z., Wang, X., Liu, G., 2014. Spatial prediction of soil organic matter content integrating artificial neural network and ordinary kriging in Tibetan Plateau. Ecol. Ind. 45, 184–194. De Reu, Jeroen., Bourgeois, J., Bats, M., Zwertvaegher, A., Gelorini, V., et al., 2013. Application of the topographic position index to heterogeneous landscapes. Geomorphology, vol. 186, pp. 39–49. ESRI Inc. 1999-2013. The Help document, ESRI Inc., Copyright 1999-2013. Ferro, V., Minacapilli, M., 1995. Sediment delivery processes at basin scale. Hydrol. Sci. J. 40, 703–717. FIS (Forestry Industry Standard): LY/T 1225-1999. 1999. Determination of forest soil particle-size composition (mechanical composition), The State Forestry Administration of China, Beijing, China. Fun, M.H., 1996. Training modular networks with the Marquardt-Levenberg algorithm. Master’s Thesis, Oklahoma State University, Stillwater, OK. Gallant, J.C., Wilson, J.P., 2000. Primary topographic attributes. In: Wilson, J.P., Gallant, J.C. (Eds.), Terrain Analysis: Principles and Applications. Wiley, New York, pp. 51–85. Gautam, R., Panigrahi, S., Franzen, D., Sims, A., 2011. Residual soil nitrate prediction from imagery and non-imagery information using neural network technique. Biosyst. Eng. 110, 20–28. Greenlee, D., 1987. Raster and vector processing for scanned linework. Photogramm. Eng. Remote Sens. 53, 1383–1387. Grimm, R., Behrens, T., Märker, M., Elsenbeer, H., 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island - Digital soil mapping using Random Forests analysis. Geoderma 146, 102–113. Guangdong Forestry Survey and Planning Institute, 2014. Report of 2014 forest resources inventory in Guangdong province. Guangdong Department of Forestry, Guangzhou, China. Guangdong Institute of Geography (CAS), 1962. Geomorphologic division of Guangdong. Guangdong Institute of Geography (CAS), Guangzhou, China. Guangdong Soil Census Office, 1993. Guangdong soil. Science Press, Beijing China. Guangdong Soil Survey Team, 1993. Atlas of Guangdong soil. China Science Press, Beijing, china. Guo, P., Liu, H., Wu, W., 2009. Spatial prediction of soil organic matter using terrain attributes in a hilly area. International Conference on Environmental Science and Information Application Technology (ESIAT), July, 2009, Wuhan, China. DOI: 10. 1109/ESIAT.2009.330. Guo, P., Wu, W., Sheng, Q., Li, M., Liu, H., Wang, Z., 2013. Prediction of soil organic matter using artificial neural network and topographic indicators in hilly areas. Nutr. Cycl. Agroecosyst 95, 333–344. Heuvelink, G.B.M., Bierkens, M.F.P., 1992. Combining soil maps with interpolations from point observations to predict quantitative soil properties. Geoderma 55, 1–15. Hong, H.L., Chen, S.L., Fang, Q., Algeo, T.J., Zhao, L.L., 2019. Adsorption of organic matter on clay minerals in the Dajiuhu peat soil chronosequence, South China. Appl. Clay Sci. 178. https://doi.org/10.1016/j.clay.2019.105125. Hyndman, R.J., Koehler, A.B., 2006. Another look at measures of forecast accuracy. Int. J. Forecast. 22, 679–688. ISSS (International Society of Soil Science), 1929. Minutes of the first commission meetings, International Congress of Soil Science, pp. 215–220. International Society of Soil Science, Washington, D. C. Jenson, S., Domingue, J.O., 1988. Extracting topographic structure from digital elevation data for geographic information system analysis. Photogramm. Eng. Remote Sens. 54, 1593–1600. Katschinski, N.A., 1956. Die mechanische Bodenanalyse und die Klassifikation der Boden nach ihrer mechanischen Zusam-mensetzung. Rapports au Sixitme Congrts International de la Science du Sol, Paris, B 321–327. Lagacherie, P., 2008. Digital soil mapping: a state of the art. In: Hartemink, A.E., A.B. Digital Soil Mapping with Limited Data. Springer Netherlands, Dordrecht, Netherlands. Lark, R.M., 1999. Soil-landform relationships at within-field scales: an investigation using continuous classification. Geoderma 92, 141–165. Li, X., Ding, X., Ceng, S., Zhang, C., Yang, H, et al., 2018a. Forest soil survey of Yunfu, Guangdong Province. China Forestry Publishing House, Beijing, China. Li, Y., Hu, S., Chen, J., Müller, K., et al., 2018b. Effects of biochar application in forest ecosystems on soil properties and greenhouse gas emissions: a review. Soils Sedim. 18, 546–563. Li, Z.Y., 1998. Supervised classification of multispectral remote sensing image using B-P Neural Network. J. Infrared Milli. Wave. 17, 153–156. Liu, D., Ding, M., Wen, C., Zhang, H., 2019. Effects of soil erosion on soil nutrient elements based on ~(137)Cs tracer in the red soil silly region of southern Jiangxi province. J. Soil Water Conserv. 33, 62–67. Liu, X.M., Zhao, K.L., Xu, J.M., Zhan, M.H., Si, B., Wang, F., 2008. Spatial variability of
CRediT authorship contribution statement Zhengyong Zhao: Conceptualization, Methodology, Software, Funding acquisition. Qi Yang: Writing - original draft, Software. Dongxiao Sun: Data curation. Xiaogang Ding: Investigation, Validation. Fan-Rui Meng: Writing - review & editing. Declaration of Competing Interest All the authors listed have approved the manuscript that is enclosed. No conflict of interest exits in the submission of this manuscript. Acknowledgements This work was supported by funding from the National Natural Science Foundation of China (Grant No. 31500385), Guangxi Natural Science Foundation of China (Grant No. 2016GXNSFCA380029 and 2018GXNSFBA138035), and Guangdong Forestry Science and Technology of China (Grant No. 2014KJCX022). The authors are also grateful to the special founding from Guangxi Hundred-Talent Program to Zhengyong Zhao and Qi Yang. Appendix A. Supplementary material Supplementary data to this article can be found online at https:// doi.org/10.1016/j.compag.2019.105172. References Aghajani, M., Jalalian, A., Besalatpour, A.A., 2015. Soil particulate organic matter (POM) prediction in a mountainous watershed using artificial neural networks. Commun. Soil Sci. Plant Anal. 46, 1–14. Akhtar, M., van Corzo, G., Andel, S., Jonoski, A., 2009. River flow forecasting with artificial neural networks using satellite observed precipitation pre-processed with flow length and travel. Hydrol. Earth Syst. Sci. 13, 1607–1618. Alvarez, R., Steinbach, H.S., Bono, A., 2011. An artificial neural network approach for predicting soil carbon budget in agroecosystems. Soil Sci. Soc. Am. J. 75, 965–975. Ambroise, B., Beven, K., Freer, J., 1996. Toward a generalization of the TOPMODEL concepts: topographic indices of hydrological similarity. Water Resour. Res. 32, 2135–2146. Arp, P.A., 2005. Soils for plant growth: field and laboratory manual. University of New Brunswick, Fredericton, NB. Ballabio, C., 2009. Spatial prediction of soil properties in temperate mountain regionsusing support vector regression. Geoderma 151, 338–350. Barré, P., Durand, H., Chenu, C., Meunier, P., Montagne, D., Castel, G., Billiou, D., Soucémarianadin, L., Cécillon, L., 2017. Geological control of soil organic carbon and nitrogen stocks at the landscape scale. Geoderma 285, 50–56. Beven, K.J., Kirkby, M.J., 1979. A physically based variable contributing area model of basin hydrology. Hydrol. Sci. Bull. 24, 43–69.
11
Computers and Electronics in Agriculture 169 (2020) 105172
Z. Zhao, et al. soil organic matter and nutrients in paddy fields at various scales in southeast China. Environ. Geol. 53, 1139–1147. Maarten, C.B., Christian, B., Marcel, R.H., Markus, R., Bart, K., Marion, S., Pavel, K., 2011. SOMPROF: a vertically explicit soil organic matter model. Ecol. Model. 222, 1712–1730. Mao, Y., Sang, S., Liu, S., Jia, J., 2014. Spatial distribution of pH and organic matter in urban soils and its implications on site-specific land uses in Xuzhou, China. Biology 337, 332–337. Martz, L.W., Garbrecht, J., 1992. Numerical definition of drainage network and subcatchment areas from digital elevation models. Comput. Geosci. 18, 747–761. Mazza, G., Agnelli, A.E., Cantiani, P., Chiavetta, U., et al., 2019. Short-term effects of thinning on soil CO2, N2O and CH4 fluxes in Mediterranean forest ecosystems. Sci. Total Environ. 651, 713–724. McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M.S., Shatar, T.M., 2000. An overview of pedometric techniques for use in soil survey. Geoderma 97, 293–327. McBratney, A.B., Santos, M.L.M., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–52. McKay, J., Grunwald, S., Shi, X., Long, R.F., 2010. Evaluation of the transferability of a knowledge-based soil-landscape model. In: Boettinger, J.L., Howell, D.W., Moore, A.C., Hartemink, A.E., Kienast-Brown, S. (Eds.), Digital soil mapping. Springer, Netherlands, Dordrecht, Netherlands. Meng, F.-R., Arp, P.A., Zelazny, V.F., Colpitts, M.C., Schivatcheva, T., Fahmy, S.H., 1997. Spatial and temporal variation of soil moisture. Progress report for Fundy Model Forest. pp. 4. Meng, F.-R., Castonguay, M., Ogilvie, J., Murphy, P.N.C., Arp, P.A., 2006. Developing a GIS-Based flow-channel and wet areas mapping framework for precision forestry planning. Proceeding for IUFRO Precision Forestry Symposium 2006, 5-10 March, 2006, Stellenbosch, South Africa. pp. 43-55. Moore, D.S., 2006. The Basic Practice of Statistics, 4th. ed. W.H. Freeman Co., New York. Murphy, P.N.C., Ogilvie, J., Connor, K., Arpl, P.A., 2007. Mapping wetlands: A comparison of two different approaches for New Brunswick, Canada. Wetlands 27, 846–854. National Soil Survey Office, 1998. Chinese soil. China Agriculture Press, Beijing, China. Olga A., R.V., Lidia, V., Fernando, P.C., 2018. Modeling soil organic matter and texture from satellite data in areas affected by wildfires and cropland abandonment in Aragon, Northern Spain. Journal of Applied Remote Sensing 12. DOI: 10.1117/1.JRS. 12.042803. Pellitier, P.T., Zak, D.R., 2018. Ectomycorrhizal fungi and the enzymatic liberation of nitrogen from soil organic matter: why evolutionary history matters. New Phytol. 217, 68–73. Poggio, L., Gimona, A., Spezia, L., Spezia, M.J., 2016. Bayesian spatial modelling of soil properties and their uncertainty: the example of soil organic matter in Scotland using R-INLA. Geoderma 277, 69–82. Qi, Y., Wang, Y., Chen, Y., 2017. Soil organic matter prediction based on remote sensing data and random forest model in Shanxi province. J. Nat. Resour. 32, 1074–1086. Qiu, Y., Fu, B., Wang, J., Chen, L., Meng, Q., Zhang, Y., 2010. Spatial prediction of soil moisture content using multiple-linear regressions in a gully catchment of the Loess Plateau, China. J. Arid Environ. 74, 208–220. Ren, L.L., Liu, X.R., 2000. Hydrological processes modeling based on digital elevation model. Geogr. Res. 19, 369–376. Rousseva, S.S., 1997. Data transformations between soil texture schemes. Eur. J. Soil Sci. 48, 748–758. SAC (Standardization Administration of the People's Republic of China), 1988. Method for determination of soil organic matter: GB 9834–88. Standards Press of China,
Beijing, China. Schloeder, C.A., Zimmerman, N.E., Jacobs, M.J., 2001. Comparison of methods for interpolating soil properties using limited data. Soil Sci. Soc. Am. J. 65, 470479. Shu, J., Zhang, S., Sun, B., Zhao, Q., Liu, Y., 2005. Dynamic analysis of soil organic matter contents in soil and water conservation region of Xingguo county, Jiangxi province. Acta Ecologica Sinica 25, 1240–1246 in Chinese. Sigillito, V.G., Hutton, L.V., 1990. Case study II: radar signal processing. In: Eberhart, R. C., Dobbins, R.W., Eds., Neural network PC tools. Academic Press Professional, Inc., San Diego, CA. Six, J., Conant, R.T., Paul, E.A., Paustian, K., 2002. Stabilization mechanisms of soil organic matter: implications for C-saturation of soils. Plant Soil 241 (2), 155–176. Smith, P., 2004. How long before a change in soil organic carbon can be detected? Global Change Biol. 10, 1878–1883. Tang, Z.H., Ouyang, T.P., Li, M.K., Huang, N.S., Kuang, Y.Q., Hu, Q., Zhu, Z.Y., 2019. Potential effffects of exploiting the Yunfu pyrite mine (southern China) on soil: evidence from analyzing trace elements in surface soil. Environ. Monit Assess. 191 (395), 1–18. The MathWorks Inc. 1984-2012. The Help document. The MathWorks, Inc., Natick, MA. Copyright 1984-2012. The National Standards of China 2009. China Soil Classification and Code (GB/T 172962009). (in Chinese). Thompson, J.A., Pena-Yewtukhiw, E.M., Grove, J.H., 2006. Soil-landscape modeling across a physiographic region: topographic patterns and model transportability. Geoderma 133, 57–70. Wang, Q.K., Wang, S.L., 2007. Soil organic matter under different forest types in Southern China. Geoderma 142, 349–356. Webster, R., 1985. Quantitative spatial analysis of soil in the field. Adv. Soil Sci. 3, 1–70. Webster, R., Oliver, M.A., 2007. Geostatistics for environmental scientists, second ed. Wiley, Chichester. Wiesmeier, M., Barthold, F., Blank, B., Kögel-Knabner, I., 2011. Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant Soil 340, 7–24. Wu, C.F., Wu, J.P., Luo, Y.M., Zhang, L.M., DeGloria, S.D., 2009. Spatial prediction of soil organic matter content using cokriging with remotely sensed data. Soil Sci. Soc. Am. J. 73, 1202–1208. Zech, W., Senesi, N., Guggenberger, G., Lehmann, J., et al., 1997. Factors controlling humification and mineralization of soil organic matter in the tropics. Geoderma 79, 117–161. Zhang, L. and Zhang, J. 2010. Precise processing of Spot-5 hrs and Irs-P5 stereo imageryfor the project of west China topographic mapping at 1:50,000 scale. In: Wagner W., Székely, B. (eds.): Isprs Tc VII Symposium-100 Years Isprs, Vienna, Austria, July 5–7, 2010, Iaprs, Vol. XXXVIII, Part 7A. Zhao, Z., Ashraf, M.I., Meng, F., 2013. Model prediction of soil drainage classes over a large area using a limited number of field samples: a case study in the province of Nova Scotia, Canada. Can. J. Soil Sci. 93, 73–83. Zhao, Z., Yang, Q., Benoy, G., Meng, F., Chow, T.L., Xing, Z., Rees, H.W., 2009. Using artificial neural network models to produce soil organic carbon content distribution maps across landscapes. Can. J. Soil. Sci. 90, 75–87. Zhong, J., Zhang, B., Lin, M., Luo, B., Tang, J., 1998. Research on the characteristics of particle composition of red soils in Guangdong-II: The spatial variation of soil particle composition. Trop. Subtrop. Soil Sci. 7, 98–101 in Chinese. Zhu, A.X., Qi, F., Moore, A., Burt, J.E., 2010. Prediction of soil properties using fuzzy membership values. Geoderma 158, 199–206.
12