Geoderma 340 (2019) 234–248
Contents lists available at ScienceDirect
Geoderma journal homepage: www.elsevier.com/locate/geoderma
Land parcel-based digital soil mapping of soil nutrient properties in an alluvial-diluvia plain agricultural area in China
T
⁎
Wen Donga, Tianjun Wub, Jiancheng Luoa, , Yingwei Suna, Liegang Xiac a
State Key Laboratory of Remote Sensing Science, Institute of Remote Sensing and Digital Earth, CAS, Beijing, 100101, China Department of Mathematics and Information Science, College of Science, Chang'an University, Xi'an 710064, China c College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310014, China b
A R T I C LE I N FO
A B S T R A C T
Handling Editor: Alex McBratney
The ability to accurately and precisely perform soil nutrient mapping over large areas is essential in the decisionmaking processes for precision agriculture. However, existing grid-based or non-grid-based digital soil mapping (DSM) can lead to the problem of mixed units of input information, which causes the mapping results to be unsuitable for direct use in guiding the implementation of precision agriculture. Instead, the goal of this study was to achieve DSM based on land parcels, which are the basic units of agricultural management and have practical geographical significance for precision mapping in agricultural areas. This study established a convolutional neural network-based automatic extraction model to extract land parcels from high resolution remote sensing images. Thirty environmental covariates were chosen and calibrated at land parcels to establish the relationships between soils and landscapes. Four prediction algorithms, namely, ordinary kriging, cokriging, random forest and artificial neural network, were combined with the land-parcel-based DSM framework to develop and evaluate their effectiveness in predicting four topsoil nutrient properties in an alluvial-diluvia plain agricultural region located in Ningxia province, China. The results of comparisons show that, overall, the landparcel-based RF model achieved the best prediction accuracy; its relative improvement (RMSE%) values over the competing models were 1.27, 4.23, 3.19 and 9.01 for soil organic matter, soil total nitrogen, available phosphorus and available potassium, respectively. In addition, land-parcel-based mapping can improve algorithmic efficiency by approximately 4 times by effectively reducing the mapping units for complex agricultural areas compared with the grid-based mapping results when using the same algorithm, and it also achieves a better performance at the detail level. Overall, the land-parcel-based DSM approach achieved good results in plain agricultural areas, but the model still needs improvement for land-parcel-based DSM in mountainous and hilly agricultural areas, and a challenge remains in selecting the most appropriate environmental covariates.
Keywords: Digital soil mapping Soil nutrient properties Land parcel Agricultural landscape Deep learning
1. Introduction Soil nutrients are the most important soil component because they directly affect plant growth and the supply of raw material for human life. These nutrients mainly include carbon (C), nitrogen (N), phosphorus (P) and potassium (K) (McGrath et al., 2001). Some studies have suggested that world food production need to increase by 70% to 100% to be able to feed the world population of 9 billion people expected by 2050 (Godfray et al., 2010). To increase food production and production efficiency, a large number of pesticides and chemical fertilizers have been developed an applied to agricultural production. However, pesticides are accompanied by environmental pollution problems and food safety concerns, and they make agricultural resources and production unsustainable (Sharma, 2001; ⁎
Ciccarese and Silli, 2016). To change this situation, precision agriculture (PA) has become an increasingly attractive concept because it holds the promise of significantly improving agricultural sustainability by applying developing information technologies (Zhang et al., 2016). More precisely, PA is a farm management system intended to help its implementers reduce production costs and soil pollution through accurate management of the planting environment across spatial-temporal changes (Khosla et al., 2005) and providing effective guidance for applying chemicals and fertilizers (Far and Rezaei-Moghaddam, 2017). Hence, accurate and precise soil information—especially soil nutrient information—over large areas is essential in PA and its decision-making processes (Brungard et al., 2015). Digital soil mapping (DSM) is a relatively effective way to get the fine-scale soil information needed by PA when soil characteristics cannot otherwise be obtained rapidly and
Corresponding author at: No. 20 Datun Road, Chaoyang District, Beijing, Institute of Remote Sensing and Digital Earth, CAS, 100101, China. E-mail address:
[email protected] (J. Luo).
https://doi.org/10.1016/j.geoderma.2019.01.018 Received 17 June 2018; Received in revised form 5 December 2018; Accepted 7 January 2019 0016-7061/ © 2019 Elsevier B.V. All rights reserved.
Geoderma 340 (2019) 234–248
W. Dong et al.
the field by intensive sampling and grid-based mapping in a field or parcel. The other (Hofer et al., 2013; Stevens et al., 2015) is to study spatial variations and impacts at the field scale by calculating and analyzing the values of validated field units based on regional gridbased mapping. Although the second type of research has performed parcel-based results analyses, it still uses grid-based DSMs to predict soil properties and synthesizes gridded mapping results based on parcels. Therefore, this type of mapping is not land-parcel-based DSM either. To summarize, existing grid-based and non-grid-based DSMs are unsuitable for directly guiding the implementation of PA and parcelbased mapping at the regional scale has not been realized. Therefore, mapping soil nutrient properties based on units more closely aligned with practical geographical meaning (geo-parcels, termed land parcels in agricultural applications) over large areas is not only needed to improve the mapping accuracy but also to meet the demand of practical application. One of the reasons grid mapping was prevalent in the past is that fast access to geo-parcels was lacking. However, with the development of earth observation and information technology, high-resolution remote sensing images acquired by advanced satellite sensors provide sufficient data support for geo-parcel identification over large areas (Bruzzone and Carlin, 2006; Zhao and Du, 2016). Information extraction technologies such as the convolutional neural network (CNN)based deep learning methods and their applications in the remote sensing field make automatic extraction of fine geo-parcels from high resolution remote sensing imageries possible (Bengio, 2009; Shen et al., 2015; Xie and Tu, 2015). Therefore, this study attempted to predict soil nutrient properties based on land parcels by applying various prediction approaches, including ordinary kriging (OK), cokriging (Co-K), random forest (RF) and artificial neural network (ANN) models. In this study, a CNN-based edge extraction method was used to automatically extract the land parcels; then, the DSM process was adjusted to adapt to digital mapping based on land parcels in an alluvial-diluvia plain agricultural region located in Ningxia province, China. The results of the selected prediction algorithms and of different mapping units were compared to evaluate the land-parcel-based mapping performance for fine-scale soil nutrient prediction in regions that possess similar topographical and climatic environments.
inexpensively. In traditional soil surveys, soil mapping was conducted in the form of large-scale polygons and was based primarily on expert knowledge (Tomlinson, 1978; Hudson, 1992). Due to the efficiency and accuracy limitations of this traditional mapping approach, DSM developed rapidly, effectively improving along with the development of information technology (McBratney et al., 2003; Minasny and McBratney, 2016). During this process, numerous prediction models have been developed and applied to soil nutrient mapping at different scales under the DSM framework (McBratney et al., 2003; Grunwald, 2009; Were et al., 2015; Taghizadeh-Mehrjardi et al., 2016; Negasa et al., 2017). In general, the DSM framework is implemented on raster (Saunders and Boettinger, 2006); therefore, all the data need to be rasterized before they can be input into the model. Although studies have used non-grid units in parts of the DSM process, such as environmental covariates acquisition (Vincent et al., 2018) or sample collection (Heung et al., 2017), the process of modeling and mapping is unified into grid units to form a grid-based soil product map, which is essentially a grid-based DSM. Although using grids as computing units can help in the automatically implementing prediction models, it also leads to the problem of mixed units of input information, which affect the prediction results. First, most of the indicators representing the s and o factors in the scorpan model, which were the common ancillary data need for soil property prediction, were extracted from remote sensing data or from thematic maps such as land use, soil type, or vegetation (McBratney et al., 2003; Grunwald, 2009; Mulder et al., 2011; Minasny et al., 2013). These characteristics exhibit a spatial zoning phenomenon, and relatively obvious boundaries exist between regions with different values. Moreover, these zones usually do not form a grid. Chaney et al. (2016) suggested that an important challenge for DSM is that the difficulty increases as the precision of soil mapping improves, especially in areas that share similar relief, climate and soil types. The difficulty in these areas arises because the soil properties in such areas are more affected by human activities such as tillage methods, crop selection and irrigation and fertilization methods (Hobbs et al., 2008; Thiele-Bruhn et al., 2012; Chu et al., 2017), and because the units of human activities often do not form grids, especially in China, with its small-scale household agricultural management practices. Furthermore, grids are also inconsistent with the management units in practical PA applications. For example, site specific management (SSM) is a typical precision agriculture practice. It is based on crop management at a limited spatial scale and accurately guides agricultural management based on planting units (Franzen et al., 2002; Gebbers and Adamchuk, 2010). In addition to grid-based mapping, studies have also focused on non-grid-based mapping in DSM. We can divide these non-grid map units into two categories according to their generation. One category includes regular or irregular polygon units constructed according to algorithm requirements that do not conform to the actual boundary of geographical objects and have no actual geographical significance. For example, Haring et al. (2012) carried out soil type predictions in forest areas based on specific map units obtained by spatially disaggregating complex soil map units. To run geostatistical spatial models with INLA + SPDE, Poggio et al. (2016) created a specific mesh that satisfies specific constraints for continuous data. The second category includes regional polygon units constructed based on the application requirements, such as soil pollution control planning, which represents a synthesis of multiple adjacent actual geographical objects and is often affected by subjective factors. The meaning of these units is different from the concept of land parcels in this paper. Similar to grid-based mapping, problems associated with mixed units of input information and inconsistencies with the management units in practical PA applications are observed when using these non-grid units for mapping. Previous research has been performed at the field or parcel scale, and it mainly focuses on two aspects. One (Banton et al., 1997; Liu et al., 2008) is to study the spatial variability of soil properties within
2. Materials 2.1. Study area The study area shown in Fig. 1 is located in Zhongning County (37°9′–37°50′N, 105°26′–106°7′E) in northwestern Ningxia Province, the birthplace of wolfberry and one of the major grain producing areas in China. This area has a diversity of landscapes because it falls into the transition zone between the Inner Mongolian Plateau and the Loess Plateau. The local climate is north temperate monsoon. According to historical meteorological data, the average annual temperature of this area is 7.2 °C. The annual precipitation is only 204.7 mm, 61% of which is concentrated from June to August, while the annual evaporation is 1947.1 mm—9.5 times more than the annual precipitation. Our attention focuses on the alluvial-diluvia plain along the Yellow River (509.17 km2). This plain is the main agricultural region in this area; we disregard other areas in our study. The major soil types in this area are cumulated irrigated soils and sierozems; however, aeolian soils, alluvial soils and fluvo-aquic soils are also present locally. The region's land uses include not only cultivated land but also pastureland and orchard. Wolfberry is the dominant crop, but corn and rice are also planted in this region. 2.2. Soil sampling and laboratory analysis The study area is located in the alluvial-diluvia plain along the 235
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 1. Location of the study area and sample site distribution. (a) location of Zhongning County; (b) the study area in Zhongning County; (c) diagram of the sampling point distribution in different land parcels; (d) the spatial distribution of soil samples in the study area. Table 1 Summary of environmental covariates considered in this study. Scorpan factora
Environmental covariates
s
Soil type Bulk density Clay content Silt content Sand content Coarse fragments volumetric Average annual temperature Annual accumulated temperature (≥10°) Average annual temperature (2015) Annual precipitation Aridity index Moisture index Annual precipitation (2015) Annual frost-free days Land use Irrigation index Elevation Slope Aspect Curvature Convergence index LS-factor TPI TWI TRI MRVBF Geomorphic type Parent material Distance to river Distance to downtown
c
o r
p n
a b c
Value typeb C Q Q Q Q Q Q Q Q Q Q Q Q Q C C Q Q Q Q Q Q Q Q Q Q C C Q Q
s: soil; c: climate; o: organisms, r: relief; p: parent materials; n: spatial position. C: categorical; Q: quantitative. G: grid; V: vector.
236
Value range
Source
Formatc & scale
Five categories 1256–1639 (kg/m3) 10–29 (%) 35–56 (%) 22–52 (%) 1–29 (%) 7.2–7.5 (°C) 2718.9–2770.5 (°C) 9.61–10.48 (°C) 204.7–224.0 (mm) 2.98–3.33 −41.16 to −39.47 125.96–165.24 (mm) 180–200 (days) Six categories Three categories 1143–1325 (m) 0.025–1.57 (°) 0–6.28() −4.40e+8–3.25e+8 −77.21–78.17 0–13.99 −4.62–2.88 −18.89–4.57 0–6.90 0–1.88 Four categories Six categories 0–14,461.12 (m) 0–24,950.96 (m)
Prior map SoilGrid SoilGrid SoilGrid SoilGrid SoilGrid RESDC RESDC RESDC RESDC RESDC RESDC RESDC Prior map GF-1,2 image Prior map SRTM DEM SRTM DEM SRTM DEM SRTM DEM SRTM DEM SRTM DEM SRTM DEM SRTM DEM SRTM DEM SRTM DEM Prior map Prior map GF-2 image GF-2 image
V, G, G, G, G, G, G, G, G, G, G, G, G, V, V, V, G, G, G, G, G, G, G, G, G, G, V, V, V, V,
1: 1,000,000 250 m 250 m 250 m 250 m 250 m 500 m 500 m 500 m 500 m 500 m 500 m 500 m 1:400,000 1:5000 1:1,000,000 30 m 30 m 30 m 30 m 30 m 30 m 30 m 30 m 30 m 30 m 1:1.000,000 1:1,000,000 1:5000 1:5000
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 2. Spatial distribution patterns of different covariates. (a) Spatial distribution patterns of the moisture index, which presents a change rate of < 5%; (b) spatial distribution patterns of average annual temperature in 2015, which presents a change rate of approximately 10%; (c) spatial distribution patterns of the bulk density, which presents a change rate of approximately 30%; and (d) spatial distribution patterns of sand content, which presents a change rate of > 100%;
CSOM = CSOC × 1.1 × α
Yellow River and sandwiched between mountains, which results in a similar topography and climate throughout the study area. According to the DEM and historical meteorological data, the area with elevation difference of > 60% is < 30 m (1165 m–1195 m), and the slope is < 4°. The average annual temperature difference is only 0.3°, and the difference between annual precipitation and annual frost-free days is < 10% (Table 1). However, this area has diverse soil type because it falls into the transition zone of multiple geomorphic regions. Five main soil types are found in the study area. The development of agricultural industrialization in this area has promoted the gradual transition from the traditional household farming mode to farm farming modes, which has resulted in the coexistence of diversified land management practices at present. Therefore, soil sampling (Fig. 1) was performed by a stratified random technology (Brus, 1994). According to the modern analysis method for soil elements (China National Environmental Monitoring Center, 1992), sampling was carried out in land parcels, which were extracted from GF-2 fusion data using a CNN-based approach (see Section 3.1), and a soil sample was formed using a multipoint average mixed sampling method applied to each sampled land parcel to ensure the representativeness of the samples. An “S-shaped” sampling was used in the larger land parcels, while “plum-shaped” sampling was used in the smaller land parcels. In total, 1288 topsoil samples (cultivated land: 0–20 cm, pastureland: 0–30 cm, orchard: 0–30 cm) were collected for laboratory analyses. In addition, soil samples were air-dried at room temperature and then crushed and sieved at 2 mm and 0.25 mm. The soil pH value was measured in a 1:2.5 (soil: water ratio) soil solution with a pH meter. Soil organic matter (SOM) was calculated by Eq. (1):
(1)
where CSOC denotes the soil organic carbon (SOC) content, which was determined by the Walkley-Black method (Sahrawat, 1982). The 1.1 is a correction factor because organic carbon can be oxidized by only 90%, and α is a constant set to 1.724 because the average carbon content in organic matter is 58%. Soil total nitrogen (TN), available phosphorus (P) and available potassium (K) were determined by the Kjeldahl method (Sahrawat, 1982), the Olsen method (Olsen et al., 1954) and ammonium acetate leaching method (Cornfield and Pollard, 1952), respectively. 2.3. Environmental covariates We chose 30 factors as environmental covariates (see Table 1) that could affect soil nutrient properties according to the scorpan model (McBratney et al., 2003) and the availability of these factors in our study area. Environmental covariates were calibrated at the land parcel level before inputting prediction models. Twenty terrain attributes were generated from the SRTM (Shuttle Radar Terrain Mission) 1 Arc-Second Global (30 m) data (download from: http://earthexplorer.usgs.gov/) System for Automated Geoscientific Analyses (SAGA) software (Conrad et al., 2015), and ten terrain attributes were selected as terrain-related environmental covariates by referencing relevant literature (Moore et al., 1993; Takata et al., 2007; Umali et al., 2012) and analyzing the distribution of attribute values in the study area, i.e., the elevation, slope, aspect, curvature, convergence index, slope length and steepness factor (LS-factor) (Boehner and Selige, 2006), topographic position index (TPI) (Guisan 237
Geoderma 340 (2019) 234–248
W. Dong et al.
et al., 1999), topographic wetness index (TWI) (Beven and Kirkby, 1979), terrain ruggedness index (TRI) (Riley et al., 1999), and Multiresolution Index of Valley Bottom Flatness (MRVBF) (Gallant and Dowling, 2003). Seven climate attributes were calculated from historical meteorological data provided by the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC) (http://www.resdc.cn): average annual temperature, annual accumulated temperature (≥10°), average annual temperature in 2015, average annual precipitation, aridity index, moisture index and average annual precipitation in 2015. Because the scale of the weather change process and historical meteorological data is relatively large, the variation degree of these attributes is relatively small (ranging from 1.9% to 31.2%). Five soil physical properties were obtained from the SoilGrid 250 m dataset (Hengl et al., 2017) via the web-mapping interface (www.soilgrids.org): bulk density, clay content, silt content, sand content and coarse fragments volumetric. Although the resolution of the SoilGrid is not high, the variation degree of the five soil physical attributes is relatively significant (ranging from 30.5% to 280%) because the study area is located in the transition zone of multiple geomorphological regions and the formation of soil parent material is diversified. Considering that the attributes with a small change rate may have a fine-tuning effect on the model and certain attributes may play a restrictive role, we retained these attributes and discussed the role of different covariates via a factor importance analysis in Section 4.2. As shown in Fig. 2, we selected four covariates with different ranges of change rates as representatives to show their spatial distribution patterns in the study area. Geomorphic types and annual frost-free days were obtained from regional thematic maps. Soil parent material and soil type maps were revised on the basis of prior maps via soil surveys and the expert knowledge of local soil scientists. The land use map was produced using a supervised classification method applied to GF-1 and GF-2 imagery (Yang et al., 2017). The irrigation index corresponds to the ability of a location to support irrigation. Three irrigation classes were distinguished: well-irrigated, moderately well-irrigated and poorly irrigated based on local expert opinions. The distances to the river and downtown were calculated from the extraction results of the river, built-up areas and land parcels.
one hand, edge extraction of multiple small regions can be conducted in parallel to improve the calculation efficiency. On the other hand, the edge extraction precision can be improved by removing the influences from surface objects in the edge zones, such as shade trees. Second, an edge probability map for each subregion was extracted by a modified VGG16 network (Simonyan and Zisserman, 2014; Dai et al., 2016). The convolution layers in the network were divided into five stages, and the outputs of each stage were up-sampled to the original image size. A cross-entropy loss layer was connected to each upsampled result. Finally, the up-sampled layers of all stages were fused and a fusion loss layer was followed (Liu et al., 2016). The formula for the improved loss function is shown in Eq. (2): N
L (W ) =
5
∑ ⎛⎜ ∑ l (Xi(k) ; W ) + l (Xi fuse ; W ) ⎞⎟ i=1
⎝ k=1
Xi(k)
(2)
⎠ Xifuse
is the activation value from stage k (from 1 to 5), is the where fusion layer value, N is the number of pixels in the input image and W denotes all the parameters that will be learned in our architecture. Third, a vectorization process guided by canny edge detection was applied to the edge probability map. When the edge probability strength reaches a certain threshold, canny edges with a certain length are preserved. The boundary of each land parcel was extracted after extending suspension lines supported by the edge probability strength. The subregion extraction results were merged to obtain the land parcel distribution map for the study area. 3.2. Spatial modeling 3.2.1. Prediction model selection Numerous soil property mapping experiments have been conducted using a variety of modeling approaches based on grids. Goovaerts (1999) and Göl et al. (2017) confirmed that Co-K has a better prediction effect than OK and Inverse Distance Weighted (IDW) models. Some experimental results have shown that RF is superior to boosted regression tree (BRT), multiple linear regression (MLR) and logistic regression (LR) for soil attribute mapping (Razakamanarivo and Grinand, 2011; Yang et al., 2016). Among six data mining techniques to map SOC tested by Taghizadeh-Mehrjardi et al. (2016), ANN showed the best performance, and Were et al. (2015) suggested that support vector regression (SVR) has superior or similar results compared with ANN. Therefore, we selected four modeling approaches to map soil nutrient properties based on land parcels: OK, Co-K, RF and ANN. These methods are briefly described below. All methods were implemented using the R language (R Core Team, 2017).
3. Methods Land parcels are the basic operational units in agricultural production activities and have practical geographical meaning for agricultural applications. Therefore, we proposed a land-parcel-based soil nutrient mapping method. An overview of this method is shown in Fig. 3. First, land parcel boundaries were extracted from GF-2 fusion data with a 0.8 m spatial resolution using a CNN-based algorithm. To build the relationship between soil nutrient content and landscape based on land parcels, the second step was to allocate environmental covariates and soil sample data to the extracted land parcels. Then, a prediction model was built by using part of the soil sample data and implemented in the study area. Finally, its prediction accuracy was evaluated by using the remaining soil sampling data.
3.2.1.1. Ordinary kriging and cokriging. Kriging is a generic name for generalized least-squares regression algorithms, which include simple kriging, OK and a series of improved algorithms developed based on the OK method. It can be divided into univariate kriging and multivariate kriging methods. Univariate kriging methods are variants of the basic linear regression estimator (Goovaerts, 1999), and OK, which considers local fluctuations of the mean according to the local neighborhood values, is a basic and widely used method in geostatistics. When observation values are spatially sparse, other related information is usually employed to assist estimation. Co-K uses a multivariate extension of the kriging estimator when the secondary information is not exhaustively sampled (Isaaks and Srivastava, 1989). Numerous examples (Goovaerts, 1997) of Co-K applications can be found in DSM. The results of these studies show that employing auxiliary information effectively improves DSM accuracy when point observations are scarce.
3.1. Land parcels extraction Extracting accurate and complete land parcel boundaries was the basis for soil nutrient mapping in our method. The boundaries of the individual land parcels in the study area were automatically extracted using a CNN-based approach (see Fig. 4) consisting of the following three steps. First, a regional division network map was built by superimposing a roads and rivers map over the study area. The resulting map was used to divide the GF2 fusion image of our study area into multiple subregion images. These subregion images formed the input of the next step. On
3.2.1.2. Random forest. RF is an ensemble learning algorithm based on bagging that works through aggregating an ensemble of randomized regression trees (Cutler et al., 2012). In RF modeling, users must specify two training parameters: ntree and mtry. The ntree training sub datasets 238
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 3. Overview of the methodology used for land-parcel-based soil nutrient properties mapping in this study.
number of covariates (24 in this study). Based on the results from testing different parameter combinations, the respective parameters were set to predict SOM, TN, available P and available K to compare the efficiency of different methods.
were generated from the original training dataset using the bootstrap method, and the ntree decision tree models were trained on these sub datasets, respectively. The final prediction result was the mean of the predicted values of multiple trees. The ntree value was set by empirically between 100 and 15,000 by testing in increments of 100 (Hengl et al., 2015), while the mtry value ranged from 1 to the total
3.2.1.3. Artificial neural networks. ANN is a type of pattern-matching
Fig. 4. Flowchart of automatic CNN-based land parcel extraction. 239
Geoderma 340 (2019) 234–248
W. Dong et al.
Table 2 Descriptive statistics of the area of land parcels.
Area (m2) a
Max
Min
Average
Skewness
Kurtosis
Q25a
Q50a
Q75a
191,165.13
137.41
2145.16
10.14
193.91
613.87
1122.58
2196.27
Q25: the 25% quartile; Q50: the 50% quartile; Q75: the 75% quartile.
a land parcel whose longest side was below the threshold. Finally, land parcels whose longest sides were above the threshold were divided into several sub polygons by spatial decomposition (McBratney, 1998), and the central points of the sub polygons were used as the typical points. Fig. 5 shows the typical point generation process using spatial decomposition. Haering et al. (2012) suggested that soil parent material, topography and soil water dynamics have important effects on soil distribution. Therefore, we select soil parent material, geomorphic type and irrigation index as the main auxiliary factors of spatial decomposition from 24 environmental covariates, considering that the study area has similar climatic factors and that the land parcel itself represents the influence of human activity. The distribution maps of land parcels and auxiliary factors were overlaid; a land parcel was segmented into multiple sub polygons that could be consider as units that contained the same values of the s, c, o, r, p and n factors. To improve computation efficiency, sub polygons smaller than a certain area were not used to generate typical points.
algorithm that simulates a biological neural network. Hundreds of different ANN algorithms exist, including Hopfield neural networks (Duan et al., 2016), self-organizing maps (Saadatdoost et al., 2012) and learning vector quantization (Schneider et al., 2009); however, the most popular type of ANN is a multilayer perceptron (MLP) using the back-propagation algorithm (Conforti et al., 2014; TaghizadehMehrjardi et al., 2015). We selected the MLP for this study. The number of input layer neurons was set to equal the environmental covariates, and the output layer was set to a single layer (SOM, TN, available P and available K, respectively) instead of four layers to enable the subsequent comparisons. The number of hidden neurons (1 to 24) and the number of hidden layers (1 to 2) were varied to build different models. The weights of each model were randomly initialized and corrected during the training phase. We chose the model parameters through a trial-and-error strategy based on root mean square error (RMSE). 3.2.2. Spatial disaggregation and typical point generation Because small-scale household management mode and large-scale farm management mode coexist in the study area, the area range of land parcels is extensive (see Table 2). Therefore, to obtain the environmental covariates and predict the nutrient properties of the different land parcels accurately, we calculated the predicted property value or environmental covariate value of a land parcel by one or more typical points in this land parcel instead of using the value of the central point to replace the value of the land parcel as grid interpolation. Considering the spatial variability of soil properties and soil-landscape relationships, we designed the typical point generation method as follows: First, a threshold was derived from Eq. (3), where the spatial semivariogram of soil nutrients is f(x):
f −1 (x ) = lim ∆x → 0
∆y ∆x
3.3. Environmental covariates calibration The available environmental covariates were expressed in grid or polygon vector formats (see Table 1). However, in this study, the prediction unit was land parcel. Therefore, the environmental covariates were recalibrated at the land parcel level (see Fig. 6) before input to the prediction models. Considering the implementation of prediction models, the calibration of environmental covariates at the land parcel level was implemented based on the disintegrated polygons of land parcels. For the grid-format variables, pixels located inside the boundary of a disintegrated polygon were counted. For categorical variables, the category with the most pixels was defined as the category for this polygon; for quantitative variables, the average value of all pixels was calculated as the value of this polygon. For the vector format variables, the polygons that intersected a disintegrated polygon boundary were counted. For categorical variables, the category of the polygon with the largest intersecting area was defined as the category for this disintegrated polygon; for quantitative variables, the weighted average value was calculated based on the size of the intersecting area as the value of this disintegrated polygon.
(3)
When the spatial distance is small, it can be assumed that the soil nutrient value is linearly related to the spatial distance. The threshold value was formulated as shown in Eq. (4):
L max = α × (Pmax − Pmin )/ k
(4)
where Pmax is the maximum value of a soil nutrient, Pmin is the minimum value of the soil nutrient, α is an acceptable maximum deviation ratio and k is a constant that indicates the relation between the property value and spatial distance. To simplify the calculation, k can be set to an approximate slope chosen from the variogram simulation. Second, land parcels were divided into two categories by comparing the longest side of a land parcel with the threshold. Third, the central point of a polygon was used as the typical point of
3.4. Model evaluation To compare the performance of different spatial models, we evaluated the models by an independent validation method and then selected the model with the best overall performance for soil nutrient property predictions and uncertainty calculations. Fig. 5. Typical point generation of different land parcels. The blue polygon represents the land parcel whose longest side is below the threshold, and the red polygon represents the land parcel whose longest side is larger than the threshold. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
240
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 6. Calibration results of environmental covariates in different formats: (a) DEM data with a grid format; (b) elevation calibration result at the land parcel level; (c) local magnification of (b); (d) geomorphic types thematic map in vector format; (e) geomorphic types calibration result at the land parcel level; (f) local magnification of (e).
4. Results and discussion
First, 1000 effective sample data were obtained from 1288 topsoil samples after removing null values and measuring error values. Second, we randomly split these data into training (n = 700, 70% of sample points) and testing (n = 300, 30% of sample points) data sets. Third, the training data sets were used to analyze spatial variability or establish the relationship between soil nutrient properties and environmental covariates for models using cross-validation, and the testing data sets was used to validate models independently (Arlot and Celisse, 2009). The training and validation data sets were the same for all models. Finally, three commonly used metrics were selected to compare the models' performances. The RMSE and the coefficient of determination (R2) (shown in Eqs. (5) and (6)) were calculated from the predicted and observed values to indicate the precision and bias of the predictions: n
RMSE =
(∑
R2 = 1 −
∑1
1
n
(yi − yi′ )2 / n
4.1. Comparison of mapping units Before land parcel extraction, river, road and built-up area maps were used as masks and regional division network maps to segment and process the high-resolution image of the study area. A total of 130,923 land parcels were extracted from the study area for land-parcel-based DSM. Through a visual comparison of the high-resolution image and the land parcels, the polygon boundaries were basically the same as those in the image (see Fig. 8). Land parcel descriptive statistics are listed in Table 2. The total land parcel area was 280.85 km2, which accounted for 55.15% of the study area and consisted of 73.08% of the extracted area (after removing the masked areas from the study area). The results showed that agriculture is dominant in the study area. The area of the largest land parcel (an orchard) was 191,165.13 m2; the smallest was approximately only 0.072% (137.41 m2) of the largest one. This large difference is due to the coexistence of household farming and farm farming modes. The 72.03% (94,300) of land parcels occupying < 2000 m2, and 54.23% (70,999) of the land parcels were between 200 m2 and 1200 m2 in size. The largest number of land parcels had areas ≥300 m2 and < 500 m2 (see Fig. 7). From these results, we can infer that the study area is still dominated by the traditional household farming mode. When regular grids are used for DSM, the approximate grid size (Sg) can be calculated simply according to the land parcels areas as shown in Eq. (8) if we assume that the shapes of land parcels are approximately square:
1/2
)
(5) n
(yi − yi′ )2 / ∑ (yi − yi )2 1
(6)
where yi′ is the predicted value, yi is the observed value, yi is the mean of the observed values and n is the number of test data. A high-performing model should have a small RMSE and an R2 closer to 1. To compare the prediction accuracy among different models more intuitively, we report the relative improvements based on RMSE values using the formula shown in Eq. (7):
RMSE(%) = (RMSEOK − RMSEMi )/ RMSEOK × 100
(7)
Sg = [Ap1/2 ÷ 10] round × 10
where RMSEOK is the RMSE of the OK model, which was selected as the benchmark model for comparison, and RMSEMi is the RMSE of model i.
(8)
where Ap is the area of the land parcel used for comparison, and the 241
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 7. Statistics of land parcels in different area ranges: (a) statistics of land parcels after rounding down to the nearest thousand; (b) statistics of land parcels with an area of < 2000 m2 after rounding down to the nearest hundred.
The maps indicated that grid-based mapping has higher variability in the southwestern and central regions of the study area at fine scale. This high variability is not only found in large land parcels (see Fig. 9-b) but also in medium land parcels (see Fig. 9-c). Based on the statistics of the total number of land parcels with different standard deviation values, 16% of the land parcels had high variability (standard deviation > 0.2). However, in the land parcels with low variability (standard deviation ≤ 0.2), 74% were zero because they contained only one grid cell. From this result, we can speculate that when a smaller grid size is selected, a larger number land parcels will show high variability. In contrast, the statistics of the total area of land parcels with different standard deviation values show that 43% of study area has high variability (standard deviation > 0.2).
Table 3 Comparison of land parcels and grids.
2
Area of land parcel (m ) Grid size (m) Number of grids in study area Rpa
Max
Min
Mean
Q50
191,165.13 440 2634 0.00076%
137.41 10 5,091,743 100%
2145.16 50 203,660 27.59%
1122.58 30 565,739 74.37%
a Rp is the ratio of the number of land parcels whose grid size is larger than or equal to the specified value to the total number of land parcels in the study area; it represents the ratio of land parcels that could be estimated by one or multiple grids if we assume that the shapes of land parcels are approximately square.
symbol “[]round” represents a rounding algorithm. Based on the land parcel area statistics, the largest and the smallest grid sizes were approximately 440 m and 10 m. If a grid size of 30 m was selected, 74.37% (97,361) of the land parcels could be estimated by one or multiple grids, but using this approach, the number of grids for DSM in the study area reached 565,739—more than 4 times the number of land parcels (Table 3). For complex prediction models and large region, this reduction represents a significant increase in computational efficiency. For example, the time cost of grid-based DSM increased from 11.32 s (the time cost of land-parcel-based DSM) to 47.84 s using the same OK model parameters and point prediction method. Furthermore, land parcels with areas similar to those of regular grids were also difficult to estimate using a grid because of the variety of land parcel shapes caused by the local complex agricultural production mode. Fig. 8 shows a visual comparison between land parcels with different shapes and areas and 30 m regular grids. In addition to visual and quantitative statistical comparisons, we also performed quantitative comparisons of the mapping effects between grids and land parcels. In our approach, a land parcel is the smallest separable unit extracted from high resolution remote sensing images. These land parcels can be considered as a pure unit from both environmental influence and agricultural production management aspects. The land-parcel-based mapping reflects this homogeneity, and the variability of the land-parcel-based prediction results is zero. Therefore, the standard deviation of grid-based prediction values in each land parcel was calculated to reflect the variability of the gridbased prediction results in a relatively homogeneous geographic unit. Fig. 9 shows the spatial distribution of this variability.
4.2. Performance of the spatial models Table 4 shows the results of the independent verification of mapping four soil nutrient properties using different spatial modeling techniques. For the SOM content in topsoil, the RMSE and R2 values vary from 3.418to 3.609 and from 0.25 to 0.33, respectively. Land-parcel-based RF had the lowest RMSE and the highest R2 values; hence, it was the best predictive model for SOM in topsoil across our study area. Gridbased RF prediction followed, with an RMSE value of 3.424 and the same R2 value (0.32). The RF model performs better than OK and Co-K models both for grid-based SOM prediction and for land-parcel-based SOM prediction. However, the ANN model performs the worst in four models, although land-parcel-based ANN performs slightly better than grid-based ANN. This result is different from those obtained by Taghizadeh-Mehrjardi et al. (2016) who indicated that ANN performed better than RF for soil organic carbon (SOC) prediction. This discrepancy might be due to the fact that the ANN model is more prone to overfitting than the RF model when sample data are limited. The RF algorithm can handle high dimensional data and does not require feature selection and standardization. The generalization ability of the RF model is strong. In our study, 700 samples were acquired from a 500 km2 area, while their study area yielded 188 samples from 30 km2. Based on this speculation, we conducted two other tests for the ANN model. One test involved allocating more samples (n = 900, 90% of the sample points) for training, and the other test was to select smaller regions (277 sample points in this area) for prediction. The results showed that the additional training samples improved the test accuracy, making it more consistent with the training accuracy, which 242
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 8. The result of land parcel extraction and detail comparison with 30 m regular grids: (a) standard false color image of study area and the location of (b) study area; (b) land parcel extraction results of local area; (c), (d) and (e) local magnifications of (b) and visual comparison between land parcels with different shapes and areas and 30 m regular grids.
different soil nutrients. The RMSE values of the prediction of different nutrient properties vary widely, thus, the relative improvement of RMSE (denoted as RMSE% in Table 4) better reflects the model performances when predicting different nutrient properties. The landparcel-based RF RMSE% values are 1.27, 4.23, 3.19 and 9.01, respectively, for SOM, TN, available P and available K. These results indicate that the land-parcel-based RF model improves on the other models for predicting available K. The sensitivity analysis results (see Fig. 10) of environmental covariates in the land-parcel-based RF model indicate that distance to the Yellow River is the most important factor influencing the prediction of SOM, TN and available K, and it is one of the important features influencing the prediction of available P. This result can be explained by the fact that the Yellow River is the most important factor in soil
verified our speculations to a certain extent. Similar performances were found for TN, available P and available K prediction using the combinations of the two mapping units and the four algorithms. The land-parcel-based RF model achieved the best performances both for RMSE and R2. These results suggest that the landparcel-based RF model is more capable than the other models used in our study for making fine-scale predictions of topsoil nutrients at regional scales, and it is more feasible than the ANN model under similar environmental conditions and with limited sample data. Therefore, we consider the land-parcel-based RF model as one of the most suitable models for making fine-scale predictions of topsoil nutrients in alluvialdiluvia plain agricultural areas. Table 4 shows that the RMSE and R2 values of land-parcel-based RF model range from 0.1863 to 46.25 and 0.11 to 0.42, respectively, for
Fig. 9. Spatial distribution of the variability of grid-based SOM prediction results in each land parcel using the RF method, which has the best overall accuracy (Table 4): (a) the standard deviation of grid-based prediction values in study area; (b) and (c) are local magnifications of (a). 243
Geoderma 340 (2019) 234–248
W. Dong et al.
Table 4 Results of model evaluation for SOM prediction (g/kg), TN (g/kg), available P (mg/kg) and available K (mg/kg) using different modeling technologies and different mapping units. RMSE
OK Co-K PF ANN
a
Ga Pa Ga Pa Ga Pa Ga Pa
RMSE (%)
R2
SOM
TN
P
K
SOM
TN
P
K
SOM
TN
P
K
3.462 3.432 3.461 3.429 3.424 3.418 3.609 3.553
0.1945 0.1944 0.1882 0.1882 0.1870 0.1863 0.2077 0.2051
20.31 20.25 20.44 20.34 19.75 19.66 20.66 20.43
50.83 50.83 51.43 50.93 46.63 46.25 53.39 52.83
0.00 0.87 0.04 0.96 1.10 1.27 −4.24 −2.62
0.00 0.05 3.22 3.23 3.83 4.23 −6.80 −5.44
0.00 0.30 −0.61 −0.13 2.78 3.19 −1.72 −0.58
0.00 0.01 −1.18 −0.19 8.28 9.01 −5.02 −3.93
0.31 0.32 0.31 0.33 0.32 0.32 0.25 0.28
0.38 0.39 0.42 0.43 0.43 0.42 0.30 0.32
0.06 0.07 0.06 0.09 0.11 0.11 0.04 0.04
0.03 0.00 0.01 0.00 0.14 0.18 −0.07 −0.07
G: Mapping with 30 m grids; P: mapping with land parcels. Fig. 10. The relative importance of the environmental variables used in the landparcel-based RF model for prediction of topsoil SOM, TN, available P and available K. RIVD: Distance to river; MI: Moisture index; PE2015: Annual precipitation (2015); ARI: Aridity index; DOWD: Distance to downtown; PA: Average annual precipitation; TE2015: Average annual temperature (2015); EVA: Elevation; Sand: Sand content; CLY: Clay content; AAT10: Annual accumulated temperature (≥10°); GEOR: Geomorphic type; LANDUSE: Land use; ASPECT: Aspect.
same prediction algorithm. This difference ranges from 0.01 to 1.62 for the different soil nutrients and indicates that land-parcel-based mapping generally performs better than grid-based mapping, although the relative improvement varies widely for different soil nutrients. In addition, land-parcel-based mapping has better performance at the detail level even when the overall accuracies of land-parcel-based and grid-based mapping are similar. Fig. 11 shows a detail comparison of the SOM mapping results using the land-parcel-based and grid-based RF models. The middle column in Fig. 11 shows the details of the central area of the study area. The grid-based mapping result (see Fig. 11-d) exhibits the phenomenon of the salt and pepper problem, which indicates that the grid-based prediction results are noisier. The right column in Fig. 11 shows the details of the eastern part of the study area. The jagged edges are unavoidable in the grid-based mapping results (see Fig. 11-f), while the land-parcel-based mapping result results in more realistic boundaries (see Fig. 11-e).
formation in this area. The three factors of moisture index, aridity index and annual precipitation (2015), which are related to regional soil water dynamics, are also highly important factors for predicting the four studied nutrient properties. This result indicates that soil water dynamics play an important role in the spatial distribution of soil nutrients in this area. In addition, the average annual temperature (2015) also has a more obvious influence on soil nutrients. This result might be related to the crop species selected for regional agriculture, which are restricted by regional climates. The elevation is the only one factor that has a significant effect on the spatial distribution of soil nutrients in ten topographic factors. This result might be related to the fact that land has been leveled in agriculture area and the role of other factors has been weakened. These results also help to explain the overall consistency but local differences in the spatial distribution of the four nutrient property prediction results (see Fig. 12). Using Table 4, we can calculate the differences in RMSE% between the land-parcel-based results and the grid-based results when using the
244
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 11. Comparison of mapping results of SOM using land-parcel-based RF model and grid-based RF model: (a) and (b) are land parcel-based RF and grid-based RF model results, respectively; (c) and (d) are local magnifications of the rectangle in the northwestern portion of the study area, and (e) and (f) are local magnifications of the rectangle area in the northeastern portion of the study area.
area, the soil content of SOM, available P and available K differed greatly from that of the central area; however, the TN content did not change significantly. Among the 30 environmental factors, climate factors such as the aridity index have undergone great changes in this region, which indicates that the influence of climate factors on SOM, available P and available K soil content is greater than their influence on TN soil content. In the study area, the spatial distribution of four nutrient properties is similar at large scale, but different nutrients show different spatial characteristics at fine scale. This indicates that soil nutrient content is mainly influenced by soil parent material, soil type, terrain and other factors related to soil development at the regional scale, while land use type, irrigation, other human activities and climate factors may have large influences on the differences in nutrient content in local areas. In the areas with similar climate, topography and soil parent material, the mean contents of TN, available P and available K were higher in vegetable gardens than in cultivated fields. This could be explained by the more frequent use of chemical fertilizers in vegetable gardens because such gardens undergo multiple crop rotations in one year.
4.3. Spatial distributions of soil nutrients and their uncertainty Based on the prediction accuracy analysis of the different models, land-parcel-based RF model was applied to map each studied soil nutrient property across the study area, and a bootstrap approach (Liess et al., 2012) was adopted to estimate the uncertainty in these properties. The predictions and uncertainties of SOM, TN, available P and available K are shown in Fig. 12. The descriptive statistics show that SOM, TN, available P and available K range from 4.75 to 21.05 g/kg, 0.33 to 1.43 g/kg, 13.77 to 75.01 mg/kg and 74.40 to 340.40 mg/kg, respectively. The mean values of SOM, TN, available P and available K are 12.88 g/kg, 0.81 g/kg, 38.35 mg/kg and 142.43 mg/kg, respectively. These results indicated that our study area has relatively low SOM and TN but relatively high available P and available K, which is caused by interactions between soil parent material distribution and the fertilization methods used in this area. The soil parent material determines that the soil nutrient content in our study area is generally low. However, the continuous use of chemical fertilizers increases the available P and available K, while increases in SOM are limited. The mapping results indicated that the spatial distributions of the four studied nutrient properties is generally similar across the study area; the mean in the southwest is lower than that in the area along the Yellow River, which is consistent with the distribution trend of soil parent material in this area. However, the four nutrient property values differ in some areas. Around the central urban area, means of TN, available P and available K are significantly higher than those means in other parts of the study area. TN and available P show this trend on both sides of the central urban area, while available K is more obvious on the west side. However, the SOM content does not show the same trend. This indicates that the soil content of TN, available P and available K is influenced by factors related to urban distribution such as land use types and tillage methods. In the northeastern part of the study
4.4. Future work To our knowledge, this study was the first attempt to develop a fine soil nutrient map at regional scale based on land parcels. Land parcels are basic geographical units and are also the basic management units to which precision management of agriculture and the environment are applied. We achieved automatic land parcel extraction based on a modified VGG16 network and achieved good results in a plain area. However, there are large mountain ranges and hilly areas in China, and the land parcels in these areas are more complex and fractured. Thus, the land parcel extraction model must be improved before its application in these areas. In addition, soil property data at several depths are required for many environmental or agricultural studies and 245
Geoderma 340 (2019) 234–248
W. Dong et al.
(caption on next page) 246
Geoderma 340 (2019) 234–248
W. Dong et al.
Fig. 12. Mapping results of soil nutrient properties and their uncertainty based on land parcels across the study area using RF: (a) SOM content (g/kg); (b) TN content (g/kg); (c) Available P content (mg/kg); (d) Available K content (mg/kg); (e) Uncertainty map of SOM (g/kg); (f) Uncertainty map of TN (g/kg); (g) Uncertainty map of available P (mg/kg); (h) Uncertainty map of available K (mg/kg).
References
applications (Hempel et al., 2008); therefore, we plan to expand this study to consider three-dimensional soil property prediction. Predicting soil nutrient properties based on land parcels instead of grids can achieve highly consistent results between soil property predictions and actual distributions. However, this approach makes the prediction process more challenging. In addition to extracting land parcels accurately and automatically, processing and selection of the fine-scale environmental factors are also a challenge. In the future, we plan to conduct additional experiments to further improve the accuracy of land-parcel-based DSM.
Arlot, S., Celisse, A., 2009. A survey of cross-validation procedures for model selection. Stat. Surv. 4 (2010), 40–79. Banton, O., Cimon, M.A., Seguin, M.K., 1997. Mapping field-scale physical properties of soil with electrical resistivity. Soil Sci. Soc. Am. J. 61, 1010–1017. Bengio, Y., 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2 (1), 1–127. Beven, K.J., Kirkby, M.J., 1979. A physically-based variable contributing area model of basin hydrology. Hydrol. Sci. Bull. 24 (1), 43–69. Boehner, J., Selige, T., 2006. Spatial prediction of soil attributes using terrain analysis and climate regionalisation. Goettinger Geogr. Abh. 115, 13–27. Brungard, C.W., Boettinger, J.L., Duniway, M.C., Wills, S.A., Edwards Jr., T.C., 2015. Machine learning for predicting soil classes in three semi-arid landscapes. Geoderma 68-83, s239–s240. Brus, D.J., 1994. Improving design-based estimation of spatial means by soil map stratification. A case study of phosphate saturation. Geoderma 62, 233–246. Bruzzone, L., Carlin, L., 2006. A multilevel context-based system for classification of very high spatial resolution images. IEEE Trans. Geosci. Remote Sens. 44 (9), 2587–2600. Chaney, N.W., Wood, E.F., McBratney, A.B., Hempel, J.W., Nauman, T.W., Brungard, C.W., et al., 2016. Polaris: a 30-meter probabilistic soil series map of the contiguous United States. Geoderma 274, 54–67. China National Environmental Monitoring Centre, 1992. The Modern Analysis Method of Soil Elements. China Environmental Science Press, Beijing. Chu, M., Jagadamma, S., Walker, F.R., Eash, N.S., Buschermohle, M.J., Duncan, L.A., 2017. Effect of multispecies cover crop mixture on soil properties and crop yield. Agric. Environ. Lett. 2 (1). https://doi.org/10.2134/ael2017.09.0030. Ciccarese, L., Silli, V., 2016. The role of organic farming for food security: local nexus with a global view. Futur. Food: J. Food Agric. Soc. 4 (1), 56–67. Conforti, M., Pascale, S., Robustelli, G., Sdao, F., 2014. Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo River catchment (northern Calabria, Italy). Catena 113, 236–250. Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., 2015. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 8, 1991–2007. Cornfield, A.H., Pollard, A.G., 1952. Extraction of potassium from soil by electrodialysis and by ammonium acetate leaching. I.-soils from long-term manurial trials. J. Sci. Food Agric. 3, 151–152. Cutler, A., Cutler, D.R., Stevens, J.R., 2012. Ensemble Machine Learning: Methods and Applications. Springer Science + Business Media, LLC. Dai, J., He, K., Sun, J., 2016. Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3150–3158. Duan, S., Dong, Z., Hu, X., et al., 2016. Small-world Hopfield neural networks with weight salience priority and memristor synapses for digit recognition. Neural Comput. Appl. 27 (4), 837–844. Far, S.T., Rezaei-Moghaddam, K., 2017. Impacts of the precision agricultural technologies in Iran: an analysis experts' perception & their determinants. Inf. Process. Agric. 5 (1), 173–184. Franzen, D.W., Hopkins, D.H., Sweeney, M.D., Ulmer, M.K., Halvorson, A.D., 2002. Evaluation of soil survey scale for zone development of site-specific nitrogen management. Agronomy Journal 94 (2), 381–389. Gallant, J.C., Dowling, T.I., 2003. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 39 (12), 1347–1359. Gebbers, R., Adamchuk, V.I., 2010. Precision agriculture and food security. Science 327 (5967), 828. Godfray, H.C.J., Beddington, J.R., Crute, I.R., Haddad, L., Lawrence, D., Muir, J.F., et al., 2010. Food security: the challenge of feeding 9 billion people. Science 327 (5967), 812–818. Göl, C., Bulut, S., Bolat, F., 2017. Comparison of different interpolation methods for spatial distribution of soil organic carbon and some soil properties in the Black Sea backward region of Turkey. J. Afr. Earth Sci. 134, 85–91. Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford Univ. Press, New York. Goovaerts, P., 1999. Geostatistics in soil science: state-of-the-art and perspectives. Geoderma 89 (1–2), 1–45. Grunwald, S., 2009. Multi-criteria characterization of recent digital soil mapping and modeling approaches. Geoderma 152 (3–4), 195–207. Guisan, A., Weiss, S.B., Weiss, A.D., 1999. GLM versus CCA spatial modeling of plant species distribution. Plant Ecol. 143, 107–122. Haering, T., Dietz, E., Osenstetter, S., Koschitzki, T., Schroeder, B., 2012. Spatial disaggregation of complex soil map units: a decision-tree based approach in Bavarian forest soils. Geoderma 185, 37–47. Haring, T., Dietz, E., Osenstetter, S., Koschitzki, T., et al., 2012. Spatial disaggregation of complex soil map units: a decision-tree based approach in Bavarian forest soils. Geoderma 185-186, 37–47. Hempel, J.W., Hammer, R.D., Moore, A.C., et al., 2008. Challenges to Digital Soil Mapping. Springer Netherlands, Berlin. Hengl, T., Heuvelink, G.B., Kempen, B., Leenaars, J.G., Walsh, M.G., Shepherd, K.D.,
5. Conclusions Topsoil nutrients are important soil components that support crop growth; consequently, the fine-scale prediction of soil nutrients plays an important role in precision agricultural and environmental management applications. This paper reports the results of a study using the land-parcel-based DSM method and provides a fine-scale prediction of topsoil SOM, TN, available P and available K conducted in an alluvialdiluvia plain agricultural area in China. A CNN-based land parcel automatic extraction method provided the basic foundation for our DSM method and achieved good results in the studied plain agricultural area. The comparisons of different algorithms and mapping units showed that, overall, the land-parcel-based RF model is best method for predicting topsoil nutrient content in the study area. In addition, landparcel-based mapping performed better than grid-based mapping at the detail level, and it can improve the efficiency of prediction algorithms by approximately 4 times because it effectively reduces the number of mapping units in complex agricultural areas compared with the gridbased mapping results when using the same algorithm. However, landparcel-based models do not improve the precision significantly compared to grid-based models, which indicates that the land-parcel-based models require further improvements for fine-scale prediction in areas with similar elevation and meteorological environments using other environmental covariates and data mining algorithms. Overall, landparcel-based DSM has a good prediction accuracy performance, good computational efficiency and good results at mapping the details of fine-resolution topsoil nutrient, and it is useful for PA management and environmental applications in China. Therefore, we can conclude that the land-parcel-based SOM method applied in this study area could be used in other similar areas in China, but further studies are needed to improve the model for land-parcel-based DSM in mountainous and hilly agricultural areas. Acknowledgement The soil sampling data, soil type data and historical meteorological data of the study area were provided by the Ningxia Academy of Agriculture and Forestry Sciences, and the GF2 image data was provided by the ImageSky company. The authors are grateful for their support. We wish to thank the anonymous reviewers and editor for their valuable contribution. Dr. Wen Dong also thank her baby, who will be born to accompany her to complete this revision work. Funding This work was supported by the National Key Research and Development Program, China [grant number 2017YFB0503600], Beijing Municipal Natural Science Foundation, China [grant number 4172064] and National Natural Science Foundation of China, China [grant number 41631179]. 247
Geoderma 340 (2019) 234–248
W. Dong et al.
Saadatdoost, R., Sim, A.T.H., Jafarkarimi, H., 2012. Application of self-organizing map for knowledge discovery based in higher education data. In: International Conference on Research and Innovation in Information Systems, pp. 1–6. Sahrawat, K.L., 1982. Simple modification of the Walkley-Black method for simultaneous determination of organic carbon and potentially mineralizable nitrogen in tropical rice soils. Plant Soil 69 (1), 73–77. Saunders, A.M., Boettinger, J.L., 2006. Digital soil mapping - an introductory perspective. Dev. Soil Sci. 31, 389–620. Schneider, P., Hammer, B., Biehl, M., 2009. Adaptive relevance matrices in learning vector quantization. Neural Comput. 21, 3532–3561. Sharma, A.K., 2001. A Handbook of Organic Farming. Agrobios India, Jodhpur. Shen, W., Wang, X., Wang, Y., Bai, X., Zhang, Z., 2015. Deep contour: a deep convolutional feature learned by positive-sharing loss for contour detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 3982–3991. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. [online] Available: https://www.arxiv.org/abs/1409.1556. Stevens, Francois, Bogaert, P., Wesemael, B.V., 2015. Detecting and quantifying fieldrelated spatial variation of soil organic carbon using mixed-effect models and airborne imagery. Geoderma 259-260, 93–103. Taghizadeh-Mehrjardi, R., Nabiollahi, K., Minasny, B., Triantafilis, J., 2015. Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma 253-254, 67–77. Taghizadeh-Mehrjardi, R., Nabiollahi, K., Kerry, R., 2016. Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran. Geoderma 266, 98–110. Takata, Y., Funakawa, S., Akshalov, K., Ishida, N., Kosaki, T., 2007. Spatial prediction of soil organic matter in northern Kazakhstan based on topographic and vegetation information. Soil Sci. Plant Nutr. 53 (3), 289–299. Thiele-Bruhn, S., Bloem, J., Vries, F.T.D., Kalbitz, K., Wagg, C., 2012. Linking soil biodiversity and agricultural soil management. Curr. Opin. Environ. Sustain. 4 (5), 523–528. Tomlinson, R., 1978. Design considerations for digital soil map systems. In: 11th Congress of Soil Science. ISSS, Edmonton, Canada. Umali, B.P., Oliver, D.P., Forrester, S., Chittleborough, D.J., et al., 2012. The effect of terrain and management on the spatial variability of soil properties in an apple orchard. Catena 93, 38–48. Vincent, Sebastien, Lemercier, B., Berthier, L., et al., 2018. Spatial disaggregation of complex soil map units at the regional scale based on soil-landscape relationships. Geoderma 311 (1), 130–142. Were, K., Bui, D.T., Dick, Øystein B., Singh, B.R., 2015. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecol. Indic. 52, 394–403. Xie, S., Tu, Z., 2015. Holistically-nested edge detection. Int. J. Comput. Vis. 125 (1–3), 3–18. Yang, R.M., Zhang, G.L., Liu, F., et al., 2016. Comparison of boosted regression tree and random forest models for mapping topsoil organic carbon concentration in an alpine ecosystem. Ecol. Indic. 60, 870–878. Yang, Y., Huang, Q., Wu, W., Luo, J., et al., 2017. Geo-parcel based crop identification by integrating high spatial-temporal resolution imagery from multi-source satellite data. Remote Sens. 9 (12), 1298. Zhang, Y., Wang, L., Duan, Y., 2016. Agricultural information dissemination using ICTS: a review and analysis of information dissemination models in China. Inf. Process. Agric. 3 (1), 17–29. Zhao, W., Du, S., 2016. Spectral-spatial feature extraction for hyperspectral image classification: a dimension reduction and deep learning approach. IEEE Trans. Geosci. Remote Sens. 54 (8), 4544–4554.
et al., 2015. Mapping soil properties of Africa at 250 m resolution: random forests significantly improve current predictions. PLoS One 10, 1–26. Hengl, T., Jesus, J.M.D., Heuvelink, G.B.M., Gonzalez, M.R., Kilibarda, M., et al., 2017. SoilGrids250 m: global gridded soil information based on machine learning. PLoS One 12 (2). https://doi.org/10.1371/journal.pone.0169748. Heung, B., Hodul, M., Schmidt, M.G., 2017. Comparing the use of training data derived from legacy soil pits and soil survey polygons for mapping soil classes. Geoderma 290, 51–68. Hobbs, P.R., Sayre, K., Gupta, R., 2008. The role of conservation agriculture in sustainable agriculture. Philos. Trans. R. Soc. Lond. 363 (1491), 543–555. Hofer, C., Borer, F., Bono, R., et al., 2013. Predicting topsoil heavy metal content of parcels of land: an empirical validation of customary and constrained lognormal block kriging and conditional simulations. Geoderma 193-194, 200–212. Hudson, B.D., 1992. The soil survey as paradigm-based science. Soil Sci. Soc. Am. J. 56, 836–841. Isaaks, E.H., Srivastava, R.M., 1989. Applied Geostatistics. Oxford Univ. Press, New York. Khosla, R., Inman, D., Smith, F., Macdonald, L., Mzuku, M., Reich, R., 2005. Spatial variability of measured soil properties across site-specific management zones. Soil Sci. Soc. Am. J. 69 (5), 1572–1579. Liess, M., Glaser, B., Huwe, B., 2012. Uncertainty in the spatial prediction of soil texture. Comparison of regression tree and random forest models. Geoderma 170, 70–79. Liu, J., Pattey, E., Nolin, M.C., et al., 2008. Mapping within-field soil drainage using remote sensing, dem and apparent soil electrical conductivity. Geoderma 143 (3–4), 261–272. Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X., 2016. Richer convolutional features for edge detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 5872–5881. McBratney, A.B., 1998. Some considerations on methods for spatially aggregating and disaggregating soil information. Nutr. Cycl. Agroecosyst. 50 (1–3), 51–62. McBratney, A.B., Mendonça Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–52. McGrath, D.A., Smith, C.K., Gholz, H.L., 2001. Effects of land-use change on soil nutrient dynamics in Amazônia. Ecosystems 4 (7), 625–645. Minasny, B., McBratney, A.B., 2016. Digital soil mapping: a brief history and some lessons. Geoderma 264, 301–311. Minasny, B., McBratney, A.B., Malone, B.P., Wheeler, I., 2013. Digital mapping of soil carbon. Adv. Agron. 118, 1–47. Moore, I.D., Gessler, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soil attribute prediction using terrain analysis. Soil Sci. Soc. Am. J. 57 (2), 443–452. Mulder, V.L., Bruin, S.D., Schaepman, M.E., Mayr, T.R., 2011. The use of remote sensing in soil and terrain mapping-a review. Geoderma 162, 1–19. Negasa, T., Ketema, H., Legesse, A., et al., 2017. Variation in soil properties under different land use types managed by smallholder farmers along the topo sequence in southern Ethiopia. Geoderma 290, 40–50. Olsen, S.R., Cole, C.V., Watanabe, F.S., Dean, L.A., 1954. Estimation of Available Phosphorus in Soils by Extraction With Sodium Bicarbonate. U.S. Department of Agriculture Circular, Washington, DC, pp. 939. Poggio, L., Gimona, A., Spezia, L., et al., 2016. Bayesian spatial modelling of soil properties and their uncertainty: the example of soil organic matter in Scotland using RINLA. Geoderma 277, 69–82. R Core Team, 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. Razakamanarivo, R.H., Grinand, C., 2011. Mapping organic carbon stocks in eucalyptus plantations of the central highlands of Madagascar: a multiple regression approach. Geoderma 162 (3), 335–346. Riley, S.J., De Gloria, S.D., Elliot, R., 1999. A terrain ruggedness that quantifies topographic heterogeneity. Intermountain J. Sci. 5 (1–4), 23–27.
248