Geoderma 263 (2016) 254–263
Contents lists available at ScienceDirect
Geoderma journal homepage: www.elsevier.com/locate/geoderma
A similarity-based method for three-dimensional prediction of soil organic matter concentration Feng Liu a, David G. Rossiter a,b, Xiao-Dong Song a, Gan-Lin Zhang a,⁎, Ren-Min Yang a, Yu-Guo Zhao a, De-Cheng Li a, Bing Ju a a b
State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China Section of Soil & Crop Sciences, School of Integrated Plant Science, Cornell University, Ithaca, NY 14850, USA
a r t i c l e
i n f o
Article history: Received 15 December 2014 Received in revised form 24 May 2015 Accepted 26 May 2015 Available online 17 June 2015 Keywords: 3D soil mapping Similarity-based prediction Depth function model Soil organic matter concentration
a b s t r a c t This paper presents an approach to predicting three-dimensional (3D) variation of soil organic matter (SOM) concentration by integrating a similarity-based method with depth functions. It was tested in a small hilly landscape. A depth function model was constructed to fit SOM profile distribution using a linear relation in the topsoil and a power function in the subsoil. Then, under the assumption that similar environmental conditions at two sites would lead to the development of similar profile morphologies and thus similar depth function parameters, the similarity-based method was used to spatially interpolate the depth function parameters based on their relationships with environmental variables. With the values of the parameters for every location, a 3D map of SOM distribution was generated. The predicted SOM pattern well reproduced the statistical distribution of the pedon dataset used in this study. The overall mean error (ME) was 0.06 g kg−1 and ratio of performance to deviation (RPD) was 2.34. We conclude that the proposed approach is effective and accurate for 3D SOM prediction. It overcomes two drawbacks of the frequently used pseudo 3D soil mapping approach: (1) the neglect of vertical soil pattern when performing horizontal soil predictions, and (2) the repeated applications of depth function fittings in the mapping process, both of which may lead to prediction errors. Moreover, the similarity-based method is a transparent and traceable prediction process, allowing for easy interpretation of its results. This is useful for understanding soil–environmental relationships and processes. The method thus is an attractive alternative to the commonly used non-linear “black-box” techniques such as artificial neural networks. © 2015 Elsevier B.V. All rights reserved.
1. Introduction Detailed and accurate three-dimensional (3D) soil information is required for estimating soil carbon stocks and modeling hydrological processes (Sanchez et al., 2009). Conventional polygon-based soil mapping techniques usually cannot provide such information. To meet this requirement, the GlobalSoilMap.net project, officially launched by a global consortium of scientists in 2009, intends to develop digital soil mapping methods to produce a fine-resolution, 3D soil data products across the globe (GlobalSoilMap Science Committee, 2013). Many attempts have been made on 3D soil mapping (Minasny et al., 2013; Arrouays et al., 2014). Most considered it as multiple 2D soil mapping operations at a set of predefined depth intervals (Tekin et al., 2008; Grimm et al., 2008; Vasques et al., 2010a,b; Malone et al., 2009; Adhikari et al., 2013; F. Liu et al., 2013; Odgers et al., 2015). These 2D mapping results are represented as depth averages (for concentrations) or sums (for stocks). These averages can be reconstructed into a full 3D soil property map (Malone et al., 2011; Lacoste et al., 2014). Although ⁎ Corresponding author. E-mail address:
[email protected] (G.-L. Zhang).
http://dx.doi.org/10.1016/j.geoderma.2015.05.013 0016-7061/© 2015 Elsevier B.V. All rights reserved.
multiple 2D mapping is simple to implement, F. Liu et al. (2013) argued that it is a pseudo 3D mapping approach and has two drawbacks. One is that soil variation pattern in the vertical dimension is neglected when performing separate horizontal soil predictions for each depth interval. The other is that depth function fitting is often applied twice in the mapping process. The first is to standardize genetic or fieldrecorded horizon-based soil observations to pseudo-observations with a set of consistent depth intervals as input to the horizontal prediction models. The second is to fit soil predictions of the depth intervals to produce vertically continuous prediction for each location in the area of interest. Any errors in the fitting are thus repeated and may be magnified. To overcome these drawbacks, some attempts have been made to develop true 3D soil mapping approaches (Minasny et al., 2006; Meersmans et al., 2009; Mishra et al., 2009; Kempen et al., 2011; Veronesi et al., 2012, 2014). These methods perform horizontal spatial prediction directly on vertical soil variation pattern, represented by the parameters of depth functions. The 1D vertical and 2D horizontal predictions are thus tightly integrated in the mapping process. Moreover, depth function fitting is applied only once in the mapping process. However, in these attempts the depth functions were combined mainly with statistical and geostatistical techniques (i.e., multiple regression
F. Liu et al. / Geoderma 263 (2016) 254–263
and ordinary kriging) or data mining techniques (i.e., artificial neural networks and random forests). The former generally requires fairly dense soil observations to ensure reliable model calibration, especially for variogram model estimation (Park and Vlek, 2002; Lacoste et al., 2014). The latter uses “black-box” models with a non-transparent prediction process, making it difficult to interpret their results. This non-transparency is a major concern to soil scientists because the interpretation of the prediction model is desirable for gaining knowledge of soil–landscape processes, e.g., by which soil organic C is accumulated, transformed and distributed in 3D. An alternative to these techniques is a similarity-based prediction method. This class of methods predicts soil property values at an unsampled location based on its environmental similarity with observed sites. It has no requirements regarding the number and distribution of soil observations (Zhu et al., 2010). Moreover, its prediction process is transparent and traceable, allowing for easy interpretation of its prediction results. It also has one of the merits of the “black-box” models, i.e., the ability to deal with complex and nonlinear relationships between soil and environment. Such relations are likely for soil organic C, due to the complexity of processes by which it is accumulated, transformed and distributed in 3D. Therefore, the objective of this study was to examine the effectiveness of integrating a similarity-based prediction method with depth functions to predict 3D distribution of soil organic matter (SOM) concentration.
255
2.2. The approach of 3D SOM prediction The approach contains three steps. First, a depth function model was constructed to fit SOM profile distribution for each pedon. Second, a similarity-based method was used to interpolate the parameters of the depth function across this area. Third, with the values of the parameters for every location, 3D SOM distribution was generated. 2.2.1. Construction of depth function of SOM concentration We assumed that SOM concentration varies continuously with depth. To find the most accurate depth function with which to describe SOM variation with depth, we evaluated the power, exponential, and logarithmic functions for their ability to match the observed SOM profile variation. These functions were considered due to their mathematical simplicity (only two parameters). The bulk horizon SOM value represents the average value over the horizon (McBratney et al., 2000). We fitted the three functions through the mid-depth of horizon data for each of the 79 profiles. The power function best described the observed SOM variation with depth. But its fittings tended to overestimate SOM concentration near the soil surface. We thus introduced a linear function to describe SOM vertical variation near the soil surface and used a power function to describe that below. The resultant depth function model is continuous but with discontinuous derivatives at the crossover depth: ( Y ¼
k0 ðux0 Þk1 þ aðX−x0 Þ; X ≤x0 k0 X k1 ; XNx0
ð1Þ
2. Material and methods 2.1. Study area and data sets The description of the study area can be found in F. Liu et al. (2013). The major soil types in this area include the Haplic Cambisols (Alumic, Dystric), Haplic Cambisols (Dystric, Ferric, Rhodic), Haplic Cambisols (Dystric, Ferric), Haplic Cambisols (Dystric), Haplic Cambisols, Hyperskeletic Leptosols, and Hydragric Anthrosols (Oxyaquic) according to the World Reference Base for soil resources (WRB) (IUSS Working Group WRB, 2014). Fig. 1 shows the general catenary sequence of the soil types in this area. Table 1 lists their typical environmental conditions. We used the same set of 79 pedon sites as F. Liu et al. (2013). Fig. 2a shows the imbalanced spatial distribution of these sites. The soil sample collection and SOM measurement of soil horizons were described in the previous paper. We also used the same digital elevation model and Landsat Thematic Mapper data. The environmental variables used in this study include elevation, slope gradient, mean curvature, convergence index, topographic wetness index (TWI), and TM bands 3, 4 and 5. Fig. 2b shows the interpreted map of land uses including forests, shrubs, tea plantations, cultivated uplands, and paddy fields, with an overall accuracy of 80%.
where Y is the SOM concentration, X is the soil depth, k0 and k1 are the parameters of the power function, a is the slope of the linear function, x0 is the crossover depth of the node between the linear function and the power function, and u is an adjustment factor to compensate for the different uses of the crossover depth (x0) when considering the effects of different land uses. For cultivated lands, the first soil horizon is usually the tillage layer, in which SOM concentration was almost constant to the tillage depth due to frequent mixing of the topsoil. For other lands, the first horizon had a near-linear SOM decrease with depth. Thus, the value of a was defined to be the trivial slope 0 for paddy fields, cultivated uplands and tea plantations, and − 1 (i.e., linear decrease) for forests and shrub lands. The value of u was set to be 1/2 for cultivated lands and 1 for other lands. From the land use map (Fig. 2b), the values of a and u were derived for every location in this area. The value of x0 was set to the lower depth limit (dh1) of the first horizon for the cultivated lands and the mid-depth (dh1/2) of the first horizon for other lands. Note that dh1 is known from the profile description, and is not fit. We fitted this depth function model for each profile and estimated the values of the parameters k0 and k1 at all 79 sites using curve fitting by the ‘nls’ function of the R environment (R Development Core Team, 2012).
Fig. 1. General catenary sequence of the soil types in this landscape.
256
F. Liu et al. / Geoderma 263 (2016) 254–263
Table 1 Typical environmental conditions of the soil types in the landscape. Soil types a
Landscape positions
Slope gradient
Mean curvature
Wetness conditions
Land uses
Haplic Cambisols (Alumic, Dystric) Haplic Cambisols (Dystric, Ferric, Rhodic) Haplic Cambisols Haplic Cambisols (Dystric, Ferric) Hyperskeletic Leptosols Haplic Cambisols (Dystric) Hydragric Anthrosols (Oxyaquic)
Summits Shoulderslope Shoulderslope Backslope Backslope Footslope and toeslope River valley
Very small Big Moderate Small Very big Small Almost zero
Flat or slightly convex Flat or slightly concave Slight convex or convex Slightly concave Convex Flat or slightly concave Flat
Moderate Fair Fair Moderate to good Poor Good Very good
Tea plantations and shrubs Forests and shrubs Forests and shrubs Forests and shrubs Forests and shrubs Forests and cultivated uplands Paddy fields
a
The soil classification system used here is the World Reference Base for soil resources 2014 (WRB).
Thus, only the depth function parameters k0, k1 and dh1 need to be spatially interpolated in the following steps. These can be interpreted as follows: k0 represents the magnitude of SOM in the topsoil, k1 represents the rate of SOM decrease with depth in the subsoil, and dh1 represents the thickness of the first soil horizon. 2.2.2. Spatial interpolation of the depth function parameters Hudson (1992) noted that the same landscape unit develops the same kind of soil, and generally the more similar two landscape units are the more similar their associated soils tend to be. From this we assumed that similar environmental conditions at two locations will lead to similar soil profile morphologies and thus similar depth function parameters. The more similar the environmental conditions, the more similar their parameters. With this assumption, a similarity-based method was used to spatially interpolate the parameters (k0, k1 and dh1) based on their relationships with the environmental variables. For the convenience of interpolation, the whole study area was divided into a grid of 30 m pixels. Each parameter was interpolated separately using the same procedure originally proposed by Zhu et al. (2010). We improved it in this study by developing a new environmental similarity algorithm and a method of determining the weights assigned to the environmental variables. The procedure consisted of two steps. The first step was to
calculate environmental similarity of a pixel to each pedon site using a weighted Gaussian function: Si j;m ¼
N X p¼1
" wp exp −
2 #! xp;i j − xp;m 2σ p 2
ð2Þ
where Sij,m is the environmental similarity between a pixel (i,j) and a pedon site (m), xp,ij and xp,m are the values of the p-th variable, respectively, at the pixel and the site, σp is the SD of the p-th variable over the entire spatial field, and wp is the weight of the p-th variable. The N is the total number of variables, which is determined subjectively. The similarity value ranges from 0 for extremely different environmental values between a pixel and a pedon site to 1 for identical values. The SD in the denominator of the negative exponential normalizes the differences between the pixel and the site by the overall variation of each predictor in the spatial field. For a given difference, a high SD results in a larger contribution of that variable to the overall similarity. The similarity index also depends on the weights of the variables. The second step was to estimate the parameter value at this pixel through aggregating the parameter values of all pedon sites. The aggregation was based on the environmental similarities of the pixel to the sites. Here, we assumed that the parameters can be linearly averaged, and each parameter can be averaged independently. The aggregation thus was implemented using a linear weighted equation (J. Liu et al., 2013): X X Si j;other V Si j;other = V i j ¼ Si j;max V Si j;max þ 1−Si j;max Si j;other ð3Þ
Fig. 2. (a) Distribution of pedon sites with the background of a digital elevation model; (b) Map of land uses interpreted from Landsat TM images acquired on 20 May 2006.
where Vij is the estimated value of the parameter at a pixel (i,j), Sij,max is the maximum similarity value between this pixel and the pedon sites, Vsij,max is the observed parameter value at the site with the maximum similarity, Sij,other is the similarity value between this pixel and any other pedon site, and Vsij,other is the observed parameter value at that site. In this equation, the environmental similarity was taken as the weights of the pedon sites in determining the parameter value at a pixel. The equation weights the most similar site according to the uniqueness of its similarity: if a pixel is only similar to one site (i.e., Sij,max = 1), that one will provide all the information; if a pixel is only marginally more similar to one site, the other sites will receive more weight. This admittedly empirical ad hoc formula is heuristically reasonable: if a pixel is mostly similar to one pedon site, there is no ambiguity; if a pixel has partial similarities to many sites one may expect that those sites are themselves similar, thus a linear combination of their values is reasonable. In this procedure, the weights (wp, Eq. (2)) of environmental variables were determined using trial-and-error based on the 79 pedon sites (Halab-Kessira and Ricard, 1999). There are two constraints for the determination: the weights must range between 0 and 1 and their combination must sum to 1. For a given combination, leave-one-out cross-validation (LOOCV) (Refaeilzadeh et al., 2009) was used to predict the parameter value for each pedon site. The prediction was implemented
F. Liu et al. / Geoderma 263 (2016) 254–263
through removing a site and then re-estimating it with the remaining sites using the two steps above. The root mean square error (RMSE) between the predicted and the observed parameter value was calculated. A large step (0.05) was used to determine a small searching range for each variable according to the change of RMSE, and then a small step (0.01) to find the optimal weight value for each variable. The combination with the smallest RMSE was chosen as optimal for the interpolation of the parameter. This procedure was implemented in the R environment (R Development Core Team, 2012). 2.2.3. Generation of 3D SOM distribution The obtained maps of the depth function parameters (k0, k1, dh1, a and u) were used in Eq. (1) to produce continuous SOM concentration profile for each pixel of the study area from the soil surface to a depth of 150 cm, as an average over the 30 m resolution grid cell. Continuous 3D distribution of SOM concentration was thus generated. Then, lateral distribution of SOM concentration for any depth interval can be computed as an average through dividing the integral of the depth function from its upper depth limit to lower limit by its thickness at each pixel. According to the specification of the GlobalSoilMap.net on gridded soil data products (GlobalSoilMap Science Committee, 2013), we generated lateral distributions of SOM concentration over the 0–5, 5–15, 15–30, 30–60, 60–100 and 100–150 cm depth layers. 2.2.4. Evaluation criteria The descriptive statistics of the pedon dataset were compared with those of the predicted values from the present study and those from the previous study F. Liu et al. (2013) for two purposes. One was to evaluate the reproducibility of the predicted 3D SOM pattern of the present study. The other was to evaluate the consistency of this pattern with that of the previous study since the two studies used the same study area and data but different approaches. The comparisons were performed at the 0–5, 5–15, 15–30, 30–60 and 60–100 cm depth intervals. In addition, LOOCV at the 79 pedon sites was used to evaluate the 3D SOM prediction. For a site, all other sites were used to predict SOM concentration for the site at every 1 cm depth from surface to its maximum depth. The SOM concentration for each soil horizon was generated by averaging the 1 cm SOM concentrations between the upper and lower limits of that horizon as reported in the original profile description. The lower limits of the first five horizons were 17 ± 3.4 (mean and standard deviation (SD), respectively), 40 ± 9.2, 69 ± 12.5, 100 ± 17.7, and 122 ± 8.1 cm, respectively. With the predicted and the observed values, we calculated two indices. One is mean error (ME, Eq. (4)), which measures the prediction bias. The other is a ratio of performance to deviation (RPD, Eq. (5)), i.e., the ratio of SD of the observed values to the RMSE between the predicted and the observed values (Meersmans et al., 2009).
ME ¼
n 1X ðP −Oi Þ n i¼1 i
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uX 2 u n u Oi −O u u RPD ¼ u i¼1 n uX t ðP i −Oi Þ2
ð4Þ
½5
i¼1
where Pi and Oi are, respectively, the predicted and the observed values of SOM concentration at soil horizon i, Ō is the mean of the observed values. A positive ME value indicates overestimation on average, while a negative ME value indicates underestimation. For an ideal prediction, the ME should be 0 and RPD as high as possible. The RPD indicates how much improvement a prediction might achieve relative to that which could have been derived by using the mean of the observed values alone. A RPD value higher than 1 indicates that there is improvement,
257
while that lower than 1 indicates a worse prediction than just using the mean. 3. Results 3.1. Preliminary results 3.1.1. Profile distribution of SOM concentration We used the mid-depth of a horizon to represent its depth, and then graphed SOM profile distributions with depth based on all soil samples (Fig. 3a). SOM concentration decreased with increasing depth. The 0– 20 cm layer had a much higher SOM concentration than the deeper layers and also much more variability; SOM values varied from 8 to 33 g kg−1, with a mean of 18.6 g kg−1 and a SD of 4.9 g kg−1. The 20– 40 cm layer had much lower SOM values ranging from 3 to 16 g kg−1, with a mean of 7.7 g kg−1 and a lower SD of 2.7 g kg− 1. The layer below 40 cm had somewhat lower SOM values than the 20–40 cm layer, between 1 and 7 g kg−1, with a mean of 3.6 g kg−1 and a much lower SD of 1.1 g kg−1. This pattern of SOM profile distribution was expected. Vegetation is a major influencing factor for this pattern. The inputs of plant litter and roots to soil profile and downward organic matter transport processes determine the vertical SOM distribution, i.e., relatively high concentration in the topsoil and low concentration in the subsoil. Also, environmental factors (atmospheric composition, climate, terrain, vegetation and anthropogenic disturbance etc.) and their interactions generally have stronger influence on the topsoil than the subsoil. This is probably the main reason for the relatively high variability of SOM concentration in the topsoil. This pattern is consistent with many other studies (Bernoux et al., 1998; Minasny et al., 2006; Mishra et al., 2009). For example, Jobbagy and Jackson (2000) found that the percentage of SOM in the top 20 cm (relative to the first meter) averaged 33–50% for several vegetation types. 3.1.2. Depth function fittings of SOM profile distribution Fig. 2(b) displays an example of the fittings at Profile 71 using the power, exponential and logarithmic functions. The power function fittings gave the highest mean value (0.94) and a small SD (0.06) of determination coefficients (R2) based on all profiles. Thus the power function best describes the SOM profile distribution in this area. However, its fits tended to overestimate near the soil surface, and could not account for the tillage effect on SOM profile distribution in the cultivated lands. A linear function was thus combined with the power function to fit the profile distribution. Fig. 2(c) and (d) shows the fits at representative cultivated (67) and non-cultivated (19) profiles. Depth function fittings were different for the two kinds of soils: the former needs to consider tillage effect while the latter does not. Kempen et al. (2011) also considered the tillage effect when constructing depth functions. 3.1.3. Effect of environmental factors on SOM profile distribution The effects of terrain and land uses on measured SOM concentration were analyzed in F. Liu et al. (2013) using multiple regression with dummy variables based on the 79 pedon sites. The influence of terrain conditions on SOM was strong in the upper 60 cm and weak below 60 cm, while that of land use was strong in the upper 30 cm and weak below 30 cm. Both influences were the strongest in the upper 15 cm soil layer. To provide a basis for spatial interpolation of the depth function parameters, their relation with the environmental variables was estimated using Pearson correlation analysis. Table 2 shows that k0 (magnitude of SOM in the topsoil) and k1 (rate of SOM decrease with depth) mainly correlated with terrain variables (mean curvature, convergence index, slope gradient and TWI) while dh1 (the first horizon thickness) mainly corrected with land use variables (TM bands 4 and 5). Low linear correlation coefficients were partly due to the existence of complex and nonlinear relationships. This can be seen from the scatter plots of the parameters against the variables in Fig. 4. For example, k0 exhibited a
258
F. Liu et al. / Geoderma 263 (2016) 254–263
Fig. 3. (a) Scatter plot of SOM concentration against soil depth based on the soil samples collected from the 79 pedon sites; (b) Curve fittings of SOM profile distributions using power (Red), exponential (green) and logarithmic functions (blue) at a selected profile 71; (c) At profile 67 (cultivated) and (d) 19 (forested) actual fittings with the depth function model Eq. (1).
negative correlation with slope gradient less than 5% (mainly at summit, toeslope and river valley), and then appeared to be an overlap of multiple positive correlation patterns with slope gradient greater than 5% (mainly at shoulder, backslope and footslope). It exhibited a negative
Table 2 Pearson correlation coefficients between the environmental variables and the depth function parameters based on the 79 pedon sites. Depth function parameters
Elevation (m) Slope gradient (%) Mean curvature (1/m) TWId Convergence index TM band 3 TM band 4 TM band 5 a
k0a
k1b
dh1c
0.13 (0.26)e −0.10 (0.38) 0.22 (0.05) 0.13 (0.24) 0.25⁎ (0.02) −0.10 (0.37) −0.02 (0.89) −0.11 (0.32)
−0.06 (0.61) 0.21 (0.06) −0.17 (0.14) −0.25⁎ (0.03) −0.18 (0.11) −0.04 (0.76) 0.07 (0.54) 0.14 (0.24)
0.10 (0.36) 0.13 (0.24) 0.00 (0.91) −0.16 (0.15) 0.08 (0.50) 0.06 (0.60) 0.35⁎⁎ (0.00) 0.32⁎⁎ (0.00)
The k0 is related to the magnitude of SOM in topsoil. The k1 is related to the rate of decrease of SOM with depth in the subsoil. The dh1 is related to the thickness of the first horizon. d SAGA wetness index. e The value in the parentheses is the p-value of Pearson correlation. ⁎⁎ Correlation is significant at the 0.01 level. ⁎ Correlation is significant at the 0.05 level (2-tailed). b c
correlation with TWI less than 4.3 (from summit to footslope), and then a positive correlation with TWI greater than 4.3 (from footslope to river valley). This justifies the use of the similarity-based method which can deal with such complex relationships, by relying on local approximation to similar pedon sites without considering where they may be in any overall relationship. In addition, the effects of land uses on the depth function parameters were analyzed. A one-way ANOVA was conducted for each parameter under forests, shrubs, tea plantations, cultivated uplands and paddy fields. Levene's tests (Field, 2009) indicated that the variances of the parameters did not significantly differ among the land uses. There was a significant effect of land uses on the parameter dh1 (F (4, 74) = 5.55, p = 0.001). Further post hoc comparisons using the least significant difference test (Field, 2009) indicated that most land uses had significantly different dh1 values. There was no significant effect of land uses on the parameter k0 (F (4, 74) = 0.99, p = 0.418). This supports our choice of simple linear functions and crossover points in the topsoil to represent the effect of land uses on this parameter. There was a marginally significant effect of land uses on the parameter k1 (F (4, 74) = 2.13, p = 0.085), suggesting that even after accounting for differences in topsoil distribution of SOM, there is some effect of land use on subsoil distribution, or else that land uses are preferentially distributed on soils with pre-existing differences in distribution. Further comparisons indicated that paddy fields had significantly higher absolute value of k1 than other land uses. These indicate that land uses can differentiate the
F. Liu et al. / Geoderma 263 (2016) 254–263
259
Fig. 4. Scatter plots of the depth function parameters (k0, k1, dh1) against environmental variables (part) at different landscape positions.
thickness of the first horizon over this area, but can only identify the differences in the rate of SOM vertical decrease between the river valley and the hills. 3.2. Prediction results and evaluation For the convenience of visualization and analysis, lateral distributions of SOM concentration at the six depth intervals were displayed (Fig. 5) to express the predicted 3D pattern. Overall, areas with high elevation were predicted to have high SOM concentration, especially for
the upper layers. Moreover, areas with high surface SOM concentration tended to have a rapid and abrupt SOM decline with depth while areas with low surface SOM concentration tended to have a gradual decline with depth. There are two reasons for this pattern. One is that intensive human disturbances lead to low vegetation coverage in lowelevation areas (Ding et al., 1987). The other is that plant litter and roots are the main source of SOM, which concentrate in the topsoil. The overall pattern was the same as that predicted by F. Liu et al. (2013). In addition, the highest SOM was found at toeslopes of long gentle slopes (colluvial positions) and relatively flat summits, and the
Fig. 5. Predicted lateral distributions of SOM concentration at the soil layers: a) 0–5, b) 5–15, c) 15–30, d) 30–60, e) 60–100, and f) 100–150 cm.
260
F. Liu et al. / Geoderma 263 (2016) 254–263
lowest SOM at gully margins. At the same elevation, concave slopes had higher SOM concentration than convex slopes. In the river valley, low flat areas with slope gradient close to 0 and TWI greater than 8.5 had higher SOM than other areas. These indicate that potentially wet or depositional positions tended to have higher SOM than other landscape positions. Table 3 shows the comparison between the descriptive statistics of the pedon dataset and those of the predicted values from the present study and those from the previous study F. Liu et al. (2013). Both studies reproduced the general distribution of the pedon dataset well. This study appears to be better than the previous study with regard to the mean of the predicted values, while it appears to be worse with regard to the range of the predicted values. The better reproduction of the mean in this study may be partly because the effect of land uses was taken into account in the construction of depth function. The data range and SD of the predicted values of this study were smaller than those of both the pedon dataset and the previous study, especially at the upper layers (0–5, 5–15 and 15–30 cm). This indicates the existence of a smoothing effect in the prediction of this study. The effect may be due to the use of the weighted averaging equation (i.e., Eq. (3)) for aggregating the depth function parameter values of the pedon sites. The evaluation by LOOCV over the first five soil horizons resulted in an overall RPD of 2.34 and ME of 0.06 g kg−1. The RPD value was much higher than 1, indicating that the prediction was much better than just using the mean value of the observed as the predicted values. This can also be seen from the scatter plot of the predicted against the observed values, in which data points are scattered closely along the 1:1 line (Fig. 6). The ME value was close to zero, indicating an overall unbiased prediction. In addition, Table 4 shows that the most effective prediction of SOM concentration was obtained for the second horizon, with the highest RPD value. The least effective prediction was obtained for the third horizon, with the lowest RPD value. There was an underestimation for the first horizon and an overestimation for others. This is likely due to localized management effects such as fertilization and weeding, which could not be included in our model. This may be also due to the effect of increasing support from pedon (observation) to grid cell (prediction) scale; the average SOM at these supports is generally more similar in the subsoil than topsoil.
4. Discussion The present study used the same study area and datasets (pedon data and environmental variables) as the previous study F. Liu et al. (2013). The two studies used different ideas and techniques for predicting 3D continuous SOM distribution.
4.1. Different ideas The idea of the previous study was to separately perform multiple 2D horizontal SOM mapping operations at a set of predefined depth intervals and 1D vertical SOM fittings at each location. The horizontal and vertical predictions were loosely integrated in the 3D soil mapping. Because soil samples were collected based on horizons, vertical SOM fittings were also performed to prepare SOM values at the consistent depth intervals. This pseudo 3D soil mapping approach had two drawbacks. The first is that prediction errors may be resulted from the neglect of vertical soil pattern when performing horizontal prediction for a particular depth interval. The second is that errors in the depth function fittings may be magnified when repeatedly performing depth function fittings in the mapping process. The present study intended to develop a true 3D soil mapping approach. Its idea was to perform 2D horizontal prediction of 1D vertical distribution which was characterized by the parameters of a power function model. The horizontal and vertical predictions were tightly integrated. Because the parameters represented global characteristics of SOM concentration profiles (i.e., k0 for the magnitude and k1 for the shape), there were vertical constraints for the horizontal prediction. This to a large extent overcame the first drawback of the pseudo 3D approach. Moreover, vertical SOM fittings were performed only once in the mapping process in order to obtain the parameters of the power function model. This overcame the second drawback of the pseudo 3D approach. In addition, an important benefit of this approach is that it is not necessary to store SOM values of all pixels at finely resolved layers over the mapping area, but only a depth function equation and several data layers of its parameters. This provides a support for efficient data storage and user-defined retrieval of 3D soil information, especially for large areas.
4.2. Different depth function models To fit SOM variation with depth, the present study constructed a depth function model based on a power function, while the previous study used an equal-area quadratic spline function as developed by Bishop et al. (1999) and described by McBratney et al. (2000). It was chosen for the previous study because of its flexibility in adapting easily to local SOM variation with depth, and satisfactory performance for depth standardization of horizon-based soil observations (F. Liu et al., 2013). The spline function was not considered in the present study because splines fit to profiles with different horizon boundaries are not commensurate: the piecewise polynomials depend on the knot placement (horizon boundaries) and so do not provide a consistent set of parameters for interpolation across the landscape.
Table 3 Descriptive statistics of the pedon dataset and the predicted SOM concentration from the present study and from the previous study F. Liu et al. (2013) based on the 79 pedon sites. SOM concentration (g kg−1)
The pedon dataset
The prediction from the present study
The prediction from the previous study
0–5 cm 5–15 cm 15–30 cm 30–60 cm 60–100 cm 0–5 cm 5–15 cm 15–30 cm 30–60 cm 60–100 cm 0–5 cm 5–15 cm 15–30 cm 30–60 cm 60–100 cm
Min
Max
Mean
Standard deviation
9.3 8.1 5.9 3.4 1.7 11.9 9.3 6.4 3.4 2.0 8.9 7.7 4.7 2.5 1.2
33.6 26.1 13.5 7.4 5.1 32.0 24.4 13.0 6.9 5.1 34.1 27.2 16.8 7.0 6.7
22.5 14.4 9.0 5.0 3.5 22.2 13.9 8.9 6.0 3.5 24.4 12.7 9.6 4.3 3.9
6.3 4.2 1.9 0.9 0.7 5.2 3.6 1.4 1.0 0.8 6.7 4.9 2.3 1.2 0.9
F. Liu et al. / Geoderma 263 (2016) 254–263
261
Fig. 6. Scatter plots of the LOOCV predicted against the observed SOM concentration. The lower limits of the first five horizons were 17 ± 3.4 (mean and standard deviation, respectively), 40 ± 9.2, 69 ± 12.5, 100 ± 17.7, and 122 ± 8.1 cm, respectively.
SOM concentration generally decreases continuously and rapidly with depth in a soil profile. The decrease has apparent similarity to the shape of an exponential, power or logarithmic decay function. Compared to the spline function, the three functions have the merits of mathematical simplicity and consistent parameters for any soil profile. The primary difference between the three functions is the rate of decay down a profile. As shown in Fig. 3b, compared to the exponential and logarithmic functions, the power function has a relatively fast decay in the upper 30 cm and then a relatively slow and steady decay below 30 cm. It best reflects the reality of the observed SOM variation with depth (Fig. 3a). It should be acknowledged that the exponential function is the most widely used depth function (Minasny et al., 2013) and in many situations it works better than other functions (e.g., Bernoux et al., 1998; Zinn et al., 2005). But, as described in the Preliminary results section, the evaluation based on the pedons in this study area showed that the power function best fitted the observed SOM profile pattern among the three functions. This may be due to the relatively large accumulation of organic matter in the topsoil. Bennema (1974) also suggested a power function for describing soil carbon profile. 4.3. Different spatial prediction models The previous study used a radial basis function neural network model for SOM spatial prediction at each depth interval, while the present study used a similarity-based model for spatial prediction of the depth function parameters. The artificial neural network has been labeled a “black-box” model with non-transparent prediction process (Young and Weckman, 2010; Dragoi et al., 2014). It can provide little insight into the landscape processes. Comparatively, the similarity-based
Table 4 Performance of three-dimensional SOM prediction at different soil horizons.
ME (g kg−1) RPDa
1st horizon
2nd horizon
3rd horizon
4th horizon
5th horizon
−1.11 1
0.13 1.35
0.79 0.91
0.27 1
0.32 1.01
a RPD: a ratio of performance to deviation, i.e., the ratio of standard deviation of the observed values to the RMSE between the predicted and the observed values.
model has the advantage of transparency, allowing soil scientists to check how a certain prediction is made at any unsampled location. To illustrate this transparency, we take the prediction of parameter k0 as an example (Table 5). Two pixel locations in the study area were randomly selected: query pixel 01 (image row 125, column 336) and pixel 02 (image row 207, column 208). Among all pedon sites, the most similar one to query pixel 01 was the pedon site 68. It can be seen from the table that this pixel and this pedon site had very similar environmental conditions. This site has the maximum similarity of 0.96 and the observed property of 154.5. The observed property value 149.75 of “other pedon sites” was computed as a weighted average of these sites with their similarity to query pixel 01 as weights. The predicted property of the pixel 01 was finally derived based on Eq. (3). Because this equation weights the most similar pedon site, it can be seen that the prediction for this pixel was based primarily on the single most similar pedon site. Similarly, for the query pixel 02, prediction was based primarily on the pedon site 36. For this pixel the similarity to the most similar site was somewhat lower, 0.89, so the “other pedon sites” value, which was quite a bit higher than that of the most similar site, somewhat influences the prediction, which is higher than the value of the most similar site. The soil scientist can find the “other pedon sites” and examine why they have some similarity to this pixel, and why on the contrary query pixel 01 is so similar to only one pedon site. Most likely query pixel 02 represents an intergrade from the soil geographer's perspective. In this way the similarity-based prediction is relatively transparent, traceable and explainable. In addition, compared to the neural network, the similarity-based model requires providing the weights of individual environmental variables in the prediction process. How to define appropriate weights is a critical issue, as emphasized by Mallavan et al. (2010). This is closely related to the accuracy of the calculation of environmental similarity index and further the success of the similarity-based prediction. The definition of the weights depends on how well we understand the soil–environmental relationships. Currently, there are two basic ways to define the weights: one is to use expert knowledge and the other is to perform data analysis of the pedon sites. The former includes the direct definition method by a local expert and the comprehensive methods (e.g., analytical hierarchy process method) by a group of
262
F. Liu et al. / Geoderma 263 (2016) 254–263
Table 5 Examples of making predictions of the depth function parameter k0 at two queried pixel locations. Location/ID
Query pixel 01 Most similar pedon site Other pedon sites Query pixel 02 Most similar pedon site Other pedon sites
(125, 336) 68 – (207, 208) 36 –
Environment conditions Elevation (m)
Slope gradient (%)
Mean curvature (1/m)
TWI
Convergence index
Maximum similarity
41.3 39.2 – 60 51.8 –
0 0 – 5.12 5.27 –
−0.00003 −0.000048 – −0.001199 −0.00096 –
9.72 8.93 – 5.13 5.74 –
−2.39 −5.71 – −6.57 −14.81 –
– 0.96 –
local experts (Chow and Sadler, 2010; Ahmed et al., 2013; Patil and Kant, 2014). The latter generally uses multivariate regression analysis (Gleich et al., 2013), principle component analysis (Waring et al., 2010), machine learning (Shin and Han, 1999) or trial-and-error (Halab-Kessira and Ricard, 1999). The former is applicable for areas with local soil experts, while the latter is for areas with a number of soil samples representing almost all soil–environmental relationships. In many situations, the number of available soil samples is limited, and they do not represent all soil–environmental relationships. Thus, the data analysis weighting methods, which are based solely on goodness-of-fit, are prone to over-fitting. In this sense, it should be acknowledged that the trial-and-error search for the optimal weights used in the present study can also overfit the data and produce unrealistic predictions. The effect of over-fitting can be offset by integrating the data analysis with expert knowledge. Such integration techniques remain to be developed. Another limitation is that the similarity-based model interpolated the power function parameters independently without considering possible correlation among the parameters. Any correlation should also be considered as a vertical constraint for horizontal prediction. The neglect of this constraint may lead to incoherent depth functions. However, in the present study, the same set of environmental variables (ancillary information) was used to build relationships with the depth function parameters for the interpolation processes. This may, to some extent, implicitly take care of the parameter correlation and thus reduce this incoherence. Similarly, Meersmans et al. (2009) interpolated the depth function parameters independently based on their relationships with the same set of maps of sand, silt, clay content and ground water table position. Mishra et al. (2009) and Veronesi et al. (2012) interpolated the depth function parameters independently using ordinary kriging. Unlike these studies, Minasny et al. (2006) considered the correlation of the exponential function parameters using a neural network model. Minasny et al. (2013) argued that an alternative solution is to interpolate the parameters simultaneously using cokriging. However, cokriging requires a large number of soil samples at various inter-observation distances to reliably estimate the direct and cross variograms, and a reasonable spread of observations across the area to be mapped for low uncertainties of the resulting cokriging prediction. This makes it unsuitable for our study area where the soil samples are spatially imbalanced and sparse in most of the area. Thus, how to adequately incorporate parameter correlation into the interpolation process is still an issue to be resolved. 5. Conclusions We presented an approach to 3D SOM prediction by tightly integrating a similarity-based method with a power function based model. This approach was compared with that of the previous work F. Liu et al. (2013) who loosely integrated an artificial neural network with an equal-area quadratic spline function. From this study we conclude that the proposed approach can be effective and accurate for the 3D SOM prediction. It can overcome two drawbacks of the frequently used pseudo 3D soil mapping approach. One is the neglect of vertical soil pattern when performing horizontal soil predictions, and the other is the
0.89 –
Observed property (k0)
Predicted property (k0)
– 154.5 149.75 – 34.67 48.12
154.31 – – 36.15 – –
repeated applications of depth function fittings in the mapping process. Both may lead to prediction errors. Another advantage is that the similarity-based prediction process is transparent and traceable, allowing for easy interpretation of its prediction results. This is useful for investigating soil–environmental relationships and processes. The similarity-based method thus could be an attractive alternative to the commonly used “black-box” techniques such as the artificial neural network. In addition, the proposed approach provides a support for efficient data storage and user-defined retrieval of 3D soil information. This is important for large mapping areas. It should be noticed that the weights of environmental variables have to be explicitly provided in this approach. The determination of the weights is still an issue to be resolved. The weights that best fit a particular set of soil samples are prone to over-fitting when the samples are limited and cannot represent the full range of soil–environmental relationships. Further work will explore the method of integrating data analysis with expert knowledge for determining the weights. In addition, the present study only examined this approach in a small single landscape. It could be interesting to test it in large areas with multiple landscape components. Acknowledgments This research was supported by the National Natural Science Foundation of China (41130530, 91325301, 41201207, 41401237, and 41371224), and the National Key Basic Research Special Foundation of China (2012FY112100). The authors would like to thank Prof. Alex McBratney and anonymous reviewers for their constructive comments that greatly improved the paper. References Adhikari, K., Kheir, R.B., Greve, M.B., Bocher, P.K., Malone, B.P., Minasny, B., McBratney, A.B., Greve, M.H., 2013. High-resolution 3-D mapping of soil texture in Denmark. Soil Sci. Soc. Am. J. 77, 860–876. Ahmed, M.U., Banaee, H., Loutfi, A., 2013. Health monitoring for elderly: an application using case-based reasoning and cluster analysis. ISRN Artif. Intell. 2013. http://dx. doi.org/10.1155/2013/380239 (Article ID 380239, 11 pp.). Arrouays, D., Grundy, M.G., Hartemink, A.E., Hempel, J.W., Heuvelink, G.B.M., et al., 2014. GlobalSoilMap: toward a fine-resolution global grid of soil properties. In: Sparks, D.L. (Ed.), Advances in Agronomy 125. Academic Press, Burlington, pp. 93–134 (2014). Bennema, J., 1974. Organic carbon profiles in Oxisols. Pédologie 24, 119–146. Bernoux, M., Arrouays, D., Cern, C.C., Bourennane, H., 1998. Modeling vertical distribution of carbon in oxisols of the western Brazilian Amazon (Rondonia). Soil Sci. 163, 941–951. Bishop, T.F.A., McBratney, A.B., Laslett, G.M., 1999. Modelling soil attribute depth functions with equal-area quadratic smoothing splines. Geoderma 91, 27–45. Chow, T.E., Sadler, R., 2010. The consensus of local stakeholders and outside experts in suitability modeling for future camp development. Landsc. Urban Plan. 94, 9–19. Ding, Y.W., Zhang, S.B., Wu, L.H., Hu, B.Q., 1987. Soils in Xuanzhou City. Xuanzhou Soil Survey Office and Soil Fertilizer Station, Xuanzhou, China. Dragoi, E.N., Horoba, C.A., Mamaliga, I., Curteanu, S., 2014. Grey and black-box mdoelling based on neural networks and artificial immune systems applied to solid dissolution by rotating disc method. Chem. Eng. Process. Process Intensif. 82, 173–184. Field, A.P., 2009. Discovering Statistics Using SPSS: and Sex and Drugs and Rock ‘n’ Roll. 3rd ed. Sage, London. Gleich, B., Achzet, B., Mayer, H., Rathgeber, A., 2013. An empirical approach to determine specific weights of driving factors for the price of commodities—a contribution to the measurement of the economic scarcity of minerals and metals. Resour. Policy 38, 350–362.
F. Liu et al. / Geoderma 263 (2016) 254–263 Grimm, R., Behrens, T., Marker, M., Elsenbeer, H., 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island-digital soil mapping using random forests analysis. Geoderma 146, 102–113. Halab-Kessira, L., Ricard, A., 1999. Use of the trial and error method for the optimization of the graft copolymerization of a cationic monomer onto cellulose. Eur. Polym. J. 35, 1065–1071. Hudson, B.D., 1992. The soil survey as paradigm-based science. Soil Sci. Soc. Am. J. 56, 836–841. IUSS Working Group WRB, 2014. World reference base for soil resources 2014. International soil classification system for naming soils and creating legends for soil maps. World Soil Resources Reports No. 106. FAO, Rome. Jobbagy, E.G., Jackson, R.B., 2000. The vertical distribution of soil organic carbon and its relation to climate and vegetation. Ecol. Appl. 10 (2), 423–436. Kempen, B., Brus, D.J., Stoorvogel, J.J., 2011. Three-dimensional mapping of soil organic matter content using soil type-specific depth functions. Geoderma 162, 107–123. Lacoste, M., Minasny, B., McBratney, A.B., Michot, D., Viaud, V., Walter, C., 2014. High resolution 3D mapping of soil organic carbon in a heterogeneous agricultural landscape. Geoderma 213, 296–311. Liu, F., Zhang, G.L., Sun, Y.J., Zhao, Y.G., Li, D.C., 2013. Mapping the three-dimensional distribution of soil organic matter across a subtropical hilly landscape. Soil Sci. Soc. Am. J. 77, 1241–1253. Liu, J., Zhu, A.X., Zhang, S.J., Qin, C.Z., 2013. Large-scaled soil attribute mapping method based on individual representativeness of sample sites (in Chinese). Acta Pedol. Sin. 50, 12–20. Mallavan, B.P., Minasny, B., McBratney, A.B., 2010. Homosoil: a methodology for quantitative extrapolation of soil information across the globe. In: Boettinger, J.L., Howell, D.W., Moore, A.C., Hartemink, A.E., Kienast-Brown, S. (Eds.), Digital soil mapping: bridging research, environmental application, and operation. Progress in Soil Science 2. Springer, Dordrecht, pp. 137–149. Malone, B.P., McBratney, A.B., Minasny, B., Laslett, G.M., 2009. Mapping continuous depth functions of soil carbon storage and available water capacity. Geoderma 154, 138–152. Malone, B.P., McBratney, A.B., Minasny, B., 2011. Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes. Geoderma 160, 614–626. McBratney, A.B., Bishop, T.F.A., Teliatnikov, I.S., 2000. Two soil profile reconstruction techniques. Geoderma 97, 209–221. Meersmans, J., van Wesemael, B., De Ridder, F., Van Molle, M., 2009. Modelling the threedimensional spatial distribution of soil organic carbon at the regional scale (Flanders, Belgium). Geoderma 152, 43–52. Minasny, B., McBratney, A.B., Mendonca-Santos, M.L., Odeh, I.O.A., Guyon, B., 2006. Prediction and digital mapping of soil carbon storage in the Lower Namoi Valley. Aust. J. Soil Res. 44, 233–244. Minasny, B., McBratney, A.B., Malone, B.P., Wheeler, I., 2013. Digital mapping of soil carbon. Adv. Agron. 118, 1–47. Mishra, U., Lal, R., Slater, B., Calhoun, F., Liu, D., Meirvenne, M.V., 2009. Predicting soil organic carbon stock using profile depth distribution functions and ordinary kriging. Soil Sci. Soc. Am. J. 73, 614–621.
263
Odgers, N.P., McBratney, A.B., Minasny, B., 2015. Digital soil property mapping and uncertainty estimation using soil class probability rasters. Geoderma 237–238, 190–198. Park, S.J., Vlek, P.L.G., 2002. Environmental correlation of three-dimensional soil spatial variability: a comparison of three adaptive techniques. Geoderma 109, 117–140. Patil, S.K., Kant, R., 2014. A fuzzy AHP-TOPSIS framework for ranking the solutions of knowledge management adoption in supply chain to overcome its barriers. Expert Syst. Appl. 41, 679–693. R Development Core Team, 2012. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (http://www.R-project. org). Refaeilzadeh, P., Tang, L., Liu, H., 2009. Cross-validation. In: Liu, L., Ozsu, M.T. (Eds.), Encyclopedia of Database Systems. Springer, pp. 532–538 http://dx.doi.org/10.1007/9780- 387-39940-9_565. Sanchez, P.A., Ahamed, S., Carre, F., Hartemink, A.E., Hempel, J., Huising, J., Lagacherie, P., McBratney, A.B., McKenzie, N.J., et al., 2009. Digital soil map of the world. Science 325, 680–681. Science Committee, GlobalSoilMap, 2013. GlobalSoilMap Specifications: Tiered GlobalSoilMap Products. (Release 2.3. http://www.globalsoilmap.net/). Shin, K., Han, I., 1999. Case-based reasoning supported by genetic algorithms for corporate bond rating. Expert Syst. Appl. 16, 85–95. Tekin, Y., Kul, B., Okursoy, R., 2008. Sensing and 3D mapping of soil compaction. Sensors 8, 3447–3459. Vasques, G.M., Grunwald, S., Comerford, N.B., Sickman, J.O., 2010a. Regional modelling of soil carbon at multiple depths within a subtropical watershed. Geoderma 156, 326–336. Vasques, G.M., Grunwald, S., Sickman, J.O., Comerford, N.B., 2010b. Upscaling of dynamic soil organic carbon pools in a north-central Florida watershed. Soil Sci. Soc. Am. J. 74, 870–879. Veronesi, F., Corstanje, R., Mayr, T., 2012. Mapping soil compaction in 3D with depth functions. Soil Tillage Res. 124, 111–118. Veronesi, F., Corstanje, R., Mayr, T., 2014. Landscape scale estimation of soil carbon stock using 3D modelling. Sci. Total Environ. 487, 578–586. Waring, M.E., Eaton, C.B., Lasater, T.M., Lapane, K.L., 2010. Correlates of weight patterns during Middle Age characterized by functional principle components analysis. Ann. Epidemiol. 20 (3), 201–209. Young, W.A., Weckman, G.R., 2010. Using a heuristic approach to derive a grey-box model through an artificial neural network knowledge extraction techniques. Neural Comput. Appl. 19, 353–366. Zhu, A.X., Liu, J., Qin, C.Z., Zhang, S.J., Chen, Y.N., Ma, X.W., 2010. Soil property mapping over large areas using sparse ad-hoc samples. 19th World Congress of Soil Science, Soil Solutions for a Changing World. the International Union of Soil Science, Brisbane, Australia (1–6 August 2010). Zinn, Y.L., Lal, R., Resck, D.V.S., 2005. Texture and organic carbon relations described by a profile pedotransfer function for Brazilian Cerrado soils. Geoderma 127, 168–173.