Geoderma Regional 10 (2017) 154–162
Contents lists available at ScienceDirect
Geoderma Regional journal homepage: www.elsevier.com/locate/geodrs
Spatial prediction of major soil properties using Random Forest techniques A case study in semi-arid tropics of South India
MARK
S. Dharumarajana,⁎, Rajendra Hegdea, S.K. Singhb a b
ICAR-National Bureau of Soil Survey and Land Use Planning, Regional Centre, Hebbal, Bangalore 560024, India ICAR-National Bureau of Soil Survey and Land Use Planning, Amaravati Road, Nagpur-440033, India
A R T I C L E I N F O
A B S T R A C T
Keywords: Soil properties Digital soil mapping Random Forest model Prediction Validation
The purpose of the study is to map the spatial variation of major soil properties in Bukkarayasamudrum mandal of Anantapur district, India using Random Forest model. The study area is divided into different Physiographic Land Units (PLU) based on landform, landuse and slope. Random Forest model (RFM) was developed based on field survey data of 116 surface samples (0–30 cm) representing all major PLU units of the study area. RFM is neither sensitive to over fitting nor to noise features and has capacity to handle large datasets. High resolution satellite imagery (IRS LISS IV data- 3 bands), terrain attributes such as elevation, slope, aspect, topographic wetness index, topographic position index, plan & profile curvature, Multi-resolution index of valley bottom flatness and Multi-resolution ridge top flatness, Vegetation factors like NDVI, EVI and land use land cover (LULC) are used as covariates along with legacy soil data of 1:50,000 scale. The predicted organic carbon, pH and EC ranged from 0.24–1.03%, 6.9–9.0, 0.11–0.97 dsm− 1 respectively. The model performance was evaluated based on Coefficient of determination (R2) and Lin's Concordance coefficient (CCC). The model performed well with R2 and CCC values of 0.23 and 0.38 for SOC, 0.30 and 0.37 for pH, and 0.62 and 0.70 for EC respectively. Variable importance ranking of RFM model showed that EVI and NDVI are the most important predictors for organic carbon whereas drainage and NDVI for EC and pH respectively. This technique can be applied to similar landscapes with more observations to refine the spatial resolution of soil properties.
1. Introduction Most of the environment modeling work requires spatially continuous and quantitative soil information especially at larger scale (Gessler et al., 1996; Minasny et al., 2008; Hartemink and McBratney, 2008). Such information is always not available at the required scale (Lagacherie et al., 1995; McBratney et al., 2003; Greve et al., 2012) and mapping at high accuracy is always challenging, time consuming and costly (Vågen et al., 2016). In recent past, digital soil mapping (DSM) techniques have become more popular among the natural resource mappers as it offers the solution for spatial prediction of soil attributes in quicker way (Lagacherie and McBratney, 2007). DSM mapping is based on soil forming factors models like CLORPT (Jenny, 1941) and SCORPAN method which describes the soil formation as a function of climate(c), organisms (o), topography (r), parent materials (p), age (a) and spatial location (McBratney et al., 2003). DSM techniques can be applied for both prediction of quantitative outputs like sand, silt, clay, pH, electrical conductivity and qualitative outputs like soil taxonomic units using
⁎
Corresponding author. E-mail address:
[email protected] (S. Dharumarajan).
http://dx.doi.org/10.1016/j.geodrs.2017.07.005 Received 12 April 2017; Received in revised form 22 July 2017; Accepted 24 July 2017 Available online 26 July 2017 2352-0094/ © 2017 Elsevier B.V. All rights reserved.
regression algorithm (Wiesmeier et al., 2011; Akpa et al., 2014) and classification algorithm (McBratney et al., 2003; Hastie et al., 2009; Kidd et al., 2014) respectively. In DSM, the interrelationships between the soil and environmental covariates are brought out to predict the soil information by using different statistical and geostatistical models. For instance, Organic carbon is predicted by multiple linear regression (Powers and Schlesinger, 2002; Thompson et al., 2006), neural networks (Mansuy et al., 2014; Malone et al., 2009), tree models (Henderson et al., 2005), generalized linear models (McKenzie and Ryan, 1999) and regression kriging (Hengl et al., 2004; Simbahan et al., 2006). Use of Machine learning algorithms in digital soil mapping is another approach to modeling soil classes and soil properties (Huang et al., 2002; Rogan et al., 2003).These algorithms are faster and more efficient in classification (Foody, 2002; Friedl and Brodley, 1997). Machine language techniques predict the soil property for unvisited location using interrelationship with the environmental covariates such as digital elevation models (DEMs) (McBratney et al., 2000), climatic parameters (Akpa et al., 2014), remote sensing imageries (Odeh and
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al.
McBratney, 2000), and legacy soil information (Akpa et al., 2014). Some of the major machine language techniques used in DSM are classification and regression trees (Breiman et al., 1984), k-nearest neighbour (Mansuy et al., 2014), multinomial logistic regression (Kempen et al., 2009), logistic model trees (Giasson et al., 2006), Support vector machine (Kovačevic et al., 2010; Priori et al., 2014) and Random Forest model (Vågen et al., 2016). Out of four different machine language techniques studied by Rodriguez-Galiano and ChicaRivas (2014), Random Forest model was found to be most accurate and more robust in noise and data reduction. It has capacity to handle both quantitative and qualitative datasets. The potential of RFM in digital soil mapping has been demonstrated by several authors (Grimm et al., 2008; Hastie et al., 2009; Wiesmeier et al., 2011; Vågen et al., 2016; Sreenivas et al., 2016). Despite the wide spread of various machine learning algorithms used in DSM, limited work is available in Indian subcontinent (Sreenivas et al., 2016). In this context, the present study was carried out is to produce a fine resolution map for major soil properties, Organic carbon, pH and EC in Bukkarayasamudrum mandal of Anantapur district representing semi-arid tropics of south India using Random Forest model techniques.
the district is 24,808 ha. The climate of Bukkarayasamudrum mandal is warm and classified as hot and arid. The mean minimum and maximum temperatures are 22.9° and 34 °C with average rainfall of 556 mm. Major geology is granite- gneisses. Quartz, feldspar and mica, are the major mineral composition of granite and gneissic rock types. The elevation ranges from 295 to 595 m MSL. The major part of mandal has nearly level to very gently sloping with a slope ranging from 1 to 3%. The major soils of Bukkarayasamudrum mandal are moderately deep (75–100 cm) gravelly red clayey soils (Clayey-skeletal, mixed, isohyperthermic Typic Haplargids) followed by shallow (25–50 cm) gravelly red clayey soils (Clayey-skeletal, mixed isohyperthermic, Lithic Haplargids) and deep (100–150 cm) black clayey soils (Fine, mixed isohyperthermic (cal) Ustic Haplargids). The study area is chronically prone to drought due to its low and unpredictable nature of length of the growing period (LGP). The LGP of Bukkarayasamudrum mandal is 11 weeks starting from third week of August and ending in the last week of October. The net cultivated area is 47.4% of total area. The major rain fed crops are groundnut, pearl millet, sorghum and minor millet while rice, cotton and vegetables are important irrigated crops.
2. Material and methods
We used physiographic land unit map (PLU) as a base for soil sampling. Physiographic land unit is the assemblage of landform, slope and land use. Landform represents testimony of climatic events whereas slope and land use represent the influence of present climatic conditions on the soil formation. Survey of India toposheets (1:50,000 scale) and Resourcesat-2 LISS IV (Linear imaging self scanner - IV) imagery (5.8 m resolution) were used for preparation of landform and land use map.
2.2. Soil sampling and analysis
2.1. Description of the study area Bukkarayasamudrum mandal is located between 13°37′ 51″ and 14°48′ 09″ N latitudes and 77°33′ 47″ and 77°47′45″ E longitudes in Anantapur district of Andhra Pradesh, India (Fig. 1). The total area of
Fig. 1. Location map.
155
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al.
Fig. 2. Physiographic land unit of Bukkarayasamudrum mandal.
Slope map developed from SRTM DEM data. Three layers i.e. landform, slope and land use were integrated taking into consideration the area, morphology of the landform units and its relation with the neighbouring objects to develop physiographic land unit (PLU) map (Fig. 2). PLU unit legend comprises of 3 to 4 letters. First letter describes the landform, second letter for slope and third and fourth letter describes type of land use. Soil samples were collected randomly from all major PLU units to capture existing soil variability. About 116 Surface (0–30 cm) samples were collected randomly covering all the PLU units of study area to capture existing soil variability. The Soil samples were air dried and processed for laboratory analysis. Organic carbon was estimated by Walkley and Black (1934) method. The soil reaction (1:2.5 soil water suspension) and electrical conductivity were estimated based on Jackson (1973) methodology.
variables along with Landform map and land use land cover map of Bukkaraysamudhrum mandal. Normalized Difference Vegetation Index (NDVI) and Enhanced vegetation index (EVI) from MODIS data of 16 days with 250 m resolution was extracted (MOD13Q1) for 5 years (2011–2015) and average NDVI and EVI data was used for modeling. In addition, Band 1 (Green), Band 2 (Red) and Band 3 (NIR) of Resourcesat 2 LISS IV data were used as covariates for prediction of soil properties. All the datasets were brought into the LCC (Lambert Conformal Conic) projection using ArcGIS 10 toolbox. The environmental variables were intersected for 116 sampling points. The final database was composed of 20 environmental variables (14 quantitative and 6 categorical) and three target variable (OC, pH and EC) for modeling (Table 1).
2.3. Environment variables
2.4. Spatial prediction of soil properties using Random Forest model
We used different set of environmental covariates to predict the soil properties. A Digital elevation model (DEM) was obtained from Shuttle Radar Topography Mission (SRTM) data and processed using ArcGIS10 data management tool box. The derivates of DEM like slope, aspect, curvatures (plan and profile), topographic wetness Index (TWI) and topographic position index (TPI) were derived by using ArcGIS10 (Reuter and Nelson, 2009). Multi-resolution Index of Valley Bottom Flatness (MrVBF) and Multi-resolution Ridge Top Flatness (MrRTF) were derived by using Saga-GIS 2.3.1 version (Kidd et al., 2014). Type of soil, soil depth, drainage and soil texture are derived from soil map of Anantapur district (1:50,000 scale) (NBSS & LUP, 2009) and used as
Random Forest 4.6 package in R environment was used for prediction of soil properties of Bukkarayasamudrum mandal. Random Forest model (RFM) is a extension of regression tree model which works based on assemblage of a number of classification and regression trees by means of two levels of randomization for each tree in the forest (Breiman, 2001). RFM improves the prediction accuracy and reduces model over fitting (Breiman, 2001; Liaw and Wiener, 2002). It is non sensitive to missing data and has capacity to handle large number of both quantitative and categorical data (Grinand et al., 2008). Three parameters decide the fitting of Random Forest model viz., number of tree (ntree), minimum no of samples at terminal node nmin and number 156
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al.
Table 1 Description of environmental covariates. Predictor Remote sensing Band 1(Green) Band2 (Red) Band3 (NIR)
Source
Resolution
Type
Range
5.8 m 5.8 m 5.8 m
Q Q Q
73–398 82–314 103–264
DEM DEM DEM DEM DEM DEM DEM
30 m 30 m 30 m 30 m 30 m 30 m 30 m
Q Q Q Q Q Q Q
295–595 0–42% 1–359 0–0.80 − 1069-2620 − 2.08-3.02 − 2.97-2.57
DEM DEM DEM
30 m 30 m 30 m
Q Q C
0–7.61 0.5.88 –
250 m _16 days 5.8 m
Q Q C
932–6502 608–4915 –
Legacy data Anantapur Legacy data Anantapur
1:50,000 1:50,000
C C
– –
Legacy data Anantapur Legacy data Anantapur
1:50,000 1:50,000
C C
– –
imagery Resourcesat2 LISS IV Resourcesat2 LISS IV Resourcesat2 LISS IV
Terrain attributes Elevation (m) SRTM Slope (%) SRTM Aspect SRTM TPI SRTM TWI SRTM Plan curvature SRTM Profile SRTM curvature MrVBF SRTM MrRTF SRTM Landform SRTM
Vegetation attributes NDVI MOD13Q1(2011–2015) EVI MOD13Q1(2011–2015) Landuse Resourcesat2 LISS IV landcover Soil attributes Soil type Soil depth (cm) Texture Drainage
Table 4 Performance of RFM in modeling of soil properties. Parameters
OC (%)
pH
EC (dsm− 1)
R2 CCC RMSE
0.23 0.38 0.20
0.30 0.37 0.75
0.62 0.70 0.14
which was used for initial assessment of performance of model. For the present study, 75% of observations were used for RFM modeling and 25% for validation. Number of trees (ntree) for final model was selected based on error estimates from the OOB sample. Initial decision tree was produced with 500 trees and the OOB overall error stabilized at 200 trees; therefore, 200 is used for the parameter ntree in the final model. 2.5. Model accuracy assessment Prediction performance of Random Forest models was evaluated based on three parameters viz. coefficient of determination which is defined by percentage of variation explained by the model, Root mean square error (measure of model accuracy) and Lin's concordance correlation coefficient which is a measure of agreement between predicted and observed values. The good models have Coefficient of determination and Concordance correlation coefficient is equal or close to 1 and root mean square error close to 0. n
Coefficeint of determination (R2) = 1 −
Q - Quantitative and C - Categorical.
∑i = 1 (pi − oi)2 n ∑i = 1
−
(oi − oi)2
−
Parameters
OC (%)
pH
EC (dsm
Mean Max Min Standard deviation Skewness Kurtosis
0.56 1.35 0.07 0.27 0.54 − 0.12
8.2 9.7 5.6 0.88 −1.02 0.51
0.28 2.21 0.05 0.28 3.89 20.94
n 1 Root mean squared error (RMSE ⎞⎟ = n ∑i = 1 (oi − pi)2. ⎠ 2ρσ σ Concordance correlation coefficient (ρc) = 2 2 0 p
−1
)
σ0 + σ p + (μ 0 − μ p)2
3. Results and discussion 3.1. Physiographic land units
Variable
EC
OC
pH
NDVI EVI Elevation Slope Aspect Profile curvature Plan curvature TWI TPI MrVBF MrRTF Band1 Band2 Band3
0.110 0.079 −0.118 −0.076 0.131 −0.026 0.028 0.052 0.029 0.271(b) 0.213(a) 0.001 −0.103 0.007
0.205(a) 0.247(b) − 0.087 0.136 − 0.019 0.013 0.068 − 0.011 − 0.031 0.093 − 0.084 0.210(a) 0.034 0.112
0.018 0.008 − 0.068 0.044 − 0.144 − 0.100 − 0.026 0.103 0.067 0.060 0.154 0.120 0.039 0.162
b
μ0 and μp
are the means of observed and predicted values and σ0 2and σp2 are corresponding variance ρ is pearson correlation coefficient between observed and predicted values.
Table 3 Correlation analysis.
a
−
where, pi and oi are predicted and observed values, pi and oi are means of predicted and observed values.
Table 2 Summary statistics soil properties of Bukkarayasamudrum mandal of Anantapur district (N = 116).
Five landform units viz. denudational hills, dissected pediments, pediplains, alluvial deposits and valleys were delineated based on online visual interpretation of remote sensing data and integrated with land use map and slope map of Bukkarayasamudrum mandal to derive physiographic land units. The results of land use land cover classification showed that out of five major land uses (single crop, double crop, scrub, forest and fallow land), single crops occupy 62.9% of study area followed by double crops (12.1%) and scrub lands (6.3%). Slope map derived from SRTM DEM showed that nearly level (0–1%) and very gently sloping lands (1–3%) covered 77.5% of total area. The integration of these three layers yielded 33 physiographic land units. In the denudational hills, three PLU units were identified based on variation in slope (E-10-15% and F-15-25%) and land-use/land-cover classes (forest and scrub). Three PLU units were delineated on the dissected pediments and 15 PLU units in the pediplains. Alluvial deposits and valley were divided into 12 PLU units on the basis of slope (A-0-1% & B-1-3%) and land use (single crop, double crop and fallow). The PLU units were homogenous with respect to geology, land use and slope.
Correlation is significant at the 0.01 level (2-tailed). Correlation is significant at the 0.05 level (2-tailed).
of predictors used for fitting the tree (Mtry). Relatively important predictors identified based on number of time predictor were used in nodes. The internal out-of-bag (OOB) prediction generated through bootstrapping provides an estimate of accuracy across the decision trees
3.2. Summary statistics of soil properties Summary of the soil properties are presented in Table. 2. The soil pH 157
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al.
Fig. 3. Concordance correlation coefficient of OC, pH and EC. (caption on next page)
158
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al. Fig. 4. a. Variable importance rankings of RFM model for prediction of organic carbon (% IncMSE = percent increase in Mean Square Error) b Variable importance rankings of RFM model for prediction of soil reaction (% IncMSE = percent increase in Mean Square Error) c. Variable importance rankings of RFM model for prediction of electrical conductivity (% IncMSE = percent increase in Mean Square Error).
observed RMSE values of validation samples were 0.20% for OC, 0.75 for pH and 0.14 dsm− 1 for EC. The performance of the model for prediction of organic carbon is quite low (R2 = 23) due to dynamic nature, contrasting land use and low terrain variation. The poor performance may also be related to the low levels of soil organic carbon. Vancampenhout et al. (2006) reported performance of DSM models is poor in low organic carbon content soils compared to soils having high organic carbon. The prediction of organic carbon was poor relative to the most previous studies results (Wiesmeier et al., 2011, R2 = 76%; Sreenivas et al., 2016, R2 = 82%; Vågen et al., 2016, R2 = 69–77%), although the performance was close to several previous studies (de Carvalho et al., 2014, R2 = 20%; Gastaldi et al., 2012, R2 = 18%). The performance of the model is also depending on sampling density. The sampling density for this study is 0.46 samples km− 2. Though the sampling intensity is higher than many previously reported studies (Aksoy et al., 2012; Gray et al., 2009; Cao et al., 2012; Ciampalini et al., 2012), the model did not perform well due to high variability soil properties and low terrain variation. Higher sample density required for better results in tropical countries where soil pattern is complex due to the geological uplift than other regions (de Carvalho et al., 2014). Like organic carbon, the model predicted only 30% of pH variation which is also poor compared to previous studies (Tekin et al., 2013 and Vågen et al., 2016). Hengl et al. (2015) declared that RFM model improved the mapping accuracy of pH in compared to other linear regression models by 20%. Vågen et al. (2016) reported Random Forest model performed well with R2 of 0.88 and RMSE of 0.32–0.36. In the case of electrical conductivity, model predicted 62% of variation with
ranged from 5.6 to 9.7 with a mean and standard deviation of 8.2 and 0.88 respectively. Out of 116 samples analysed, 56 samples had pH of > 8.5. The Electrical conductivity ranged from 0.05 to 2.21 dsm− 1and organic carbon ranged from 0.07 to 1.35%. The frequency distributions of the soil properties showed that pH is negatively skewed whereas OC and EC are positively skewed. The higher variability in pH and OC is mainly attributed to land management. The results of correlation analysis (Table 3) between soil properties and environmental covariates showed that electrical conductivity correlated with MrVBF and MrRTF whereas organic carbon with EVI, NDVI, and Band 1. 3.3. Performance of Random Forest model in predicting soil properties The performance of Random Forest model was evaluated for each property by calculating uncertainty indicators viz. Coefficient of determination (R2), Root Mean Square Error (RMSE), concordance correlation coefficient (CCC). The results (Table 4) showed that the combination of different categorical and quantitative covariates explained 23, 30 and 62% of the variation for organic carbon, pH and EC respectively. Lin's concordance correlation coefficient (CCC) results showed that EC had the highest CCC value (70%) whereas pH and OC had the lowest with CCC values of 37 and 38% respectively (Fig. 3). The
Fig. 5. Predicted Organic Carbon map.
159
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al.
Fig. 6. Predicted pH map.
prediction of pH followed by enhanced vegetation index. Mosleh et al. (2016) reported that the plan curvature was a main important variable for prediction of pH in the regression model. Due to lesser terrain variation (> 70% area are < 3% slope) and high variation in soil properties, the terrain factors did not emerged as important predictor in predicting the soil properties in this study.
Lin's CCC of 70% indicating moderate predictive performance. Overall, complex regional soil pattern with changing land use are the causes for weak prediction of these dynamic soil properties. Grimm et al. (2008) explained that prediction accuracy of RFM model is low in surface soil compared to sub surface. The prediction performance of model could be improved by additional environmental covariates like geology, ground water depth map and high resolution DEM attributes.
3.5. Spatial prediction of soil properties 3.4. Importance of predictor variables for predicting soil properties Mapping soil properties is a preliminary step towards decision making such as the delineation of suitable crop growing areas or identification of polluted or affected areas. We spatially predicted the OC, pH and EC concentrations in the surface (0–30 cm) using Random Forest (Figs. 5–7). The predicted organic carbon varied from 0.24–1.03%. The soil reaction and electrical conductivity ranged from 6.9 to 9.0 and 0.11–0.97 dsm− 1 respectively (Table 5). The spatial prediction of soil properties suggested that distribution of soil properties on the surface are highly variable due to variations in land management and land use. The spatial resolution of the maps helps to assess and monitor the soil health at watershed or village level. (Vågen et al., 2013; Grimm et al., 2008). The maps of pH and EC can be used to identify the areas with high salinity or alkalinity. These predicted maps can be used for identification of important soil fertility constraints, salinity or alkalinity (Vågen et al., 2016) and also to determine the soil degradation risk and suggest suitable management options for better reclamation.
RFM model estimates the importance of covariates based on how best or worse the prediction would be if one or more variable is removed (Prasad et al., 2006) and also it protects elimination of good predictor variables which are important for the model. Fig. 4a–c shows the variable importance rankings of Random Forest model for OC, pH and EC. EVI emerged as top predictor for prediction of organic carbon followed by NDVI and band 1 of satellite imagery. The results confirmed that vegetation cover influences the surface organic carbon distribution (Wiesmeier et al., 2011). Sreenivas et al. (2016) reported that land cover is the most important predictor for prediction of organic carbon density in India followed by maximum temperature. Adhikari et al. (2014) declared that rainfall, land use and soil type were the most important variables controlling OC distribution. For electrical conductivity, drainage emerged as most influenced factor followed by MrRTF and NDVI. Kühn et al. (2009) reported that terrain attributes had weak and non-significant effect on prediction of EC whereas geological unit and ground water table map have significant effect in the regression models. NDVI emerged as most important predictor for 160
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al.
Fig. 7. Predicted electrical conductivity map.
Acknowledgement
Table 5 Summary statistics predicted soil properties. Parameters
OC (%)
pH
EC (dsm− 1)
Mean Max Min Standard deviation Skewness Kurtosis
0.60 1.03 0.24 0.12 0.76 0.07
8.1 9.0 6.9 0.33 −0.34 −0.33
0.32 0.97 0.11 0.17 1.96 2.80
The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improve the quality of manuscript. Appendix A. Supplementary data Supplementary data associated with this article can be found in the online version, at doi: http://dx.doi.org/10.1016/j.geodrs.2017.07. 005. These data include the Google maps of the most important areas described in this article.
4. Conclusion In the present study, digital map of three major soil properties, OC, pH and EC were prepared using Random Forest model techniques. General spatial patterns of soil properties (OC, pH and EC) can be predicted easily by using digital soil mapping approach. Different covariates including landuse, terrain, legacy soil information and satellite imagery were used for prediction. The output of RFM techniques showed that combination of different covariates explains 23, 30 and 62% of the variation in OC, pH, and EC respectively in the study area and the uncertainty can be reduced by higher sampling density and incorporating additional datasets like high resolution digital elevation models. Overall mapping based on RFM technique based prediction can help to map the soil properties at faster and more efficiently.
References Adhikari, K., Hartemink, A.E., Minasny, B., Bou Kheir, R., Greve, M.B., 2014. Digital mapping of soil organic carbon contents and stocks in Denmark. PLoS One 9 (8), 1–13. Akpa, S.I.C., Odeh, I.O.A., Bishop, T.F.A., Hartemink, A.E., 2014. Digital mapping of soil particle-size fractions for Nigeria. Soil Sci. Soc. Am. J. 78, 1953–1966. Aksoy, E., Panagos, P., Montanarella, L., 2012. Spatial prediction of soil organic carbon of crete by using geostatistics. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 149–154. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. http://dx.doi.org/10.1023/ A:1010933404324. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Regression Trees- Wadsworth & Brooks Wadsworth Statistics/Probability Series. Cao, B., Grunwald, S., Xiong, X., 2012. Cross-regional digital soil carbon modeling in two contrasting soil-ecological regions in the U.S. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 103–106. de Carvalho Junior, W., Lagacherie, Philippe, da Silva Chagas, Cesar, Filho, Braz
161
Geoderma Regional 10 (2017) 154–162
S. Dharumarajan et al.
18–21. Malone, B.P., McBratney, A.B., Minasny, B., Laslett, G.M., 2009. Mapping continuous depth functions of soil carbon storage and available water capacity. Geoderma 154, 138–152. http://dx.doi.org/10.1016/j.geoderma.2009.10.007. Mansuy, N., Thiffault, E., Paré, D., Bernier, P., Guindon, L., Villemaire, P., Poirier, V., Beaudoin, A., 2014. Digital mapping of soil properties in Canadian managed forests at 250 m of resolution using the k-nearest neighbor method. Geoderma 235-236, 59–73. McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M.S., Shatar, T.M., 2000. An overview of pedometric techniques for use in soil survey. Geoderma 97, 293–327. http://dx.doi.org/10.1016/S0016-7061(00)00043-4. McBratney, A.B., Mendonça, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–52. McKenzie, N.J., Ryan, P.J., 1999. Spatial prediction of soil properties using environmental correlation. Geoderma 89, 67–94. Minasny, B., McBratney, A.B., Lark, R.M., Mendonça Santos, M.D.L., 2008. Digital soil mapping technologies for countries with sparse data infrastructures. In: Hartemink, A.E., McBratney, A.B. (Eds.), Digital Soil Mapping With Limited Data. Springer, London, pp. 15–30. Mosleh, Z., Salehi, M.H., Jafari, A., Borujeni, I.E., Mehnatkesh, A., 2016. The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environ. Monit. Assess. 188 (195). http://dx.doi.org/10.1007/s10661-016-5204-8. NBSS & LUP, 2009. Soil Resources of Anantapur District, Andhra Pradesh. NBSS & LUP publication No. 1017. NBSS & LUP, Nagpur, India. Odeh, I.O.A., McBratney, A.B., 2000. Using AVHRR images for spatial prediction of clay content in the lower Namoi valley of eastern Australia. Geoderma 97, 237–254. http://dx.doi.org/10.1016/S0016-7061(00)00041-0. Powers, J.S., Schlesinger, W.H., 2002. Relationships among soil carbon distributions and biophysical factors at nested spatial scales in rain forests of northeastern Costa Rica. Geoderma 109, 165–190. Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9, 181–199. http://dx.doi.org/10.1007/s10021-005-0054-1. Priori, S., Bianconi, N., Constantini, E.A.C., 2014. Can γ-radiometrics predict soil textural data and stoniness in different parent materials? A comparison of two machine learning methods. Geoderma 226-227, 354–364. Reuter, H.I., Nelson, A., 2009. Geomorphometry in ESRI packages. In: Hengl, T., Reuter, H.I. (Eds.), Geomorphometry: Concepts, Software, Applications, Dev. Soil Sci. 33. Elsevier, New York, pp. 269–291. Rodriguez-Galiano, V.F., Chica-Rivas, M., 2014. Evaluation of different machine learning methods for land cover mapping of a Mediterranean area using multi-seasonal Landsat images and digital terrain models. Int. J. Digit Earth 7, 492–509. http://dx. doi.org/10.1080/17538947.2012.748848. Rogan, J., Miller, J., Stow, D., Franklin, J., Levien, L., Fischer, C., 2003. Land-cover change monitoring with classification trees using landsat TM and ancillary data. Photogramm. Eng. Remote. Sens. 69, 784–793. Simbahan, G.C., Dobermann, A., Goovaerts, P., Ping, J.L., Haddix, M.L., 2006. Fine resolution mapping of soil organic carbon based on multivariate secondary data. Geoderma 132, 471–489. Sreenivas, K., Dadhwal, V.K., Kumar, Suresh, Sri Harsha, G., Mitran, Tarik, Sujatha, G., Janaki Rama Suresh, G., Fyzee, M.A., Ravisankar, T., 2016. Digital organic and inorganic carbon mapping of India. Geoderma 269, 160–173. Tekin, Y., Kuang, B., Mouazen, A.M., 2013. Potential of on-line visible and near infrared spectroscopy for measurement of pH for deriving variable rate lime recommendations. Remote Sens. 13, 10177–10190. http://dx.doi.org/10.3390/s130810177. Thompson, J.A., Pena-Yewtukhiw, E.M., Grove, J.H., 2006. Soil-landscape modelling across a physiographic region: topographic patterns and model transportability. Geoderma 133, 57–70. Vågen, T.G., Winowiecki, L.A., Abegaz, A., Hadgu, K.M., 2013. Landsat-based approaches for mapping of land degradation prevalence and soil functional properties in Ethiopia. Remote Sens. Environ. 134, 266–275. Vågen, T.G., Winowiecki, L.A., Tondoh, J.E., Desta, L.T., Gumbricht, T., 2016. Mapping of soil properties and land degradation risk in Africa using MODIS reflectance. Geoderma 263, 216–225. Vancampenhout, K., Nyssen, J., Gebremichael, D., Deckers, J., Poesen, J., Haile, M., Moeyersons, J., 2006. Stone bunds for soil conservation in the northern Ethiopian highlands: impacts on soil fertility and crop yield. Soil Tillage Res. 90, 1–5. Walkley, A., Black, I.A., 1934. An estimation of the method for determining soil organic matter and a proposed modification of the chromic acid titration method. Soil Sci. 37, 29–38. Wiesmeier, M., Barthold, F., Blank, B., Kögel-Knabner, I., 2011. Digital mapping of soil organic matter stocks using random Forest modeling in a semi-arid steppe ecosystem. Plant Soil 340, 7–24. http://dx.doi.org/10.1007/s11104-010-0425-z.
Calderano, Bhering, Silvio Barge, 2014. A regional-scale assessment of digital mapping of soil attributes in a tropical hillslope environment. Geoderma 232–234, 479–486. Ciampalini, R., Lagacherie, P., Hamrouni, H., 2012. Documenting GlobalSoilMap.net grid cells from legacy measured soil profile and global available covariates in Northern Tunisia. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 439–444. Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote Sens. Environ. 80, 185–201. Friedl, M.A., Brodley, C.E., 1997. Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61, 399–409. Gastaldi, G., Minasny, B., McBratney, A.B., 2012. Mapping the occurrence and thickness of soil horizons within soil profiles. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 145–148. Gessler, P.E., McKenzie, N.J., Hutchison, M.F., 1996. Progress in soil-landscape modelling and spatial prediction of soil attributes for environmental model. In: Proceedings of the Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Barbara, CA. January. National Center for Geographic Information and Analysis, Sante Fe, NM, pp. 21–26. Giasson, E., Clarke, R.T., Inda Júnior, A.V., Merten, G.H., Tornquist, C.G., 2006. Digital soil mapping using logistic regression on terrain parameters in Southern Brazil. Sci. Agric. 63, 262–268. Gray, J.M., Humphreys, G.S., Deckers, J.A., 2009. Relationships in soil distribution as revealed by a global soil database. Geoderma 150, 309–323. Greve, M.H., Bou Kheir, R., Greve, M.B., Bøcher, P.K., 2012. Using digital elevation models as an environmental predictor for soil clay contents. Soil Sci. Soc. Am. J. 76, 2116–2127. http://dx.doi.org/10.2136/sssaj2010.0354. Grimm, R., Behrens, T., Märker, M., Elsenbeer, H., 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island—digital soil mapping using random forests analysis. Geoderma 146, 102–113. http://dx.doi.org/10.1016/j. geoderma. 2008.05.008. Grinand, C., Arrouays, D., Laroche, B., Martin, M.P., 2008. Extrapolating regional soil landscapes from an existing soil map: sampling intensity, validation procedures, and integration of spatial context. Geoderma 143, 180–190. Hartemink, A.E., McBratney, A.B., 2008. A soil science renaissance. Geoderma 148, 123–129. http://dx.doi.org/10.1016/j.geoderma.2008.10.006. Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning. In: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York. Henderson, B.L., Elisabeth, N.B., Christopher, J.M., Simon, D.A.P., 2005. Australia-wide predictions of soil properties using decision trees. Geoderma 124, 383–398. http:// dx.doi.org/10.1016/j.geoderma.2004.06.007. Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 120, 75–93. Hengl, T., Heuvelink, G.B.M., Kempen, B., Leenaars, J.G.B., Walsh, M.G., et al., 2015. Mapping soil properties of Africa at 250 m resolution: random forests significantly improve current predictions. PLoS One 10 (6), e0125814. http://dx.doi.org/10. 1371/journal.pone.0125814. Huang, C., Davis, L.S., Townshend, J.R.G., 2002. An assessment of support vector machines for land cover classification. Int. J. Remote Sens. 23, 725–749. Jackson, M.L., 1973. Soil Chemical Analysis. Prentice Hall of India Pvt. Ltd., New Delhi. Jenny, H., 1941. Factors of Soil Formation, a System of Quantitative Pedology. McGrawHill, New York. Kempen, B., Brus, D.J., Heuvelink, G.B.M., Stoorvogel, J.J., 2009. Updating the 1:50,000 Dutch soil map using legacy soil data: a multinomial logistic regression approach. Geoderma 151, 311–326. Kidd, D.B., Malone, B.P., McBratney, A.B., Minasny, B., Webb, M.A., 2014. Digital mapping of a soil drainage index for irrigated enterprise suitability in Tasmania, Australia. Soil Res. 52, 107–119. Kovačevic, M., Bajat, B., Gajić, B., 2010. Soil type classification and estimation of soil properties using support vector machines. Geoderma 154, 340–347. Kühn, J., Brenning, A., Wehrhan, M., Koszinski, S., Sommer, M., 2009. Interpretation of electrical conductivity patterns by soil properties and geological maps for precision agriculture. Precis. Agric. 10, 490–507. Lagacherie, P., McBratney, A.B., 2007. Spatial soil information systems and spatial Soil inference systems: perspectives for digital soil mapping. In: Lagacherie, P., McBratney, A.B., Voltz, M. (Eds.), Digital Soil Mapping: An Introductory Perspective. Elsevier B.V. hlm, The Netherlands, pp. 3–22. Lagacherie, P., Legros, J., Burfough, P., 1995. A soil survey procedure using the knowledge of soil pattern established on a previously mapped reference area. Geoderma 65, 283–301. Liaw, A., Wiener, M., 2002. Classification and regression by Random Forest. R News 2,
162