Spatial prediction of major soil properties using Random Forest techniques - A case study in semi-arid tropics of South India

Geoderma Regional 10 (2017) 154–162 Contents lists available at ScienceDirect Geoderma Regional journal homepage: www.elsevier.com/locate/geodrs Sp...

Download PDF

2MB Sizes 0 Downloads 28 Views

Report

PDF Reader
Full Text

Geoderma Regional 10 (2017) 154–162

Contents lists available at ScienceDirect

Geoderma Regional journal homepage: www.elsevier.com/locate/geodrs

Spatial prediction of major soil properties using Random Forest techniques A case study in semi-arid tropics of South India

MARK

S. Dharumarajana,⁎, Rajendra Hegdea, S.K. Singhb a b

ICAR-National Bureau of Soil Survey and Land Use Planning, Regional Centre, Hebbal, Bangalore 560024, India ICAR-National Bureau of Soil Survey and Land Use Planning, Amaravati Road, Nagpur-440033, India

A R T I C L E I N F O

A B S T R A C T

Keywords: Soil properties Digital soil mapping Random Forest model Prediction Validation

The purpose of the study is to map the spatial variation of major soil properties in Bukkarayasamudrum mandal of Anantapur district, India using Random Forest model. The study area is divided into diﬀerent Physiographic Land Units (PLU) based on landform, landuse and slope. Random Forest model (RFM) was developed based on ﬁeld survey data of 116 surface samples (0–30 cm) representing all major PLU units of the study area. RFM is neither sensitive to over ﬁtting nor to noise features and has capacity to handle large datasets. High resolution satellite imagery (IRS LISS IV data- 3 bands), terrain attributes such as elevation, slope, aspect, topographic wetness index, topographic position index, plan & proﬁle curvature, Multi-resolution index of valley bottom ﬂatness and Multi-resolution ridge top ﬂatness, Vegetation factors like NDVI, EVI and land use land cover (LULC) are used as covariates along with legacy soil data of 1:50,000 scale. The predicted organic carbon, pH and EC ranged from 0.24–1.03%, 6.9–9.0, 0.11–0.97 dsm− 1 respectively. The model performance was evaluated based on Coeﬃcient of determination (R2) and Lin's Concordance coeﬃcient (CCC). The model performed well with R2 and CCC values of 0.23 and 0.38 for SOC, 0.30 and 0.37 for pH, and 0.62 and 0.70 for EC respectively. Variable importance ranking of RFM model showed that EVI and NDVI are the most important predictors for organic carbon whereas drainage and NDVI for EC and pH respectively. This technique can be applied to similar landscapes with more observations to reﬁne the spatial resolution of soil properties.

1. Introduction Most of the environment modeling work requires spatially continuous and quantitative soil information especially at larger scale (Gessler et al., 1996; Minasny et al., 2008; Hartemink and McBratney, 2008). Such information is always not available at the required scale (Lagacherie et al., 1995; McBratney et al., 2003; Greve et al., 2012) and mapping at high accuracy is always challenging, time consuming and costly (Vågen et al., 2016). In recent past, digital soil mapping (DSM) techniques have become more popular among the natural resource mappers as it oﬀers the solution for spatial prediction of soil attributes in quicker way (Lagacherie and McBratney, 2007). DSM mapping is based on soil forming factors models like CLORPT (Jenny, 1941) and SCORPAN method which describes the soil formation as a function of climate(c), organisms (o), topography (r), parent materials (p), age (a) and spatial location (McBratney et al., 2003). DSM techniques can be applied for both prediction of quantitative outputs like sand, silt, clay, pH, electrical conductivity and qualitative outputs like soil taxonomic units using

⁎

Corresponding author. E-mail address: [email protected] (S. Dharumarajan).

http://dx.doi.org/10.1016/j.geodrs.2017.07.005 Received 12 April 2017; Received in revised form 22 July 2017; Accepted 24 July 2017 Available online 26 July 2017 2352-0094/ © 2017 Elsevier B.V. All rights reserved.

regression algorithm (Wiesmeier et al., 2011; Akpa et al., 2014) and classiﬁcation algorithm (McBratney et al., 2003; Hastie et al., 2009; Kidd et al., 2014) respectively. In DSM, the interrelationships between the soil and environmental covariates are brought out to predict the soil information by using diﬀerent statistical and geostatistical models. For instance, Organic carbon is predicted by multiple linear regression (Powers and Schlesinger, 2002; Thompson et al., 2006), neural networks (Mansuy et al., 2014; Malone et al., 2009), tree models (Henderson et al., 2005), generalized linear models (McKenzie and Ryan, 1999) and regression kriging (Hengl et al., 2004; Simbahan et al., 2006). Use of Machine learning algorithms in digital soil mapping is another approach to modeling soil classes and soil properties (Huang et al., 2002; Rogan et al., 2003).These algorithms are faster and more eﬃcient in classiﬁcation (Foody, 2002; Friedl and Brodley, 1997). Machine language techniques predict the soil property for unvisited location using interrelationship with the environmental covariates such as digital elevation models (DEMs) (McBratney et al., 2000), climatic parameters (Akpa et al., 2014), remote sensing imageries (Odeh and

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al.

McBratney, 2000), and legacy soil information (Akpa et al., 2014). Some of the major machine language techniques used in DSM are classiﬁcation and regression trees (Breiman et al., 1984), k-nearest neighbour (Mansuy et al., 2014), multinomial logistic regression (Kempen et al., 2009), logistic model trees (Giasson et al., 2006), Support vector machine (Kovačevic et al., 2010; Priori et al., 2014) and Random Forest model (Vågen et al., 2016). Out of four diﬀerent machine language techniques studied by Rodriguez-Galiano and ChicaRivas (2014), Random Forest model was found to be most accurate and more robust in noise and data reduction. It has capacity to handle both quantitative and qualitative datasets. The potential of RFM in digital soil mapping has been demonstrated by several authors (Grimm et al., 2008; Hastie et al., 2009; Wiesmeier et al., 2011; Vågen et al., 2016; Sreenivas et al., 2016). Despite the wide spread of various machine learning algorithms used in DSM, limited work is available in Indian subcontinent (Sreenivas et al., 2016). In this context, the present study was carried out is to produce a ﬁne resolution map for major soil properties, Organic carbon, pH and EC in Bukkarayasamudrum mandal of Anantapur district representing semi-arid tropics of south India using Random Forest model techniques.

the district is 24,808 ha. The climate of Bukkarayasamudrum mandal is warm and classiﬁed as hot and arid. The mean minimum and maximum temperatures are 22.9° and 34 °C with average rainfall of 556 mm. Major geology is granite- gneisses. Quartz, feldspar and mica, are the major mineral composition of granite and gneissic rock types. The elevation ranges from 295 to 595 m MSL. The major part of mandal has nearly level to very gently sloping with a slope ranging from 1 to 3%. The major soils of Bukkarayasamudrum mandal are moderately deep (75–100 cm) gravelly red clayey soils (Clayey-skeletal, mixed, isohyperthermic Typic Haplargids) followed by shallow (25–50 cm) gravelly red clayey soils (Clayey-skeletal, mixed isohyperthermic, Lithic Haplargids) and deep (100–150 cm) black clayey soils (Fine, mixed isohyperthermic (cal) Ustic Haplargids). The study area is chronically prone to drought due to its low and unpredictable nature of length of the growing period (LGP). The LGP of Bukkarayasamudrum mandal is 11 weeks starting from third week of August and ending in the last week of October. The net cultivated area is 47.4% of total area. The major rain fed crops are groundnut, pearl millet, sorghum and minor millet while rice, cotton and vegetables are important irrigated crops.

2. Material and methods

We used physiographic land unit map (PLU) as a base for soil sampling. Physiographic land unit is the assemblage of landform, slope and land use. Landform represents testimony of climatic events whereas slope and land use represent the inﬂuence of present climatic conditions on the soil formation. Survey of India toposheets (1:50,000 scale) and Resourcesat-2 LISS IV (Linear imaging self scanner - IV) imagery (5.8 m resolution) were used for preparation of landform and land use map.

2.2. Soil sampling and analysis

2.1. Description of the study area Bukkarayasamudrum mandal is located between 13°37′ 51″ and 14°48′ 09″ N latitudes and 77°33′ 47″ and 77°47′45″ E longitudes in Anantapur district of Andhra Pradesh, India (Fig. 1). The total area of

Fig. 1. Location map.

155

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al.

Fig. 2. Physiographic land unit of Bukkarayasamudrum mandal.

Slope map developed from SRTM DEM data. Three layers i.e. landform, slope and land use were integrated taking into consideration the area, morphology of the landform units and its relation with the neighbouring objects to develop physiographic land unit (PLU) map (Fig. 2). PLU unit legend comprises of 3 to 4 letters. First letter describes the landform, second letter for slope and third and fourth letter describes type of land use. Soil samples were collected randomly from all major PLU units to capture existing soil variability. About 116 Surface (0–30 cm) samples were collected randomly covering all the PLU units of study area to capture existing soil variability. The Soil samples were air dried and processed for laboratory analysis. Organic carbon was estimated by Walkley and Black (1934) method. The soil reaction (1:2.5 soil water suspension) and electrical conductivity were estimated based on Jackson (1973) methodology.

variables along with Landform map and land use land cover map of Bukkaraysamudhrum mandal. Normalized Diﬀerence Vegetation Index (NDVI) and Enhanced vegetation index (EVI) from MODIS data of 16 days with 250 m resolution was extracted (MOD13Q1) for 5 years (2011–2015) and average NDVI and EVI data was used for modeling. In addition, Band 1 (Green), Band 2 (Red) and Band 3 (NIR) of Resourcesat 2 LISS IV data were used as covariates for prediction of soil properties. All the datasets were brought into the LCC (Lambert Conformal Conic) projection using ArcGIS 10 toolbox. The environmental variables were intersected for 116 sampling points. The ﬁnal database was composed of 20 environmental variables (14 quantitative and 6 categorical) and three target variable (OC, pH and EC) for modeling (Table 1).

2.3. Environment variables

2.4. Spatial prediction of soil properties using Random Forest model

We used diﬀerent set of environmental covariates to predict the soil properties. A Digital elevation model (DEM) was obtained from Shuttle Radar Topography Mission (SRTM) data and processed using ArcGIS10 data management tool box. The derivates of DEM like slope, aspect, curvatures (plan and proﬁle), topographic wetness Index (TWI) and topographic position index (TPI) were derived by using ArcGIS10 (Reuter and Nelson, 2009). Multi-resolution Index of Valley Bottom Flatness (MrVBF) and Multi-resolution Ridge Top Flatness (MrRTF) were derived by using Saga-GIS 2.3.1 version (Kidd et al., 2014). Type of soil, soil depth, drainage and soil texture are derived from soil map of Anantapur district (1:50,000 scale) (NBSS & LUP, 2009) and used as

Random Forest 4.6 package in R environment was used for prediction of soil properties of Bukkarayasamudrum mandal. Random Forest model (RFM) is a extension of regression tree model which works based on assemblage of a number of classiﬁcation and regression trees by means of two levels of randomization for each tree in the forest (Breiman, 2001). RFM improves the prediction accuracy and reduces model over ﬁtting (Breiman, 2001; Liaw and Wiener, 2002). It is non sensitive to missing data and has capacity to handle large number of both quantitative and categorical data (Grinand et al., 2008). Three parameters decide the ﬁtting of Random Forest model viz., number of tree (ntree), minimum no of samples at terminal node nmin and number 156

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al.

Table 1 Description of environmental covariates. Predictor Remote sensing Band 1(Green) Band2 (Red) Band3 (NIR)

Source

Resolution

Type

Range

5.8 m 5.8 m 5.8 m

Q Q Q

73–398 82–314 103–264

DEM DEM DEM DEM DEM DEM DEM

30 m 30 m 30 m 30 m 30 m 30 m 30 m

Q Q Q Q Q Q Q

295–595 0–42% 1–359 0–0.80 − 1069-2620 − 2.08-3.02 − 2.97-2.57

DEM DEM DEM

30 m 30 m 30 m

Q Q C

0–7.61 0.5.88 –

250 m _16 days 5.8 m

Q Q C

932–6502 608–4915 –

Legacy data Anantapur Legacy data Anantapur

1:50,000 1:50,000

C C

– –

Legacy data Anantapur Legacy data Anantapur

1:50,000 1:50,000

C C

– –

imagery Resourcesat2 LISS IV Resourcesat2 LISS IV Resourcesat2 LISS IV

Terrain attributes Elevation (m) SRTM Slope (%) SRTM Aspect SRTM TPI SRTM TWI SRTM Plan curvature SRTM Proﬁle SRTM curvature MrVBF SRTM MrRTF SRTM Landform SRTM

Vegetation attributes NDVI MOD13Q1(2011–2015) EVI MOD13Q1(2011–2015) Landuse Resourcesat2 LISS IV landcover Soil attributes Soil type Soil depth (cm) Texture Drainage

Table 4 Performance of RFM in modeling of soil properties. Parameters

OC (%)

pH

EC (dsm− 1)

R2 CCC RMSE

0.23 0.38 0.20

0.30 0.37 0.75

0.62 0.70 0.14

which was used for initial assessment of performance of model. For the present study, 75% of observations were used for RFM modeling and 25% for validation. Number of trees (ntree) for ﬁnal model was selected based on error estimates from the OOB sample. Initial decision tree was produced with 500 trees and the OOB overall error stabilized at 200 trees; therefore, 200 is used for the parameter ntree in the ﬁnal model. 2.5. Model accuracy assessment Prediction performance of Random Forest models was evaluated based on three parameters viz. coeﬃcient of determination which is deﬁned by percentage of variation explained by the model, Root mean square error (measure of model accuracy) and Lin's concordance correlation coeﬃcient which is a measure of agreement between predicted and observed values. The good models have Coeﬃcient of determination and Concordance correlation coeﬃcient is equal or close to 1 and root mean square error close to 0. n

Coefficeint of determination (R2) = 1 −

Q - Quantitative and C - Categorical.

∑i = 1 (pi − oi)2 n ∑i = 1

−

(oi − oi)2

−

Parameters

OC (%)

pH

EC (dsm

Mean Max Min Standard deviation Skewness Kurtosis

0.56 1.35 0.07 0.27 0.54 − 0.12

8.2 9.7 5.6 0.88 −1.02 0.51

0.28 2.21 0.05 0.28 3.89 20.94

n 1 Root mean squared error (RMSE ⎞⎟ = n ∑i = 1 (oi − pi)2. ⎠ 2ρσ σ Concordance correlation coeﬃcient (ρc) = 2 2 0 p

−1

)

σ0 + σ p + (μ 0 − μ p)2

3. Results and discussion 3.1. Physiographic land units

Variable

EC

OC

pH

NDVI EVI Elevation Slope Aspect Proﬁle curvature Plan curvature TWI TPI MrVBF MrRTF Band1 Band2 Band3

0.110 0.079 −0.118 −0.076 0.131 −0.026 0.028 0.052 0.029 0.271(b) 0.213(a) 0.001 −0.103 0.007

0.205(a) 0.247(b) − 0.087 0.136 − 0.019 0.013 0.068 − 0.011 − 0.031 0.093 − 0.084 0.210(a) 0.034 0.112

0.018 0.008 − 0.068 0.044 − 0.144 − 0.100 − 0.026 0.103 0.067 0.060 0.154 0.120 0.039 0.162

b

μ0 and μp

are the means of observed and predicted values and σ0 2and σp2 are corresponding variance ρ is pearson correlation coeﬃcient between observed and predicted values.

Table 3 Correlation analysis.

a

−

where, pi and oi are predicted and observed values, pi and oi are means of predicted and observed values.

Table 2 Summary statistics soil properties of Bukkarayasamudrum mandal of Anantapur district (N = 116).

Five landform units viz. denudational hills, dissected pediments, pediplains, alluvial deposits and valleys were delineated based on online visual interpretation of remote sensing data and integrated with land use map and slope map of Bukkarayasamudrum mandal to derive physiographic land units. The results of land use land cover classiﬁcation showed that out of ﬁve major land uses (single crop, double crop, scrub, forest and fallow land), single crops occupy 62.9% of study area followed by double crops (12.1%) and scrub lands (6.3%). Slope map derived from SRTM DEM showed that nearly level (0–1%) and very gently sloping lands (1–3%) covered 77.5% of total area. The integration of these three layers yielded 33 physiographic land units. In the denudational hills, three PLU units were identiﬁed based on variation in slope (E-10-15% and F-15-25%) and land-use/land-cover classes (forest and scrub). Three PLU units were delineated on the dissected pediments and 15 PLU units in the pediplains. Alluvial deposits and valley were divided into 12 PLU units on the basis of slope (A-0-1% & B-1-3%) and land use (single crop, double crop and fallow). The PLU units were homogenous with respect to geology, land use and slope.

Correlation is signiﬁcant at the 0.01 level (2-tailed). Correlation is signiﬁcant at the 0.05 level (2-tailed).

of predictors used for ﬁtting the tree (Mtry). Relatively important predictors identiﬁed based on number of time predictor were used in nodes. The internal out-of-bag (OOB) prediction generated through bootstrapping provides an estimate of accuracy across the decision trees

3.2. Summary statistics of soil properties Summary of the soil properties are presented in Table. 2. The soil pH 157

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al.

Fig. 3. Concordance correlation coeﬃcient of OC, pH and EC. (caption on next page)

158

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al. Fig. 4. a. Variable importance rankings of RFM model for prediction of organic carbon (% IncMSE = percent increase in Mean Square Error) b Variable importance rankings of RFM model for prediction of soil reaction (% IncMSE = percent increase in Mean Square Error) c. Variable importance rankings of RFM model for prediction of electrical conductivity (% IncMSE = percent increase in Mean Square Error).

observed RMSE values of validation samples were 0.20% for OC, 0.75 for pH and 0.14 dsm− 1 for EC. The performance of the model for prediction of organic carbon is quite low (R2 = 23) due to dynamic nature, contrasting land use and low terrain variation. The poor performance may also be related to the low levels of soil organic carbon. Vancampenhout et al. (2006) reported performance of DSM models is poor in low organic carbon content soils compared to soils having high organic carbon. The prediction of organic carbon was poor relative to the most previous studies results (Wiesmeier et al., 2011, R2 = 76%; Sreenivas et al., 2016, R2 = 82%; Vågen et al., 2016, R2 = 69–77%), although the performance was close to several previous studies (de Carvalho et al., 2014, R2 = 20%; Gastaldi et al., 2012, R2 = 18%). The performance of the model is also depending on sampling density. The sampling density for this study is 0.46 samples km− 2. Though the sampling intensity is higher than many previously reported studies (Aksoy et al., 2012; Gray et al., 2009; Cao et al., 2012; Ciampalini et al., 2012), the model did not perform well due to high variability soil properties and low terrain variation. Higher sample density required for better results in tropical countries where soil pattern is complex due to the geological uplift than other regions (de Carvalho et al., 2014). Like organic carbon, the model predicted only 30% of pH variation which is also poor compared to previous studies (Tekin et al., 2013 and Vågen et al., 2016). Hengl et al. (2015) declared that RFM model improved the mapping accuracy of pH in compared to other linear regression models by 20%. Vågen et al. (2016) reported Random Forest model performed well with R2 of 0.88 and RMSE of 0.32–0.36. In the case of electrical conductivity, model predicted 62% of variation with

ranged from 5.6 to 9.7 with a mean and standard deviation of 8.2 and 0.88 respectively. Out of 116 samples analysed, 56 samples had pH of > 8.5. The Electrical conductivity ranged from 0.05 to 2.21 dsm− 1and organic carbon ranged from 0.07 to 1.35%. The frequency distributions of the soil properties showed that pH is negatively skewed whereas OC and EC are positively skewed. The higher variability in pH and OC is mainly attributed to land management. The results of correlation analysis (Table 3) between soil properties and environmental covariates showed that electrical conductivity correlated with MrVBF and MrRTF whereas organic carbon with EVI, NDVI, and Band 1. 3.3. Performance of Random Forest model in predicting soil properties The performance of Random Forest model was evaluated for each property by calculating uncertainty indicators viz. Coeﬃcient of determination (R2), Root Mean Square Error (RMSE), concordance correlation coeﬃcient (CCC). The results (Table 4) showed that the combination of diﬀerent categorical and quantitative covariates explained 23, 30 and 62% of the variation for organic carbon, pH and EC respectively. Lin's concordance correlation coeﬃcient (CCC) results showed that EC had the highest CCC value (70%) whereas pH and OC had the lowest with CCC values of 37 and 38% respectively (Fig. 3). The

Fig. 5. Predicted Organic Carbon map.

159

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al.

Fig. 6. Predicted pH map.

prediction of pH followed by enhanced vegetation index. Mosleh et al. (2016) reported that the plan curvature was a main important variable for prediction of pH in the regression model. Due to lesser terrain variation (> 70% area are < 3% slope) and high variation in soil properties, the terrain factors did not emerged as important predictor in predicting the soil properties in this study.

Lin's CCC of 70% indicating moderate predictive performance. Overall, complex regional soil pattern with changing land use are the causes for weak prediction of these dynamic soil properties. Grimm et al. (2008) explained that prediction accuracy of RFM model is low in surface soil compared to sub surface. The prediction performance of model could be improved by additional environmental covariates like geology, ground water depth map and high resolution DEM attributes.

3.5. Spatial prediction of soil properties 3.4. Importance of predictor variables for predicting soil properties Mapping soil properties is a preliminary step towards decision making such as the delineation of suitable crop growing areas or identiﬁcation of polluted or aﬀected areas. We spatially predicted the OC, pH and EC concentrations in the surface (0–30 cm) using Random Forest (Figs. 5–7). The predicted organic carbon varied from 0.24–1.03%. The soil reaction and electrical conductivity ranged from 6.9 to 9.0 and 0.11–0.97 dsm− 1 respectively (Table 5). The spatial prediction of soil properties suggested that distribution of soil properties on the surface are highly variable due to variations in land management and land use. The spatial resolution of the maps helps to assess and monitor the soil health at watershed or village level. (Vågen et al., 2013; Grimm et al., 2008). The maps of pH and EC can be used to identify the areas with high salinity or alkalinity. These predicted maps can be used for identiﬁcation of important soil fertility constraints, salinity or alkalinity (Vågen et al., 2016) and also to determine the soil degradation risk and suggest suitable management options for better reclamation.

RFM model estimates the importance of covariates based on how best or worse the prediction would be if one or more variable is removed (Prasad et al., 2006) and also it protects elimination of good predictor variables which are important for the model. Fig. 4a–c shows the variable importance rankings of Random Forest model for OC, pH and EC. EVI emerged as top predictor for prediction of organic carbon followed by NDVI and band 1 of satellite imagery. The results conﬁrmed that vegetation cover inﬂuences the surface organic carbon distribution (Wiesmeier et al., 2011). Sreenivas et al. (2016) reported that land cover is the most important predictor for prediction of organic carbon density in India followed by maximum temperature. Adhikari et al. (2014) declared that rainfall, land use and soil type were the most important variables controlling OC distribution. For electrical conductivity, drainage emerged as most inﬂuenced factor followed by MrRTF and NDVI. Kühn et al. (2009) reported that terrain attributes had weak and non-signiﬁcant eﬀect on prediction of EC whereas geological unit and ground water table map have signiﬁcant eﬀect in the regression models. NDVI emerged as most important predictor for 160

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al.

Fig. 7. Predicted electrical conductivity map.

Acknowledgement

Table 5 Summary statistics predicted soil properties. Parameters

OC (%)

pH

EC (dsm− 1)

Mean Max Min Standard deviation Skewness Kurtosis

0.60 1.03 0.24 0.12 0.76 0.07

8.1 9.0 6.9 0.33 −0.34 −0.33

0.32 0.97 0.11 0.17 1.96 2.80

The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improve the quality of manuscript. Appendix A. Supplementary data Supplementary data associated with this article can be found in the online version, at doi: http://dx.doi.org/10.1016/j.geodrs.2017.07. 005. These data include the Google maps of the most important areas described in this article.

4. Conclusion In the present study, digital map of three major soil properties, OC, pH and EC were prepared using Random Forest model techniques. General spatial patterns of soil properties (OC, pH and EC) can be predicted easily by using digital soil mapping approach. Diﬀerent covariates including landuse, terrain, legacy soil information and satellite imagery were used for prediction. The output of RFM techniques showed that combination of diﬀerent covariates explains 23, 30 and 62% of the variation in OC, pH, and EC respectively in the study area and the uncertainty can be reduced by higher sampling density and incorporating additional datasets like high resolution digital elevation models. Overall mapping based on RFM technique based prediction can help to map the soil properties at faster and more eﬃciently.

References Adhikari, K., Hartemink, A.E., Minasny, B., Bou Kheir, R., Greve, M.B., 2014. Digital mapping of soil organic carbon contents and stocks in Denmark. PLoS One 9 (8), 1–13. Akpa, S.I.C., Odeh, I.O.A., Bishop, T.F.A., Hartemink, A.E., 2014. Digital mapping of soil particle-size fractions for Nigeria. Soil Sci. Soc. Am. J. 78, 1953–1966. Aksoy, E., Panagos, P., Montanarella, L., 2012. Spatial prediction of soil organic carbon of crete by using geostatistics. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 149–154. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. http://dx.doi.org/10.1023/ A:1010933404324. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classiﬁcation and Regression Trees- Wadsworth & Brooks Wadsworth Statistics/Probability Series. Cao, B., Grunwald, S., Xiong, X., 2012. Cross-regional digital soil carbon modeling in two contrasting soil-ecological regions in the U.S. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 103–106. de Carvalho Junior, W., Lagacherie, Philippe, da Silva Chagas, Cesar, Filho, Braz

161

Geoderma Regional 10 (2017) 154–162

S. Dharumarajan et al.

18–21. Malone, B.P., McBratney, A.B., Minasny, B., Laslett, G.M., 2009. Mapping continuous depth functions of soil carbon storage and available water capacity. Geoderma 154, 138–152. http://dx.doi.org/10.1016/j.geoderma.2009.10.007. Mansuy, N., Thiﬀault, E., Paré, D., Bernier, P., Guindon, L., Villemaire, P., Poirier, V., Beaudoin, A., 2014. Digital mapping of soil properties in Canadian managed forests at 250 m of resolution using the k-nearest neighbor method. Geoderma 235-236, 59–73. McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M.S., Shatar, T.M., 2000. An overview of pedometric techniques for use in soil survey. Geoderma 97, 293–327. http://dx.doi.org/10.1016/S0016-7061(00)00043-4. McBratney, A.B., Mendonça, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–52. McKenzie, N.J., Ryan, P.J., 1999. Spatial prediction of soil properties using environmental correlation. Geoderma 89, 67–94. Minasny, B., McBratney, A.B., Lark, R.M., Mendonça Santos, M.D.L., 2008. Digital soil mapping technologies for countries with sparse data infrastructures. In: Hartemink, A.E., McBratney, A.B. (Eds.), Digital Soil Mapping With Limited Data. Springer, London, pp. 15–30. Mosleh, Z., Salehi, M.H., Jafari, A., Borujeni, I.E., Mehnatkesh, A., 2016. The eﬀectiveness of digital soil mapping to predict soil properties over low-relief areas. Environ. Monit. Assess. 188 (195). http://dx.doi.org/10.1007/s10661-016-5204-8. NBSS & LUP, 2009. Soil Resources of Anantapur District, Andhra Pradesh. NBSS & LUP publication No. 1017. NBSS & LUP, Nagpur, India. Odeh, I.O.A., McBratney, A.B., 2000. Using AVHRR images for spatial prediction of clay content in the lower Namoi valley of eastern Australia. Geoderma 97, 237–254. http://dx.doi.org/10.1016/S0016-7061(00)00041-0. Powers, J.S., Schlesinger, W.H., 2002. Relationships among soil carbon distributions and biophysical factors at nested spatial scales in rain forests of northeastern Costa Rica. Geoderma 109, 165–190. Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classiﬁcation and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9, 181–199. http://dx.doi.org/10.1007/s10021-005-0054-1. Priori, S., Bianconi, N., Constantini, E.A.C., 2014. Can γ-radiometrics predict soil textural data and stoniness in diﬀerent parent materials? A comparison of two machine learning methods. Geoderma 226-227, 354–364. Reuter, H.I., Nelson, A., 2009. Geomorphometry in ESRI packages. In: Hengl, T., Reuter, H.I. (Eds.), Geomorphometry: Concepts, Software, Applications, Dev. Soil Sci. 33. Elsevier, New York, pp. 269–291. Rodriguez-Galiano, V.F., Chica-Rivas, M., 2014. Evaluation of diﬀerent machine learning methods for land cover mapping of a Mediterranean area using multi-seasonal Landsat images and digital terrain models. Int. J. Digit Earth 7, 492–509. http://dx. doi.org/10.1080/17538947.2012.748848. Rogan, J., Miller, J., Stow, D., Franklin, J., Levien, L., Fischer, C., 2003. Land-cover change monitoring with classiﬁcation trees using landsat TM and ancillary data. Photogramm. Eng. Remote. Sens. 69, 784–793. Simbahan, G.C., Dobermann, A., Goovaerts, P., Ping, J.L., Haddix, M.L., 2006. Fine resolution mapping of soil organic carbon based on multivariate secondary data. Geoderma 132, 471–489. Sreenivas, K., Dadhwal, V.K., Kumar, Suresh, Sri Harsha, G., Mitran, Tarik, Sujatha, G., Janaki Rama Suresh, G., Fyzee, M.A., Ravisankar, T., 2016. Digital organic and inorganic carbon mapping of India. Geoderma 269, 160–173. Tekin, Y., Kuang, B., Mouazen, A.M., 2013. Potential of on-line visible and near infrared spectroscopy for measurement of pH for deriving variable rate lime recommendations. Remote Sens. 13, 10177–10190. http://dx.doi.org/10.3390/s130810177. Thompson, J.A., Pena-Yewtukhiw, E.M., Grove, J.H., 2006. Soil-landscape modelling across a physiographic region: topographic patterns and model transportability. Geoderma 133, 57–70. Vågen, T.G., Winowiecki, L.A., Abegaz, A., Hadgu, K.M., 2013. Landsat-based approaches for mapping of land degradation prevalence and soil functional properties in Ethiopia. Remote Sens. Environ. 134, 266–275. Vågen, T.G., Winowiecki, L.A., Tondoh, J.E., Desta, L.T., Gumbricht, T., 2016. Mapping of soil properties and land degradation risk in Africa using MODIS reﬂectance. Geoderma 263, 216–225. Vancampenhout, K., Nyssen, J., Gebremichael, D., Deckers, J., Poesen, J., Haile, M., Moeyersons, J., 2006. Stone bunds for soil conservation in the northern Ethiopian highlands: impacts on soil fertility and crop yield. Soil Tillage Res. 90, 1–5. Walkley, A., Black, I.A., 1934. An estimation of the method for determining soil organic matter and a proposed modiﬁcation of the chromic acid titration method. Soil Sci. 37, 29–38. Wiesmeier, M., Barthold, F., Blank, B., Kögel-Knabner, I., 2011. Digital mapping of soil organic matter stocks using random Forest modeling in a semi-arid steppe ecosystem. Plant Soil 340, 7–24. http://dx.doi.org/10.1007/s11104-010-0425-z.

Calderano, Bhering, Silvio Barge, 2014. A regional-scale assessment of digital mapping of soil attributes in a tropical hillslope environment. Geoderma 232–234, 479–486. Ciampalini, R., Lagacherie, P., Hamrouni, H., 2012. Documenting GlobalSoilMap.net grid cells from legacy measured soil proﬁle and global available covariates in Northern Tunisia. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 439–444. Foody, G.M., 2002. Status of land cover classiﬁcation accuracy assessment. Remote Sens. Environ. 80, 185–201. Friedl, M.A., Brodley, C.E., 1997. Decision tree classiﬁcation of land cover from remotely sensed data. Remote Sens. Environ. 61, 399–409. Gastaldi, G., Minasny, B., McBratney, A.B., 2012. Mapping the occurrence and thickness of soil horizons within soil proﬁles. In: Minasny, B., Malone, B.P., McBratney, A.B. (Eds.), Digital Soil Assessments and Beyond. CRC Press/Balkema, London, pp. 145–148. Gessler, P.E., McKenzie, N.J., Hutchison, M.F., 1996. Progress in soil-landscape modelling and spatial prediction of soil attributes for environmental model. In: Proceedings of the Third International Conference/Workshop on Integrating GIS and Environmental Modeling, Santa Barbara, CA. January. National Center for Geographic Information and Analysis, Sante Fe, NM, pp. 21–26. Giasson, E., Clarke, R.T., Inda Júnior, A.V., Merten, G.H., Tornquist, C.G., 2006. Digital soil mapping using logistic regression on terrain parameters in Southern Brazil. Sci. Agric. 63, 262–268. Gray, J.M., Humphreys, G.S., Deckers, J.A., 2009. Relationships in soil distribution as revealed by a global soil database. Geoderma 150, 309–323. Greve, M.H., Bou Kheir, R., Greve, M.B., Bøcher, P.K., 2012. Using digital elevation models as an environmental predictor for soil clay contents. Soil Sci. Soc. Am. J. 76, 2116–2127. http://dx.doi.org/10.2136/sssaj2010.0354. Grimm, R., Behrens, T., Märker, M., Elsenbeer, H., 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island—digital soil mapping using random forests analysis. Geoderma 146, 102–113. http://dx.doi.org/10.1016/j. geoderma. 2008.05.008. Grinand, C., Arrouays, D., Laroche, B., Martin, M.P., 2008. Extrapolating regional soil landscapes from an existing soil map: sampling intensity, validation procedures, and integration of spatial context. Geoderma 143, 180–190. Hartemink, A.E., McBratney, A.B., 2008. A soil science renaissance. Geoderma 148, 123–129. http://dx.doi.org/10.1016/j.geoderma.2008.10.006. Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning. In: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York. Henderson, B.L., Elisabeth, N.B., Christopher, J.M., Simon, D.A.P., 2005. Australia-wide predictions of soil properties using decision trees. Geoderma 124, 383–398. http:// dx.doi.org/10.1016/j.geoderma.2004.06.007. Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 120, 75–93. Hengl, T., Heuvelink, G.B.M., Kempen, B., Leenaars, J.G.B., Walsh, M.G., et al., 2015. Mapping soil properties of Africa at 250 m resolution: random forests signiﬁcantly improve current predictions. PLoS One 10 (6), e0125814. http://dx.doi.org/10. 1371/journal.pone.0125814. Huang, C., Davis, L.S., Townshend, J.R.G., 2002. An assessment of support vector machines for land cover classiﬁcation. Int. J. Remote Sens. 23, 725–749. Jackson, M.L., 1973. Soil Chemical Analysis. Prentice Hall of India Pvt. Ltd., New Delhi. Jenny, H., 1941. Factors of Soil Formation, a System of Quantitative Pedology. McGrawHill, New York. Kempen, B., Brus, D.J., Heuvelink, G.B.M., Stoorvogel, J.J., 2009. Updating the 1:50,000 Dutch soil map using legacy soil data: a multinomial logistic regression approach. Geoderma 151, 311–326. Kidd, D.B., Malone, B.P., McBratney, A.B., Minasny, B., Webb, M.A., 2014. Digital mapping of a soil drainage index for irrigated enterprise suitability in Tasmania, Australia. Soil Res. 52, 107–119. Kovačevic, M., Bajat, B., Gajić, B., 2010. Soil type classiﬁcation and estimation of soil properties using support vector machines. Geoderma 154, 340–347. Kühn, J., Brenning, A., Wehrhan, M., Koszinski, S., Sommer, M., 2009. Interpretation of electrical conductivity patterns by soil properties and geological maps for precision agriculture. Precis. Agric. 10, 490–507. Lagacherie, P., McBratney, A.B., 2007. Spatial soil information systems and spatial Soil inference systems: perspectives for digital soil mapping. In: Lagacherie, P., McBratney, A.B., Voltz, M. (Eds.), Digital Soil Mapping: An Introductory Perspective. Elsevier B.V. hlm, The Netherlands, pp. 3–22. Lagacherie, P., Legros, J., Burfough, P., 1995. A soil survey procedure using the knowledge of soil pattern established on a previously mapped reference area. Geoderma 65, 283–301. Liaw, A., Wiener, M., 2002. Classiﬁcation and regression by Random Forest. R News 2,

162

Spatial prediction of major soil properties using Random Forest techniques - A case study in semi-arid tropics of South India

Spatial prediction of major soil properties using Random Forest techniques - A case study in semi-arid tropics of South India

Recommend Documents