A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape

A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape

Ecological Indicators 52 (2015) 394–403 Contents lists available at ScienceDirect Ecological Indicators journal homepage: www.elsevier.com/locate/ec...

2MB Sizes 2 Downloads 20 Views

Ecological Indicators 52 (2015) 394–403

Contents lists available at ScienceDirect

Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind

A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape Kennedy Were a,b,∗ , Dieu Tien Bui c , Øystein B. Dick a , Bal Ram Singh d a

Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 Ås, Norway Kenya Agricultural Research Institute, Kenya Soil Survey, P.O. Box 14733-00800, Nairobi, Kenya c Department of Economics and Computer Sciences, Faculty of Arts and Sciences, Telemark University College, NO-3800 Bø, Norway d Department of Environmental Sciences, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 Ås, Norway b

a r t i c l e

i n f o

Article history: Received 28 June 2014 Received in revised form 29 November 2014 Accepted 24 December 2014 Keywords: Random forests Artificial neural networks Support vector regression Soil organic carbon Digital soil mapping Eastern Mau Kenya

a b s t r a c t Soil organic carbon (SOC) is a key indicator of ecosystem health, with a great potential to affect climate change. This study aimed to develop, evaluate, and compare the performance of support vector regression (SVR), artificial neural network (ANN), and random forest (RF) models in predicting and mapping SOC stocks in the Eastern Mau Forest Reserve, Kenya. Auxiliary data, including soil sampling, climatic, topographic, and remotely-sensed data were used for model calibration. The calibrated models were applied to create prediction maps of SOC stocks that were validated using independent testing data. The results showed that the models overestimated SOC stocks. Random forest model with a mean error (ME) of −6.5 Mg C ha−1 had the highest tendency for overestimation, while SVR model with an ME of −4.4 Mg C ha−1 had the lowest tendency. Support vector regression model also had the lowest root mean squared error (RMSE) and the highest R2 values (14.9 Mg C ha−1 and 0.6, respectively); hence, it was the best method to predict SOC stocks. Artificial neural network predictions followed closely with RMSE, ME, and R2 values of 15.5, −4.7, and 0.6, respectively. The three prediction maps broadly depicted similar spatial patterns of SOC stocks, with an increasing gradient of SOC stocks from east to west. The highest stocks were on the forest-dominated western and north-western parts, while the lowest stocks were on the cropland-dominated eastern part. The most important variable for explaining the observed spatial patterns of SOC stocks was total nitrogen concentration. Based on the close performance of SVR and ANN models, we proposed that both models should be calibrated, and then the best result applied for spatial prediction of target soil properties in other contexts. © 2014 Elsevier Ltd. All rights reserved.

1. Introduction Soils sustain life on Earth by delivering various ecosystem services. For example, they are essential for producing food, fibre, fuel, and raw materials, as well as for maintaining the climatic and terrestrial systems (Chen et al., 2002). The rapid land use-land cover changes (LULCC), especially conversion of natural ecosystems to agro-ecosystems, is straining the world’s soils. Agricultural land uses modify the soil’s physical, chemical, and biological properties leading to soil degradation, particularly depletion of soil organic matter (SOM). This in turn has implications for global climate,

∗ Corresponding author at: Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 Ås, Norway. Tel.: +47 966 563 62; fax: +47 649 654 01. E-mail address: [email protected] (K. Were). http://dx.doi.org/10.1016/j.ecolind.2014.12.028 1470-160X/© 2014 Elsevier Ltd. All rights reserved.

food security, and sustainable development. Soil organic carbon (SOC), which is the major constituent of SOM, determines the soil’s physical, chemical, and biological properties. It maintains soil quality by supplying nutrients, enhancing cation exchange capacity, supporting biodiversity, and improving aggregation and waterholding capacity (Bationo et al., 2007). Depletion of SOM occurs because of frequent tillage and other disturbances, which disintegrate the aggregates and alter aeration, moisture, and temperature conditions in the soil. This accelerates microbial decomposition and oxidation of SOM to CO2 , which increases the atmospheric concentrations of CO2 and global warming (Murty et al., 2002; Batlle-Aguilar et al., 2011; Wiesmeier et al., 2012). The threat of global warming is disturbing because the world’s soils contain about 1500 Pg C (1 Pg = 1015 g) to 1 m-depth, which is twice the amount of C in the atmospheric pool (750 Pg C), and almost three times in the biotic pool (610 Pg C) (Lal, 2004; Smith, 2004, 2008). Thus, even slight changes in SOC pool can significantly impact on

K. Were et al. / Ecological Indicators 52 (2015) 394–403

the global C cycle, climate, and soil properties (Powlson et al., 2011). In the face of climate change and food insecurity, scientists have focused their attention on LULCC and SOC storage research. There is consensus that sustainable use of soil resources is one of the ways to manage climate change and food insecurity issues. This requires a deeper understanding of the spatial distribution of SOC storage to guide policy formulation. Consequently, many tools have been developed and tested to help scientists analyze soil processes, and derive spatiallycontinuous information on soil properties at different scales. The open accessibility to most geographic information systems (GIS) and remotely-sensed data and technologies has boosted these efforts. This forms the basis of digital soil mapping (DSM). In DSM, the variability of a target soil property is explained by its relationships with soil-forming factors, such as topography, climate, land use, vegetation, and soil type. This is underpinned by Jenny’s (1941) seminal work, which considered soil development as a function of climate (c), organisms (o), relief (r), parent material (p), and time (a). This function was later expanded by McBratney et al. (2003) to include soil properties (s) and space (n) under the famous SCORPAN framework. Numerous statistical techniques have been applied in digital mapping of SOC stocks, including multiple linear regression (Meersmans et al., 2008), partial least square regression (Amare et al., 2013), generalized linear models (Yang et al., 2008), linear mixed models (Doetterl et al., 2013; Karunaratne et al., 2014), geographically-weighted regression (Mishra et al., 2010; Kumar et al., 2013), kriging (Cambule et al., 2014), and regression-kriging (Hengl et al., 2004, 2007; Kumar et al., 2012). Recently, a few studies also applied new methods from the machine learning field, such as artificial neural networks (Malone et al., 2009; Jaber and Al-Qinna, 2011; Li et al., 2013), support vector machines (Rossel and Behrens, 2010), boosted regression trees (Martin et al., 2011), and random forests (Grimm et al., 2008; Wiesmeier et al., 2011; Vågen and Winowiecki, 2013; Vågen et al., 2013) to map SOC stocks. Machine learning methods overcome the shortcomings of parametric and non-parametric statistical methods, such as spatial autocorrelation, non-linearity, and overfitting (Drake et al., 2006). This improves the prediction accuracy of spatial models. Despite the merits, application of machine learning techniques in DSM is still rare (Vågen et al., 2013). Therefore, in this study, we aimed to develop, evaluate, and compare the performance of random forests (RF), support vector machines for regression (SVR), and artificial neural networks (ANN) models in predicting and mapping the variability of SOC stocks in the Eastern Mau Forest Reserve, Kenya. The distinction between this study and the previous ones is that SVR with the recently proposed sequential minimal optimization (SMO) algorithm (Platt, 1999; Smola and Schölkopf, 2004) was implemented. This relatively new algorithm has been valuable for diverse environmental applications, but seldom used to model and map SOC stocks and other soil properties.

2. Materials and methods 2.1. Study area The study was conducted in the Eastern Mau Forest Reserve (∼650 km2 ), which is bounded by the latitudes 0◦ 15 –0◦ 40 S and the longitudes 35◦ 40 –36◦ 10 E (Fig. 1). It is part of the largest closed-canopy Afromontane forest in Eastern Africa, and provides essential ecosystem services. This is despite the deforestation and degradation experienced since the mid-1990s because of illegal logging, charcoal burning, and encroachments, as well as excision of ∼61,023 ha for human settlement (Government of Kenya, 2009; UNEP, 2009). The climate is cool and humid thanks to the high

395

altitudes ranging between 2210 and 3070 m above sea level. The mean annual rainfall varies between 935 and 1287 mm, while the mean annual temperatures range from 9.8 to 17.5 ◦ C (Jaetzold et al., 2010). The Njoro, Naishi, and Larmudiac Rivers drain the eastern slopes into Lake Nakuru, while the Nessuiet flow northwards into Lake Bogoria, and the Rongai River into Lake Baringo. The physiography and lithology consists of major scarps and uplands covered with pyroclastic rocks (i.e., pumice tuffs) of tertiary-quaternary volcanic age. The rocks decompose into deep to very deep, dark reddish brown clayey, friable and smeary soil aggregates with humic topsoils: the resultant soils are classified as Mollic Andosols (McCall, 1967; Jaetzold et al., 2010). The major land uses are forestry, agriculture, and grazing. The red stinkwood (Prunus Africana), bamboo (Arundinaria alpina), red cedar (Juniperus procera), African wild olive (Olea europaea ssp. Africana), East African olive (Olea capensis ssp. hochstetteri), broad-leaved yellowwood (Podocarpus latifolius), brittlewood (Nuxia congesta), clematis (Clematis hirsuta), schefflera (Schefflera volkensii), and forest dombeya (Dombeya torrida) are dominant in the indigenous forests, and pine (Pinus patula) and cypress (Cupressus lusitanica) in the plantation forests. The major crops grown are maize (Zea mays), beans (Phaseolus vulgaris), wheat (Triticum aestivum), and potatoes (Solanum tuberosum). 2.2. Soil data 2.2.1. Sampling design and soil sampling We conducted field campaign from June to August 2012. Before that, sampling points were generated randomly in a GIS with agroecological zones as the stratifying factor, and then a map showing the distribution of the points was produced for field use. In the field, plots measuring 30 × 30 m were laid out at each sampling point and soil samples collected at 0–15 cm and 15–30 cm depths from the centres and corners of these plots using an auger. Samples taken from similar depths in a plot were properly mixed and bulked into one composite sample weighing about 500 g. For bulk density (BD) determination, a core ring sampler (5 cm in diameter, 5 cm in height) was used to collect undisturbed samples from each depth at the centre of each plot. Three hundred and twenty (320) soil samples were collected from 160 sampling plots for chemical and physical analysis, and a similar number for BD determination. Supplementary soil data that had been collected similarly from 60 other sampling plots for LULCC impact assessment (Were et al., 2015) were also used. Thus, soil data from 220 sampling plots were used for spatial modelling. 2.2.2. Determination of soil properties The collected soil samples were air-dried, ground, and passed through a 2 mm mesh at the National Agricultural Research Laboratories. The Walkley-Black wet oxidation method (Nelson and Sommers, 1982) and Kjeldahl digestion method (Bremner and Mulvaney, 1982) were used to determine SOC concentrations and TN concentrations, respectively. The hydrometer method (Day, 1965) was used to analyze particle size distribution, the core method (Blake, 1965) to determine BD, and the Mehlich method (Okalebo et al., 2002) to estimate phosphorous (P) content. A flame-photometer was used to measure potassium (K) content, an atomic absorption spectrophotometer to measure calcium (Ca) and magnesium (Mg) contents, and a pH metre to measure pH (1:2.5 soil-water) (Okalebo et al., 2002). 2.2.3. Estimation of SOC stocks The SOC stocks, i.e., mass of C per unit area for a given depth, were calculated according to Eq. (1) (Aynekulu et al., 2011): SOCst =

SOC × BD × D × 100 100

(1)

396

K. Were et al. / Ecological Indicators 52 (2015) 394–403

Fig. 1. Geographical location of the study area.

where SOCst is the soil organic carbon stock (Mg C ha−1 ), SOC is the soil organic carbon concentration (%, which is then converted to g C g−1 soil), BD is the bulk density (g cm−3 ), D is the depth (cm), and 100 is the multiplication factor to convert the SOC per unit area from g C cm2 to Mg C ha−1 . Coarse particles were negligible due to the softness of the volcanic rocks; hence, Eq. (1) does not account for them. The SOC stocks in the surface (0–15 cm) and subsurface soils (15–30 cm) were summed up to obtain the total stocks to 30 cm depth.

2.3. Environmental data Based on the SCORPAN conceptual model of soil development (McBratney et al., 2003) and review of literature (e.g., Liu et al., 2006; Vasques et al., 2010; Kumar and Lal, 2011; Kumar et al., 2012; Li et al., 2013; Shelukindo et al., 2014), we selected a priori nineteen (19) environmental variables (predictors) with the potential to explain the spatial variability of SOC stocks, and retrieved them from existing spatial databases. Climate data (temperature and rainfall) were obtained from www.worldclim.org, land cover data from Were et al. (2013), elevation data (digital elevation model; DEM) from http://srtm.csi.cgiar.org/, and Landsat 8 Operational Land Imager (OLI) data from http://earthexplorer.usgs.gov/. Four terrain parameters, including slope, curvature, aspect, and topographic wetness index (TWI) were extracted from the DEM (Wilson and Gallant, 2000). Normalized Difference Vegetation Index (NDVI) was derived from OLI band 4 (red) and 5 (near infra-red) after conversion of the digital numbers to top-of-atmosphere reflectance.

The first component (PC1) from principal component analysis of OLI band 2, 3, 4, 5, 6, and 7 was also included. The data, all of which were in raster format, were transformed to UTM WGS84 Zone 36S and subsets made. The climatic grids were resampled from 1 km to 30 m using the nearest neighbour method, to match them with the other raster grids. As in Kumar and Lal (2011), soil data from the laboratory, including sand content, silt content, clay content, Ca, Mg, P, K, TN, and pH were also integrated into the GIS database both as points in vector format and as raster grids after interpolation by ordinary kriging. Ordinary kriging has been widely used to optimize the prediction of soil properties at unsampled locations in pedological studies (Chaplot et al., 2010; Pachomphon et al., 2010; Tesfahunegn et al., 2011; Marchetti et al., 2012; Elbasiouny et al., 2014). Finally, the attribute values of all the other raster grids (e.g., slope, rainfall, and temperature) were extracted to the points, which were the main input for spatial modelling.

3. Spatial modelling and prediction 3.1. Exploratory data analysis We first estimated the descriptive statistics of the target variable, followed by pairwise Pearson’s product-moment correlation analysis to detect collinearity between the predictors, as well as their correlation with the target variable. Predictors that were highly correlated (r ≥ 0.8), and had high variance inflation factors (VIFs ≥ 10) in regression analysis were excluded from modelling.

K. Were et al. / Ecological Indicators 52 (2015) 394–403

3.2. Model training

that minimized the empirical risk (Eq. (3)) (Pozdnoukhov, 2005; Ruß and Kruse, 2010):

After exploring the data (n = 220), we randomly split it into training (n = 176) and testing (n = 44) sets. The former was used to model the relationships between the site-specific SOC stocks and the predictors, and the latter to evaluate the predictive performance of the models developed. For modelling purpose, RF, SVR, and ANN algorithms were used. 3.2.1. Random forests The algorithm is an extension of bagging (i.e., bootstrap aggregation) and a competitor to boosting (Cutler et al., 2012). It uses either categorical (i.e., classification) or continuous (i.e., regression) response variables, and either categorical or continuous predictor variables. As described by Cutler et al. (2012), the algorithm worked by growing an ensemble of regression trees based on binary recursive partitioning, where the predictor space at each tree node was partitioned based on binary splits on a subset of randomly selected predictors. At each binary split, the response data were grouped into two descendant nodes to maximize homogeneity, and the best binary split was selected. The response data for each tree were obtained through bootstrap sampling (with replacement) of original observations in the training set. Each descendant node of the selected split was treated similarly as the original (root) node, and the process continued recursively until a stopping criterion was met at a terminal node. The trees were grown to their maximum sizes with the results being combined by unweighted averaging to make predictions. In RF modelling, the training parameters that needed specification were: (i) the number of trees to grow in the forest (ntree ), (ii) the number of randomly selected predictor variables at each node (mtry ), and (iii) the minimal number of observations at the terminal nodes of the trees (nodesize). These were set to 1000, 12, and 5, respectively. The default of ntree was 500, but it has been observed that more stable results for estimating variable importance are achieved with a higher number (Grimm et al., 2008). The training data that were left out of the bootstrap samples (out-of-bag (OOB) samples) were used to estimate prediction error and variable importance. In error estimation, the OOB samples were predicted by the respective trees and by aggregating the predictions, the mean square error (MSEOBB ) was calculated using Eq. (2) (Cutler et al., 2012): 1 2 (yi − yˆ iOOB ) N

(2)

i=1

where yˆ iOOB is the OOB prediction for observation yi . Regarding variable importance, the values of a specific predictor variable were randomly permutated in the OOB data of a tree while the values of other predictors remain fixed. The modified OOB data were predicted, and the differences between the MSEs obtained from the permutated and original OOB data gave a measure of variable importance. 3.2.2. Support vector machines for regression Support vector machines (SVM) use kernel functions to project the data onto a new hyperspace where complex non-linear patterns can be simply represented (Gunn, 1998; Williams, 2011). In the new hyperspace, SVM aims to construct an optimal hyperplane that separates classes and creates the widest margin between their data (i.e., classification), or that fits data and predicts (i.e., SVR) with minimal empirical risk and complexity of the modelling function. In this study, SVR was used. Given the training data {(xi , yi ), i = 1, 2, . . . n}, where x is a vector of the input predictors and y is the values of SOC stocks, the SVR developed an optimal function f(x)

1 Lε (y − f (x)) N N

Remp =

(3)

i=1

where Lε is the loss function, which penalizes the model in case of differences between the training data and model predictions (i.e., errors). An ε-insensitive loss function was used (Eq. (4)) where smaller errors than the specified non-negative constant ε were not penalized (Gunn, 1998; Pozdnoukhov, 2005).



Lε (y − f (x)) =





for y − f (x) < ε

0

  y − f (x) − ε otherwise

(4)

Prior to developing the SVR function f(x), the quadratic programming optimization problem shown in Eq. (5) was solved using the SMO algorithm (Platt, 1999; Smola and Schölkopf, 2004). 1  (˛i − ˛∗i )(˛j − ˛∗j )K(xi , xj ) 2 N

N

max ˛, ˛∗ −

i=1 j=1 N 

−ε

(˛i + ˛∗i ) +

i=1

N 

yi (˛i + ˛∗i )

(5)

i=1

with the constraints given in Eq. (6):

⎧ N ⎪ ⎨ (˛ + ˛∗ ) = 0 i i i=1 ⎪ ⎩ 0≤

˛i , ˛∗j

≤C

(6)

for i = 1, . . .N

˛j and ˛∗j

where are the weights (Lagrange multipliers), which determined the influence of each data point on the model (support vectors were the data with non-zero weights), K(xi , xj ) is the kernel function, and C is the regularization parameter, which determined the trade-off between the training errors and model complexity (i.e., flatness of f(x)). The SMO algorithm decomposed the optimization problem into sub-problems, which were solved step by step (Platt, 1999). At each step, the algorithm selected two Lagrange multipliers, found their optimal values analytically, and updated the SVR function (Eq. (7)) to reflect the new values. The process was repeated until the Lagrange multipliers converged.

N

MSEOOB =

397

f (x) =

N 

(˛i − ˛∗i )K(xi , xj ) + b

(7)

i=1

where b is a constant threshold. The Gaussian radial basis kernel function of the form in Eq. (8) was used (Tien Bui et al., 2012). K(xi , kj ) = exp



− xi ,xj

2 /2 2

(8)

where  is the bandwidth parameter. The best parameters C and  obtained using the training data were 5 and 0.1, respectively. Determination of these parameters was carried out using the grid search method (Zhuang and Dai, 2006; Kavzoglu and Colkesen, 2009). 3.2.3. Artificial neural networks Artificial neural network algorithm simulates human learning processes through establishment and reinforcement of linkages between the input and output data. The linkages then connect input and output data in the absence of training data (Campbell, 2002). Numerous ANN algorithms have been proposed, such as Radial Basis Function (Vojislav, 2001), Elman recurrent (Rakkiyappan and Balasubramaniam, 2008), and Hopfield neural networks (Nguyen et al., 2006); however, Multi-layer perceptron neural networks (MLP Neural Nets) with back-propagation algorithm may be the

398

K. Were et al. / Ecological Indicators 52 (2015) 394–403

3.3. Model testing and comparison The testing data (n = 44) was used to validate the RF, SVR, and ANN models, and derive statistical measures to compare their performance. The root mean squared error (RMSE) and mean error (ME) were computed from the differences between the predicted SOC stock values and measured values (Eqs. (9) and (10)), to determine the precision and bias of the predictions, respectively.

RMSE = Fig. 2. Architecture of the MLP neural network for SOC stocks modelling.

ME =

Fig. 3. Training and validation errors associated with a given number of neurons in the hidden layer.

1

n

n  (yi − yˆ i ) 1

most popular, and was selected for this study. The architecture of the MLP Neural Nets consists of input, hidden, and output layers (Fig. 2), each with a set of interconnected nodes (neurons) working in parallel to transform the input data into output values (Lee and Evangelista, 2006; Conforti et al., 2014). In this context, the neurons in the input layer were equal to the 16 predictors that were statistically selected from the original list, while the number of neurons in the hidden layer, which carried weights representing the linkages between the predictors and SOC stocks, was determined using both the training and validation data. That is, values ranging from 1 to 20 were used to build different models, the training and prediction errors of which were calculated (Fig. 3). The network having 2 hidden neurons and the lowest error was selected, with the Unipolar sigmoid as the transfer function. The output layer comprised a single neuron that represented the output values of SOC stocks. The training phase was initiated by assigning arbitrary connection weights. Then the algorithm fed forward the input layer to the hidden layer. The neurons in the hidden layer multiplied the inputs by their associated weights, summed up the products, and processed the weighted sums using the transfer function, the results of which were propagated to the output layer (Lee and Evangelista, 2006). The output values were compared with the expected values in the training data, and the errors computed. Through iterative propagation of errors back to the network, the connection weights were automatically adjusted until the target minimum error was attained, and the network was able to assign correct values of SOC stocks to the training data, as well as from new input data in the absence of training data. To achieve this, different tests were conducted and the best learning rate, momentum, and training time (iterations) obtained were 0.01, 0.18, and 500, respectively. Finally, the trained network was used as a feed-forward structure to produce predictions for the entire spatially continuous data.

n (y − yˆ )2 i i

n

(9)

(10)

where yˆ i is the estimated value, yi is the measured value and n is the number of measured values in the testing data. The ME should be close to zero, while RMSE should be as small as possible. The coefficient of determination (R2 ) was also calculated.

3.4. Model application We applied the output SVR, ANN, and RF models to create prediction surfaces that showed the spatial distribution of SOC stocks. Data preparations, analyses, and geovisualization were carried out using ArcGIS® 10.1, ERDAS IMAGINE® 2013, Microsoft Excel® 2010, Weka 3.6, and R 3.0.1 (R Core Team, 2013) with its add-in packages: “sp”, “maptools”, “rgdal”, “randomForest”, and “raster”.

4. Results and discussion 4.1. Exploratory data analysis Descriptive statistics of SOC stocks along with other soil properties are presented in Table 1. SOC stocks ranged from 42.0 to 193.4 Mg C ha−1 , with a mean and median of 102.7 and 103.2 Mg C ha−1 , respectively. The standard deviation and coefficient of variation were 24.6 Mg C ha−1 and 23.9%, respectively, denoting moderate variability of SOC stocks. This can be attributed to environmental factors, such as climate, land cover, and topography, as well as measurement errors. The BD varied between 0.5 and 1.1 g cm−3 , and the pH between 4.8 and 7.0. The macronutrients, including N, P, and K ranged from 0.2 to 0.9%, 8 to 62.5 me/100 g, and 0.3 to 2.1 me/100 g, respectively, while the soil separates varied from 20 to 53% for sand, 21 to 49% for silt, and 10 to 55% for clay. For all soil properties, the mean and median values were similar indicating normality of data distribution. Some skewness was evident, although quite low and mostly positive. This was probably the influence of extreme data values. Similarly, kurtosis values were low, which implied less peaked values in data distribution. According to Table 2, Pearson’s coefficients (r) ranged between −0.04 (sand content) and 0.74 (TN content) for the relationships between SOC stocks and the predictors. Among the predictors, the correlation between temperature and elevation (r = −0.99), elevation and TN (r = 0.81), temperature and band 11 (r = 0.81), elevation and band 11 (r = −0.83), band 11 and PC1 (r = 0.82), land cover and elevation (r = −0.83), land cover and temperature (r = 0.83), and land cover and band 11 (r = 0.88) exceeded the threshold correlation value of 0.8. Therefore, elevation, temperature, and band 11 were excluded from model building: their VIFs in regression analysis were also greater than 10 (not shown here). This reduced the number of predictors from 19 to 16.

K. Were et al. / Ecological Indicators 52 (2015) 394–403

399

Table 1 Descriptive statistics of SOC stocks and some soil properties in 0–30 cm depth. Soil properties 0–30 cm

Mean Median SD CV Kurtosis Skewness Range Minimum Maximum

SOC (Mg ha−1 )

C (%)

TN (%)

P (ppm)

K (me %)

Ca (me %)

Mg (me %)

BD (g cm−3 )

pH

Clay (%)

Silt (%)

Sand (%)

102.65 103.15 24.55 23.91 0.39 0.97 151.43 41.99 193.42

4.22 4.02 1.26 29.84 0.79 0.81 6.86 1.86 8.72

0.42 0.41 0.13 29.64 0.51 0.71 0.67 0.18 0.85

31.86 30.25 12.56 39.43 −0.75 0.31 54.50 8.00 62.50

1.21 1.24 0.43 35.23 −0.64 −0.08 1.86 0.28 2.14

4.04 4.10 1.44 35.58 8.27 1.67 11.00 1.40 12.40

4.97 5.29 1.42 28.59 −0.40 −0.43 7.16 1.38 8.54

0.84 0.84 0.11 13.18 0.05 −0.20 0.60 0.50 1.10

5.84 5.81 0.50 8.64 −0.70 0.18 2.19 4.83 7.02

28.13 27.00 7.28 25.87 0.85 0.79 45.00 10.00 55.00

35.56 36.00 5.93 16.68 −0.40 −0.03 28.00 21.00 49.00

36.25 36.00 5.89 16.24 0.31 0.00 33.00 20.00 53.00

4.2. Relative importance of the predictor variables The increases in RMSEs as the predictors were excluded one by one from the SVR, ANN, and RF models can be seen in Fig. 4. Based on the magnitude of increase in RMSE, all models showed TN concentration as the most important variable for explaining the spatial variations of SOC stocks. This was unsurprising for statistical and theoretical reasons. Statistically, the correlation found between SOC stocks and TN concentration was significantly high (Table 2), which made it a good predictor for SOC stocks. The relationship between the two is well defined and has also been reported by Chaplot et al. (2010), Phachomphon et al. (2010), and Elbasiouny et al. (2014). Theoretically, the high correlation can be ascribed to the tight coupling of carbon and nitrogen cycles. For instance, nitrogen supply increases the net uptake of carbon by stimulating biochemical determinants, including the photosynthetic enzymes (Lorenz, 2013), which in turn leads to higher input of carbon and nitrogen to the SOC pool. In addition, mineralization of organic matter not only leads to the breakdown of carbon substrates and emission of CO2 , but also to the release of plant-available inorganic nitrogen (Butterbach-Bahl and Dannenmann, 2012). Other nitrogen transformations (e.g., nitrification and denitrification) also use the energy supplied by carbon (Batlle-Aguilar et al., 2011). This finding differs from previous studies, which for instance reported that land use (Wiesmeier et al., 2011) and topographic attributes (Grimm et al., 2008) were the most important predictors of SOC stocks. Most of the past studies, however, seldom included TN concentration in distributed modelling of SOC stocks mainly because accurate and spatially-exhaustive information on it (and other soil parameters, e.g., pH, CEC, soil moisture, etc.) was lacking. PC1 TWI Rainfall Land cover Predictor excluded from the model

Aspect Slope Curvature

SVR

NDVI

ANN

Sand

RF

Silt Mg Ca K P pH TN 13

14

15

16

17

18

19

20

21

Increase in RMSE Fig. 4. Variable importance shown by increase in the RMSEs of the SVR, ANN, and RF models after excluding a predictor.

The contributions of the remaining predictors to the models were more or less the same; that is, their exclusion marginally increased the RMSEs. The importance of the predictors was judged by the decrease in prediction accuracy after excluding each from the model(s) because SVR, ANN, and RF algorithms did not reveal the functional relationships between the target and predictor variables. This limited their interpretability, and is the reason they are often referred to as “black box” approaches. Therefore, visualization of the prediction surfaces also helped to assess the soil–environment relationships that explained the observed spatial patterns of SOC stocks. 4.3. Spatial prediction and mapping of SOC stocks The spatial patterns of SOC stocks predicted by SVR, ANN, and RF models are displayed in Fig. 5. Broadly, the three prediction surfaces were similar in terms of the spatial patterns of SOC stocks. There was an increasing gradient of SOC stocks from east to west, with the highest stocks on the western and north-western parts. This clearly reflected the land cover-effect because these were areas covered by the Logoman, Nessuiet, Kiptunga, and Baraget forests. Besides land cover, the highly fertile Andosols, high rainfall, low temperatures, and high altitudes, which favour high net primary productivity and low SOC turnover, also explained the high carbon storage in these parts. The lowest stocks, on the other hand, were distributed on the eastern part, including Teret, Nessuiet, Kapkembu, Tuiyotich, and Sururu locations. These were areas where plantation and indigenous forests had been converted to croplands since the mid-1990s. Thus, the low SOC stocks was due to biomass removal after harvesting, erosive processes, and frequent tillage, which breaks up the soil aggregates, alters aeration, and accelerates microbial decomposition and oxidation of SOM to CO2 (Murty et al., 2002; Smith, 2008; Eclesia et al., 2012; Wiesmeier et al., 2012). The northern and southeastern parts exhibited moderate to high SOC stocks. These also were cropland-dominated areas. Higher estimates of SOC stocks in forests and lower estimates in croplands coincide with the findings of Tesfahunegn et al. (2011) in northern Ethiopia. Table 3 shows the general statistics of the predictions by the SVR, ANN, and RF models. The predicted minimum and mean values approximated the measured ones, whereas the maximum and standard deviation values slightly differed from the measured ones (cf. Table 1). The training data point with the maximum value was probably treated as an outlier by the algorithms resulting in the different maximum values. Targeted climate change mitigation and sustainable land management strategies in the area will require an understanding of the spatial distribution of SOC stocks; hence, the output SOC stock map is important. The map can guide the identification of areas for differential allocation of resources for carbon sequestration and fertility management. For instance, areas with low SOC stocks, but with good soil fertility potential may be targeted for conservation agriculture and agro-forestry practices.

400

K. Were et al. / Ecological Indicators 52 (2015) 394–403

Fig. 5. Spatially distributed maps of SOC stocks.

4.4. Performance of the spatial models Table 4 presents the prediction error indices derived from independent validation of the SOC stock maps using the testing dataset. The negative ME signs indicate that the models overestimated SOC stocks. In particular, the RF model with an ME of −6.5 Mg C ha−1 had the highest tendency for overestimation, while SVR model with an ME of −4.4 Mg C ha−1 showed the lowest tendency for overestimation. The RMSEs from the testing data varied from 14.9 to

17.6 Mg C ha−1 , which compared with those of the fitted models (i.e., 14.5, 15.4, and 18.3 for SVR, ANN, and RF models, respectively). This implies that the models predicted new data as precise as they fitted the training ones. Support vector regression model had the lowest RMSE (14.9 Mg C ha−1 ) and ME values, as well as the highest R2 (0.6) value; hence, it was the best method to predict SOC stocks at the unvisited locations in this context. However, ANN prediction followed closely with RMSE, ME, and R2 values of 15.5, −4.7, and 0.6, respectively. This indicated a modest relative improvement of

1.00 1.00 0.81 1.00 −0.72 1.00 0.71 −0.99 0.82 −0.83 1.00 0.07 −0.19 0.18 0.11 1.00 −0.04 −0.50 0.65 −0.62 −0.57 1.00 −0.46 0.11 0.74 −0.83 0.83 0.88

Min.

Max.

Mean

SD

1. SVR 2. ANN 3. RF

40.05 44.45 58.38

169.05 180.99 162.94

103.75 105.48 105.39

15.16 16.20 14.61

Model

ME

RMSEcal

RMSEval

R2

1. SVR 2. ANN 3. RF

−4.42 −4.71 −6.51

14.45 15.39 18.27

14.88 15.46 17.57

0.64 0.61 0.53

prediction accuracy by SVR. Thus, in other contexts, both SVR and ANN models should be calibrated, and the best result applied for spatial prediction of target soil properties. The RF model results compare with other studies in the region (Vågen and Winowiecki, 2013; Vågen et al., 2013), although the other studies reported slightly higher R2 values than this one. This may be because of the different extents of the study areas, topography, sampling densities, or quantity and quality of the auxiliary data used. In addition, these comparative results are consistent with those of Rossel and Behrens (2010) who found that SVR outperformed RF and boosted trees in estimating SOC, clay content, and pH in Australia. Furthermore, the RMSEs of all models were lower than the standard deviations of the measured values (cf. Table 1), which suggested that application of auxiliary spatial data produced better predictions than what was expected using the measured values alone. Generally, the RMSE values obtained in this study reflected the measurement, laboratory, statistical, and random errors. For example, the soil properties used as predictors were interpolated by ordinary kriging. Thus, the associated interpolation errors were propagated to the subsequent SOC stock estimations. Retrieval of auxiliary spatial data from different sources also meant different data quality. Poor coverage of samples in the south-eastern most and middle parts dominated by thick impenetrable bamboo forests also influenced prediction accuracy in these areas. Lastly, some soil-forming factors (e.g., parent material) were not included in the models for lack of suitable data. Thus, incorporating the missing environmental data, as well as the stochastic component by analysing the spatial structure of residuals with geostatistical techniques (i.e., kriging) (Hengl et al., 2004, 2007) are some of the ways to minimize prediction errors in future.

1.00 −0.02 −0.21 −0.00 −0.23 −0.03 0.17 −0.19 −0.12 1.00 −0.14 0.04 −0.01 0.08 −0.12 −0.12 0.12 −0.10 −0.03 1.00 −0.09 0.18 0.12 −0.52 0.16 −0.05 −0.21 0.44 −0.44 −0.57 1.00 −0.26 −0.04 −0.04 0.05 0.39 0.17 0.12 0.26 −0.30 0.33 0.30 1.00 0.00 −0.27 0.05 −0.17 −0.08 0.43 −0.37 −0.01 0.26 −0.43 0.41 0.47 1.00 −0.40 0.36 −0.06 0.14 −0.12 −0.00 0.08 0.41 0.04 −0.06 0.01 0.03 −0.11 1.00 0.41 −0.25 0.10 −0.22 −0.07 −0.05 −0.12 0.36 −0.26 0.02 0.46 −0.42 0.42 0.39 1.00 0.71 0.66 −0.40 0.28 −0.12 −0.05 −0.08 −0.03 0.33 0.11 0.10 0.32 −0.32 0.34 0.21

Model

Table 4 Prediction error indices of the SVR, ANN, and RF models.

5. Conclusion

Highly correlated predictor variables are in bold.

1.00 0.69 0.45 0.57 −0.17 0.38 −0.28 0.02 −0.16 −0.08 0.49 0.17 0.01 0.33 −0.38 0.41 0.27 1.00 0.62 0.63 0.56 0.40 0.05 0.37 −0.30 0.01 −0.12 −0.14 0.73 −0.43 0.06 0.64 −0.75 0.74 0.64 1.00 −0.44 0.07 0.07 −0.11 0.38 −0.67 −0.09 0.36 0.09 0.08 0.17 −0.68 0.70 −0.17 −0.64 0.81 −0.78 −0.79 1.00 0.72 −0.28 0.06 0.03 −0.15 0.20 −0.47 −0.04 0.27 0.06 0.04 0.15 −0.48 0.43 −0.10 −0.46 0.52 −0.51 −0.56 1. SOC stock 2. TN content 3. pH 4. Phosphorous 5. Potassium 6. Calcium 7. Magnesium 8. Silt 9. Sand 10. NDVI 11. Curvature 12. Slope 13. Aspect 14. Land cover 15. Rainfall 16. TWI 17. PC1 18. Elevation 19. Temperature 20. Band 11

401

Table 3 Descriptive statistics of the SOC stocks estimated by the SVR, ANN, and RF models.

1.00 −0.16 0.18 −0.01 −0.25 0.17 −0.17 −0.25

15 7 6 5 4 3 2 1 Variables

Table 2 Correlation matrix showing the relationships between the variables used in spatial modelling.

8

9

10

11

12

13

14

16

17

18

19

20

K. Were et al. / Ecological Indicators 52 (2015) 394–403

The results have demonstrated that SVR with SMO algorithm is the best for spatially predicting and mapping the patterns of SOC stocks in the Eastern Mau Forest Reserve, Kenya. However, due to the close performance of SVR and ANN models, we propose that both models should be calibrated, and then the best result applied for spatial prediction of target soil variables in other geographical settings. Data quality cannot be overlooked in the process. The results have also shown that TN is the most important variable explaining the observed variability of SOC stocks in the area, and that contributions of the other environmental factors are only marginal. Overall, the performance of the models in this study will inform the selection of machine learning techniques for spatial prediction of SOC stocks plus other soil functional properties in other environments, while the map generated will be instrumental for formulating spatially-targeted climate change mitigation and sustainable land management strategies. In future, model performance will be improved by incorporating other important environmental

402

K. Were et al. / Ecological Indicators 52 (2015) 394–403

data (e.g., parent material), as well as the stochastic component of SOC stocks. Acknowledgements We thank the Research Council of Norway for funding this work through the Norwegian University of Life Sciences. We also thank Mr. P. Owenga for technical support, Mr. E. Thairu for driving skilfully in extreme weather and terrain conditions, and the three anonymous reviewers for their constructive comments. References Amare, T., Hergarten, C., Hurni, H., Wolfgramm, B., Yitaferu, B., Selassie, Y.G., 2013. Prediction of soil organic carbon for Ethiopian highlands using soil spectroscopy. ISRN Soil Sci. 720589, 11, http://dx.doi.org/10.1155/2013/720589. Aynekulu, E., Vågen, T.-G., Shepherd, K., Winowiecki, L., 2011. A protocol for measurement and monitoring soil carbon stocks in agricultural landscapes. Version 1. 1. World Agroforestry Centre, Nairobi. Butterbach-Bahl, L., Dannenmann, M., 2012. Soil carbon and nitrogen interactions and biosphere-atmosphere exchange of nitrous oxide and methane. In: Lal, R., Lorenz, K., Hüttl, R.F., Schneider, B.U., von Braun, J. (Eds.), Recarbonization of the Biosphere: Ecosystems and the Global Carbon Cycle. Springer Science+Business Media, pp. 429–442. Bationo, A., Kihara, J., Vanlauwe, B., Waswa, B., Kimetu, J., 2007. Soil organic carbon dynamics, functions and management in West African agro-ecosystems. Agric. Syst. 94, 13–25. Batlle-Aguilar, J., Brovelli, A., Porporato, A., Barry, D.A., 2011. Modelling soil carbon and nitrogen cycles during land use change – a review. Agron. Sustain Dev. 31, 251–274. Blake, G.R., 1965. Bulk density. In: Black, C.A. (Ed.), Methods of Soil Analysis, Part 1. Physical and Mineralogical Properties, including Statistics of Measurement and Sampling. American Society of Agronomy, Inc, Madison, Wisconsin, USA. Bremner, J.M., Mulvaney, C.S., 1982. Nitrogen – total. In: Page, A.L. (Ed.), Methods of soil analysis, Part 2. Chemical and microbiological properties. , 2nd edition. American Society of Agronomy, Inc, Madison, Wisconsin, USA. Cambule, A.H., Rossiter, D.G., Stoorvogel, J.J., Smaling, E.M.A., 2014. Soil organic carbon stocks in the Limpopo National Park, Mozambique: amount, spatial distribution and uncertainty. Geoderma 213, 46–56. Campbell, J.B., 2002. Introduction to Remote Sensing. Taylor & Francis, London. Chaplot, V., Bouahom, B., Valentin, C., 2010. Soil organic carbon stocks in Laos: spatial variations and controlling factors. Global Change Biol. 16, 1380–1393. Chen, J., Chen, J., Tan, M., Gong, Z., 2002. Soil degradation: a global problem endangering sustainable development. J. Geograph. Sci. 12 (2), 243–252. Conforti, M., Pascale, S., Robustelli, G., Sdao, F., 2014. Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo River catchment (northern Calabria, Italy). Catena 113, 236–250. Cutler, A., Cutler, D.R., Stevens, J.R., 2012. In: Zhang, C., Ma, Y. (Eds.), Ensemble Machine Learning: Methods and Applications. Springer Science+Business Media, LLC. Day, P.R., 1965. Particle fractionation and particle size analysis. In: Black, C.A. (Ed.), Methods of Soil Analysis, Part 1. Physical and Mineralogical Properties, including Statistics of Measurement and Sampling. American Society of Agronomy, Inc, Madison, Wisconsin, USA. Doetterl, S., Stevens, A., van Oost, K., Quine, T.A., van Wesemael, B., 2013. Spatially explicit regional scale prediction of soil organic carbon stocks in cropland using environmental variables and mixed model approaches. Geoderma 204-205, 31–42. Drake, J.M., Randin, C., Guisan, A., 2006. Modelling ecological niches with support vector machines. J. Appl. Ecol. 43, 424–432. ˜ Eclesia, R.P., Jobbagy, E.G., Jackson, R.B., Biganzoli, F., Pineiro, G., 2012. Shifts in soil organic carbon for plantation and pasture establishment in native forests and grasslands of South America. Global Change Biol. 18, 3237–3251. Elbasiouny, H., Abowaly, M., Abu Alkheir, A., Gad, A., 2014. Spatial variation of soil carbon and nitrogen pools by using ordinary kriging method in an area of north Nile delta, Egypt. Catena 113, 70–78. Government of Kenya, 2009. Report of the prime minister’s task force on the conservation of the Mau forest complex. [Online], Available: http://www.kws. org/export/sites/kws/info/maurestoration/maupublications/Mau Forest Complex Report.pdf [Accessed 19.01.14]. Grimm, R., Behrens, T., Märker, M., Elsenbeer, H., 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island - Digital soil mapping using Random Forests analysis. Geoderma 146, 102–113. Gunn, S.R., 1998. Support Vector Machines for Classification and Regression. University of Southampton, Technical report. Hengl, T., Heuvelink, G.B.M., Rossiter, D.G., 2007. About regression-kriging: from equations to case studies. Comput. Geosci. 33, 1301–1315. Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 120, 75–93. Jaber, S.M., Al-Qinna, M.I., 2011. Soil organic carbon modelling and mapping in a semi-arid environment using thematic mapper data. Photogrammetric Eng. Remote Sens. 77 (7), 709–719.

Jaetzold, R., Schmidt, H., Hornetz, B., Shisanya, C., 2010. Farm management handbook of Kenya, Vol. II. Natural conditions and farm management information, 2nd edition, Part B Central Kenya, Subpart B1a Southern Rift Valley Province. Ministry of Agriculture, Kenya and German Agency for Technical Cooperation (GTZ), Nairobi. Jenny, H., 1941. Factors of Soil Formation: A System of Quantitative Pedology. McGraw-Hill. Karunaratne, S.B., Bishop, T.F.A., Baldock, J.A., Odeh, 1.O.A., 2014. Catchment scale mapping of measureable soil organic carbon fractions. Geoderma 219-220, 14–23. Kavzoglu, T., Colkesen, I., 2009. A kernel function analysis for support vector machines for land cover classification. Int. J. Appl. Earth Observ. Geoinfomat. 11 (5), 352–359. Kumar, S., Lal, R., 2011. Mapping the organic carbon stocks of surface soils using local spatial interpolator. J. Environ. Monit. 13, 3128–3135. Kumar, S., Lal, R., Liu, D., 2012. A geographically weighted regression kriging approach for mapping soil organic carbon stock. Geoderma 189-190, 627–634. Kumar, S., Lal, R., Liu, D., 2013. Estimating the spatial distribution of organic carbon density for the soils of Ohio, USA. J. Geograph. Sci. 23 (2), 280–296. Lal, R., 2004. Soil carbon sequestration to mitigate climate change. Geoderma 123, 1–22. Lee, S., Evangelista, D.G., 2006. Earthquake-induced landslide susceptibility mapping using an artificial neural network. Nat. Hazards Earth Syst. Sci. 6, 687–695. Li, Q., Yue, T., Wang, C., Zhang, W., Yu, Y., Li, B., Yang, J., Bai, G., 2013. Spatially distributed modeling of soil organic matter across China: an application of artificial neural network approach. Catena 104, 210–218. Liu, Z.P., Shao, M.A., Wang, Y.Q., 2006. Large-scale spatial variability and distribution of soil organic carbon across the entire Loess Plateau, China. Soil Res. 50 (2), 114–124. Lorenz, K., 2013. Ecosystem carbon sequestration. In: Lal, R., Lorenz, K., Hüttl, R.F., Schneider, B.U., von Braun, J. (Eds.), Ecosystem Services and Carbon Sequestration in the Biosphere. Springer Science+Business Media Dordrecht, pp. 39–62. Malone, B.P., McBratney, A.B., Minasny, B., Laslett, G.M., 2009. Mapping continuous depth functions of soil carbon storage and available water capacity. Geoderma 154, 138–152. Marchetti, A., Piccini, C., Francaviglia, R., Mabit, L., 2012. Spatial distribution of soil organic matter using geostatistics: a key indicator to assess soil degradation status in central Italy. Pedosphere 22 (2), 230–242. Martin, M.P., Wattenbach, M., Smith, P., Meersmans, J., Jolivet, C., Boulonne, L., Arrouays, D., 2011. Spatial distribution of soil organic carbon stocks in France. Biogeosciences 8, 1053–1065. McBratney, A.B., Santos, M.L.M., Minasny, B., 2003. On digital soil mapping. Geoderma 117, 3–52. McCall, G.J.H., 1967. Geology of the Nakuru-Thomson’s falls-Lake Hannington area: degree sheet No. 35, S.W. Quarter and 43 N.W. Quarter. Report No. 78. Government Printer, Nairobi. Meersmans, J., de Ridder, F., Canters, F., de Baets, S., van Molle, M., 2008. A multiple regression approach to assess the spatial distribution of soil organic carbon (SOC) at the regional scale (Flanders, Belgium). Geoderma 143, 1–13. Mishra, U., Lal, R., Liu, D., van Meirvenne, M., 2010. Predicting the spatial variation of the soil organic carbon pool at a regional scale. Soil Sci. Soc. Am. J. 74, 906–914. Murty, D., Kirschbaum, M.F., McMurtrie, R.E., McGilvray, H., 2002. Does conversion of forest to agricultural land change soil carbon and nitrogen? A review of the literature. Global Change Biol. 8, 105–123. Nelson, D.W., Sommers, L.E., 1982. Total carbon, organic carbon and organic matter. In: Page, A.L. (Ed.), Methods of Soil Analysis, Part 2, Chemical and Microbiological Properties. , 2nd edition. American Society of Agronomy, Inc, Madison, Wisconsin, USA. Nguyen, M.Q., Atkinson, P.M., Lewis, H.G., 2006. Super-resolution mapping using Hopfield neural network with fused images. IEEE Trans. Geosci. Remote Sens. 44 (3), 736–749. Okalebo, J.R., Gathna, K.W., Woomer, P.L., 2002. Laboratory Methods for Soil and Plant Analysis: A Working Manual, 2nd edition. Tropical Soil Biology and Fertility Programme, Nairobi. Pachomphon, K., Dlamini, P., Chaplot, V., 2010. Estimating carbon stocks at regional level using soil information and easily accessible auxiliary variables. Geoderma 155, 372–380. Platt, J., 1999. Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (Eds.), Advances in Kernel Methods – Support Vector Learning. MIT Press, MA, pp. 185–208. Powlson, D.S., Gregory, P.J., Whalley, W.R., Quinton, J.N., Hopkins, D.W., Whitmore, A.P., Hirsch, P.R., Goulding, K.W.T., 2011. Soil management in relation to sustainable agriculture and ecosystem services. Food Policy 36, S72–S87. Pozdnoukhov, A., 2005. Support vector regression for automated robust spatial mapping of natural radioactivity. Appl. GIS 1 (2), 21.1-21.10. R Core Team, 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://wwwR-projectorg/ Rakkiyappan, R., Balasubramaniam, P., 2008. Delay-dependent asymptotic stability for stochastic delayed recurrent neural networks with time varying delays. Appl. Math. Comput. 198 (2), 526–533. Rossel, R.A.V., Behrens, T., 2010. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 158, 46–54. Ruß, G., Kruse, R., 2010. Regression models for spatial data: An example from precision agriculture. In: Perner, P., ICDM 2010, LNAI 6171, pp. 450-463.

K. Were et al. / Ecological Indicators 52 (2015) 394–403 Shelukindo, H.B., Semu, E., Msanya, B.M., Singh, B.R., Munishi, P.K.T., 2014. Predictor variables for soil organic carbon contents in the Miombo woodlands ecosystem of Kitonga forest. Int. J. Agric. Sci. 4 (7), 222–231. Smith, P., 2004. Soils as carbon sinks: the global context. Soil Use Manage 20, 212–218. Smith, P., 2008. Land use change and soil organic carbon dynamics. Nutr. Cycl. Agroecosyst. 81, 169–178. Smola, A.J., Schölkopf, B., 2004. A tutorial on support vector regression. Stat. Comput. 14, 199–222. Tesfahunegn, G.B., Tamene, L., Vlek, P.L.G., 2011. Catchment scale spatial variability of soil properties and implications on site-specific soil management in northern Ethiopia. Soil Till. Res. 117, 124–139. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., 2012. Landslide susceptibility assessment in Vietnam using support vector machine, decision tree and Naïve Bayes models. Math. Prob. Eng., 1–26. UNEP, 2009. Kenya: Atlas of our changing environment. Division of Early Warning and Assessment (DEWA), United Nations Environment Programme (UNEP). [Online]. Available: http://www.unep.org/dewa/africa/kenyaatlas/ [Accessed 28.09.13]. Vojislav, K., 2001. Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models (Complex Adaptive Systems). The IMT Press. Vågen, T.G., Winowiecki, L.A., 2013. Mapping of soil organic carbon stocks for spatially explicit assessments of climate change mitigation potential. Environ. Res. Lett. 8, http://dx.doi.org/10.1088/1748-9326/8/1/015011, 015011 (9 pp). Vågen, T.G., Winowiecki, L.A., Abegaz, A., Hagdu, K.M., 2013. Landsat-based approaches for mapping of land degradation prevalence and soil functional properties in Ethiopia. Remote Sens. Environ. 134, 266–275. Vasques, G.M., Grunwald, S., Comerford, N.B., Sickman, J.O., 2010. Regional modelling of soil carbon at multiple depths within a subtropical watershed. Geoderma 156, 326–336.

403

Were, K.O., Dick, Ø.B., Singh, B.R., 2013. Remotely sensing the spatial and temporal land cover changes in Eastern Mau forest reserve and Lake Nakuru drainage basin, Kenya. Appl. Geogr. 41, 75–86. Were, K.O., Singh, B.R., Dick, Ø.B., 2015. Effects of land cover changes on soil organic carbon and total nitrogen stocks in the Eastern Mau Forest Reserve, Kenya (Chapter 6). In: Lal, R., Singh, B.R., Mwaseba, D.L., Kraybill, D., Hansen, D.O., Eik, L.O. (Eds.), Sustainable Intensification to Advance Food Security and Enhance Climate Resilience in Africa. Springer International Publishing, Switzerland, pp. 113–133, http://dx.doi.org/10.1007/978-3-319-09360-4 6. Wiesmeier, M., Barthold, F., Blank, B., Kögel-Knabner, I., 2011. Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem. Plant Soil 340, 7–24. Wiesmeier, M., Spörlein, P., Geuß, U., Hangen, E., Haug, S., Reischl, A., Schilling, B., von Lützow, M., Kögel-Knabner, I., 2012. Soil organic carbon stocks in southeast Germany (Bavaria) as affected by land use, soil type and sampling depth. Global Change Biol. 18, 2233–2245. Williams, G., 2011. Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, use R. Springer Science+Business Media, LLC, DOI 10.1007/9781441998 2. Wilson, J.P., Gallant, J.C., 2000. Terrain Analysis: Principles and Applications. John Wiley & Sons, Inc. Yang, Y., Fang, J., Tang, Y., Ji, C., Zheng, C., He, J., Zhu, B., 2008. Storage, patterns and controls of soil organic carbon in the Tibetan grasslands. Global Change Biol. 14, 1592–1599. Zhuang, L., Dai, H.H., 2006. Parameter optimization of kernel-based classifier on imbalance text learning. Pricai: 2006. Trends in Artificial Intelligence, Proceedings 4099, 434–443.