Predicting soil chemical composition and other soil parameters from field observations using a neural network

Predicting soil chemical composition and other soil parameters from field observations using a neural network

Computers and Electronics in Agriculture 82 (2012) 108–116 Contents lists available at SciVerse ScienceDirect Computers and Electronics in Agricultu...

856KB Sizes 0 Downloads 23 Views

Computers and Electronics in Agriculture 82 (2012) 108–116

Contents lists available at SciVerse ScienceDirect

Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag

Predicting soil chemical composition and other soil parameters from field observations using a neural network M.J. Aitkenhead ⇑, M.C. Coull, W. Towers, G. Hudson, H.I.J. Black The James Hutton Institute, Craigiebuckler, Aberdeen AB15 8QH Scotland, UK

a r t i c l e

i n f o

Article history: Received 18 July 2011 Received in revised form 22 November 2011 Accepted 18 December 2011

Keywords: National Soils Inventory of Scotland Scotland Neural networks Ecosystem services

a b s t r a c t Characterisation of soils in relation to a particular use or classification system is often heavily dependant on their chemical properties, requiring detailed, time-consuming and often expensive laboratory analysis. If it were possible to gain even partial knowledge of the status of a soil in terms of parameters that normally require this kind of analysis, but instead to be able to do so, based on simple field observation information, then an observer in the field would be able to more effectively characterise the soil they were investigating. Using data from the NSIS (National Soil Inventory of Scotland) database, we have produced a neural network model that predicts a wide range of soil chemical and physical properties with varying accuracy levels. This neural network model is supplied with field observation inputs that require only a limited degree of training to determine, and limited field equipment. These inputs include colour, texture classification and site information (topography, climate and vegetation). Several model outputs are predicted with a high degree of accuracy, including organic matter content, Mg, Ca, Ni, total base saturation and pH amongst others. We discuss the outputs that are predicted well and those that are not in terms of their relationships to the model input parameters and their significance within the soil, and consider possible uses and limitations of this prediction system. Ó 2012 Elsevier B.V. All rights reserved.

1. Introduction In order to know how suitable a soil is for any particular purpose, it is necessary to categorise it according to one or more classification systems. Some of these classification systems are relatively simple and directed towards specific functions, such as the critical load capacity of some chemical or potential addition (e.g. sewage sludge, manure, fertilizer, etc., Sheppard et al., 2007). In some cases, an investigator may even be measuring a single parameter of the soil, such as bulk density, in which case the classification of the soil becomes irrelevant. At the other end of the spectrum, some classification systems are highly detailed and require a significant amount of information about the soil (e.g. World Reference Base for soil resources, IUSS Working group, 2006; USDA soil taxonomy, USDA, 1999). In most if not all cases where a specific purpose or character is being considered, the classification or evaluation of a soil requires information about its chemical composition and, usually, other parameters that require laboratory analysis to determine their values. Experienced soil surveyors can often estimate to some degree the composition of soils simply by visual and physical examination. Organic matter content, iron content, saturation, and sand/ silt/clay components can be determined from colour and hand ⇑ Corresponding author. Tel.: +44 (0)1224 395257. E-mail address: [email protected] (M.J. Aitkenhead). 0168-1699/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.compag.2011.12.013

texturing. Other sensory clues, including overall horizon structure, vegetation and topography, smell (and even taste!) can provide information about the likely composition of a soil. The question, therefore, is whether the knowledge and expertise applied by an experienced field soil scientist can be mimicked, or possibly even improved upon, using an automated system. Much recent work has been carried out to determine whether specific physical, chemical or biological parameters can be measured using detailed spectral analysis of soil samples. This work includes the use of visible wavelength light (Demattê et al., 2003; La et al., 2008; Sellitto et al., 2009), near-infrared and mid-infrared spectroscopy (Linker et al., 2005; Chen et al., 2009; Janik et al., 2009; Viscarra Rossel et al., 2009), remote sensing using visible and infrared wavelengths and microwaves (Bishop et al., 2008; Chai et al., 2008; Furby et al., 2010). In many cases, specific chemical components have been evaluated, simplifying the task of relating spectral reflectance to the quantity of the chemical of interest, although there has also been work on detection and quantification of multiple soil chemicals. This work is producing promising results, although it usually still relies on sample preparation, transport and laboratory analysis. Here, we attempt to determine the degree to which more basic field observations can be used to evaluate the chemical composition of soil, looking at a wide range of different elements and biophysical properties. We do not intend to make redundant the laboratory-based chemical analysis of samples, as it is implausible

M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

to suggest that every chemical and biophysical parameter can be accurately evaluated based on field observations. However, we do aim to show that based on a few simple observations and measurements, it is possible to produce an additional set of information about the studied soil system that would be of use not only to the field observer, but also to land managers and policymakers. It is known that ‘gross’ soil factors such as texture are strongly related to more specific or subtle parameters within the soil, such as greenhouse gas production and emission (Dobbie and Smith, 2003; Lee et al., 2006; Gu and Riley, 2010), biological activity and productivity (Fogg et al., 2004; Cable et al., 2008; Barthès et al., 2008; Kooijman et al., 2009), organic matter dynamics (Plante et al., 2006; Matus et al., 2008; Grandy et al., 2009) and nutrient status (Ige et al., 2007). In many cases, several of these gross factors are known to have strong relationships with specific characteristics, particularly those relating to critical loads (Towers and Paterson, 1997; Kernan et al., 1998). There are also strong relationships between soil colour and chemical composition, particularly with specific components such as organic matter (Barouchas and Moustakas, 2004), manganese (Haidouti and Massas, 1998; Dowding and Fey, 2007) and iron (Davey et al., 1975; Barron and Torrent, 1986; Galva~o and Vitorello, 1998). There are also relationships between colour and less commonly considered elements, including titanium (Krishna Murti and Satyanarayana, 1971; Galva~o and Vitorello, 1998), calcium (Demattê et al., 2007), hydrogen (and therefore pH) (Dudley, 1976), cation exchange capacity (Demattê et al., 2007), sodium (Howari, 2003; Sidhu et al., 2004) and phosphorus and potassium (Mouazen et al., 2005). The main objective of this work is to demonstrate the effectiveness of neural network modelling in predicting soil parameters, based on field observations. In order to achieve this, the work is split into several smaller components:  Dataset development and preparation: Converting the existing NSIS (National Soil Inventory of Scotland) data into a dataset that can be used to train and test a neural network.  Neural network design and training: Developing a neural network solution that can be trained with the soils dataset and that can also be used to predict soil parameters by the user for any site in Scotland.  Testing and statistical evaluation of the trained neural network 2. Soils data The data used in this work is contained within the National Soils Inventory of Scotland (NSIS) dataset, a component of the Scottish Soils database. This dataset was developed at the Macaulay Land Use Research Institute (now part of the James Hutton Institute) in Scotland. The NSIS has been used extensively in soil characterisation and classification by researchers both at the Macaulay Institute (Towers, 1994; Towers and Paterson, 1997; Brown et al., 2008; Lilly and Matthews, 1994; Lilly et al., 2009) and elsewhere both directly (Smith et al., 2007, 2009) and in the development of models (Boorman et al., 1995; Maréchal and Holman, 2005; Schneider et al., 2007; Van Huyssteen, 2008). The database contains a large amount of information from thousands of sites across Scotland, including site descriptions, physical soil parameterisation and chemical analyses. The consistency, scale and accuracy of the information contained in the database makes it ideal for developing a model to relate soil field observations to more detailed analytical parameters. 2.1. Scottish soils database The Scottish Soil database is one of the most detailed and systematic collections of national soil data in Europe. Since its inception in

109

1948 to the end of systematic survey in 1988, the Soil Survey of Scotland produced a range of digitised and paper maps at a number of scales from full national coverage at 1:250,000 scale to more local surveys at scales of 1:10,560 or larger. In addition, a comprehensive database was developed that currently contains information on over 13,000 geo-referenced soil profiles. In recent years, digitising of archived soil maps has taken place and addition of more soil profiles to the database has been carried out, although the main emphasis of current work is on the ease of utilisation of the existing data for a variety of purposes. Further work by Lilly et al. (2004) is directed towards achieving this by satisfying the following objectives:  Consultation with end-users to identify requirements for soil data now and in the future, and establish the feasibility of satisfying these requirements.  Rationalise and update the many existing datasets, methods and metadata within a common framework.  Extend the current database using existing data, while checking for and correcting errors.  Assess the options for new measurements on archived soil samples and limited resampling of key soils.  Coordinate these activities with those that are being carried out at the National Soil Resources Institute at Cranfield University, in order to facilitate the provision of UK data. Formation of the Soil database in something related to its current form was initiated at the Macaulay Institute in 1975, with several studies and surveys resulting in a completed database in 1987 (Brown et al., 1987). This database contained soil profile descriptions and chemical and physical analyses for several thousand sample points in a grid across the country, and includes mineralogical and spectrochemical trace element analyses for a number of these sample points. The data is stored digitally in multiple subsets, and can be accessed using the Oracle database system. Since 1987, further datasets have been added to the original, with the current database undergoing continuous updating and restructuring to improve data content and access. 2.2. National Soils Inventory of Scotland The National Soils Inventory for Scotland (NSIS) is an objective sample of Scottish soils. Soil and site conditions of 3094 locations throughout Scotland were sampled using a 5 km grid across the entire country, aligned with the National Grid of Great Britain. Samples were taken at multiple depths from soil pits and analysed to determine their physical and chemical properties. The NSIS dataset is one of the most significant contributions to the Scottish Soil database in terms of data quality and quantity. The data provide good estimates of means and regional variations in a range of soil properties and attributes. They inform the soil classification for Scotland and are used with soil map information to estimate the regional variation in soil properties. The soils inventory data is stored in several data tables within an Oracle database, of which several were used for this work and are described in brief below:  Basic – the site description information, including grid reference (in the UK National Grid coordinate system), topography (slope, elevation, and aspect), vegetation type, soil classification according to the Scottish soil classification system, and climate type. There are also additional site description details regarding presence of stones, type of topography, sample date and identification of the person carrying out the site description, but these were not used in this study. Any columns that were considered either unnecessary for this work, or that were largely missing data entries, were not used. The six parameters that were

110









M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

included in this work were: grid reference (to allow linking with data from other tables), slope, altitude, vegetation type (five categories – arable, forest, grassland, heath and wetland), dominant soil type (five categories – alluvial, calcareous, brown earth, gley and podzol) and climate type (two categories – low to moderately humid & very humid to perhumid, using the climatic classification of Thornthwaite (1948)). Mineral – a detailed description of mineral soil horizons within the NSIS. This data includes grid reference, horizon symbol, and depth to the top of the horizon and of sample location. Munsell values for the colour of the soil matrix, secondary material and mottles are given. There is also a large amount of information about the presence or absence of diagnostic features such as concretions and stone size fractions, but these were not included in the analysis. Four parameters from the Mineral data table were included in this study: horizon symbol and grid reference (to allow linking with other tables), sampling depth and Munsell colour. Organic – this table was similar to that of the Mineral data, but contained some information of relevance only to organic soil horizons, such as the structure of the organic material and its condition in the field when moist, and the character of the boundary with a mineral horizon (if present). Other than these parameters relating specifically to organic soils, the same parameters were extracted from this table as those taken from the Mineral dataset. Analytical – as the title of the table suggests, the data contained here was largely the kind acquired from analytical methods. This included chemical composition, pH, precise sand/silt/clay fraction proportions and organic matter content. Most of these parameters were kept, with some that were largely missing or unknown being excluded from the dataset. Also included was the laboratory sample number and horizon class designation, to allow linking with other tables. Trace – the data contained in this table is largely detail about trace element composition, with a total of 31 parameters for elemental concentration defined either as extractable or exchangeable. Eight parameters were not used as there were insufficient numbers of measurements for them. The following trace elements were included in this study: H, Na, Mg, P, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, Sr, Mo, Ba, and Pb. The elements C and N were already included from the other tables mentioned, with Ca, H, K, Mg, Na and P also included from other tables but duplicated here, along with base saturation and total exchangeable bases.

3. Methods 3.1. Data preparation Each of the NSIS tables described above was exported to Microsoft Excel, and examined for missing entries. In columns (variables) for which a significant number of entries were missing (e.g. more than half, although this threshold was not fixed and depended on the variable in question) the column was removed from the dataset. Efforts were made to include variables that were considered difficult to estimate from included data or are considered important for soil/plant processes, so for example if more than half of the entries for calcium or lead had been missing, it would have been retained anyway (although this was not in fact the case for these elements). An example of variables that were lost included stone size fractions, for which over half of the entries were complete, but the inclusion of which would have meant reducing the overall training dataset size by almost 30%. For variables that were missing a smaller proportion of their entries, the data points associated were removed. The mineral

and organic horizon datasets were merged after removal of variables that occurred only in one or the other. This resulted in four spreadsheets, each with several thousand entries, within which there were some common variables or parameters. Some of these parameters, for example pH or sand/silt/clay content, were removed to leave only one example of each in one of the tables. Other parameters that were required for linking the data entries from different tables together were kept. These included grid reference, soil horizon designation and laboratory sample identification number. The basic and mineral/organic tables were linked using grid reference values that were common to each of the two tables, to produce a new table called Combined_1. This table contained more data points than the basic table, as more than one sample was taken from each of the grid points when the NSIS was being developed. As the basic table contained information common to all samples taken from a particular location (i.e. altitude, climate code, soil code, slope and vegetation type), a one-to-many linking approach was possible. These multiple samples corresponded to different horizons within the profiles. The table Combined_1 contained the parameters given in Table 1. The Combined_1 table was linked to the analytical table based on grid reference and horizon symbol. The use of both these parameters was necessary as multiple samples from different horizons had been analysed from each site location. The new table, titled Combined_2, contained all of the information in Combined_1 for each sample and included specific information about the horizon in question (sand/silt/clay proportions, bottom and top depths for the horizon, pH, loss on ignition and carbon) and the laboratory sample identification number, in addition to some chemical analysis data (Ca, H, K, Mg, Na, N, P, base saturation (Satn) and the sum of exchangeable bases (Sum)). Not all of the entries in Combined_1 were used, as some of the samples had not been analysed. However, a total of 5982 data points were obtained. The final table that was produced, entitled Combined_3, contained all of the parameters in Combined_2 and the trace table, and was created by linking these tables through the laboratory sample reference numbers in each. The Trace table added another 31 parameters, producing a total of 54 parameters. The number of entries in Combined_2 that had the same laboratory reference number as the Trace table was restricted, giving a total of 101 data entries. While this is a lot fewer data points than would have been desired, it is still sufficient to provide training and testing datasets. Examination of the Combined_3 table showed that the 101 training points were distributed over a wide spatial distribution across Scotland, and occupied a range of topographic, climatic, soil and compositional types. In order to produce a training dataset that could be used by a neural network expert system, it is necessary to have continuous variables for each input. Parameters that are given by type, therefore, must be split into several inputs with each associated with one of the types, and each given a value of 0 or 1 depending on whether than type was present or absent. Adjusting the dataset to include these 1/0 values for vegetation, climate, soil type and textural class expanded the size of the table to 66 parameters. Finally, the Munsell colour codes in each case were converted to RGB (Red, Green and Blue) values, to allow a consistent set of continuous input values to be given. This increased the total number of parameters to 68. The final dataset was split into parameters identified either as ‘input’ or ‘output’ types, depending on whether they Table 1 Information given in table Combined_1 (generic environmental and soil information). Grid reference

Horizon symbol

Altitude

Climate code Vegetation code

Soil code Munsell colour code

Slope Sampling depth

M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

could be expected to be observed easily in the field, or if they would have to be determined through laboratory-based analysis. Existing statistical relationships between input parameters (for example between elevation and climate type) are acknowledged to exist, but we have not considered the effects of correlations between parameters in the training of the neural network. The input parameters identified are given in Table 2. Outputs parameters included sand/silt/clay and organic matter, 38 elemental concentrations (including both exchangeable and extractable values for some), and some additional characteristics such as ash content, pH, base cation status and soil type. Each of the input parameters was adjusted to fit on the scale [0 and 1], to provide a series of inputs to the neural network that was not biased towards parameters with higher numerical values. Each of the output values was adjusted to fit within the range [0.1 and 0.9], to allow the output node activations to extend across the full range of expected values.

3.2. Neural network architecture and training The main challenge, in addition to acquiring sufficient field and analytical data to model the relationships between each, is to actually produce a modelling system that can handle a large number of input and output parameters. There are many statistical and expert system approaches capable of handling complex mathematical transformations of this nature, but one approach that is relatively easy to implement, reliable and popular for this type of work is artificial neural networks (ANNs). These have been used extensively in environmental modelling where parameter modelling using large, noisy datasets has been a requirement. Their use in modelling soil characteristics is well demonstrated (Abbaspour and Baramakeh, 2006; Aitkenhead et al., 2007; Elshorbagy and Parasuraman, 2008). Here we demonstrate the attractiveness of using a neural network model to relate field observations to physical and chemical characteristics from a wide range of soil types distributed across a complex landscape (see Fig. 1), and show how significant some of the input parameters are in terms of their effect on specific model outputs. The neural network design used to create the model was a fullyconnected feedforward multilayer perceptron, with an input layer, an output layer and two hidden layers. A single neural network model was used to predict all output parameters, instead of using a separate neural network for each output. The number of input nodes was equal to the number of input parameters in the model (i.e. 17), and the number of output nodes was equal to the number of output parameters (i.e. 51). The number of nodes in each hidden layer was set at 100, close to the number given by Kolmogorov’s theorem for neural network modelling (which states that the number of nodes in the hidden layers should be twice the maximum in the input or output layers in order to guarantee the perfect fit of any continuous function (Bishop, 1995)). While there is some debate over whether one hidden layer or two is the best design for a feedforward neural network on the grounds that two may lead to overfitting of the data, we found that this was not true in this case; indeed, using two hidden layers improved the performance of the network. Training of the neural network was carried out using the backpropagation gradient descent algorithm, with a log-sigmoid or

Slope (1 value) Vegetation (5 values)

logistic node activation function for each layer. The activation function converts the value X presented to each node in the hidden or output layers into an activation value Y, with the logistic function used being similar to that applied in converting inputs to outputs in the human nervous system. Eq. (1) gives the logistic function below:



1 1 þ ebX

ð1Þ

where b is a value used to control the gradient of the logistic function (useful when a node is integrating a large number of values, as in this case where the nodes in the second hidden layer are receiving values from 100 nodes in the first hidden layer). The values presented at the input nodes are integrated at each node in the first hidden layer (in proportion to the connection strengths between the inputs and first hidden layer nodes), which sends the converted activation values onto the second hidden layer, and thence onto the output layer. Initially the network’s connections are randomised within the range [1 and 1], with the connection weight between two nodes acting as a multiplier of the activation being sent between them. The output node activations are compared against the known ‘correct’ values, and the errors determined are then used to propagate adjustments to the connection weights backwards through the network (hence ‘backpropagation’). In addition to the parameter b, a learning rate a is required which controls the rate at which connection weights are adjusted in proportion to the errors that they have communicated. The value of a normally lies between 0.001 and 0.1 (too low and the network learns too slowly, too fast and the learning driven by one training step tends to overwrite that of previous steps), and that of b is normally less than 1. For further details on the mathematics behind the backpropagation neural network algorithm and its implementation, the reader is directed to Rumelhart et al. (1986) or Aitkenhead et al. (2003). The 101-point dataset described above was split into two components, for training and testing. The training dataset contained 70 entries, selected at random from the Combined_3 table. The testing dataset was composed of the remaining 31 data points. Examination of the distribution of parameter values within the training and testing datasets showed that they were distributed across the range of values found within the combined dataset, and that there were no biases of geographical location or any other characteristic within either. The neural network expert system was trained 100 times (from initially random weightings) for a total of 100,000 training steps, using a learning rate a of 0.05 and a value for b of 0.25. At each multiple of 1000 training steps, the neural network’s performance was compared against the testing dataset, to prevent overtraining. The point at which the mean performance was highest was taken to be the optimal training step count, and the network was retrained with this number of training steps. This optimal training count was found at 12,000 steps. The same procedure to find an optimal training count was carried out with both one and two hidden layers, and it was found that while the optimal number of training steps was similar for one hidden layer (14,000 instead of 12,000), the overall performance when compared against the test dataset was worse (Table 3). This does not imply an over-fitting of the data, as comparison was made against test data not used for training the neural network. Over-fitting caused by having two hidden layers instead of one would result in a reduction of test data validation accuracy. 3.3. Validation

Table 2 Input parameters for the neural network model. Elevation (1 value) Climate (2 values) Textural type (4 values)

111

Sample depth (1 value) Colour (3 values)

Testing of the trained neural network model was carried out by presenting it with each of the 31 test data entries as an ‘unknown’, using the input values to activate the network and comparing the output values given with the actual values for those data entries. In

112

M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

Fig. 1. Illustration of the variability of soil type across Scotland (seven main categories).

Table 3 Statistical evaluation of neural networks with one and two hidden layers, for training and testing stage. RMSE – Root Mean Squared Error; MAE – Mean Absolute Error; MBE – Mean Bias Error.

Training (1 hidden layer) Testing (1 hidden layer) Training (2 hidden layers) Testing (2 hidden layers)

RMSE

MAE

MBE

0.0724 0.0973 0.0561 0.0748

0.0647 0.0896 0.0504 0.0689

0.0128 0.0113 0.0151 0.0095

order to carry out this comparison, the neural network output values were transformed into values matching the scale of the

parameter in question. For example, for observed values of pH between 3.63 and 7.20, the output values from the corresponding node (which would lie in the range [0.1 and 0.9]) were subjected to the following transformation:

Y ¼ ðð7:20  3:63Þ  ð1:25  ðX  0:1ÞÞÞ þ 3:63

ð2Þ

where X was the value given by the neural network and Y was the value on the appropriate range for pH. A table of predicted against actual values for each of the 51 parameters was developed to allow statistical evaluation of the model, with all parameters normalised within the range [0 and 1]. For each parameter, the mean of the proportional error (actual value minus correct value, all divided by

113

M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

parameter range) was calculated, as was the r2 value of the regression carried out between actual and correct values. Parameters for which an r2 value of greater than 0.5 was given were considered to have been ‘well’ modelled, while those for which the r2 was less than 0.5 were considered less so. This threshold of 0.5 is lower than that normally considered when modelling parameters with neural networks (0.8 is a more common value), however when considering that we are attempting to predict chemical composition using nothing but field observations it is necessary to lower expectations. 3.4. Model analysis An often-stated complaint regarding the use of neural networks for modelling parameters in complex systems (such as the one studied here) is that they are effectively ‘black boxes’. In other words, they may give good results, but it is difficult to understand how these results have been achieved, and almost impossible to pick apart the network to determine the important relationships that may have been identified and modelled within its structure. In recent years however, this position has been refuted by new methods of analysing and ‘decomposing’ trained neural networks to evaluate the relationships that they encapsulate. One particularly effective way of answering the ‘black box’ criticism is to carry out a partial derivatives analysis of the neural network’s input/output transformations. This can be achieved in many ways, but the most successful is the Connection Weights approach (Olden and Jackson, 2002; Olden et al., 2004). Here we use a simplified version of this approach, in which we calculate the sum of all connection weights in every possible ‘route’ from each input node to each output node. The sum calculated represents the significance of each input for each output, and from this we can determine which inputs are more or less important within the network as a whole and in modelling specific parameters. This partial derivatives analysis was carried out on the trained neural network model, resulting in a total of 17  51 = 867 partial derivative values, more than could be presented or discussed here. Instead, the three most significant inputs for each parameter that was modelled with an r2 value of greater than 0.5 were identified. 4. Results 4.1. Model accuracy Table 4 gives the prediction accuracy for each of the model output variables, using the r2 values (the square of the Pearson product moment correlation coefficient) and p-values obtained from comparing the actual and predicted values in the test dataset. The software package used to carry out the statistical evaluation was Statistica 8 (Statsoft, 2008). The number of samples for which predictions were made equals 31 in each case. Not included in

Table 4 is the prediction accuracy for soil group, for which there were five classes and which was predicted accurately for 27 out of 31% or 87% of the test data points. As expected, the prediction accuracy for sand, silt and clay values are high, which is consistent with the model having been given the soil texture class as an input. The LOI (Loss on Ignition) r2 value was also expected to be high, as organic matter content and soil colour are known to be closely related. Magnesium, calcium, pH and manganese are also known to be related to soil colour, and the r2 values in each case are high. The high values for nickel, chromium and several other output parameters of the model were not expected, however. It is also important to note that the prediction accuracies for total exchangeable bases and base saturation are relatively high, showing that these extremely important parameters can be predicted with a fair degree of confidence. However, as the ‘total exchangeable bases’ value is related to the major bases Mg, Ca and Na which all had relatively high r2 values, the accuracy of predicting this parameter is perhaps not very surprising. Assuming a threshold for rejection of the null hypothesis of 0.01, only two of the 30 parameters (Ba and K) had p-values higher than this threshold. In addition to calculating r2 values and p-values, the statistical analysis provided information on the ratio between means of actual and predicted values for each parameter. Table 5 gives these ratios, providing a further indication of how well each parameter is predicted. The parameters have been listed in Table 5 in the same order as in Table 4, and it is instructive to note that for the 18 out of 30 parameters for which the ratio between actual and predicted means falls between 0.8 and 1.2 (taken to show good prediction by this measure), the mean r2 value calculated was 0.554. For the 12 parameters with ratios outwith this range (all of which were lower than 0.8), the mean r2 was 0.359. A total of 11 parameters were categorised as ‘well predicted’ under both criteria, having r2 values of greater than 0.5 and ratios between actual and predicted means between 0.8 and 1.2. 4.2. Model analysis Identification of the three most important input parameters for each of the outputs that were predicted with an r2 value of greater than 0.5 gave some interesting and unexpected results (see Table 6). We found that the most important inputs varied from one output to another, and that there are no input parameters that are by a very large amount more or less important than any other. Each of the inputs was found to be important for at least one and often several outputs, with the sole exception of the low humidity climate class. As we only had two climate classes, the absence of one equates to the presence of another and therefore we reason in this case that knowledge of the presence or absence of each climate class is still important. It is beyond the scope of this work to

Table 4 Statistical evaluation of the prediction accuracy of the trained neural network model, applied to the test dataset. Statistical parameters given include the r2 values from Pearson correlation, and the p-values indicating probability of statistical significance.

a b c

Param

r2

p-value

Param

r2

p-value

Param

r2

p-value

Param

r2

p-value

Clay Mg Sand Ca Ni LOIa pH Cr

0.890 0.782 0.763 0.716 0.710 0.688 0.654 0.641

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Mn Silt Satnb Sumc N Ti Co C

0.625 0.613 0.560 0.542 0.538 0.491 0.481 0.470

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Na Cu Ga H Ash P Zn Pb

0.422 0.422 0.413 0.403 0.352 0.317 0.271 0.243

0.000 0.000 0.000 0.000 0.000 0.001 0.003 0.005

Fe V Mo Sr Ba K

0.243 0.234 0.225 0.213 0.183 0.169

0.005 0.006 0.007 0.009 0.016 0.022

LOI – Loss on Ignition. Satn – base saturation. Sum – total exchangeable bases.

114

M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

Table 5 Ratios between actual and predicted mean values for each parameter in the test dataset.

a b c

Param

Ratio

Param

Ratio

Param

Ratio

Param

Ratio

Clay Mg Sand Ca Ni LOIa pH Cr

0.972 0.838 1.045 0.724 0.872 0.900 1.062 0.988

Mn Silt Satnb Sumc N Ti Co C

0.730 0.949 1.002 0.905 0.925 1.039 1.033 0.697

Na Cu Ga H Ash P Zn Pb

0.773 0.972 1.042 0.675 1.087 0.281 0.527 0.851

Fe V Mo Sr Ba K

0.794 0.450 0.575 0.508 0.961 0.568

LOI – Loss on Ignition. Satn – base saturation. Sum – total exchangeable bases.

attempt to explain all of the relationships quantified by the ‘strong’ input–output importance values, and we have not attempted to do so, although some specific examples are covered in the Section 5. Examining the specifics of Table 6, we see that for sand, silt and clay, the textural class inputs tend to dominate as would be expected. For the chemistry-related outputs, texture becomes less important and there is a mix of topography, climate and vegetation in terms of significance. pH and base saturation are both influenced strongly by vegetation type, while for loss on ignition (related to organic matter content), colour and depth are important. This would seem to indicate that organic matter levels can be high or low under a wide range of environmental conditions, and that direct observation is required in order to determine the OM status of the soil. 5. Discussion We have demonstrated a novel ability to predict and quantify soil parameters that often require complex and expensive laboratory work to determine. The novelty of the work lies in the ability to predict a number of physical and chemical parameters using simple field observations that require only a small amount of training to be carried out. While we do not think that the ability to model these soil parameters using field observations can (or should) replace laboratory analysis, we do feel that it is a significant and potentially very useful approach that can allow rapid ‘field

estimation’ of soil characteristics by land managers. Many researchers have shown that relationships exist between environmental factors and soil or plant nutrient characteristics (e.g. Zhang et al., 2011; Han et al., 2011; Shahandeh et al., 2011 for recent examples). The awareness that these relationships exist goes back at least as far as the work of Hans Jenny (Jenny, 1941). However, few have used these relationships to develop predictive models. Examples of these include Wills et al. (2007), who used soil colour as a predictor of log-normalised values for soil organic carbon, and achieved slightly better results to the similar characteristic LOI (Loss on Ignition) predictive ability achieved here, and Majumdar et al. (2008), who developed a multivariate predictive model of N, C and PO4 using environmental and socioeconomic (land use) input factors. This work produced relatively high predictive accuracy, but covered a relatively small surface area with less spatial variability than is seen over the whole of Scotland. Not only is the work carried out here important due to its ability to provide us with an indirect, rapid and relatively cheap method of determining many soil parameters, but it also has the potential to show underlying relationships within the soil system that were previously unknown. Some of the inputs found to be important for specific outputs were not surprising, such as the influence of soil textural class on sand, silt and clay, but there are many other strong relationships between topographic, climatic and vegetation inputs and specific soil chemistry components that may reveal hidden mechanisms and processes at work. The strong relationships between elevation and pH, calcium and nitrogen for example may indicate impacts to be expected from climate change, as elevation and temperature are strongly linked and a change in one may allow us to simulate a change in another. Examples of strong relationships discovered here between input and output parameters, and of possible explanations for these relationships, include:  SCL (Sandy Clay Loam), S (Sand), LS (Loamy Sand) and SL (Sandy Loam) inputs and Clay, Sand and Silt outputs: The presence of a particular soil textural class will obviously have a strong impact on the level of a particular size fraction, as it is these size fractions that are used to determine the textural class allocated to the soil in the first place. It is no surprise therefore to see strong relationships between these particular input and output variables.

Table 6 Identification of the three most influential input parameters (columns) for each of the output parameters (rows), using the Connection Weights method of Olden and Jackson (2002). Relative influence of input parameters is normalised to produce a total of 1. An ‘a’ marks that a parameter is one of the three inputs identified. Texture classes are given as codes (SCL – sandy clay loam texture, S – sandy texture, LS – loamy sand texture, SL – sandy loam texture). Clay Elevation Slope Climate 1 Climate 2 Veg 1 Veg 2 Veg 3 Veg 4 Veg 5 Red Green Blue Depth SCL S LS SL *

0.016 0.021 0.048 0.044 0.015 0.031 0.011 0.034 0.019 0.008 0.016 0.015 0.042 0.215a 0.157a 0.176a 0.131

Mg 0.084 0.104a 0.088 0.116a 0.046 0.052 0.048 0.095a 0.063 0.018 0.026 0.021 0.012 0.067 0.043 0.061 0.058

Sand 0.009 0.015 0.025 0.020 0.014 0.019 0.006 0.009 0.006 0.040 0.063 0.108a 0.038 0.165a 0.256a 0.102 0.104

Ca

Ni a

0.111 0.086 0.068 0.080 0.055 0.070 0.054 0.135a 0.042 0.022 0.025 0.019 0.136a 0.031 0.019 0.025 0.021

0.077 0.062 0.084 0.104a 0.052 0.048 0.032 0.034 0.088a 0.051 0.028 0.033 0.066 0.066 0.039 0.040 0.097a

LOI – Loss on Ignition. Satn – base saturation. *** Sum – total exchangeable bases (see for example Ragg and Futty, 1967). **

*

LOI

0.086 0.082 0.021 0.025 0.021 0.023 0.016 0.019 0.022 0.187a 0.039 0.047 0.176a 0.087a 0.078 0.042 0.027

pH

Cr a

0.145 0.088 0.059 0.061 0.045 0.049 0.121a 0.130a 0.018 0.011 0.008 0.016 0.036 0.036 0.092 0.022 0.065

a

0.108 0.104 0.101 0.134a 0.018 0.013 0.016 0.021 0.018 0.146a 0.087 0.032 0.059 0.034 0.053 0.028 0.026

Mn

Silt

**

0.073 0.099a 0.050 0.067 0.092a 0.043 0.048 0.038 0.040 0.052 0.021 0.008 0.045 0.085a 0.082 0.082 0.076

0.023 0.041 0.020 0.038 0.042 0.034 0.039 0.035 0.072a 0.019 0.024 0.021 0.008 0.205a 0.241a 0.066 0.070

0.068 0.074 0.074 0.063 0.108a 0.102a 0.041 0.099a 0.050 0.028 0.024 0.017 0.069 0.041 0.049 0.058 0.036

Satn

***

Sum

0.060 0.119a 0.059 0.064 0.048 0.057 0.036 0.062 0.042 0.128a 0.023 0.033 0.085a 0.038 0.052 0.045 0.047

N 0.104a 0.050 0.043 0.074 0.052 0.062 0.078 0.076 0.048 0.109a 0.121a 0.033 0.041 0.027 0.038 0.023 0.019

M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

 Elevation and pH: Within Scotland, elevation and annual rainfall are positively related, and the effects of increased rainfall will include increased acid deposition and runoff/leaching, which will remove soluble bases.  Vegetation 3 (Grassland) and Vegetation 4 (Heath) inputs and pH: Both of these land cover types have strong impacts on soil pH, for different reasons. Grassland pH values tend to be higher, both as a result of natural processes (lower organic matter content) and maintenance (e.g. liming). Heathland pH values are normally lower, due to higher organic matter content and the acidic conditions caused by specific vegetation types (e.g. Sphagnum).  Red (colour component) and LOI (Loss on Ignition): Organic matter, the component of soil that is lost during ignition and is therefore taken to be quantified by LOI, absorbs shorter blue/ green wavelengths of visible light much more than longer, red wavelengths. This results in high organic matter content soils having a more reddish colouration.  Green (colour component) and N (total nitrogen): Nitrogen is a major component of chlorophyll, so upper soil layers that have a high level of undecomposed plant material incorporated into them will have both increased green colouring and nitrogen content. In addition, Gleysols (poorly drained soils with anoxic conditions) can contain relatively high proportions of organic matter and due to the reducing conditions tend to have grey, blue or green colouration due to the increased ratio of Fe(II) to Fe(III). Many of the parameters within the NSIS database were not used in this work, either because they were largely missing from the database or because it was felt that they were not overly important in relation to the outputs that were eventually chosen. A careful analysis of whether or not these parameters should have been left out will allow us to determine how useful they may be, either as inputs from field observations that may improve the accuracy of the system, or as outputs that are useful for some purpose. The method used to analyse the trained neural network will prove invaluable in this, as it can be used to identify inputs that are less or more influential, and that should therefore be left out or kept within the model. In order to take this work further, we need to have a larger dataset including more soil, vegetation, climate and textural types to allow soils from a wider range of geographic locations to be considered. This will become possible as the NSIS database is expanded over time, as is currently taking place. This will also provide us with more confidence in the model, and will allow it to be applied to an increased number of soil parameters. There is a wider implication to the success of this work, which will not be ignored. If it is possible to predict the soil’s physical, biological and chemical status using a model that requires only relatively simple field data, then this provides scope for a field-based soil evaluation application. It may be possible to accurately quantify soil characteristics that are significant in terms of soil ecosystem services, and in so doing categorise a soil in terms of some user-relevant classification system. Possible examples of this include the determination of critical load thresholds for a wide range of possible soil additives, such as sewage sludge, nitrogen or sulphur deposition, or specific heavy metal pollutants. It may also be possible to estimate a soil’s designation within Land Suitability classifications, and therefore determine how suitable it is for Agriculture, Forestry or some other use. We intend to take this work further in this direction, in the hope of developing a field-applicable soil characterisation method that will provide land users with a rapid and accurate way of characterising soils. The neural network trained during the work described here has been implemented within a software package intended for this purpose.

115

The inputs used for the neural network model correspond closely to the classically-described ‘five soil formation factors’ of topography, climate, vegetation, geology (colour and texture) and human influence (through vegetation type). A potential further development of this work could therefore be the prediction of soil properties in different areas, based on appropriate quantification of the five soil-forming factors. Work has begun on implementing this idea within the software package described above. 6. Conclusions We have demonstrated a neural network approach to the prediction of soil parameter values. The success of the approach varies between parameters, but consistently provides r2 values of >0.5 for parameters relating to texture and broad physico-chemical properties. While the accuracy rates achieved do not make laboratory analysis redundant, they do indicate that the neural network approach demonstrated here could be an important additional tool in the field surveyor’s ‘armoury’. As a rapid and cost-effective method of providing indicative information about soil characteristics, this approach can be applied by soil professionals and land users with little expertise in the field. In addition, the work carried out here provides information about parameter relationships and processes within the soil system, and forms the basis of a software package under development for the prediction of soil physical and chemical characteristics from field observations. Acknowledgements This work was carried out under the Scottish Government’s Programme 3: ‘Environment – Land Use and Rural Stewardship’. The authors would also like to thank Allan Lilly for providing information relating to this work. References Abbaspour, A., Baramakeh, L., 2006. Application of principle component analysisartificial neural network for simultaneous determination of zirconium and hafnium in real samples. Spectrochimica Acta – Part A: Molecular and Biomolecular Spectroscopy 64 (2), 477–482. Aitkenhead, M.J., McDonald, A.J.S., Dawson, J.J., Couper, G., Smart, R.P., Billett, M., Hope, D., Palmer, S., 2003. A novel method for training neural networks for time-series prediction in environmental systems. Ecological Modelling 162 (1– 2), 87–95. Aitkenhead, M.J., Aitkenhead-Peterson, J.A., McDowell, W.H., Smart, R.P., Cresser, M.S., 2007. Modelling DOC export from watersheds in Scotland using neural networks. Computers and Geosciences 33 (3), 423–436. Barouchas, P.E., Moustakas, N.K., 2004. Soil colour and spectral analysis employing linear regression models I Effect of organic matter. International Agrophysics 18 (1), 1–10. Barron, V., Torrent, J., 1986. Use of the Kubelka-Munk theory to study the influence of iron oxides on soil colour. Journal of Soil Science 37 (4), 499–510. Barthès, B.G., Kouakoua, E., Larré-Larrouy, M.-C., Razafimbelo, T.M., de Luca, E.F., Azontonde, A., Neves, C.S.V.J., de Freitas, P.L., Feller, C.L., 2008. Texture and sesquioxide effects on water-stable aggregates and organic matter in some tropical soils. Geoderma 143 (1–2), 14–25. Bishop, C.M., 1995. Neural Networks Pattern Recognition. Oxford University Press, Oxford, p. 482. Bishop, J.L., Lane, M.D., Dyar, M.D., Brown, A.J., 2008. Reflectance and emission spectroscopy study of four groups of phyllosilicates: smectites, kaoliniteserpentines, chlorites and micas. Clay Minerals 43 (1), 35–54. Boorman, D.B., Hollis, J.M., Lilly, A., 1995. Hydrology of soil types: a hydrologicallybased classification of the soils of the United Kingdom. Institute of Hydrology Report No. 126. Institute of Hydrology, Wallingford. Brown, K.W.M., Gauld, J.H., Smith, B.F.L., Bain, D.C., Burridge, J.C., Inkson, R.H.E., 1987. Design of a database for Scottish soils. Journal of Soil Science 38, 267–277. Brown, I., Towers, W., Rivington, M., Black, H.I.J., 2008. The influence of climate change on agricultural land-use potential: adapting and updating the land capability system for Scotland. Climate Research 37, 43–57. Cable, J.M., Ogle, K., Williams, D.G., Weltzin, J.F., Huxman, T.E., 2008. Soil texture drives responses of soil respiration to precipitation pulses in the sonoran desert: implications for climate change. Ecosystems 11 (6), 961–979. Chai, S.-S., Veenendaal, B., West, G., Walker, J.P., 2008. Explicit inverse of soil moisture retrieval with an artificial neural network using passive microwave

116

M.J. Aitkenhead et al. / Computers and Electronics in Agriculture 82 (2012) 108–116

remote sensing data. International Geoscience and Remote Sensing Symposium (IGARSS), 2(1), art. no. 4779086, II687–II690. Chen, L.J., Xing, L., Han, L.J., 2009. Quantitative determination of nutrient content in poultry manure by near infrared spectroscopy based on artificial neural networks. Poultry Science 88 (12), 2496–2503. Davey, B.G., Russell, J.D., Wilson, M.J., 1975. Iron oxide and clay minerals and their relation to colours of red and yellow podzolic soils near Sydney, Australia. Geoderma 14 (2), 125–138. Demattê, J.A.M., Pereira, H.S., Nanni, M.R., Cooper, M., Fiorio, P.R., 2003. Soil chemical alterations promoted by fertilizer application assessed by spectral reflectance. Soil Science 168 (10), 730–747. Demattê, J.A.M., Galdos, M.V., Guimarães, R.V., Genú, A.M., Nanni, M.R., Zullo, J., 2007. Quantification of tropical soil attributes from ETM+/LANDSAT-7 data. International Journal of Remote Sensing 28 (17), 3813–3829. Dobbie, K.E., Smith, K.A., 2003. Nitrous oxide emission factors for agricultural soils in Great Britain: the impact of soil water-filled pore space and other controlling variables. Global Change Biology 9 (2), 204–218. Dowding, C.E., Fey, M.V., 2007. Morphological, chemical and mineralogical properties of some manganese-rich oxisols derived from dolomite in Mpumalanga province, South Africa. Geoderma 141 (1–2), 23–33. Dudley, R.J., 1976. A simple method for determining the pH of small soil samples and its use in forensic science. Journal of the Forensic Science Society 16 (1), 21–27. Elshorbagy, A., Parasuraman, K., 2008. On the relevance of using artificial neural networks for estimating soil moisture content. Journal of Hydrology 362 (1–2), 1–18. Fogg, P., Boxall, A.B.A., Walker, A., Jukes, A., 2004. Effect of different soil textures on leaching potential and degradation of pesticides in biobeds. Journal of Agricultural and Food Chemistry 52 (18), 5643–5652. Furby, S., Caccetta, P., Wallace, J., 2010. Salinity monitoring in Western Australia using remotely sensed and other spatial data. Journal of Environmental Quality 39 (1), 16–25. Galva~o, L.S., Vitorello, I., 1998. Role of organic matter in obliterating the effects of iron on spectral reflectance and colour of Brazilian tropical soils. International Journal of Remote Sensing 19 (10), 1969–1979. Grandy, A.S., Strickland, M.S., Lauber, C.L., Bradford, M.A., Fierer, N., 2009. The influence of microbial communities, management, and soil texture on soil organic matter chemistry. Geoderma 150 (3–4), 278–286. Gu, C., Riley, W.J., 2010. Combined effects of short term rainfall patterns and soil texture on soil nitrogen cycling - A modeling analysis. Journal of Contaminant Hydrology 112 (1–4), 141–154. Haidouti, C., Massas, I., 1998. Distribution of iron and manganese oxides in Haploxeralfs and Rhodoxeralfs and their relation to the degree of soil development and soil colour. Journal of Plant Nutrition and Soil Science 161 (2), 141–145. Han, W.X., Fang, J.Y., Reich, P.B., Woodward, F.I., Wang, Z.H., 2011. Biogeography and variability of eleven mineral elements in plant leaves across gradients of climate, soil and plant functional type in China. Ecology Letters 14 (8), 788–796. Howari, F.M., 2003. The use of remote sensing data to extract information from agricultural land with emphasis on soil salinity. Australian Journal of Soil Research 41 (7), 1243–1253. Ige, D.V., Akinremi, O.O., Flaten, D.N., 2007. Direct and indirect effects of soil properties on phosphorus retention capacity. Soil Science Society of America Journal 71 (1), 95–100. IUSS Working Group WRB, 2006. World reference base for soil resources 2006. 2nd edition. World Soil Resources Reports No. 103. FAO, Rome. Janik, L.J., Forrester, S.T., Rawson, A., 2009. The prediction of soil chemical and physical properties from mid-infrared spectroscopy and combined partial leastsquares regression and neural networks (PLS-NN) analysis. Chemometrics and Intelligent Laboratory Systems 97 (2), 179–188. Jenny, H., 1941. Factors of Soil Formation. McGraw-Hill, New York, p. 277. Kernan, M.R., Allott, T.E.H., Battarbee, R.W., 1998. Predicting freshwater critical loads of acidification at the catchment scale: An empirical model. Water, Air and Soil Pollution 105 (1–2), 31–41. Kooijman, A.M., Van Mourik, J.M., Schilder, M.L.M., 2009. The relationship between N mineralization or microbial biomass N with micromorphological properties in beech forest soils with different texture and pH. Biology and Fertility of Soils 45 (5), 449–459. Krishna Murti, G.S.R., Satyanarayana, K.V.S., 1971. Influence of chemical characteristics in the development of soil colour. Geoderma 5 (3), 243–248. La, W.J., Sudduth, K.A., Chung, S.-O., Kim, H.-J., 2008. Spectral reflectance estimates of surface soil physical and chemical properties. American Society of Agricultural and Biological Engineers Annual International Meeting 2008 (7), 4159–4172. Lee, J., Six, J., King, A.P., Van Kessel, C., Rolston, D.E., 2006. Tillage and field scale controls on greenhouse gas emissions. Journal of Environmental Quality 35 (3), 714–725. Lilly, A., Matthews, K.B., 1994. A soil wetness class map for Scotland: New assessments of soil and climate data for land evaluation. Geoforum 25 (3), 371– 379. Lilly, A., Towers, W., Malcolm, A., Paterson, E., 2004. Report on a workshop on the development of a Scottish Soils Knowledge and Information Base (SSKIB). Macaulay Land Use Research Institute Report, p. 35.

Lilly, A., Ball, B.C., Mctaggart, I.P., Degroote, J., 2009. Spatial modelling of nitrous oxide emissions at the national scale using soil, climate and land use information. Global Change Biology 15 (9), 2321–2332. Linker, R., Shmulevich, I., Kenny, A., Shaviv, A., 2005. Soil identification and chemometrics for direct determination of nitrate in soils using FTIR-ATR midinfrared spectroscopy. Chemosphere 61 (5), 652–658. Majumdar, A., Kaye, J., Gries, C., Hope, D., Grimm, N., 2008. Hierarchical spatial modeling and prediction of multiple soil nutrients and carbon concentrations. Communications in Statistics – Simulation and Computation 37 (2), 434–453. Maréchal, D., Holman, I.P., 2005. Development and application of a soil classification-based conceptual catchment-scale hydrological model. Journal of Hydrology 312 (1–4), 277–293. Matus, F.J., Lusk, C.H., Maire, C.R., 2008. Effects of soil texture, carbon input rates, and litter quality on free organic matter and nitrogen mineralization in Chilean rain forest and agricultural soils. Communications in Soil Science and Plant Analysis 39 (1–2), 187–201. Mouazen, A.M., Saeys, W., Xing, J., De Baerdemaeker, J., Ramon, H., 2005. Near infrared spectroscopy for agricultural materials: An instrument comparison. Journal of Near Infrared Spectroscopy 13 (2), 87–97. Olden, J.D., Jackson, D.A., 2002. Illuminating the ‘‘black box’’: understanding variable contributions in artificial neural networks. Ecological Modelling 154 (1–2), 135–150. Olden, J.D., Joy, M.K., Death, R.G., 2004. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological Modelling 178, 389–397. Plante, A.F., Conant, R.T., Stewart, C.E., Paustian, K., Six, J., 2006. Impact of soil texture on the distribution of soil organic matter in physical and chemical fractions. Soil Science Society of America Journal 70 (1), 287–296. Ragg, J.M., Futty, D.W., 1967. Soils of the Country around Haddington and Eyemouth. The Macaulay Institute for Soil Research. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by backpropagating errors. Nature 323, 533–536. Schneider, M.K., Brunner, F., Hollis, J.M., Stamm, C., 2007. Towards a hydrological classification of European soils: preliminary test of its predictive power for the base flow index using river discharge data. Hydrology and Earth System Sciences 11 (4), 1501–1513. Sellitto, V.M., Fernandes, R.B.A., Barron, V., Colombo, C., 2009. Comparing two different spectroscopic techniques for the characterization of soil iron oxides: Diffuse versus bi-directional reflectance. Geoderma 149 (1–2), 2–9. Shahandeh, H., Wright, A.L., Hons, F.M., 2011. Use of soil nitrogen parameters and texture for spatially-variable nitrogen fertilization. Precision Agriculture 12 (1), 146–163. Sheppard, M.I., Sheppard, S.C., Grant, C.A., 2007. Solid/liquid partition coefficients to model trace element critical loads for agricultural soils in Canada. Canadian Journal of Soil Science 87 (2), 189–201. Sidhu, P.S., Raj-Kumar, Sharma, B.D., 2004. Periodic changes in characteristics of a saline-sodic soil following reclamation. Indian Journal of Agricultural Sciences 74 (1), 14–18. Smith, P., Smith, J.U., Flynn, H., Killham, K., Rangel-Castro, I., Foereid, B., Aitkenhead, M.J., Chapman, S., Towers, W., Bell, J., Lumsdon, D., Milne, R., Thomson, A., Simmons, I., Skiba, U., Reynolds, B., Evans, C., Frogbrook, Z., Bradley, I., Whitmore, A., Falloon, P., 2007. ECOSSE: Estimating Carbon in Organic Soils Sequestration and Emissions. Final Report. SEERAD Report. ISBN 978 0 7559 1498 2. p. 166. Smith, J.U., Chapman, S.J., Bell, J.S., Bellarby, J., Gottschalk, P., Hudson, G., Lilly, A., Smith, P., Towers, W., 2009. Developing a methodology to improve soil c stock estimates for Scotland & use of initial results from a resampling of the national soil inventory of Scotland to improve the ECOSSE model. Project funded by the Rural and Environment Research and Analysis Directorate of the Scottish Government, Science Policy and Co-ordination Division. Thornthwaite, C.W., 1948. An approach toward a rational classification of climate. Geographical Review 38 (1), 55–94. Towers, W., 1994. Towards a strategic approach to sewage sludge utilization on agricultural land in Scotland. Journal of Environmental Planning & Management 37 (4), 447–460. Towers, W., Paterson, E., 1997. Sewage sludge application to land - A preliminary assessment of the sensitivity of Scottish soils to heavy metal inputs. Soil Use and Management 13 (3), 149–155. USDA Soil Survey Staff, 1999. Soil Taxonomy. A Basic System of Soil Classification for Making and Interpreting Soil Surveys (2nd ed.). USDA, p. 871. Van Huyssteen, C.W., 2008. A review of advances in hydropedology for application in South Africa. South African Journal of Plant and Soil 25 (4), 245–254. Viscarra Rossel, R.A., Cattle, S.R., Ortega, A., Fouad, Y., 2009. In situ measurements of soil colour, mineral composition and clay content by vis-NIR spectroscopy. Geoderma 150 (3–4), 253–266. Wills, S.A., Burras, C.L., Sandor, J.A., 2007. Prediction of soil organic carbon content using field and laboratory measurements of soil color. Soil Science Society of America Journal 71 (2), 380–388. Zhang, S., Zhang, X., Huffman, T., Liu, X., Yang, J., 2011. Influence of topography and land management on soil nutrients variability in Northeast China. Nutrient Cycling in Agroecosystems 89 (3), 427–438.