Updating a national soil classification with spectroscopic predictions and digital soil mapping

Updating a national soil classification with spectroscopic predictions and digital soil mapping

Catena 164 (2018) 125–134 Contents lists available at ScienceDirect Catena journal homepage: www.elsevier.com/locate/catena Updating a national soi...

2MB Sizes 2 Downloads 75 Views

Catena 164 (2018) 125–134

Contents lists available at ScienceDirect

Catena journal homepage: www.elsevier.com/locate/catena

Updating a national soil classification with spectroscopic predictions and digital soil mapping

T

Hongfen Tenga,b, Raphael A. Viscarra Rossela,*, Zhou Shib, Thorsten Behrensc a

Bruce E. Butler Laboratory, CSIRO Land & Water, PO Box 1700, Canberra, ACT 2601, Australia Institute of Applied Remote Sensing and Information Technology, College of Environmental and Resource Sciences, Zhejiang University, 310058 Hangzhou, China c Department of Geosciences, Soil Science and Geomorphology, University of Tübingen, Rümelinstraße 19-23, 72074 Tübingen, Germany b

A R T I C L E I N F O

A B S T R A C T

Keywords: Soil mapping Digital soil mapping Soil classification Random forests Visible–near infrared spectroscopy

Traditional soil maps have helped us to better understand soil, to form our concepts and to teach and transfer our ideas about it, and so they have been used for many purposes. Although, soil maps are available in many countries, there is a need for them to be updated because they are often deficient in that their spatial delineations and their descriptions are subjective and lack assessments of uncertainty. Updating them is a priority for federal soil surveys worldwide as well as for research, teaching and communication. New data from sensors and quantitative ‘digital’ methods provide us with the tools to do so. Here, we present an approach to update large scale, national soil maps with data derived from a combination of traditional soil profile classifications, classifications made with visible–near infrared (vis–NIR) spectroscopy, and digital soil class mapping (DSM). Our results present an update of the Australian Soil Classification (ASC) orders map. The overall error rate of the DSM model, tested on an independent validation set, was 55.6%, and a few of the orders were poorly classified. We discuss the possible reasons for these errors, but argue that compared to the previous ASC maps, our classification was derived objectively, using currently best available data sets and methods, the classification model was interpretable in terms of the factors of soil formation, the modelling produced a 1 × 1 km resolution soil map with estimates of spatial uncertainty for each soil order and our map has no artefacts at state and territory borders.

1. Introduction Traditional soil maps are the outcome of many years' experience and investigation by pedologists, who have made meticulous descriptions of easily-observable morphological characteristics of soil profiles combined with often sparsely gathered laboratory measurements. They aggregated and simplified the information by dividing the variation into more manageable soil classes, which are assumed to exist in fixed proportions and that can be interpreted for different uses. These soil classes are expected to be similar in terms of their intrinsic chemical and physical characteristics, and representative of other soil types in other similar bio-climatic regions and landscapes. There are two international systems for soil classification that have produced soil maps for the world: the Soil Taxonomy (Soil Survey Staff, 2014) and the World Reference Base for Soil Resources (WRB) (FAO, 2014). But there are also many national systems and maps that are used more locally for e.g. land use planning, evaluations of hydrology and agricultural land. Examples include the soil classification systems of France (Baize and Girard, 1995), Russia (Lebedeva and Gerasimova,

*

2012), Germany (Ad-hoc-AG Boden, 2005), Brazil (EMBRAPA, 2006), and China (Gerasimova, 2010; Shi et al., 2006). All reflect local pedological descriptions. The most widely used soil map in Australia was derived from a general purpose hierarchical classification system that consists of five levels: order, suborder, great group, sub-group and family (Isbell, 2002). At the top level there are fourteen soil orders that reflect the arid, strongly weathered nature of Australia. Soil maps derived from such classifications have helped us to better understand soil, to form our concepts and to teach and transfer our ideas about it. They are valuable because of the expertise that has been used to create them and, which is inherently contained in the maps. Because soil is directly related to climate, vegetation, parent material and relief, soil maps have been useful in many soil and environmental applications, such as land management, ecosystem assessments and modelling (Yang et al., 2011). However, traditional soil maps are limited in terms of both their spatial delineations and their representations of the soil attributes within the classes (Bui and Moran, 2001; Wilson, 2005). There is a need to update traditional soil maps with modern methods and technologies to provide more objective and accurate

Corresponding author. E-mail address: [email protected] (R.A. Viscarra Rossel).

https://doi.org/10.1016/j.catena.2018.01.015 Received 20 October 2017; Received in revised form 9 January 2018; Accepted 15 January 2018 Available online 20 February 2018 0341-8162/ © 2018 Elsevier B.V. All rights reserved.

Catena 164 (2018) 125–134

H. Teng et al.

visible–near infrared (vis–NIR) spectra so that the ASC orders could be assigned to them with the models developed by Viscarra Rossel and Webster (2011). The combined data made up of the NSSDC, NATSOIL, and the vis–NIR estimates represents all of the Australian states and territories and all orders of the ASC (Table 1).

classifications of the spatial distribution of soil types. Digital soil mapping (DSM) (McBratney et al., 2003) can help to derive improved versions of soil maps by combining new spatially explicit data collected with new technologies and the traditional soil maps. Several authors have used DSM to update existing maps, either the descriptions of the soil (Nelson and Odeh, 2009; Vaysse and Lagacherie, 2015; Werban et al., 2013; Yang et al., 2011), the delineations (Behrens et al., 2008), or both (Behrens and Scholten, 2006), but regardless of how good the updating models might be, success largely depends on having accurate and sufficient new data. Those data can come primarily from soil survey and measurement, which is often costly and time-consuming. One technology, which can help to improve the efficiency of soil survey is diffuse reflectance spectroscopy in the visible and near infrared range (vis–NIR, 400–2500 nm) (Viscarra Rossel et al., 2016). It enables us to extract soil information on colour, iron oxide, clay and carbonate mineralogy, organic matter content and composition, the amount of water present and its particle-size distribution, quickly and cheaply. The integration of vis–NIR spectroscopy, remote sensing and DSM is enabling soil mapping over large and sparsely sampled regions of the world. In Australia, this approach has been used to map soil properties such as clay and iron mineralogy (Viscarra Rossel, 2011; Viscarra Rossel et al., 2010), carbon stocks (Viscarra Rossel et al., 2014), phosphorus stocks (Viscarra Rossel and Bui, 2016), clay, sand, silt contents, pH, cation exchange capacity, bulk density, organic carbon content, total nitrogen (Viscarra Rossel et al., 2015) and soil erosion (Teng et al., 2016). Although digital mapping has been used for updating soil maps in small sample regions (Cambule et al., 2013; Grimm et al., 2008; Guo et al., 2013; Kempen et al., 2012), few pedologists have investigated digital soil class mapping over large scales, and none in combination with vis–NIR spectroscopy. Thus, we have produced an updated quantitative version of the ASC orders map using DSM with random forests and derived estimates of spatial uncertainty for each class. Here we describe our procedure and the results.

2.2. Digital soil class mapping We used the Jenny-like DSM framework (McBratney et al., 2003; Jenny, 1941) to model the ASC orders, o, as a function of various environmental predictors:

o (u) = f (s [u], c [u], v [u], r [u], p [u])

(1)

where the soil–environmental factors represented by the environmental predictors across space (u ≡{x,y}) are soil (s), climate (c), vegetation (v), terrain (r), and parent material (p). The function used to relate these factors to o was random forest, which we describe below. The proxies for these soil–environmental factors that we used in the modelling are given in Table 2. They were chosen to represent factors that affect the formation and distribution of soil in Australia, and were from several sources, including remote sensing, other soil maps, maps of climatic variables and terrain attributes derived from a digital elevation model (DEM). We used bilinear resampling to harmonise the different resolutions of these data (Table 2) to a common grid with cell size of 1 × 1 km. To assess the effect of using the original ASC orders map (Isbell, 2002) in the modelling, we derived two random forest models, one that used all of the predictors shown in Table 2, including the original ASC map, which we refer to as the ASCIsbell map, and the other with all predictors except the ASCIsbell map. 2.3. Data mining We separated the dataset, , containing 38 756 observations and their covariates, into a training and a validation set by random sampling. Two-thirds were assigned to the training set,  = 25 837 , and the remaining to the validation set,  = 12 919. We used random forest (Breiman, 2001) to classify and map the Australian soil classification orders, o. Random forest is an ensemble of B trees {t1(p),…,tB(p)}, where p = p1,…,pp is a p-dimensional vector of covariates (or predictors) that represent the soil–environmental factors (Table 2). The ensemble produces B outputs {ô1 = t1(p), …, ôB = tB (p)} , where ôB , b = 1,…,B, is the classification of the ASC class by the bth tree. Outputs of all trees are aggregated to produce, by majority vote from all trees, the final classification, ô . Given a set of training data,  = {(p1, o1), …, (pn , on )} , where pi, i = 1, …,n, is a vector of predictors and oi is the corresponding soil order, training of the random forest proceeds as follows:

2. Material and methods 2.1. The data set We used data from two sources originating from 38 756 unique sites across Australia (Fig. 1). The first set was from a historical archive of data contained in the Commonwealth Scientific and Industrial Research Organization (CSIRO) National soil database (NATSOIL) and in databases of soil survey organizations in each of the Australian States and Territories, which were collated during a national soil site data collation (NSSDC) as part of the ‘Soil and Landscape Grid of Australia’ project (Grundy et al., 2015). It consists of 33 784 sites that were classified by soil surveyors, during numerous projects, according to the Australian Soil Classification (ASC) (Isbell, 2002). The second set of data was also from the NATSOIL database, but in this case profiles from 3847 sites had no ASC classification assigned to them. Nevertheless in the development of the Australian soil spectroscopic database (Viscarra Rossel and Webster, 2012), we had recorded the visible–near infrared (vis–NIR) spectra of these soil samples and so, we could use discriminant models developed by Viscarra Rossel and Webster (2011) to assign ASC orders to them. The models developed by Viscarra Rossel and Webster (2011) used data from the CSIRO's NATSOIL database, which represented all of the ASC orders. The authors showed that vis–NIR spectra could be used to fairly accurately discriminate among horizons and the orders of the ASC. They describe in detail the spectroscopic measurements and the modelling they performed. Readers are directed to that publication for details. The third set of data, which provides an even cover of points in central and western Australia are from 1125 sites held in the National Geochemical Survey of Australia (de Caritat et al., 2008). Again, none of the sites had an ASC order assigned, but the soil from all sites had

1. From , draw B bootstrap samples. Each bootstrap is the basis for one of the ‘trees in the forest’. 2. Then, grow a classification tree for each bootstrap sample with no pruning, to derive the ASC classification, ô . 3. At each node, rather than choosing the best split among all predictors, randomly sample m predictors and choose the best split from among them. The value m is held constant while the forest is grown. 4. Repeat the above steps until B trees are grown. 5. For each tree, predict the data not in the bootstrap sample (i.e. the out-of-bag data, which on average, for each data would be approximately 36% of the time) using the tree grown with the bootstrap sample (i.e. data that is in-the-bag). 6. Aggregate the out-of-bag predictions and compare the predicted ô values, with the observed values, o, of each unit in the out-of-bag (oob) sample and calculate the classification error-rate (ER): 126

Catena 164 (2018) 125–134

H. Teng et al.

Fig. 1. Locations of the sites with soil classified at the Australian Soil Classification order level. The points show the data classified by pedologists and the data classified using visible–near infrared spectra.

2.4. Model validation and assessment

Table 1 Number of data in each of the fourteen Australian Soil Classification orders. ASC order

Abbreviation

Site data

Vis–NIR data

Total

Anthroposols Calcarosols Chromosols Dermosols Ferrosols Hydrosols Kandosols Kurosols Organosols Podosols Rudosols Sodosols Tenosols Vertosols Total

AN CA CH DE FE HY KA KU OR PO RU SO TE VE

14 195 5713 6623 1351 2007 3403 2375 98 271 1381 4243 1313 4797 33 784

15 300 501 434 170 182 537 87 21 156 216 806 632 915 4972

29 495 6214 7057 1521 2189 3940 2462 119 427 1597 5049 1945 5712 38 756

From our grid learning procedure above, we selected the set of tuning parameters that produced the smallest out-of-bag ER: m = 5; B = 1000; nt = 2. Using this model we predicted the validation dataset, , and for these predictions, we reported the ER on the validation set, the confusion matrix and the importance of each predictor used in the model. The importance of each predictor in the random forest model is estimated by calculating the increase in the out-of-bag prediction ER (Eq. (2)), after a predictor variable has been removed at random, while all other predictors are held constant. The differences between the ERs for the full set of predictors and that of the full set less each of the predictors are calculated for each tree. The greater the increase, the more important is the predictor. Thus, the importance of each predictor is calculated tree-by-tree as the random forest is constructed. 2.5. Prediction and mapping

n

ER =

∑i = 1 I (ôioob ≠ oioob ) n

,

For the final predictions of the ASC orders, ô , over Australia, we run a random forest model using the entire data set, , and the tuning parameters m = 5; B = 1000; nt = 2. We used this model to predict ô at the nodes of a 1 km grid over the whole of Australia. As well as predictions of ô , the random forest algorithm measures at each new location, the proportion of votes for each class, which it calls estimates of class probabilities (Breiman, 2001). Although these are not distributional probabilities, they provide a measure of the confidence in the classification. Finally, we compared our maps derived with the two random forest models: (i) with all predictors including the original ASC map (we refer to this map as the ASCRF map) and (ii) with all predictors except the original ASC orders map (we refer to this map as the ASCRFa map), to the original ASC orders map (ASCIsbell), and the more recent update to the ASC map by the Australian Collaborative Land Evaluation Program (ACLEP) (ASCACLEP). The ASCIsbell map was derived from

(2)

in which I(⋅) is the indicator function, and n is the number of out-ofbag data. We used the package randomForests of Liaw and Wiener (2002) in the software R (R Development Core Team, 2008) for the computation. When training the random forest, we tested different values for the tuning parameters: the number of variables randomly sampled as candidates at each split, m ∈{3,4,5,6,7,8,9} (in the software the default value for classification is sqrt(p)), which in our case is 6), the number of trees to grow, B ∈{300,500,1000,2000,5000}, and the minimum size of terminal nodes, nt ∈{1,2,3,4,5} (in the software the default value for classification is 1).

127

Catena 164 (2018) 125–134

H. Teng et al.

Table 2 List of the auxiliary environmental predictor variables (covariates) used in the random forest modelling and their nominal scale or resolution. Predictor variables

Climate Prescott index (PI) Mean annual rainfall (Rain) Mean annual evapotranspiration (PET) Mean annual temperature min (TMIN) Mean annual temperature max (TMAX) Mean annual solar radiation (SRAD)

Scale/ resolution

Source

250 m 90 m 90 m

Prescott (1950)

90 m

Xu and Hutchinson (2011b)

90 m 90 m

Vegetation Fpar-evergreen (Fpar-e) Fpar-raingreen (Fpar-r) NDVI

250 m 250 m 30 m

Donohue et al. (2009)

Terrain DEM

90 m

Xu and Hutchinson (2011a)

Aspect Contributing area Relief Slope 300 m Slope 1000 m MrVBF

90 m 90 m 90 m 90 m 90 m 90 m

Topographic Wetness Index (TWI) Erosional landscape (Erosional) Terrain roughness (Roughness)

90 m 90 m 90 m

Parent material Total gamma dose (GammaTD) Potassium (GammaK) Thorium (GammaTh) Uranium (GammaU) Gravity Magnetics

100 m 100 m 100 m 100 m 90 m 90 m

Minty et al. (2009)

1:2 000 000

Isbell (2002)

90 m

Viscarra Rossel and Chen (2011)

Soil Australian soil classification orders (ASCIsbell) PC1 vis–NIR (PC1spec) PC3 vis–NIR (PC2spec) PC3 vis–NIR (PC3spec) Kaolinite Illite Smectite

Table 3 Correspondence between the assigned ASC orders of the point data used in the modelling to the original ASC orders map derived by Isbell (2002) (ASCIsbell) and the more recent update to the ASC map by the Australian Collaborative Land Evaluation Program (ACLEP) (ASCACLEP). These two maps are shown in Fig. 3a and b, respectively. In this case correspondence was derived from a confusion matrix derived for each data set and summarizing for each order with 1 −error rate.

90 m 90 m 90 m 90 m 90 m

Site data n = 33 784

Vis–NIR data n = 4972

Combined (all) data n = 38 756

Order

ASCIsbell

ASCACLEP

ASCIsbell

ASCACLEP

ASCIsbell

ASCACLEP

AN CA CH DE FE HY KA KU OR PO RU SO TE VE

0.00 0.22 0.17 0.15 0.41 0.29 0.39 0.23 0.04 0.24 0.06 0.48 0.31 0.53

0.00 0.22 0.29 0.33 0.52 0.43 0.31 0.44 0.17 0.31 0.23 0.43 0.21 0.62

0.00 0.49 0.21 0.13 0.41 0.11 0.43 0.15 0.10 0.14 0.07 0.42 0.16 0.72

0.00 0.48 0.24 0.20 0.50 0.16 0.34 0.15 0.10 0.08 0.04 0.29 0.21 0.75

0.00 0.38 0.17 0.15 0.41 0.28 0.39 0.22 0.05 0.21 0.06 0.47 0.26 0.57

0.00 0.38 0.28 0.32 0.52 0.41 0.31 0.43 0.16 0.23 0.20 0.41 0.21 0.64

that the reassignments of the orders in the ASCIsbell map made by ACLEP had some effect (Table 3). The correspondence between the vis–NIR data and ASC maps was generally similar to that for the site data, but larger for Calcarosols, Kandosols and Vertosols (Table 3).

Gallant and Dowling (2003)

3.2. The random forest model The error rate of the random forest ASCRF model, calculated on the validation set, , was 55.6%. Vertosols, Hydrosols and Chromosols were most accurately classified orders with more than 50% correct classification (Table 4). Dermosols, Kurosols, Ferrosols, Sodosols and Organosols were next with correct classifications between 40% and 50%. Kandosols, Calcarosols and Tenosols were correctly classified in 38%, 34% and 22% of cases, respectively, while Podosols and Rudosols were poorly classified with only 19% and 13% of cases correctly classified (Table 4). There were no Anthroposols that were correctly classified and this might be due to the relatively small number of these soil orders in the dataset (Table 1). Table 4 shows that Calcarosols were mostly misclassified as Sodosols, which are sodic in the upper B horizon and are not strongly acidic with pH values > 5.5 (Isbell, 2002). Chromosols were mostly misclassified as Dermosols and vice versa. Ferrosols, which contain free iron in the B2 horizon were misclassified mostly as Dermosols, which have structured B2 horizons. Kandosols were misclassified mostly as Dermosols, and Kurosols, which have pH < 5.5 in the B horizon, were misclassified mostly as Chromosols, which have pH ≥ 5.5. Organosols and Podosols were misclassified mostly as Hydrosols. Rudosols were misclassified mostly as Chromosols as were Sodosols, the difference between the latter two orders being the absence of a sodic B horizon in Chromosols. Tenosols were similarly misclassified as Dermosols, Chromosols, Kurosols and Kandosols, while Vertosols as Dermosols, Chromosols and Sodosols (Table 4). The error rate of the random forest ASCRFa model was only slightly larger at 57.5% and we therefore do not show the more detailed results here.

GA (2009) Milligan et al. (2004)

Viscarra Rossel (2011)

reinterpretations of soil profile information, look-up tables and earlier soil classifications (Isbell, 2002), while the ASCACLEP map was updated by reassigning the orders in the ASCIsbell map with the best available data at the finest scale (http://www.asris.csiro.au/themes/ NationalGrids.html). 3. Results 3.1. Correspondence between the point data and the original ASC maps Before embarking on the modelling, we compared the correspondence between the assigned ASC orders of the point data (Fig. 1), to the current ASCIsbell and ASCACLEP maps (Fig. 3a and b, respectively). Table 3 shows the correspondence separately for the site data where the assignments were made by pedologists, the vis–NIR data where the assignments were made using discriminant models derived by Viscarra Rossel and Webster (2011) (see above) and all of the (combined) data. The correspondence between the site data and the ASCACLEP map was generally a little better than that with the ASCIsbell map, suggesting

3.3. Predictor importance Fig. 2 shows the importance of the predictors in the random forest ASCRF classification. Climatic covariates were the most important predictors of the ASC orders, except for Anthroposols and Tenosols (Fig. 2), which together 128

Catena 164 (2018) 125–134

H. Teng et al.

Table 4 Confusion matrix of the random forest model classification on the validation set,  = 12 919 , with an error rate of 55.6%. The ‘correct classification’ was derived using (1 −error rate). Bold values indicate significance at < 0.001. Order

AN

CA

CH

DE

FE

HY

KA

KU

OR

PO

RU

SO

TE

VE

Correct class.

AN CA CH DE FE HY KA KU OR PO RU SO TE VE

0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 58 19 7 0 7 18 4 1 0 9 12 0 8

6 23 1032 337 32 26 187 140 2 2 168 291 80 170

1 6 332 1123 139 113 263 60 4 22 81 250 90 199

0 0 7 41 222 10 12 8 3 8 0 3 0 10

4 5 27 71 3 457 46 26 9 59 23 29 27 23

1 16 144 159 18 42 494 37 2 7 59 63 65 25

0 0 68 96 19 21 80 370 0 15 57 38 79 7

0 0 0 1 0 0 0 0 16 0 0 0 0 0

0 2 2 3 0 1 2 1 1 28 3 2 5 0

0 4 5 9 2 1 18 7 0 0 70 2 9 2

1 36 205 224 5 25 110 105 0 6 46 719 72 175

0 1 8 5 0 3 16 21 0 0 4 7 124 0

0 20 188 317 61 25 69 12 1 0 38 269 13 1355

0.00 0.34 0.51 0.47 0.44 0.63 0.38 0.47 0.41 0.19 0.13 0.43 0.22 0.69

Australia, Northern Territory and South Australia (Fig. 3c). Fig. 3d, shows the ASCRF map which uses all of the environmental covariates, including the original ASC map. Most of the soil around Arnhem Land in the Northern Territory was classified as Kandosols, with a smaller proportion of Tenosols and Vertosols inland and Hydrosols along the coast. Also, compared to the ASCIsbell and ASCACLEP and ASCRFa maps, the ASCRF map shows larger areas of Calcarosols in southwestern and western Western Australia (Fig. 3d). Areas in Western Australia mapped as Sodosols in the ASCIsbell map, or Rudosols and Kandosols in the ASCACLEP map, were mapped as either Tenosols, Rudosols, Calcarosols and small areas of Vertosols in the ASCRF map (Fig. 3d). In south eastern Australia, areas mapped as primarily Sodosols in the ASCIsbell map, or Sodosols and Kurosols in the ASCACLEP map were mapped as predominantly Chromosols in the ASCRF map (Fig. 3). Fig. 4 shows the spatial distribution of each of the ASC orders as depicted by the ASCRF, ASCIsbell and ASCACLEP maps. The figure shows where the classifications are the same, and where they differ. The ASCRF map has more Calcarosols, Chromosols and Dermosols compared to the ASCIsbell and ASCACLEP maps (Fig. 4). It predicted more Calcarosols in South Australia and southwester Western Australia and more Chromosols in many of the agricultural regions of southeastern Australia (Fig. 4). The ASCIsbell map shows a greater distribution of Sodosols than the ASCACLEP and ASCRF maps, respectively. Both the ASCIsbell and ASCACLEP maps show a greater distribution of Kansosols compared to the ASCRF map, particularly in arid and semi-

with Podosols and Rudosols were poorly classified (Table 4). The vegetation covariates did not have much influence on the classification, except for Fpar-e, which affected the classification of primarily Ferrosols, Kurosols and Vertosols. Elevation, slope and relief were the most important terrain predictors. The DEM was the most important predictor of Hydrosols, and it was also important in the classification of Vertosols, Organosols and Ferrosols. Gamma radiometrics and some of the mineralogical predictors were important for the classification of Ferrosols (Fig. 2). The smectite and PC3spec covariates were important in the classification of Vertosols. The ASCIsbell map helped in the classification of most of the ASC orders but was particularly important in the classification of Ferrosols, Hydrosols and Vertosols (Fig. 2).

3.4. Digital mapping of the ASC orders The spatial distribution of the ASC orders, depicted by the original ASCIsbell and the recently updated ASCACLEP maps are show in Fig. 3a and b. The ASCACLEP map shows discontinuities at State and Territory boundaries, which are due to reassignments of the orders using the currently best available (although sparse) survey data gathered at the finest scale (http://www.asris.csiro.au/themes/NationalGrids.html). The spatial distribution of the ASC orders, depicted by the ASCRFa and the ASCRF maps are shown in Fig. 3c and d. Compared to the ASCRF map, the ASCRFa map derived using all environmental covariates except the original ASC map, shows larger areas of Tenosols over Western

Fig. 2. Relative importance of the environmental predictor variables in the random forest classification of the Australian Soil Classification orders.

129

Catena 164 (2018) 125–134

H. Teng et al.

Fig. 3. Maps of the Australian Soil Classification orders derived by (a) reinterpreting soil profile information, look-up tables and earlier soil classifications (ASCIsbell), (b) reassignments of the orders in the ASCIsbell map from the best available data at the finest scale (ASCACLEP), (c) and (d) random forest classification using the point data shown in Fig. 1 and the covariates in Table 2. In (c) we did not use the ASCIsbell map as a covariate (we call this map ASCRFa), while in (d) we did (and we call this map ASCRF).

ASC orders of the ASCRF model. The probability of Calcarosols occurring is largest in arid and semiarid regions of southern Australia (Fig. 5), mostly on flat to undulating plains. The probability of Chromosols is largest in regions of central New South Wales, north eastern Queensland, south western Western Australia and central Tasmania. Dermosols show to have a larger probability of correct classification along the wetter coastal and subcoastal areas in eastern Australia from Cape York peninsula to northern Tasmania (Fig. 5). The probability of Ferrosols is also largest in localised regions of the Eastern Uplands along the Great Dividing Range and in northeastern Tasmania where its distribution is determined by the occurrence of basalt. The largest probability of Hydrosols occurs in Cape York and in northern Australia in low-lying coastal and sub-coastal plains and in drainage depressions (Fig. 5). The probability of Kandosols occurring is largest in eastern Australia, ranging from the coastal high rainfall regions in Cape York to the arid and semi-arid interior in Queensland and New South Wales. The largest probability of Kurosols extends from southern Queensland through to coastal and sub-coastal New South Wales, Victoria and Tasmania, but also in localised regions of

arid regions of northern and western Australia. The ASCACLEP map shows a much smaller distribution of Tenosols compared to the ASCIsbell and ASCRF maps. The ASCRF map shows a greater distribution of Rudosols in Western Australia compared to the ASCIsbell map, but smaller distribution compared to the ASCACLEP map (Fig. 4). The distribution of Vertosols was most similar in the ASCRF, ASCIsbell and ASCACLEP maps. In the Supplementary information accompanying this article we present results that compare the proportion of land in each of the Australian State and Territories, and as a whole occupied by the Australian Soil Classification orders calculated using the ASCRF and those of the original ASCIsbell and ASCACLEP maps (see Table S1 in the Supplementary information). There, we also show results that compare the area occupied by the ASC orders for different land use types in Australia, calculated with the ASCRF, ASCIsbell and ASCACLEP maps (Table S2 in the Supplementary information).

3.5. Uncertainty Fig. 5 shows maps of the classification probabilities for each of the 130

Catena 164 (2018) 125–134

H. Teng et al.

Fig. 4. Maps of the spatial agreement between the ASCRF map and (a) the ASCIsbell and (b) ASCACLEP, maps.

of Sodosols occurring is in southern Western Australia but also in the dryer inland regions of eastern Australia and western Victoria. The probability of Tenosols occurring over vast regions of central and western Australia is large (Fig. 5). Interestingly however, the ASCACLEP map shows a much smaller occurrence of Tenosols over these regions (Fig. 4b). There is large probability of Vertosols occurring in the lowlands of eastern Australia in Queensland, New South Wales and the Northern Territory (Fig. 5).

southwestern Western Australia (Fig. 5). The largest probability of Organosols occurs in the mountainous regions of alpine and sub-alpine environments in southeastern Australia and in western Tasmania (Fig. 5). The largest probability of Podosols occurs in southern Australia along the coast at the border between Victoria and South Australia. However, as described above, the distribution of Podosols in Australia was underestimated by the random forest model. There is a larger probability of Rudosols in arid areas of central and north western Australia (Fig. 5), where they occupy the vast areas of desert with red sand sheets and dunes. The largest probability 131

Catena 164 (2018) 125–134

H. Teng et al.

Fig. 5. Maps of the probability of each of the Australian Soil Classification orders occurring across Australia.

4. Discussion and conclusions

55.6%, and some of the orders were poorly classified (e.g. Anthroposols, Rudosols, Podsols, Tenosols). There are a number of possible reasons for this. First, to train our models, the large majority of the data that we used was historical from soil profiles that were classified by different people during numerous projects over approximately 50 years. The data were stored in different databases, with often different data models. As is usually done, orders were assigned using morphological descriptions with few laboratory analyses and largely relying on the pedologist's experience and expertise. Note also that for these poorly classified orders, the original point data used in the modelling did not correspond well to the orders in the ASCIsbell and ASCACLEP maps (Table 3). Second, the vis–NIR estimates of the soil orders, that provide a fairly even coverage in the centre and west of Australia, although quantitative, were derived from a spectroscopic model and thus also contain errors. Nevertheless, we are confident that the vis–NIR estimates were useful, because their correspondence to the existing ASCIsbell and ASCACLEP maps was similar to that of the site data, or better, e.g. for Calcarosols, Vertosols, Kandosols and Ferrosols (Table 3). Third, although we had 38 756 sites with a soil order for our modelling, the total land area of Australia is 7 659 861 km2, which means that we had only 5 data points for every 1000 km2, and the spatial distribution of these data was somewhat biased towards agricultural areas in eastern Australia (Fig. 1). Despite these shortcomings, our classification was derived using the best available data set that there is, and it was derived objectively. We used data-driven modelling that relates soil–environmental information to the soil classes, under the hypothesis that their characteristics depend on environmental factors that affect their formation. Conceptually this is the same approach that is taken by pedologists who derive traditional soil classification maps. The advantages of our approach, however, are that (i) rapid and cost-effective estimates of the soil orders made with vis–NIR spectra complemented the difficult and expensive to obtain soil profile data derived from survey, (ii) random forest produced a model that was interpretable in terms of the soil–environmental covariates that were most useful for the classification of each of the 14 soil orders, (iii) random forest produced a fine spatial resolution (1 km × 1 km) digital soil map of the ASC orders and maps of the probability of each soil class occurring across Australia, and (iv) by

Several investigators have tried to update soil maps using new data and technologies. Yang et al. (2011) developed a method to update conventional soil maps of Wakefield, Canada using digital soil mapping. Adhikari et al. (2014) constructed the soil map of Denmark using digital soil mapping, existing soil profile observations and environmental data. Collard et al. (2014) refined and improved the 1:250 000 reconnaissance soil map of the northwest France using digital soil mapping. Nauman and Thompson (2014) used widely available data to disaggregate two existing adjacent soil surveys in West Virginia, USA, into one continuous soil series class map using no new soil field data. Rad et al. (2014) investigated the use of conditioned Latin hypercube sampling and random forests for mapping Soil Taxonomy great groups and subgroups in Northern Iran. Dy and Fung (2016) updated a global soil map based on Soil Taxonomy for predictions of soil moisture. Several investigators have tried to up-date soil maps in Australia at regional or smaller scales. For example, Holmes et al. (2014) produced detailed soil class maps for Western Australian at approximately 90 m resolution, by disaggregating polygons of traditional soil maps using the DSMART algorithm developed by Odgers et al. (2014). Triantafilis et al. (2013) used remotely sensed gamma-ray spectrometry and fuzzy k-means to identify geological and geomorphological units at the Namoi valley in northwest New South Wales. None that we are aware of, has tackled the matter at the continental scale. Here, we presented an approach to update a national soil class map with digital soil mapping and random forests, with data from different sources, soil survey and soil spectroscopy, that was recorded over different time periods. By deriving two random forest models, one that included the original ASC soil map as a predictor in the modelling, and one that did not, we tested the hypothesis that there is no value in using this map in the updating. We found that the random forest model that included the original ASC map, ASCRF, was slightly more accurate than the model that did no use it. This indicated that the original map, with the pedologists expertise and understanding inherent in its presentation, although largely subjective, added some value to the updating of the digital soil class mapping with the random forest. The overall error rate of the random forest was relatively large at 132

Catena 164 (2018) 125–134

H. Teng et al.

using the original ASC map as a predictor in the modelling we were able to integrate the expertise and the large amount of time and resources that went into the creating of that map, into the new ASCRF map. As far as we know, ours is the first attempt to produce a quantitative and objective update to a national soil classification map at the country or continental scale.

concentrations and stocks on Barro Colorado island - digital soil mapping using random forests analysis. Geoderma 146 (1–2), 102–113. Grundy, M.J., Rossel, R.A.V., Searle, R.D., Wilson, P.L., Chen, C., Gregory, L.J., 2015. Soil and landscape grid of Australia. Soil Res. 53 (8), 835–844. Guo, Y., Shi, Z., Li, H.Y., Triantafilis, J., 2013. Application of digital soil mapping methods for identifying salinity management classes based on a study on coastal central China. Soil Use Manag. 29 (3), 445–456. Holmes, K.W., Odgers, N.P., Griffin, E., van Gool, D., 2014. Spatial disaggregation of conventional soil mapping across Western Australia using DSMART. GlobalSoilMap. Isbell, R., 2002. The Australian Soil Classification. CSIRO Publishing, Collingwood, Victoria. Jenny, H., 1941. Factors of Soil Formation: A System of Quantitative Pedology. Courier Dover Publications. Kempen, B., Brus, D.J., Stoorvogel, J.J., Heuvelink, G.B.M., de Vries, F., 2012. Efficiency comparison of conventional and digital soil mapping for updating soil maps. Soil Sci. Soc. Am. J. 76 (6), 2097–2115. Lebedeva, I.I., Gerasimova, M.I., 2012. Diagnostic horizons in the Russian soil classification system. Eurasian Soil Sci. 45 (9), 823–833. Liaw, A., Wiener, M., 2002. Classification and regression by randomforest. The Newsletter of the R Project. 2. pp. 18–22. McBratney, A.B., Santos, M.L.M., Minasny, B., 2003. On digital soil mapping. Geoderma 117 (1–2), 3–52. Milligan, P.R., Franklin, R., Ravat, D., 2004. A New Generation Magnetic Anomaly Grid Database of Australia (MAGDA) - Use of Independent Data Increases the Accuracy of Long Wavelength Components of Continental-scale Merges. Australian Society of Exploration Geophysicists, Perth. Minty, B., Franklin, R., Milligan, P., Richardson, M., Wilford, J., 2009. The radiometric map of Australia. Explor. Geophys. 40 (4), 325–333. Nauman, T.W., Thompson, J.A., 2014. Semi-automated disaggregation of conventional soil maps using knowledge driven data mining and classification trees. Geoderma 213, 385–399. Nelson, M.A., Odeh, I.O.A., 2009. Digital soil class mapping using legacy soil profile data: a comparison of a genetic algorithm and classification tree approach. Aust. J. Soil Res. 47 (6), 632–649. Odgers, N.P., Sun, W., McBratney, A.B., Minasny, B., Clifford, D., 2014. Disaggregating and harmonising soil map units through resampled classification trees. Geoderma 214, 91–100. Prescott, J.A., 1950. A climatic index for the leaching factor in soil formation. J. Soil Sci. 1 (1), 9–19. R Development Core Team, 2008. R: A language and environment for statistical computing. Vienna, Austria. Rad, M.R.P., Toomanian, N., Khormali, F., Brungard, C.W., Komaki, C.B., Bogaert, P., 2014. Updating soil survey maps using random forest and conditioned latin hypercube sampling in the loess derived soils of Northern Iran. Geoderma 232, 97–106. Shi, X.Z., Yu, D.S., Yang, G.X., Wang, H.J., Sun, W.X., Du, G.H., Gong, Z.T., 2006. Crossreference benchmarks for translating the genetic soil classification of China into the Chinese soil taxonomy. Pedosphere 16 (2), 147–153. Staff, Soil Survey, 2014. Keys to Soil Taxonomy. USDA National Resources Conservation Services, Washington DC. Teng, H.F., Rossel, R.A.V., Shi, Z., Behrens, T., Chappell, A., Bui, E., 2016. Assimilating satellite imagery and visible-near infrared spectroscopy to model and map soil loss by water erosion in Australia. Environ. Model. Softw. 77, 156–167. Triantafilis, J., Gibbs, I., Earl, N., 2013. Digital soil pattern recognition in the lower Namoi valley using numerical clustering of gamma-ray spectrometry data. Geoderma 192, 407–421. Vaysse, K., Lagacherie, P., 2015. Evaluating digital soil mapping approaches for mapping GlobalSoilMap soil properties from legacy data in Languedoc-Roussillon (France). Geoderma Reg. 4, 20–30. Viscarra Rossel, R.A., 2011. Fine-resolution multiscale mapping of clay minerals in australian soils measured with near infrared spectra. J. Geophys. Res. Earth Surf. 116, F04023. Viscarra Rossel, R.A., Behrens, T., Ben-Dor, E., Brown, D.J., Dematte, J.A.M., Shepherd, K.D., Shi, Z., Stenberg, B., Stevens, A., Adamchuk, V., Aichi, H., Barthes, B.G., Bartholomeus, H.M., Bayer, A.D., Bernoux, M., Bottcher, K., Brodsky, L., Du, C.W., Chappell, A., Fouad, Y., Genot, V., Gomez, C., Grunwald, S., Gubler, A., Guerrero, C., Hedley, C.B., Knadel, M., Morras, H.J.M., Nocita, M., Ramirez-Lopez, L., Roudier, P., Campos, E.M.R., Sanborn, P., Sellitto, V.M., Sudduth, K.A., Rawlins, B.G., Walter, C., Winowiecki, L.A., Hong, S.Y., Ji, W., 2016. A global spectral library to characterize the world's soil. Earth-Sci. Rev. 155, 198–230. Viscarra Rossel, R.A., Bui, E.N., 2016. A new detailed map of total phosphorus stocks in Australian soil. Sci. Total Environ. 542, 1040–1049. Viscarra Rossel, R.A., Bui, E.N., de Caritat, P., McKenzie, N.J., 2010. Mapping iron oxides and the color of Australian soil using visible-near-infrared reflectance spectra. J. Geophys. Res. Earth Surf. 115 F04031. Viscarra Rossel, R.A., Chen, C., 2011. Digitally mapping the information content of visible-near infrared spectra of surficial Australian soils. Remote Sens. Environ. 115 (6), 1443–1455. Viscarra Rossel, R.A., Chen, C., Grundy, M.J., Searle, R., Clifford, D., Campbell, P.H., 2015. The Australian three-dimensional soil grid: Australia's contribution to the GlobalSoilMap project. Soil Res. 53 (8), 845–864. Viscarra Rossel, R.A., Webster, R., 2011. Discrimination of Australian soil horizons and classes from their visible-near infrared spectra. Eur. J. Soil Sci. 62 (4), 637–647. Viscarra Rossel, R.A., Webster, R., 2012. Predicting soil properties from the Australian soil visible-near infrared spectroscopic database. Eur. J. Soil Sci. 63 (6), 848–860. Viscarra Rossel, R.A., Webster, R., Bui, E.N., Baldock, J.A., 2014. Baseline map of organic carbon in Australian soil to support national carbon accounting and monitoring under

Acknowledgments We thank the CSIRO and the Terrestrial Ecosystem Research Network's (TERN) Soil and Landscape Grid of Australia project for collating data into the National Soil Data Collation. We are grateful to the custodians of the soil site data in each state and territory for providing access to them. They are the Queensland Department of Science, Information Technology, Innovation and the Arts, Northern Territory Department of Land Resource Management, Western Australia Department of Agriculture and Food, South Australia Department of Environment, Water and Natural Resources, Victoria Department of Environment and Primary Industries, NSW Office of Environment and Heritage, Tasmania Department Primary Industries, Parks, Water and Environment, and Geoscience Australia. We thank also S. Tuomi, P. Leppert, M. Virueda and G. Navarrette for their help with the spectroscopic measurements, P. de Caritat for the soil samples from the National Geochemical Survey of Australia, D. Jacquier for help with the ASRIS database and R.Webster for the editing an earlier version of the manuscript. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.catena.2018.01.015. References Boden, Ad-hoc-AG, 2005. Bodenkundliche Kartieranleitung, 5th edition. Hannover (In German.). Adhikari, K., Minasny, B., Greve, M.B., Greve, M.H., 2014. Constructing a soil class map of Denmark based on the FAO legend using digital techniques. Geoderma 214, 101–113. Baize, D., Girard, M.C., 1995. A Sound Reference Base for Soils: The ‘Référentiel Pédologique’. Institut National de la Recherche Agronomique, Paris. Behrens, T., Schmidt, K., Scholten, T., 2008. An Approach to Removing Uncertainties in Nominal Environmental Covariates and Soil Class Maps. Springer, Berlin. Behrens, T., Scholten, T., 2006. Digital soil mapping in Germany-a review. J. Plant Nutr. Soil Sci. 169 (3), 434–443. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. http://link.springer.com/ article/10.1023/A%3A1010933404324. Bui, E.N., Moran, C.J., 2001. Disaggregation of polygons of surficial geology and soil maps using spatial modelling and legacy data. Geoderma 103 (1–2), 79–94. Cambule, A.H., Rossiter, D.G., Stoorvogel, J.J., 2013. A methodology for digital soil mapping in poorly-accessible areas. Geoderma 192, 341–353. Collard, F., Kempen, B., Heuvelink, G., Saby, N., Richer de Forges, A., Lehmann, S., Nehlig, P., Arrouays, D., 2014. Refining a reconnaissance soil map by calibrating regression models with data from the same map (Normandy, France). Geoderma Reg. 1, 21–30. de Caritat, P., Lech, M.E., McPherson, A.A., 2008. Geochemical mapping ‘down under’: selected results from pilot projects and strategy outline for the National Geochemical Survey of Australia. 8 (3–4). pp. 301–312. Donohue, R.J., McVicar, T.R., Roderick, M.L., 2009. Climate-related trends in Australian vegetation cover as inferred from satellite observations, 1981–2006. Glob. Chang. Biol. 15 (4), 1025–1039. Dy, C.Y., Fung, J.C.H., 2016. Updated global soil map for the weather research and forecasting model and soil moisture initialization for the Noah land surface model. J. Geophys. Res.-Atmos. 121 (15), 8777–8800. EMBRAPA, 2006. Sistema Brasileiro de Classificação de Solos, 2nd edition. Embrapa Produção de Informação, Brasília (In Portuguese.). FAO, 2014. In: International soil classification system for naming soils and creating legends for soil maps. World Soil Resources Reports no. 106. FAO, Rome. GA, 2009. Gravity Grid of Australia and Surrounding Areas (National Geoscience 467 Dataset). Geoscience, Australia, Symonston. Gallant, J.C., Dowling, T.I., 2003. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 39 (12), 1347. Gerasimova, M.I., 2010. Chinese soil taxonomy: between the American and the international classification systems. Eurasian Soil Sci. 43 (8), 945–949. Grimm, R., Behrens, T., Marker, M., Elsenbeer, H., 2008. Soil organic carbon

133

Catena 164 (2018) 125–134

H. Teng et al.

Guide. Geoscience Australia. http://www.ga.gov.au/topographic-mapping/digitalelevation-data.html. Xu, T., Hutchinson, M.F., 2011b. ANUCLIM Version 6.1. Fenner School of Environment and Society. The Australian National University, Canberra. Yang, L., Jiao, Y., Fahmy, S., Zhu, A.X., Hann, S., Burt, J.E., Qi, F., 2011. Updating conventional soil maps through digital soil mapping. Soil Sci. Soc. Am. J. 75 (3), 1044–1053.

climate change. Glob. Chang. Biol. 20 (9), 2953–2970. Werban, U., Bartholomeus, H., Dietrich, P., Grandjean, G., Zacharias, S., 2013. Digital soil mapping: approaches to integrate sensing techniques to the prediction of key soil properties. Vadose Zone J. 12 (4). Wilson, B.P., 2005. Classification issues for the hydrosol and organosol soil orders to better encompass surface acidity and deep sulfidic horizons in acid sulfate soils. Aust. J. Soil. Res. 43 (5), 629–638. Xu, T., Hutchinson, M.F., 2011a. 3 second SRTM Derived Digital Elevation Models User

134