Geoderma 89 Ž1999. 67–94
Spatial prediction of soil properties using environmental correlation Neil J. McKenzie
a,)
, Philip J. Ryan
b
a
b
CSIRO Land and Water, GPO Box 1666, Canberra, ACT 2601, Australia CSIRO Forestry and Forest Products, PO Box E4008, Kingston, ACT 2604, Australia Received 15 December 1997; accepted 20 November 1998
Abstract Conventional survey methods have efficiencies in medium to low intensity survey because they use relationships between soil properties and more readily observable environmental features as a basis for mapping. However, the implicit predictive models are qualitative, complex and rarely communicated in a clear manner. The possibility of developing an explicit analogue of conventional survey practice suited to medium to low intensity surveys is considered. A key feature is the use of quantitative environmental variables from digital terrain analysis and airborne gamma radiometric remote sensing to predict the spatial distribution of soil properties. The use of these technologies for quantitative soil survey is illustrated using an example from the Bago and Maragle State Forests in southeastern Australia. A design-based, stratified, two-stage sampling scheme was adopted for the 50,000 ha area using digital geology, landform and climate as stratifying variables. The landform and climate variables were generated using a high resolution digital elevation model with a grid size of 25 m. Site and soil data were obtained from 165 sites. Regression trees and generalised linear models were then used to generate spatial predictions of soil properties using digital terrain and gamma radiometric survey data as explanatory variables. The resulting environmental correlation models generate spatial predictions with a fine grain unmatched by comparable conventional survey methods. Example models and spatial predictions are presented for soil profile depth, total phosphorus and total carbon. The models account for 42%, 78% and 54% of the variance present in the sample respectively. The role of spatial dependence, issues of scale and landscape complexity are discussed along with the capture of expert knowledge. It is suggested that environmental correlation models may form a useful trend
)
Corresponding author. Fax: q61-2-6246-5965; E-mail:
[email protected]
0016-7061r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S 0 0 1 6 - 7 0 6 1 Ž 9 8 . 0 0 1 3 7 - 2
68
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
model for various forms of kriging if spatial dependence is evident in the residuals of the model. q 1999 Elsevier Science B.V. All rights reserved. Keywords: digital terrain model; digital elevation model; soil sampling; soil survey; soil carbon; soil depth; total phosphorus; classification and regression trees; spatial prediction
1. Introduction Geostatistical methods for quantitative soil survey have proven useful at large scales but their utility at intermediate and smaller scales is less clear Ž Webster, 1997.. Conventional survey methods are apparently more efficient at these smaller scales because they use relationships between soil properties and more readily observed environmental features as a basis for mapping. These relationships are derived from qualitative and complex mental models developed by the pedologist during field survey. Unfortunately, the mental models are rarely communicated and users of surveys find it difficult to separate evidence from interpretation Ž Austin and McKenzie, 1988; Hudson, 1992; Hewitt, 1993; Webb, 1994.. There is a need for quantitative survey methods applicable at intermediate scales Ž i.e., equivalent to cartographic scales from 1:50,000 to 1:100,000.. One possibility is to integrate conventional and quantitative survey methods. For example, mapped polygons can be used for stratification prior to, or during, geostatistical analysis Ž e.g., Voltz and Webster, 1990. . This has the advantage of incorporating the qualitative process knowledge of the pedologist into spatial prediction as well as providing a more realistic portrayal of soil variation that can be both continuous and discontinuous. In this study, another possibility is considered and it involves developing an explicit analogue of conventional survey practice. Several studies have shown that useful predictive relationships exist between quantitative environmental variables and soil properties Že.g., McKenzie and Austin, 1993; Odeh et al., 1994; Gessler et al., 1995. . Some of the most promising environmental variables are those generated by digital terrain analysis Ž Wilson and Gallant, in press. and gamma radiometric remote sensing ŽCook et al., 1996a.. Digital terrain analysis allows the generation of a suite of variables that reflect geomorphic, climatic and hydrologic processes. Gamma radiometric remote sensing captures parent material differences, most notably particle size and mineralogy in the upper portion of the solum. The use of these technologies requires careful registration between field observations and other digital coverages. This is now readily achieved through the use of differential global positioning systems. Our purpose here is to test whether quantitative environmental variables can account for soil variation across a region of 50,000 ha. The resulting model may then form a trend model for use in combination with various forms of kriging and co-kriging Že.g., Odeh et al., 1994, 1995; Knotters et al., 1995.. The
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
69
incorporation of the environmental correlation model into such a geostatistical analysis is not dealt with in this paper. 2. Survey design 2.1. Purpose The Bago–Maragle study area ŽE 1488 15X S 358 45X . is adjacent to the Snowy Mountains in southern New South Wales, Australia Ž Fig. 1. . The centre of the study area is comprised of an elevated rolling plateau with alpine and sub-alpine vegetation. The northern, western and southern margins support tall hardwood forests containing Eucalyptus delegatensis R.T. Baker Ž Alpine Ash., which is a valuable commercial species. These areas are steeper and have greater relief because of actively incising streams. The eastern margin is dominated by the Gilmour Fault Ž Stuart-Smith, 1991. with its more resistant metamorphosed rocks forming a bulwark to stream incision. This study involves the survey of 50,000 ha of forested land where soil information is needed for: predicting soil qualities affecting land management
Fig. 1. Location of the Bago–Maragle study area in southeast Australia.
70
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Že.g., erosion hazard, trafficability, etc..; predicting hardwood forest productivity across the complete area; and establishing a monitoring network to determine the long term impact of forest management. 2.2. ProÕisional pedogenic model Quantitative soil survey using environmental correlation involves the iterative development of predictive models for the study area. The great complexity and range of spatial and temporal scales over which soil forming processes operate makes the development of models for spatial prediction that are quantitative, mechanistic and mathematical an almost impossible task in routine soil survey although there have been some notable attempts Že.g., Dietrich et al., 1995.. Despite the complexity, a simplifying hypothesis is necessary and reliance has to be placed on approximate local models of pedogenesis with varying levels of empiricism Ž McKenzie et al., in press. . At the outset, a provisional pedogenic model was required to allow the development of a sampling plan. There were no previous surveys of the Bago–Maragle area and, as a consequence, experience gained from other landscapes was used to propose a provisional pedogenic model. The model is summarised in Table 1. It draws heavily on the functional factorial approach by Jenny Ž1941, 1980. and is clearly a gross simplification. Some possible factors controlling pedogenesis in the study area do not have predictors that are readily observable and hence suited for spatial prediction Ž e.g., biomechanical impacts, aeolian accession and disturbance caused by fire and tree fall. . Several of these factors can generate large variation over short length scales. Prediction at this fine grain across extensive areas is rarely required but the processes may generate a large nugget variance. It was also suspected that the elevated and undulating lands in the study area were remnant ancient surfaces Ž Tertiary or older. with complex polygenetic soils. 2.3. Sampling plan Sampling has been a weak feature of conventional soil survey and purposive schemes have dominated. In the Bago–Maragle survey, we used a design-based, stratified, two-stage sampling plan Ž Brus and De Gruijter, 1997. with geology, landform and climate stratifying variables. The following description of the sampling plan is from Ryan et al. Ž1996. and McKenzie et al. Ž in press. . 2.3.1. Geology Generalised geology maps at a cartographic scale of 1:100,000 for the Bago–Maragle study area were digitised. The geology of the area was remapped during the course of the study using conventional field methods supported by airborne gamma radiometric remote sensing Ž Bierwirth, 1996; Cook et al.,
Table 1 Key variables considered initially to be controlling pedogenesis and soil distribution across the Bago–Maragle study area
Climate Temperature Wetness Parent material Substrate Aeolian accession Erosion and deposition Time Age of landsurface
Significance and interpretation
Potential environmental correlates
Nature and rate of weathering and biological activity Nature and rate of weathering and biological activity
Climate surfaces Terrain attributes and climate surfaces
Influences weathering products including mineralogy and particle size Modifies the influence of substrate Influences soil depth
Geological maps and geophysical remote sensing Geophysical remote sensing and terrain attributes Terrain attributes and possibly geophysical remote sensing
Older lower relief areas may not reflect contemporary environmental conditions
Empirical correlations with terrain attributes
Biomechanical impact
Local scale disturbance Že.g., earthworms, termites, ants, etc.. may control profile form and function
No effective predictors
Organisms (Õegetation)
Nutrient cycling and local patterning
Multi-spectral remote sensing
Disturbance
Fire and tree fall
No effective predictors
Endogenous Õectors
Inherited profile features may create feedback loops Že.g., pedogenic pans controlling profile hydrology.
Empirical correlations with terrain attributes
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Variable influencing pedogenesis
71
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
72
1996a.. Ninety-five percent of the Bago–Maragle area is covered by four rock types Žgranodiorite, adamellite, basalt and meta-sediments. and these formed an obvious first stratifying variable. 2.3.2. Climate A high resolution DEM with a grid size of 25 m was developed for the Bago–Maragle study area using digital contours, streamlines and spot heights from the eight 1:25,000 topographic map sheets covering the area. The DEM was generated using the ANUDEM computer program Ž Hutchinson, 1989a. . A powerful and under-utilized application of terrain analysis in soil survey is the spatial prediction of climatic averages at local and more regional scales. In the Bago–Maragle study area, broad trends in rainfall and temperature are correlated with elevation Ž Fig. 2. . The ESOCLIM computer program Ž Hutchinson, 1989b. uses long term climate records and relationships with elevation to generate predictive surfaces of monthly climatic variables including rainfall and mean temperature. Climatic variation at a more local scale in the area is caused by aspect differences and these were modelled using the SRAD program ŽWilson and Gallant, 1999. within TAPESG Ž Moore, 1992; Gallant and Wilson, 1996. to provide estimates of net solar radiation Ž Fig. 3. . There are many ways to integrate rainfall, temperature and radiation data to provide indices of local climate. Simple indices of the water balance are amongst the most useful because they integrate the biologically important effects of rainfall and evaporation. They give an indication of the intensity of leaching and provide a relative measure of potential biological productivity. A useful estimate of water balance is provided by the Prescott Index Ž Prescott, 1948. . It is calculated on a monthly basis and an annual average can be used to compare sites. The Prescott Index is: PI s
0.445P E 0.75
Ž1.
where PI s the index, P s mean monthly Žannualr12. rainfall Ž mm. and E s mean monthly Žannualr12. potential evaporation Ž mm. . The original index used empirical units hence the coefficient of 0.445. As noted above, P was derived directly from the ESOCLIM surface and E was estimated using the Priestly–Taylor Model Ž Priestly and Taylor, 1972. where: s LE s a Ž R qS.. Ž2. sqg n L s latent heat of vaporisation of water, E s potential evaporation, a s a constant Ž 1.26. , s s slope of the saturation vapour pressure curve at the mean wet bulb temperature, g s psychrometric constant, R n s net radiation and S s soil heat flux.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Fig. 2. Mean annual temperature Ž8C. and rainfall Žmm. surfaces for the Bago–Maragle study area. 73
74
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Fig. 3. Average net solar radiation ŽWmy2 . for the Bago–Maragle study area.
S can be neglected for periods greater than 24 h and is a minor component of the energy balance when averaged over a year. The term srŽ s q g . is tabulated as a function of temperature by Slatyer and McIlroy Ž 1961. and is well approximated by the following regression equation: s s 0.3943 q 0.01691T y 0.0001349T 2 R 2 s 0.99938 Ž3. sqg with mean annual temperature Ž T . being obtained from the ESOCLIM surface.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
75
R n was generated using the SRAD program with the units of Wmy2 Ž i.e., J sy1 my2 .. A conversion factor is required to estimate evaporation in mm dayy1. Taking L to be 580 cal gy1 Ž 2427 = 10 6 J my3 . and expressing R n on a daily rather than a per second basis, the relationship for estimating evaporation is: E s Ž 0.01768 q 0.0007585T y 0.00000605T 2 . R n .
Ž4.
Surfaces of average monthly evaporation and rainfall calculated on an annual basis were generated for the study area and used to derive the Prescott Index ŽFig. 4.. The Prescott Index is useful because it provides a simple estimate of microclimate across the study area at a fine grain that has not been available for survey purposes to date. It provides a cheap estimate of relative wetness suitable for predicting soil properties. 2.3.3. Landform Local landform has a major impact on soils by controlling water and sediment movement. The Compound Topographic Index ŽCTI., also known as the Topographic Wetness Index Ž Wilson and Gallant, in press., was calculated using the 25 m resolution digital elevation model. The CTI is a useful integrative topographic variable that is a guide to water and sediment movement in particular landscapes. It quantifies the position of a site in the landscape and has proven useful for predicting soil properties Ž e.g., Moore et al., 1993; Gessler et al., 1995. . It is defined as: CTI s ln
Ac
ž / tan b
Ž5.
where A c is the specific contributing area expressed as m2 per unit width orthogonal to the flow direction and b is the slope angle. Slope and specific contributing area were calculated using the TAPESG computer program Ž Moore, 1992; Gallant and Wilson, 1996. . A c is calculated from the contributing area and the latter was limited to a maximum value Ž - 100,000 m2 . to prevent the generation of very large values along major streams. A small value Ž 0.01%. was added to grid cells with zero slopes to avoid a denominator of zero. Slopes above 300% Ž lower limit for cliffs in McDonald et al., 1990. were set to that value. The CTI exhibits different scales of variation to the Prescott Index and the calculated surface for the study area is shown in Fig. 5. 2.3.4. Stratification, exclusion rules, classification and randomization Exclusion rules were applied to avoid sites of limited value; for example, cleared land, roads, lakes and stream beds. The geology, climate ŽPI. and landform Ž CTI. surfaces were then classified and used to generate the digital stratification as follows.
76
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Fig. 4. Prescott Index for the Bago–Maragle study area.
Density functions of CTI and the PI were used to calculate quantiles on an equal area basis. Each of the approximately 4 = 10 6 cells was then classified into one of three quantiles for PI and one of four quantiles for CTI. This produced 12 discrete environments for each geological class. Fig. 6 shows classified CTI, PI and the combined 12-class coverages for a 5 = 5 km area in the adamellite geologic unit.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
77
Fig. 5. Compound Topographic Index for the Bago–Maragle study area.
Discrete environments, or patches, were selected randomly within each of the 12 classes. Patches smaller than one hectare were excluded because they are difficult to locate in the field. Sites Ž individual cells. within each patch were then randomly selected. Replicate sites were selected in each of the 12 environments within each parent material class. The granodiorite class had an extra replicate because of its significance for forest production. As a result, the classes
78
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Fig. 6. Outline of the design-based, stratified, two stage sampling plan.
are not of equal size and the inclusion probabilities for sample sites differ between geologic units but remain known.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
79
2.4. Field measurement Accurate location of sample sites is essential for registration of field observations and environmental variables derived from terrain analysis and other sources. Sample sites were located to within 2 m using a differential global positioning system ŽGPS. with local base stations tied to the NSW Geodetic Grid. Real-time differential GPS is preferable but was not possible due to poor configurations of the satellite constellation at the time of the sampling. Navigation to the predetermined points was therefore undertaken on foot using a compass and hip-chain—some sites were several kilometres from the nearest road access. The standard deviation of the distance between the predetermined point and the actual location of the field observation was 28 m. This is a good result given the rugged terrain, forest vegetation and often long distances involved. Comprehensive site and profile data were collected. Soil morphology was described to a depth of at least three metres wherever possible according to McDonald et al. Ž1990. and Abraham and Abraham Ž 1992. . Soil samples were collected for comprehensive chemical analysis on all horizons. Only soil depth, total phosphorus and total carbon data are presented here. Total phosphorus was determined using a Kjeldahl acid digest ŽHeffernan, 1985. . Total carbon was determined using a Leco CHN-1000 Elemental Analyser. Bulk density was determined on major horizons and used to calculate the mass of carbon and phosphorus in the upper 1.00 m on an areal basis Ž T hay1 .. 2.5. Supplementary sampling A supplementary set of purposive samples was obtained after the main phase of sampling. The intention was to include sites of perceived pedologic importance that were inadequately sampled during the first phase. Most significant was the addition of sites in elevated swamps because of their stratigraphic record and the inclusion of sites that had extreme or anomalous gamma radiometric signatures. The gamma radiometric remote sensing became available after the first phase of sampling. Twenty one supplementary sites were sampled.
3. Models for environmental correlation 3.1. Explanatory Õariables for enÕironmental correlation Environmental correlation models have a form similar to Jenny’s functional factorial model but several terms differ: Si s f Ž Cl i ,Ti ,PM i , Mi . . . .
Ž6.
80
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Table 2 Environmental explanatory variables available across the Bago–Maragle study area Žitalicized variables have been used in the models presented here. Environmental variable
Description
Source or reference
Elevation
Elevation above sea level
DEM from 1:25,000 cartography
RelatiÕe eleÕation
Absolute value of cell elevation minus mean elevation in 625 m diameter circle Absolute range in elevation within 625 m circle centred on the cell Degrees clockwise from north ŽTAPESG. Measured in percent ŽTAPESG. Generated using a 3=3 window in TAPESG Generated in TAPESG Generated in TAPESG Generated in TAPESG CAr Žunit contour or cell width. Generated in TAPESG
Relief
Aspect Slope Plan curÕature Profile curvature Tangential curvature Contributing area ŽCA. Specific contributing area Ž A c . Compound Topographic Index ŽCTI. Dispersal area ŽDA. Specific dispersal area ŽA d . Degradation Index ŽDI. Stream Power Index Up-slope means for slope, plan curvature, profile curvature and tan curvature Down-slope means for slope, plan curvature, profile curvature and tan curvature Erosion Index Mean annual rainfall Mean annual temperature Net radiation Ž R n .
Generated in TAPESG with an inverted DEM DAr Žunit contour or cell width. Down-slope equivalent of CTI DI s InŽA d =Slope. SPI s lnŽA c =Slope. All up-slope averages generated using UPSUMG Žpart of TAPESG. Generated using UPSUMG on the inverted DEM Generated using the EROS software Generated using ESOCLIM and climate records Generated using ESOCLIM and climate records Generated using the SRAD software
McDonald et al. Ž1990.
Gallant and Wilson Ž1996. Gallant and Wilson Ž1996. Gallant and Wilson Ž1996. Gallant and Wilson Ž1996. Gallant and Wilson Ž1996. Gallant and Wilson Ž1996. Gallant and Wilson Ž1996. Moore et al. Ž1993.
Gallant and Wilson Ž1996.
Wilson and Gallant Ž1996. Hutchinson Ž1989b. Hutchinson Ž1989b. Wilson and Gallant Ž1999.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
81
Table 2 Žcontinued. Environmental variable
Description
Source or reference
Prescott Index
Calculated in ARCr GRID using inputs from SRAD and ESOCLIM Gamma radiometric counts derived from airborne geophysical survey Derived from airborne geophysical survey 3=3 focal mean Normalised Difference Vegetation Index calculated from Landsat TM Bands 3 and 4 ŽŽB4yB3.rŽB4qB3.. Digitized 1:100,000 geology map
See text
Radiometric potassium ŽK., thorium, uranium and total dose ŽDOSE. Magnetic intensity Landsat TM bands 1–7 NDVI
Geologic unit
Australian Geological Survey Organization Australian Geological Survey Organization Kumar and Monteith Ž1982.
where for each site, Si is the soil property or land quality observed in the field. Cl i , Ti and PM i are explanatory variables from spatial coverages of climate, terrain and parent material respectively. Mi represents other miscellaneous environmental predictors that may be available for a survey area Ž e.g., multispectral remote sensing, management operations, etc... For a cost-effective survey, the explanatory variables must be easier to obtain than soil variables otherwise intensive sampling of soil variables combined with an interpolation or surface fitting procedure would be more efficient for spatial prediction. The explanatory variables available for the Bago–Maragle study area are listed in Table 2. 3.2. Statistical models A range of data analysis methods can be applied to develop models for spatial prediction using environmental correlation. These include Bayesian rule-based systems ŽCook et al., 1996b; Skidmore et al., 1996. , neural nets, fuzzy logic ŽXhu et al., 1997., generalized linear models Ž McKenzie and Austin, 1993; Gessler et al., 1995., tree-based methods and co-kriging Že.g., Odeh et al., 1994.. Our focus here is on tree-based methods Ž Breiman et al., 1984; Clark and Pregibon, 1992. and generalized linear models Ž McCullagh and Nelder, 1989. . A thorough analysis of the advantages of different strategies for environmental correlation has yet to be done in soil survey although Austin et al. Ž 1995. have undertaken such a study for vegetation prediction. They concluded that a combination of generalized linear models and generalized additive models was superior to tree-based procedures but all were acceptable for practical applica-
82
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
tions. Neural nets were unsatisfactory because of the difficulty of interpretation and requirement for specialised skills. They also noted that the use of suitable environmental variables was more important than the choice of prediction method. The generality of these results is not clear because soil data are often noisy and conditional relationships appear more common. Odeh et al. Ž 1995. have examined various strategies for predicting soil properties using co-kriging and regression kriging. Tree-based models are fitted by successively splitting a dataset into increasingly homogenous subsets. The response variable can be either a factor Ž classification trees. or a continuous variable Ž regression trees. and the explanatory variables can be of either type. Output from the model fitting process is a decision tree and a measure of goodness of fit is provided by the reduction in variance. Clark and Pregibon Ž1992. note that compared to classical regression methods, tree-based models are sometimes easier to interpret and discuss when a mix of continuous and discrete variables are used as predictors. Missing values are dealt with in a more satisfactory manner and multinomial response variables Že.g., soil types. can be used. However, the most significant advantage of tree-based models is the capacity to model non-additive and non-linear relationships in a relatively simple way. This is particularly useful for pedologic data where interactions between the response variable and environmental explanatory variables are often conditional on other explanatory variables. The most significant disadvantage of tree-based methods is the lack of a well accepted procedure for statistical inference. The reduction in variance provides a useful measure of goodness-of-fit and results of cross-validation are also useful Žsee below.. Overfitting of trees should be avoided and rules are required for selecting an optimal tree. In this work, cross-validation has been used to determine the predictive ability of trees with decreasing numbers of nodes ŽBreiman et al., 1984; Venables and Ripley, 1994. . At each level, trees with increasing numbers of terminal nodes were fitted 20 times with 5% of the data randomly selected and withheld to provide a test of the predictive strength of the model. The optimal number of nodes was defined by the model that yielded the smallest prediction error for the response variable. Stable models exhibit a clear and consistent minimum for repeated cross-validation runs. Generalized linear models can also be used for prediction and they have the advantage of being able to use explanatory variables that are continuous Ž e.g., CTI. or nominal Ž e.g., rock type.. As a consequence, predictions are more realistic because they portray soil variation as being gradual or discontinuous. The response variable of a generalized linear model can be continuous or binary. 3.3. Spatial prediction of soil profile depth, total phosphorus and total carbon A regression tree predicting soil profile depth is presented in Fig. 7. Crossvalidation indicated that a tree with five terminal nodes was optimal. The tree
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
83
Fig. 7. Regression tree for the prediction of soil profile depth Žm.. The predicted depths are presented in the rectangles and ellipses. Numbers under the ellipses and rectangles are variances.
model accounted for a modest 42% reduction in variance compared to a null model Ži.e., a single mean. . Despite this, the fitted model has a simple process interpretation. The study area is characterised by deep soils with an average depth of 2.10 m. The length of branches in Fig. 7 is approximately proportional to the variance accounted for by the split at the parent node. The first split of the tree in Fig. 7 is based on slope with soils on steeper slopes Ž) 5%. being shallower Ž1.92 m. than those on more gentle slopes Ž- 5% and 3.20 m predicted depth.. The gently inclined sites are then split on the basis of CTI. Those with a smaller CTI Ž- 10.3. have shallower soils Ž2.95 m. and tend to be higher in the landscape on divergent slopes. Sites with a CTI ) 10.3 Ži.e., sites with convergent flows and lower landscape positions. have very deep soils Ž 4.85 m.. The sites on steeper slopes Ž) 5%. are split on the basis of relative elevation. Sites with higher local relief Ži.e., more dissected terrain. have shallower soil depths Ž1.72 m. . The lower relief sites have deeper soil Ž 2.4 m. . This subgroup is further divided on slope with sites - 28% having deeper soil Ž2.5 m. than
84
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
steeper Ž) 28%. sites Ž 1.04 m.. Sites with high relative elevation on the right hand side of the tree have a further subdivision on the basis of mean annual
Fig. 8. Predicted soil profile depth Žm. using the regression tree from Fig. 7.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
85
temperature. This is the most empirical variable in the model and is acting as a surrogate for weathering intensity and profile age. Steeper sites with higher temperatures are drier, occur at low elevations and occupy the most dissected parts of the study area—they have the shallowest predicted depths Ž0.93 m. . The predicted soil profile depth across the study area is presented in Fig. 8—it uses the model from Fig. 7. A regression tree model predicting total phosphorus ŽT hay1 . summed over the top 1.0 m of the soil profile is presented in Fig. 9. Cross-validation indicated a tree with five terminal nodes to be optimal and it accounts for 78.5% of the variance. The first split uses potassium concentration from the air-borne gamma radiometric remote sensing—it accounts for a large proportion of the variance and identifies basalt and depositional areas with high levels of phosphorus. These areas have low radiometric potassium Ž- 0.84. and are further subdivided using landform. Higher total phosphorus is associated with basalt areas having greater relief Ž) 38 m.. Sites in this category with an average downslope slope less than 7% tend to be zones of sediment accumulation and have a large total phosphorus value Ž 23.97 T hay1 . . The remaining areas with steeper slopes
Fig. 9. Regression tree for the prediction of total soil phosphorus in the upper 1.00 m ŽT hay1 .. The predicted values are presented in the rectangles and ellipses. Numbers under the ellipses and rectangles are variances.
86
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Fig. 10. Predicted total soil phosphorus for the upper 1.00 m ŽT hay1 . using the regression tree from Fig. 9.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
87
below the site are finally subdivided using plan curvature. Plan curvatures greater than 0.8 are on divergent slopes and have lower total phosphorus Ž 7.63 T hay1 . than sites on convergent slopes Ž plan curvature - 0.8 and total phosphorus 15.04 T hay1 . . The latter sites usually have larger organic matter contents Žsee below. and this contributes to the total phosphorus content. The predicted total phosphorus surface is presented in Fig. 10. A standard least squares regression model Ž i.e., generalized linear model with an identity link and normal error model. predicting lnŽtotal carbon., summed to 1.0 m, is presented in Eq. Ž7. and Table 3. The model accounts for 54% of the variance. ln Ž total carbon. s 6.444 y 2.668 Ž Prescott Index.
y1
y 0.009DOSE
q 1.599NDVI y 0.0091ln Ž Dispersal Area . q 0.046 Ž Plan Curvature.
Ž7.
The model has a straightforward interpretation. The Prescott Index accounts for most variation. Locations with a larger Prescott Index have a higher predicted carbon content and they are typically elevated Žwetter and less evaporation. or with a cool Žsoutheastern. aspect. A negative relationship between total radiometric count and predicted carbon arises because locations with a low count are invariably basaltic areas or upland swamps. The large predicted carbon content for the areas of basalt arises primarily because of their relative fertility particularly with regard to phosphorus. The higher iron content of the soils may allow more effective retention of carbon in the strongly leaching environment. NDVI was calculated from a Landsat TM scene and is indicative of leaf area and forest productivity more generally. Sites with a large dispersal area Ži.e., erosional sites high in the landscape on crests and ridges. and positive plan curvature Ži.e., divergent slopes. have lower predicted carbon contents. The predicted total carbon surface is presented in Fig. 11.
Table 3 Summary of the least squares regression model for predicting lnŽtotal carbon. summed to 1.00 m .y1
ŽPrescott Index DOSE NDVI lnŽdispersal area. Plan curvature Residuals
d.f.
Sum of squares
Mean square
F value
PrŽ F .
1 1 1 1 1 157
9.689 2.483 0.718 0.626 0.611 12.124
9.689 2.483 0.718 0.626 0.611 0.077
125.37 32.13 9.29 8.10 7.91
0.0000 0.0000 0.0027 0.0050 0.0055
One observation has been deleted from the analysis because of a local defect in the DEM and another due to the presence of a rotting log in the profile caused by logging operations.
88
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Fig. 11. Predicted total carbon for the upper 1.00 m ŽT hay1 . using an ordinary least squares regression ŽEq. Ž7...
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
89
4. Discussion 4.1. Integration with geostatistical analysis The presence of spatial dependence in soil data suggests that some form of kriging can be profitably used for spatial prediction. The environmental explanatory variables used here have surfaces with different levels of smoothness and therefore contrasting degrees of spatial dependence Ž Figs. 2–5. . Any spatial dependence present in the soil data may be accounted for by these environmental variables when they are included in a tree-based or generalised linear model. The degree to which spatial dependence in the soil variables is captured through correlation with the spatially dependent environmental explanatory variables will determine the value of subsequent geostatistical analysis. These issues will be explored in a subsequent paper. However, it should be noted that the development of reliable variograms is difficult for the Bago–Maragle study area where sampling intensity is low and a range of soil–landscapes exist Ž e.g., residual plateaux, erosional slopes, alluvial flats. with contrasting pedogenic processes and presumably different scales and patterns of spatial dependence. The relatively low sampling intensity may result in sample points being only weakly spatially dependent and the advantages for kriging are therefore diminished. 4.2. Issues of scale The scales at which processes of soil formation operate should ideally guide the scales used for measurement of soil and environmental variables. In practice this is rarely possible due to logistic constraints. For example, in this study there is a mismatch in the scale, or support, for soil measurements Ž specimen volumes of less than 0.5 m3 . and those for the digital elevation model Žgrid cell area of 625 m2 . , NDVI Ž 900–810,000 m2 . and radiometrics Ž resolution ; 10,000 m2 .. There are many issues relating to scale but we will restrict our comments to some recent observations on the scale of soil forming processes and the resolution of digital elevation models. The predictive relationships for soil profile depth, total carbon and total phosphorus demonstrate a common feature of environmental correlation statistical models. Both site scale Ž e.g., plan curvature, slope, mean annual temperature. and hillslope or more regional scale Ž e.g., relative elevation, downslope slope, relief, CTI. variables appear as explanatory variables. Gessler et al. Ž1995. worked in much smaller watersheds and obtained similar results with CTI and plan curvature being useful predictors. The CTI represented processes at the hillslope scale with length scales in the order of 200–400 m while plan curvature was calculated using a 60 = 60 m window and represented more local scale processes controlling soil patterns.
90
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
The local, hillslope and more regional scales of variation captured within the models of environmental correlation are in accord with field experience—the factors controlling soil distribution are multi-scaled. The significant advance provided by high resolution digital elevation models and gamma radiometric remote sensing is the capacity to predict soil variation at a fine grain across extensive areas. Conventional soil surveys cannot readily portray predicted variation of soils at length scales from approximately 50 to 500 m. Clearly, the appropriate resolution of a digital elevation model will depend on the scale of the processes controlling soil formation and this will be strongly landscape dependent. There are few guidelines at present although Gessler Ž1996. found that a grid cell resolution finer than 40 m was necessary in an erosional landscape. At more coarse resolutions, terrain variables behaved erratically and quickly lost their predictive capacity. McKenzie and Austin Ž1993., working in extremely low relief alluvial landscapes, found good predictive relationships between two dimensional landform profiles and soil patterns but noted that elevation data with a resolution of around 0.2 m were necessary. The resolution of the gamma radiometric remote sensing is also a key factor. Several variables control the fidelity of a gamma radiometric image but most significant are the elevation of the airborne sensor and the distance between flight lines. The Bago–Maragle study area was flown using a detector suspended beneath a helicopter flying at an elevation of 100 m above the landsurface. Flight lines were 200 m apart and this appears to be near the maximum distance if patterns at the hillslope scale are to be well resolved Ž Bierwirth, 1996. . 4.3. Capture of expert knowledge The environmental correlation models presented here can be used for spatial prediction and they can form the basis for a more scientific approach to soil survey. However, they may not utilize all of the predictive capacity in the intuitive mental models used in conventional survey. McKenzie et al. Ž in press. note that an experienced pedologist usually draws on relationships between soils and environmental variables gained in other landscapes when mapping a new area. It is difficult to include this type of information in the explicit models presented here although we used such information to make decisions in this study Že.g., in relation to the sampling plan. . Note that rule-based systems Že.g., Cook et al., 1996b. have the capacity to more thoroughly integrate qualitative information into a predictive system. Another facet of conventional practice difficult to incorporate into a more quantitative system is the large amount of semi-continuous field observation incorporated into intuitive mental models. Our quantitative models use data from observations at selected sites. However, many impressions of relationships between soil and more readily observable environmental variables are obtained as the survey area is traversed Ž e.g., opportunistic observations of road cuttings
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
91
and other exposures. . These continuous observations are incorporated into quantitative models to some extent because the general impressions provide insights that influence the selection of explanatory variables and overall model fitting process. The degree to which continuous observations improve an intuitive model has not been investigated. However, these observations may conceivably introduce bias and reduce the quality of spatial prediction Ž for example, roads in the study area follow drier parts of the landscape and impressions gained from road-cuttings may be misleading if they are used to make generalisations for the complete area.. 4.4. Landscape complexity Previous sections have highlighted the potentially complex nature of relationships between soils and terrain. McKenzie and Austin Ž 1993. , for example, found relationships between landform and soils to be more strongly expressed in younger alluvial units than in older landscape units. They suggested that the older units in their study area had been re-worked several times, experienced possible aeolian accessions and undergone strong weathering during the last 100,000 years to effectively obliterate relationships between soils and the contemporary landform. Unpublished particle size, mineralogical and geomorphic data suggests that aeolian accession is an important process across the Bago–Maragle study area and elevated, gently inclined sites on contrasting geologies have unusually similar soils. There is potential for modelling the likely sediment budgets across the area but many simplifying assumptions would be necessary. In some landscapes, relationships between soils and the geometry of the underlying weathering front may be more significant than surface morphometry. McKenzie and Austin Ž1993. also found the presence of impeding layers at shallow depths Ž - 1.5 m. to be a strong predictor of soil properties and the presence of geologic structures such as dykes and sills can control hillslope hydrology and soil patterns Že.g., Engel et al., 1987.. Air-borne magnetic data show that dyke swarms occur in the Bago–Maragle study area but their effect on soil distribution could not be discerned. Finally, there are circumstances where soil variation occurs without relation to readily observable environmental variables. In these cases, detailed sampling is unavoidable and some form of interpolation or surface fitting is required to generate spatial predictions.
5. Conclusions Environmental correlation is a general method for the spatial prediction of individual soil properties, and if desired, soil types. The statistical sampling plan
92
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
used in the Bago–Maragle study ensured a sampling of the environmental variation considered initially to be affecting soil formation and distribution across the area. In contrast to conventional survey methods, the sampling plan is explicit, consistent and repeatable. The explicit prediction of individual properties is another considerable advantage because soil properties often display contrasting scales of variation. The availability of extensive, fine grain environmental data from digital elevation models and gamma radiometric remote sensing allows predictions at local scales in a way not possible with conventional polygon-based survey methods. Environmental correlation models for soil profile depth, total phosphorus and total carbon accounted for 42%, 78% and 54% of the variance present in the sample respectively. Finally, the success of environmental correlation depends on the strength of relationships between soil and environmental variables. To be of any advantage, the environmental variables must be more readily measurable and available for the complete study area. The availability of high resolution digital elevation models and gamma radiometric remote sensing has made environmental correlation feasible in many landscapes.
Acknowledgements Thanks to all members of the Bago–Maragle team including Andrew Loughhead, Linda Ashton, David Jacquier, Peter Leppert, Bob Abell, Phil Bierwirth and John Raison. Supporting funds have been provided by CSIRO, State Forests NSW, Australian Geological Survey Organization and the Forest and Wood Products Research and Development. Several patient referees provided many useful suggestions.
References Abraham, S.A., Abraham, N.A., 1992. Soil Data System Site and Profile Information Handbook. Dept. Conservation and Land Management, NSW, Sydney, Australia. 162 pp. Austin, M.P., McKenzie, N.J., 1988. Data analysis. In: Gunn, R.H., Beattie, J.A., van der Graff, R.H.M. ŽEds.., Australian Soil and Land Survey Handbook. Guidelines for Conducting Surveys, Chap. 11. Inkata Press, Melbourne. Austin, M.P., Meyers, J.A., Belbin, L., Doherty, M.D., 1995. Simulated Data Case Study, Subproject 5, Modelling of Landscape Patterns and Processes Using Biological Data. Division of Wildlife and Ecology, CSIRO, Canberra. Bierwirth, P., 1996. Investigation of airborne gamma-ray images as a rapid mapping tool for soil and land degradation—Wagga Wagga, NSW. Record 1996r22. Australian Geological Survey Organization, Canberra. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterey.
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
93
Brus, D.J., De Gruijter, J.J., 1997. Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil. Geoderma 80, 1–59 Žwith discussion.. Clark, L.A., Pregibon, D., 1992. Tree-based models. In: Chambers, J.M., Hastie, T.J. ŽEds.., Statistical Models in S, Chap. 9. Chapman & Hall, New York. Cook, S.E., Corner, R.J., Grealish, G.J., Gessler, P.E., Chartres, C.J., 1996a. A rule based system to map soil properties. Soil Sci. Soc. Am. J. 60, 1893–1900. Cook, S.E., Corner, R.J., Groves, P.R., Grealish, G.J., 1996b. Use of airborne gamma radiometric data for soil mapping. Aust. J. Soil Res. 34, 183–194. Dietrich, W.E., Reiss, R., Hsu, M.-L. Hsu, Montgomery, D.R., 1995. A process-based model for colluvial soil depth and shallow landsliding using digital elevation data. Hydrological Processes 9, 383–400. Engel, R., McFarlane, D.J., Street, G., 1987. The influence of dolerite dykes on saline seeps in southwestern Australia. Aust. J. Soil Res. 25, 125–136. Gallant, J.C., Wilson, J.P., 1996. TAPES-G: a grid-based terrain analysis program for the environmental sciences. Computers and Geosciences 22, 713–722. Gessler, P.E., 1996. Statistical soil–landscape modelling for environmental management. PhD thesis, Australian National University. Gessler, P.E., Moore, I.D., McKenzie, N.J., Ryan, P.J., 1995. Soil–landscape modelling and spatial prediction of soil attributes. Int. J. Geographical Information Systems 4, 421–432. Heffernan, B., 1985. A Handbook of Methods of Inorganic Chemical Analysis for Forest Soils, Foliage and Water. Division of Forest Research, CSIRO, Canberra, pp. 60–65. Hewitt, A.E., 1993. Predictive modelling in soil survey. Soils and Fertilizers 3, 305–315. Hudson, B.D., 1992. The soil survey as a paradigm-based science. Soil Sci. Soc. Am. J. 56, 836–841. Hutchinson, M.F., 1989a. A new method for griding elevation and stream line data with automatic removal of pits. J. Hydrol. 106, 211–232. Hutchinson, M.F., 1989b. A new method for spatial interpolation of meteorological variables from irregular networks applied to the estimation of monthly mean solar radiation, temperature, precipitation and windrun. Tech. Memo. 89r5. Division of Water Resources, CSIRO, pp. 95–104. Jenny, H., 1941. Factors of Soil Formation—a System of Quantitative Pedology. McGraw-Hill, New York. Jenny, H., 1980. The Soil Resource—Origin and Behaviour. Springer-Verlag, New York. Knotters, M., Brus, D.J., Oude Voshaar, J.H., 1995. A comparison of kriging, co-kriging and kriging combined with regression for spatial interpolation of horizon depth with censored observations. Geoderma 67, 227–246. Kumar, M., Monteith, J.L., 1982. Remote sensing of plant growth. In: Smith, H. ŽEd.., Plants and the Daylight Spectrum. Academic Press, London. McCullagh, P., Nelder, J.A., 1989. Generalized linear models, 2nd edn. Chapman & Hall, London. McDonald, R.C., Isbell, R.F., Speight, J.G., Walker, J., Hopkins, M.S., 1990. Australian Soil and Land Survey Field Handbook, 2nd edn. Inkata Press, Melbourne. McKenzie, N.J., Austin, M.P., 1993. A quantitative Australian approach to medium and small scale surveys based on soil stratigraphy and environmental correlation. Geoderma 57, 329–355. McKenzie, N.J., Gessler, P.E., Ryan, P.J., O’Connell, D., in press. The role of terrain analysis in soil mapping. In: Wilson, J.P., Gallant, J.C. ŽEds.., Terrain Analysis: Principles and Applications, Chap. 10. Geoinformation International. Moore, I.D., 1992. Terrain Analysis Programs for the Environmental Sciences: TAPES. Agricultural Systems and Information Technology 2, 37–39.
94
N.J. McKenzie, P.J. Ryan r Geoderma 89 (1999) 67–94
Moore, I.D., Gessler, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soil attribute prediction using terrain analysis. Soil Sci. Soc. Am. J. 57, 443–452. Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1994. Spatial prediction of soil properties from landform attributes derived from a digital elevation model. Geoderma 63, 197–214. Odeh, I.O.A., McBratney, A.B., Chittleborough, D.J., 1995. Further results on prediction of soil properties from terrain attributes: heterotopic co-kriging and regression kriging. Geoderma 67, 215–226. Prescott, J.A., 1948. A climatic index for the leaching factor in soil formation. J. Soil Sci. 1, 9–19. Priestly, C.H.B., Taylor, R.J., 1972. On the assessment of surface heat flux and evaporation using large-scale parameters. Mon. Weather Rev. 100, 81–92. Ryan, P.J., McKenzie, N.J., Loughhead, A., Ashton, L.J., 1996. New methods for forest soil surveys. In: Eldridge, K.G. ŽEd.., Environmental Management: the Role of Eucalypts and Other Fast Growing Species. Proceedings of the Joint AustralianrJapanese Workshop Held in Australia, 23rd–27th October. CSIRO Division of Forestry and Forest Products. CSIRO Publishing, Melbourne. Skidmore, A.K., Watford, F., Luckananurug, P., Ryan, P.J., 1996. An operational GIS expert system for mapping forest soils. Photogrammetric Engineering and Remote Sensing 62, 501–511. Slatyer, R.O., McIlroy, I.C., 1961. Practical Microclimatology with Special Reference to the Water Factor in Soil–Plant–Atmosphere Relationships. UNESCOrCSIRO. Stuart-Smith, P.G., 1991. The Gilmore Fault Zone—the deformation history of a possible terrane boundary within the Lachlan Fold Belt. BMR Journal of Australian Geology and Geophysics 12, 35–50. Venables, W.N., Ripley, B.D., 1994. Modern Applied Statistics with S-Plus. Springer-Verlag, New York. Voltz, M., Webster, R., 1990. A comparison of kriging, cubic splines and classification for predicting soil properties from sample information. J. Soil Sci. 41, 473–490. Webb, T.H. ŽEd.., 1994. Soil Landscape Modelling in New Zealand. Landcare Research Science Series No. 5. Manaaki Whenua Press, Lincoln. Webster, R., 1997. Soil resources and their assessment. Phil. Trans. R. Soc. London, Ser. B 352, 963–973. Wilson, J.P., Gallant, J.C., 1996. EROS: a grid-based program for estimating spatially-distributed erosion indices. Computers and Geosciences 22, 707–712. Wilson, J.P., Gallant, J.C., 1999. SRAD: a program for estimating radiation and temperature in complex terrain. Trans GIS 8 Žin press.. Wilson, J.P., Gallant, J.C., in press. Terrain Analysis: Principles and Applications. Geoinformation International. Xhu, A.X., Band, L., Vertessy, R., Dutton, B., 1997. Derivation of soil properties using a soil land inference model ŽSoLIM.. Soil Sci. Soc. Am. J. 61, 523–533.