Geoderma 160 (2010) 225–235
Contents lists available at ScienceDirect
Geoderma j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / g e o d e r m a
Data integration model to assess soil organic carbon availability Ana Horta ⁎, Amílcar Soares Center for Natural Resources and Environment, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, 1049-001 Lisbon, Portugal
a r t i c l e
i n f o
Article history: Received 29 March 2010 Received in revised form 6 August 2010 Accepted 27 September 2010 Available online 20 October 2010 Keywords: Geostatistics Joint sequential simulation Bivariate distribution Soil organic carbon assessment
a b s t r a c t Soil data acquisition and assessment are crucial phases in the evaluation of soil degradation scenarios. To overcome the lack of field data, flexible sampling approaches can be used to complement conventional soil sampling. For the assessment of soil quality, it is necessary to integrate different soil support data and to provide a coherent spatial characterization of soil properties. This study proposes a new model to combine soil data from two different supports: “point” data, which refers to the concentration measured in the topsoil layer, and “bulk” data, which refers to the concentration measured for the whole soil depth sampled. The method developed uses a geostatistical co-simulation algorithm based on the experimental bi-distribution between both types of soil supports to compute co-simulated values. This new approach was applied to assess Soil Organic Carbon (SOC) availability in the topsoil. The results were used to identify critical areas in the Left Margin of the Guadiana River; an area in the South of Portugal with a high susceptibility to desertification. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Field data acquisition is one of the most important tasks in soil degradation assessment studies. In practice, a conventional sampling campaign comprises a qualitative description of the soil profile plus sampling per horizon, and further lab analysis for quantitative classification of soil quality indicators. These sampling requirements make soil quality studies expensive, time-consuming and, as a result, often lacking field data. Also, conventional soil sampling methods provide enough information to describe the vertical variability of soil properties but not their spatial continuity over a short distance. Therefore this last variable is extremely difficult to evaluate when only conventional sampling data is available. One possible way to assess spatial continuity is to collect bulk, undisturbed soil samples of the first 40 to 50 cm of soil (depending on field conditions). In the first stage of a soil campaign, bulk sampling is an alternative to sampling per horizon; it is faster, cheaper and allows for a more representative soil sampling over a larger area. Based on this assumption, this work presents a model to integrate data from both sampling approaches, namely: - “bulk” data concerning the concentration measured for the whole soil depth sampled (usually 40 to 50 cm of soil); and - “point” data, referring to a concentration measured in the topsoil layer (obtained by conventional soil sampling that starts by collecting soil from the first 5, 10 or 20 cm of the topsoil layer).
⁎ Corresponding author. Tel.: +351 21 841 74 41; fax: +351 21 841 73 89. E-mail address:
[email protected] (A. Horta). 0016-7061/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.geoderma.2010.09.026
The algorithm developed to combine soil data from two different supports uses geostatistical co-simulation. Several papers discuss the importance of including geostatistics for the prediction/characterization of soil quality (see, for example, Heuvelink and Webster, 2001; Sun et al., 2003). In general, geostatistical applications in soil science use estimation algorithms to assess the spatial distribution of a soil attribute. Besides estimation methods, stochastic simulation algorithms have been applied in the environmental field, in particular to soil quality characterization (Goovaerts, 2001). The advantages of using simulation over estimation has been discussed (Goovaerts, 2000) and can be resumed to the fact that estimation methods (kriging) minimize the estimation variance but fail to reproduce the spatial variability of main variables as it is revealed by the variograms, spatial covariances and histograms of experimental data. Moreover, stochastic simulation provides the spatial uncertainty associated with spatial estimates, a requirement in several soil studies involving impact studies and risk assessment (Goovaerts, 1999). To characterize more than one variable (attribute, soil property or pollutant) and reproduce their joint spatial pattern (given by covariograms and bi-histograms), several co-simulation algorithms have been used: sequential multi-Gaussian co-simulation (Verly, 1993), multi-Gaussian co-simulation with collocated co-kriging (Almeida and Journel, 1994), co-simulation with LU decomposition method (Myers, 1988), and simulation of autocorrelation factors (Desbarats and Dimitrakopoulos, 2000; Boucher and Dimitrakopoulos, 2009). For most practical applications, the common stochastic cosimulation algorithm applied to a set of correlated variables is based on a sequential approach. Sequential co-simulation algorithms depend on the formalism used to establish the spatial model of the random variable. Sequential Gaussian Co-simulation (Almeida and Journel, 1994), Sequential Indicator Co-simulation (Goovaerts, 1997)
226
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
and Direct Sequential Co-simulation (Soares, 2001) are examples of different approaches to perform sequential co-simulation. Larocque et al. (2006) and Parysow et al. (2003) show the application of geostatistical Gaussian co-simulation to soil studies. Franco et al. (2006) applied Direct Sequential Co-simulation to map the risk of contamination of river sediments. As required for the implementation of any stochastic simulation algorithm, these methods ensure that the variograms, co-variograms, marginal histograms and the correlation coefficient of each pair of variables (Z1 and Z2) are reproduced in the final results. However, it may happen that the variable's conditional distribution is not correctly reproduced by the set of simulated values. For example, for a given value of a covariate, say Z1(x) = z, spatially located at x, the co-simulated value of Z2(x) is not necessarily included in the conditional distribution F(Z2(x)|Z1(x) = z), given by experimental data. Thus the co-simulated value Z2(x) can be higher or lower than the experimental limits defined by the class of Z1(x) = z. Horta and Soares (2010) proposed a new algorithm of joint simulation to overcome this problem and ensure the reproduction of the bidistribution of each pair of variables. In this paper, we applied this new co-simulation approach to quantify Soil Organic Carbon (SOC) availability in the topsoil in the Left Margin of the Guadiana River; an area in the South of Portugal with high susceptibility to desertification.
2. Materials and methods 2.1. Study area description The study area is located in the South-East of Portugal (Alentejo Region) and it comprises the Left Margin of the Guadiana River with, approximately, 290.000 ha (Figure 1). This territory is delimited by the Guadiana River and the Portuguese frontier with Spain (Extremadura and Andaluzia). This is a region with high susceptibility to desertification, and is often used as a Portuguese example of agriculture abandonment and rural migration. The rainfall regime is highly variable in both space and time dimensions, being clearly Mediterranean (Costa and Soares, 2008). The mean annual temperature is about 17.5 °C. The landscape is a trademark of this Region: flat open areas covered with typical cork tree forest distributed in a coarse grid where the spaces are “filled” with non-irrigated land and pastures. The steepest area (slope between 10% and 15%) is located Northeast (municipality of Barrancos) where arid hills with olive trees define the landscape. The major soil groups are predominantly lithosols and luvisols (and in some smaller areas cambissols and vertisols), as defined by the Food and Agriculture Organization of the United Nations soil classification.
Fig. 1. Study area and “bulk” sampling points (Dataset 1).
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
227
2.2. Soil sampling For the assessment of SOC availability in the topsoil we used soil data from two different supports: - “bulk” data concerning SOC concentration measured for the whole soil depth sampled (40 to 50 cm); and - “point” data, referring to SOC concentration measured in the topsoil layer (first 5, 10 or 20 cm). These different sets of SOC data were provided by the following sources: 1. “bulk data”: sampling campaign for the study of the spatial variability of soil quality in the Left Margin of the Guadiana River (Dataset 1). 2. “point data”: sampling campaign to evaluate soil agriculture capability in the Alentejo Region (Dataset 2). Dataset 1 consists in 100 soil “bulk” samples (average 0.5 m depth) covering the entire study area (Figure 1). This sampling campaign was performed in the framework of a national project for desertification assessment, to evaluate horizontal variability of soil quality indicators (SOC included). Sampling was done in October 2005, in the beginning of the hydrological cycle, after the worst drought recorded in the last 10 years in Portugal. Sampling points were positioned in the study area using a random stratified strategy. Preferential locations were chosen according to the combination of slope, pedological and lithological classes. Soil samples were collected with a soil auger and conditioned to maintain the sample undisturbed. The samples were air-dried and ground to pass through a 2-mm sieve. Soil organic carbon was determined by dichromate-wet combustion method. For the 100 soil “bulk” samples collected in the study area, the measured SOC values range between 5.5% and 0.2% as presented in the histogram of Fig. 2(a). Spatial distribution of “bulk” SOC content per sampling point can be seen in Fig. 2(b). Dataset 2 was obtained by conventional soil sampling performed in three campaigns (1998, 2004 and 2005). The purpose of these campaigns was to collect soil data for future irrigation projects in the Alentejo Region. Dataset 2 refers to specific zones within the study area (Figure 3). In this sub-area, soil profiles were obtained for 43 sampling points, where SOC was measured in each horizon according to conventional sampling methods. Soil organic carbon was determined by dichromate-wet combustion method. For the 43 soil samples collected in the topsoil layer, the measured SOC value ranges between 1.74% and 0.38% as shown in the histogram of Fig. 4(a). The spatial distribution of “point” SOC content per sampling location is presented in Fig. 4(b). 2.3. Assessment of SOC availability using a co-simulation approach As mentioned before, this work proposes the use of a cosimulation algorithm to integrate SOC data measured in different sampling supports. Geostatistical co-simulation is applied to calculate topsoil SOC using SOC measured in a bulk core with a soil depth of approximately 0.5 m. Bulk SOC will provide valuable data to condition (as secondary information) the simulation of the new variable. Hence, for the application of the co-simulation approach it is desirable that both variables (previously designated as “point” and “bulk”) are spatially correlated. According to the co-simulation formalism, SOC “bulk” data is designed the “secondary variable” and SOC “point” data the “primary variable” or the variable to be co-simulated. In this section, we present the co-simulation algorithm applied to our case study, including some theory related to stochastic co-
Fig. 2. “Bulk” data (Dataset 1): (a) histogram; (b) SOC content per sampling point.
228
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
Fig. 3. “Point” sampling location (Dataset 2).
simulation and practical issues concerning the algorithm implementation. More details about the method are provided in Appendix A.
2.4. Direct Sequential Co-Simulation with Joint Probability Distributions Direct Sequential Co-Simulation (Co-DSS) is a stochastic simulation algorithm that reproduces the spatial relation of a set of interdependent and correlated variables (Soares, 2001). This algorithm is based on a sequential approach where each variable is simulated in turn, taking the previously simulated values as secondary information (Almeida and Journel, 1994). The main advantages of using the sequential approach are its simplicity, efficiency and the possibility of choosing which variable will condition the next covariate. Co-DSS enables the joint simulation of a set of Nv variables without any prior or subsequent transformation (unlike other cosimulation algorithms, as Sequential Gaussian Co-simulation that requires a variable transformation to obtain co-simulated values). Let us consider Co-DSS applied to two random variables, Z1(x) and Z2(x), being Z1(x) the variable with more evident spatial continuity (i.e. the one that is simulated in the first place), and Z2(x) the variable to be co-simulated. The algorithm ensures the reproduction of marginal conditional distributions (F1(z1) and F2(z2)), variograms (γ1(h) and γ2(h)) and the joint spatial pattern given by the co-
Fig. 4. “Point” data (Dataset 2): (a) Histogram; (b) SOC content per sampling point.
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
229
Fig. 6. Experimental biplot (SOC “point” vs. SOC “bulk”, Dataset 2).
Fig. 5. “Bulk” data (Dataset 2): (a) histogram; (b) SOC content per sampling point.
variogram γ1,2(h). More details about the Co-DSS algorithm are given in Appendix A and in Soares (2001). The use of Co-DSS in environmental and mining applications showed that sometimes the algorithm does not reproduce the joint distribution F[Z1(x), Z2(x)]. In fact the bi-histogram of co-simulated values can show some differences when compared with the experimental one. One possible explanation is the fact that high values of conditional variances, usually computed at the very beginning of the sequential simulation process, produce co-simulated values that do not comply with the experimental bi-histogram. In some practical applications the reproduction of the joint distribution or, more precisely, of the conditional distributions F(z2| z1) can be of paramount importance mostly when non-linear cost functions are applied to the set of co-simulated values. Horta and Soares (2010) developed a direct sequential co-simulation algorithm to ensure that the experimental bi-histogram is reproduced at the end of the co-simulation. This new approach (untitled Direct Sequential Co-simulation with Joint Probability Distributions) is based on the idea of resampling z(l) 2 (xi) not from the global/marginal cdf F[Z2(x)] as in the usual Co-DSS, but from the joint distribution F[Z1(x),Z2(x)]. In fact, this joint distribution is the same as the conditional distribution of Z2(xi), at a location xi, given the previously simulated value z(l) 1 (xi), F[(Z2(x)|Z1(x) = z(l) 1 (xi)]. Hence, Co-DSS with Joint Probability Distributions guarantees that, for a particular class of values of Z1(x) = z, the simulated values z(l) 2 (xi) will fall between the upper and lower limits of the experimental data class to which z(l) 1 (xi) belongs. Also, it will reproduce the experimental conditional histogram of Z2(x) given Z1(x) = z [F(Z2(x)|Z1(x) = z]. This new algorithm also ensures that simulated realizations of Z1(x) and Z2(x) reproduce the variograms γ1(h) and γ2(h), the marginal cdfs (F[Z1(x)] and F[Z2(x)] and the conditional cdfs (F[Z2(x)|Z1(x)]).
230
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
Fig. 7. Variograms for SOC “bulk” simulation (Dataset 1): (a) main direction; (b) minor direction.
The proposed methodology of Co-DSS with Joint Probability Distributions can be summarized as follows (Horta and Soares, 2010): i. Estimation of the global bi-distribution F[Z2(x)|Z1(x)] from experimental data. ii. Simulation of the first covariate Z1(x) using DSS. iii. Computation of the conditional cdf F[(Z2(x)|Z1(x) = z(l) 1 (xi)] from the global bi-distribution F[Z2(x)|Z1(x)]. iv. Estimation of the local mean and variance of z2(xi), identified with the collocated simple co-kriging estimate z2(xi)⁎ and σ2sk (xi) conditioned to experimental data z2(xα) and the collocated datum z1(xi). v. Definition of the interval of the conditional cdf F[(Z2(x)|Z1(x) = z(l) 1 (xi)] to be sampled. vi. Drawing of a value z(l) 2 (xi) from the conditional cdf F[(Z2(x)|Z1 (x) = z(l) 1 (xi)]. vii. Looping until all n nodes have been visited and simulated. 2.5. Spatial correlation between SOC “bulk” and “point” data The spatial correlation between both types of data, i.e., the SOC content in the topsoil layer and the corresponding SOC content in the “bulk” volume is a fundamental step for the joint simulation. For this case study, we used Dataset 2 to generate a new set of “bulk” data, by averaging the SOC content in each sampled layer:
SOCWBULK2WðxÞ =
1 ∫ SOC WPOINT2WðuÞdu jVx j Vx
ð1Þ
where SOC “Bulk2” is the spatial linear average of SOC “Point2” (SOC concentration measured for each horizon in Dataset 2), within block volume V.
Fig. 8. SOC “bulk” simulation (Dataset 1): (a) one realization; (b) mean of 20 realizations.
For this new variable, concentrations range between 1.06% and 0.31% according to the histogram in Fig. 5(a). Fig. 5(b) shows the spatial distribution of SOC “bulk” concentrations calculated for the 43 samples. Fig. 6 shows the biplot calculated between SOC “bulk” and “point” data for the 43 samples, with a correlation coefficient (Pearson) r = 0.78. Although it was calculated with a subset of samples (n = 43) in a local area, for the sake of this work we consider this relation between SOC “bulk” and “point” data valid and representative of entire area.
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
231
Fig. 10. Co-simulation of SOC “point” (Dataset 1): (a) one realization; (b) mean of 20 realizations.
3. Results and discussion
Fig. 9. “Smoothed” experimental data (Dataset 2): (a) biplot SOC “point” vs. SOC “bulk”; (b) marginal histogram of SOC “point”.
Afterwards, we used Co-DSS with Joint Probability Distributions (Horta and Soares, 2010) to obtain “point” SOC data based on the assumption that both “point” and “bulk” SOC were linear correlated with a bi-distribution presented in Fig. 6.
The application of Co-DSS with Joint Probability Distributions to compute “point” SOC (content in the topsoil) is summarized in the following steps: - Estimation of the global bi-distribution between SOC “point” and SOC “bulk” provided by Dataset 2 (see previous section). - Simulation of SOC “bulk” data (Dataset 1) for the entire study area. - Co-simulation with Joint Probability Distributions of SOC “point” for the study area, using simulated SOC “bulk” (Dataset 1) as secondary information.
232
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
Fig. 12. Variogram of co-simulated SOC “point” (Dataset 1).
3.2. Co-simulation of SOC “point” (Dataset 1)
Fig. 11. Co-DSS with Joint Probability Distributions (Dataset 1): (a) biplot SOC “point” vs. SOC “bulk”; (b) marginal histogram of co-simulated SOC “point”.
3.1. Simulation of SOC “bulk” (Dataset 1) Stochastic simulation of SOC “bulk” (Dataset 1) was performed using the Direct Sequential Simulation (DSS) algorithm (see Appendix A and Soares, 2001). The spatial continuity imposed to the simulation model was given by the variograms in Fig. 7(a–b), fitted by a spherical model with a range of 9 km. DSS was used to generate a set of 20 simulated images of SOC “bulk”, in a regular grid of nodes of 100 by 100 m, reproducing the spatial variability given by the experimental histogram (Figure 2(a)) and the variograms (Figure 7(a–b)). Fig. 8(a) shows an example of one realization obtained using DSS and Fig. 8(b) presents the mean image obtained for the set of 20 simulations.
As mentioned before, the co-simulation algorithm used is based on the experimental bi-distribution between both types of soil supports to compute the co-simulated values. Besides establishing the correlation coefficient between “point” and “bulk” SOC contents, the bi-distribution is used to define the conditional histogram in each co-simulation step from which the co-simulated value is resampled. To obtain a more representative (or less erratic fluctuations) bi-histogram, an optimization algorithm (Deutsch and Journel, 1998) was used to calculate a more smoothed bi-distribution (Figure 9(a–b)). Co-simulation of SOC “point” is then performed using Co-DSS with Joint Probability Distributions, considering SOC “point” from Dataset 2 as the experimental data z2(xα), and the previous simulated values of SOC “bulk” from Dataset 1 (z(l) 1 (x)) as the secondary variable. Fig. 10(a) presents one realization obtained for SOC “point” (Dataset 1). Fig. 10(b) shows the mean image calculated for a set of 20 simulations. The black spots represent areas with SOC content higher than 1%. These areas were not considered in this study since we assumed that for values higher than 1% soil degradation does not occur caused by a lack of SOC. The comparison between the bi-distribution of simulated values (Figure 11(a)) and the experimental bi-distribution (Figures 6 and 9(a)) reveals a good matching as well as the marginal histogram (Figure 11(b)) obtained when compared with experimental data (Figures 4(a) and 9(b)). The variogram calculated for the cosimulated values (Figure 12) was also successfully compared with the experimental one (Figure 7(a)). These results confirm that the method is reproducing global statistics and thus honouring experimental data, including the bivariate relationship between the two SOC soil supports that was assumed to be valid for the study area. Another comparison was performed using the conditional probabilities (Prob (Z2(x)|Z1(x)) obtained by co-simulation and the conditional probabilities given by the experimental bi-histogram (Figure 9(a)). The results can be considered satisfactory, in spite of some minor differences among the values. This is probably due to the fact that different datasets
Table 1 Conditional probabilities. Prob (z2|z1) Prob Prob Prob Prob
(z2 b Q1|z1 b Q1) (Q1 b z2 b Q2|Q1 b z1 b Q2) (Q2 b z2 b Q3|Q2 b z1 b Q3) (Q3 b z2 b max|Q3 b z1 b max)
Experimental bi-histogram
Simulated bi-histogram
70% 40% 31% 40%
76% 30% 42% 43%
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
Fig. 13. SOC “point” vs. SOC “bulk” with content lower than 0.5%: (a) “smoothed” experimental data (Dataset 2); (b) Co-DSS with Joint Probability Distributions (Dataset 1).
were used to compute the experimental bi-histogram and to perform the co-simulation. Table 1 presents the conditional probabilities calculated for four intervals (defined according to the quartile values). To analyse the algorithm performance for a certain subset of SOC values, namely, the one corresponding to the classes of SOC “bulk” content lower than 0.5%, we compared the correlation coefficient obtained for the experimental (Figure 13(a)) and the co-simulation data (Figure 13(b)). The result is quite satisfactory since Co-DSS with Joint Probability Distributions is able to reproduce the experimental
233
Fig. 14. Topsoil SOC availability: (a) critical areas (threshold = 0.8%); (b) comparison with land patterns.
relation between two variables even when the correlation coefficient is low. Regarding the interpretation of these results in terms of SOC variability, it should be noticed that for a low SOC content (less than 0.5%), the linear correlation between SOC “point” and “bulk” concentrations decreases. This evidence confirms the fact that, for low SOC contents, it is difficult to model SOC vertical variability due to a higher dispersion along the soil profile.
234
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
3.3. Topsoil SOC availability The output of our work provided the spatial distribution of SOC in the topsoil for the Left Margin of the Guadiana River. However, our purpose was to identify critical areas in terms of soil degradation due to SOC variability. To accomplish this goal, we defined a threshold value of 0.8% as the critical limit for SOC concentration below which soil degradation is likely to occur (Horta et al., 2008). Then, for each position in the grid defining the study area, we computed the probability of SOC content to be below 0.8%. Fig. 14(a) presents the final classification map where it is possible to identify the areas where this probability equals to 1 (“red” areas). These are the areas to be integrated in future land use planning to prevent irreversible soil degradation process. A visual validation was made comparing the classified map with satellite images available for the Left Margin of the Guadiana River. Fig. 14(a) and (b) shows an example of this practical comparison: areas 1 and 2 match with the “red” features that are easily identified in Fig. 14(b). A lack of vegetation in areas 1 and 2 can be a sign for degradation effects, and should be further investigated by a new soil sampling campaign. 4. Conclusions The purpose of this work was to develop a model that enables the integration of soil data from different sampling sources. This new approach is based on a geostatistical co-simulation algorithm that uses the experimental bi-distribution of soil data to compute cosimulated values (Co-DSS with Joint Probability Distributions). For the case study presented, Co-DSS with Joint Probability Distributions was used to obtain the spatial variability of SOC in the topsoil layer, using different soil support data. The co-simulation of topsoil SOC was conditioned to the experimental bi-distribution of “point” and “bulk” soil data. The algorithm reproduced the experimental bivariate distribution, even when the correlation coefficient was low. This is an advantage when decision-making is based on computing impact costs which, in turn, are non-linear functions of soil quality indicator contents and thus can magnify existing bias of the conditional distributions. Regarding the SOC availability map obtained for the Left Margin of the Guadiana River, it was possible to establish a visual matching between the areas with low SOC contents and areas lacking vegetation. This can be a sign for degradation effects, and should be further investigated to quantify the soil degradation extent. The success of this application can be seen as a new approach to define soil sampling strategies. In a first stage, it can be helpful to study the horizontal variability of a certain quality indicator, using soil “bulk” data. This variability provides a first “image” of critical areas and can be further combined with topsoil quality data to obtain new outputs regarding soil quality evaluation.
Considering Z(x) a continuous random variable at location x and z(l) (x) the lth realization of Z(x) at the same location, the sequential simulation of a continuous variable follows this general methodological outline: i. Randomly select the spatial location of a node xi, i = 1,…,n; in a regular grid of nodes to be simulated. ii. Estimate the local cumulative distribution (cdf) function at xi, conditioned to the experimental data z(xα) and the previous simulated values z(l)(xi). iii. Draw a value z(l)(xi)from the estimated local cdf. iv. Return to step (i) until all nodes have been visited by the random path. In step (ii), the estimation of the local cdf is usually performed with the indicator formalism (Sequential Indicator Simulation; Goovaerts, 1997) or with the multi-Gaussian approach (Sequential Gaussian Simulation; Almeida and Journel, 1994). For these simulation algorithms, to perform step (ii) it is compulsory to transform experimental data in a binary variable (Sequential Indicator Simulation — SIS) or in a Gaussian distributed variable (Sequential Gaussian Simulation — SGS). DSS is based on the concept that original z-attribute values are simulated directly without any prior transform. This approach was firstly introduced by Journel (1994).This work demonstrated that the direct simulation of a continuous variable succeeds in reproducing the covariance model provided that simulated values are drawn from local distribution centered at the simple kriging estimates with a variance corresponding to the simple kriging estimation variance. The demonstration given by Journel (1994) ensures that the semivariogram is reproduced by the simulated values however it does not guarantee the reproduction of the histogram. Later, Soares (2001) proposed a solution for this problem by developing the DSS algorithm to be used as an alternative to the existent SIS and SGS. To ensure the histogram reproduction, Soares (2001) introduced the idea of resampling the global cdf (F(x;z)) instead of the local cdf. Then, for each simulation node, simple kriging estimate (z(x)⁎) and variance (σ2sk(x)) are computed to define the sampling interval of the global cdf. Hence, in step (iii), a value z(l)(xi) is drawn from the global cdf at a specific interval centered at z(x)⁎ and with a range determined by the σ2sk(x) value. In summary, the DSS algorithm can be described by the following implementation steps (Soares, 2001): i. Definition a random path over the entire grid of nodes xi, i = 1, …n, to be simulated. ii. Estimation of the local mean and variance of z(xi), identified with the simple kriging estimate z(xi)⁎ and simple kriging variance (σ2sk(xi)), conditioned to the experimental data z(xα) and previous simulated values z(l)(xi): i
sk
zðxi Þ* = ∑ λα ðxi Þ½zðxα Þ−m + m α=1
Acknowledgements 2
This paper was produced in the context of Project “CIDMEG” (PPCDT/CLI/58865/2004) and Project “Soil contamination risk assessment” (PTDC/CTE-SPA/69127/2006) (financed by FEDER through the National Operational Science an Innovation Program 2010 with the support of the Foundation for Science and Technology). Appendix A Direct Sequential Simulation and Co-simulation Direct Sequential Simulation (DSS) and Co-Simulation (Co-DSS) are sequential stochastic simulation algorithms used to obtain realizations of spatial random fields.
i
sk
σsk ðxi Þ = C ð0Þ− ∑ λα ðxi ÞC ðxα −xi Þ α=1
(where C(0) is the variance of z(xi) and C(xα − xi) the covariance between xα and the location to be simulated xi). iii. Define the interval of F(u;z) to be sampled, using the algorithm described in Soares (2001). iv. Draw a value z(l)(xi) from the selected interval of cdf F(x;z). v. Loop until all n nodes have been visited and simulated. Because DSS deals with original variables, corrections for local means were also implemented to reproduce non stationary patterns of the primary variable or better reproduce the extreme classes of highly skewed histograms (for more details, see Soares (2001)).
A. Horta, A. Soares / Geoderma 160 (2010) 225–235
One of the main advantages of the DSS algorithm over SIS and SGS is that it allows the joint simulation of several variables, dealing directly with non-transformed experimental data. To account for the joint simulation of interdependent continuous variables, DSS was extended to Direct Sequential Co-simulation (Co-DSS). The sequential co-simulation algorithms follow a general framework: instead of simulating the N variables simultaneously, each variable can be simulated in turn as long as it is done conditionally to the previously simulated values (Gómez-Hernández and Journel, 1993; Almeida and Journel, 1994). Hence to simulate one variable in turn it is necessary to predefine, at the beginning of the simulation, the hierarchy of variables. This hierarchy is chosen according to the variable's spatial continuity or relevance for the case study (Almeida and Journel, 1994). Let us consider Co-DSS applied to two random variables, Z1(x) and Z2(x), being Z1(x) the variable with more evident spatial continuity (i.e. the one that is simulated in the first place), and Z2(x) the variable to be co-simulated. First, Z1(x) is simulated for the entire grid of nodes xi, i = 1,…,n, using DSS. The same algorithm is applied to simulate Z2(x) but, for CoDSS, collocated simple co-kriging is used to calculate z2(xi)⁎ and σ2sk (xi) conditioned to experimental data z2(xα) and the collocated datum z1(xi) (Soares, 2001). The global/marginal cdf of Z2(x) (F[Z2(x)]) is built using experimental data z2(xα) whereas z2(xi)⁎ and σ2sk(xi) are used to define the interval to resample z(l) 2 (xi). The Co-DSS method ensures the reproduction of the marginal distributions (F[Z1(x)] and F[Z2(x)]), the variograms (γ1(h)and γ2(h)) and the joint spatial pattern given by the co-variogram γ1,2(h) (Soares, 2005). However, Co-DSS may not reproduce the joint distribution F[Z1(x), Z2(x)]. In fact the bi-histogram of co-simulated values can show some differences when compared with the experimental one (Horta and Soares, 2010). Horta and Soares (2010) developed a Co-DSS algorithm to ensure that the experimental bi-histogram is reproduced at the end of the cosimulation. This new approach (untitled Direct Sequential Cosimulation with Joint Probability Distributions) is based on the idea of resampling z2(l)(xi) not from the global/marginal cdf F[Z2(x)], as in the usual Co-DSS, but from the joint distribution F[Z1(x),Z2(x)].
235
References Almeida, A., Journel, A., 1994. Joint simulation of multiple variables with a Markov-type coregionalization model. Math. Geol. 26 (5), 565–588. Boucher, A., Dimitrakopoulos, R., 2009. Block simulation of multiple correlated variables. Math. Geosci. 41, 215–237. Costa, A.C., Soares, A., 2008. Trends in extreme precipitation indices derived from a daily rainfall database for the South of Portugal. Int. J. Climatol. 29, 1956–1975. Desbarats, A.J., Dimitrakopoulos, R., 2000. Geostatistical simulation of regionalized poresize distributions using min/max autocorrelation factors. Math. Geol. 32 (8), 919–942. Deutsch, C.V., Journel, A.G., 1998. Gslib: Geostatistical Software Library and User's Guide. Oxford University Press, New York. Franco, C., Soares, A., Delgado, J., 2006. Geostatistical modelling of heavy metal contamination in the topsoil of Guadiamar river margins (S Spain) using a stochastic simulation technique. Geoderma 136, 852–864. Gómez-Hernández, J. And, Journel, A.G., 1993. Joint sequential simulation of multiGaussian fields. In: Soares, A. (Ed.), Geostatistics Troia, volume 1. Kluwer, Dordrecht, pp. 85–94. Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press, New York. Goovaerts, P., 1999. Geostatistics in soil science: state-of-the-art and perspectives. Geoderma 89, 1–45. Goovaerts, P., 2000. Estimation or simulation of soil properties? An optimization problem with conflicting criteria. Geoderma 97, 165–186. Goovaerts, P., 2001. Geostatistical modelling of uncertainty in soil science. Geoderma 103, 3–26. Heuvelink, G.B.M., Webster, R., 2001. Modelling soil variation: past, present, and future. Geoderma 100, 269–301. Horta, A., Soares, A., 2010. Direct sequential co-simulation with joint probability distributions. Math. Geosci. 42 (3), 269–292. Horta, A., Carvalho, J., Soares, A., 2008. Assessing the quality of the soil by stochastic simulation. In: Soares, A., Pereira, M.J., Dimitrakopoulos, R. (Eds.), Proceedings of the Sixth European Conference on Geostatistics for Environmental Applications. Springer, Netherlands, pp. 385–396. Journel, A.G., 1994. Modeling uncertainty: some conceptual thoughts. In: Dimitrakopoulos, R. (Ed.), Geostatistics for the Next Century. Kluwer, Dordrecht, pp. 30–43. Larocque, G., Dutilleul, P., Pelletier, B., Fyles, J.W., 2006. Conditional Gaussian cosimulation of regionalized components of soil variation. Geoderma 134, 1–16. Myers, D.E., 1988. Vector conditional simulation. In: Armstrong, M. (Ed.), Geostatistics. Kluwer, Dordrecht, pp. 283–292. Parysow, P., Wang, G., Gertner, G., Anderson, A.B., 2003. Spatial uncertainty analysis for mapping soil erodibility based on joint sequential simulation. Catena 53, 65–78. Soares, A., 2001. Direct sequential simulation and cosimulation. Math. Geol. 33 (8), 911–926. Soares, A., 2005. Classification of mining reserves using direct sequential simulation. In: Leuangthong, O., Deutsch, C.V. (Eds.), Geostatistics Banff 2004, volume 1. Springer, Netherlands, pp. 511–522. Sun, B., Zhou, S., Zhao, Q., 2003. Evaluation of spatial and temporal changes of soil quality based on geostatistical analysis in the hill region of subtropical China. Geoderma 115, 85–99. Verly, G.W., 1993. Sequential Gaussian cosimulation: a simulation method integrating several types of information. In: Soares, A. (Ed.), Geostatistics Troia, volume 1. Kluwer, Dordrecht, pp. 543–554.