Journal of Hydrology (2008) 352, 309– 321
available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/jhydrol
Estimating the necessary sampling size of surface soil moisture at different scales using a random combination method Chunmei Wang a, Qiang Zuo
a,*
, Renduo Zhang
b
a
Department of Soil and Water Sciences, College of Resources and Environment, China Agricultural University, and Key Laboratory of Plant–Soil Interactions, MOE, Beijing 100094, China b School of Environmental Science and Engineering, Sun Yat-Sen (Zhongshan) University, Guangzhou 510275, China Received 10 February 2007; received in revised form 25 November 2007; accepted 17 January 2008
KEYWORDS Necessary sampling size; Random combination method; Sampling strategy; Soil moisture
Summary To develop a sampling strategy of surface soil moisture, a random combination method (RCM) was proposed and used to estimate the necessary sampling size (NSS) of soil moisture at different sampling areas. The RCM was developed based on the bootstrap sampling procedure and consideration of all possible sub-sampling combinations of available data. To examine the method, field experiments were conducted in sampling domains of 10 · 10, 20 · 20, 40 · 40, 55 · 55, 80 · 80, and 160 · 160 m2. Comparisons of the RCM with other commonly used sampling methods, including the statistical, geostatistical, stratified sampling, and bootstrap methods, indicated that the RCM provided rational and efficient sampling strategies. Under the same accuracy, estimated NSS values using the RCM were much smaller than those by the statistical and bootstrap methods. In addition, the RCM has the advantage of requiring less input information, whereas the statistical and stratified sampling methods require independent data with the normal distribution, the stratified sampling method requires stratified allocation information, and the geostatistical method requires the semivariogram model. The RCM was applied to estimate the NSS of soil moisture at different scales (i.e. squares with sides of 10, 20, 40, 80, and 160 m). Estimated values of the NSS under confidence levels of 90% and 95% with relative errors of 5% and 10% were linearly related to the coefficients of variation calculated from the experimental data. To enhance calculation efficiency of the RCM, the procedure was simplified using a small sub-sample size, which dramatically reduced the computation time for the NSS estimation. ª 2008 Elsevier B.V. All rights reserved.
Abbreviations: CL, confidence level; NSS, necessary sampling size; RCM, random combination method; RE, relative error. * Corresponding author. Tel.: +86 10 6273 2504; fax: +86 10 6273 3596. E-mail address:
[email protected] (Q. Zuo). 0022-1694/$ - see front matter ª 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2008.01.011
310
Introduction Soil moisture is an important state variable of the land surface system. The collection and analysis of the surface soil moisture data at different spatial scales have received much attention (Narayan et al., 2004; De Lannoy et al., 2006). Typical point measurements through methods such as time domain reflectometry and neutron probe are only reliable in a small area because of the soil spatial variability. Satellite remote sensing offers a useful alternative to investigate the surface soil moisture at different regional scales. However, evaluation of remote sensing data often requires ground-based sampling corresponding to the different scales or image pixels (Jacobs et al., 2004; De Lannoy et al., 2006). Therefore, the key question for soil moisture measurements at different scales (or remote sensing image pixels) is how many points must be sampled to estimate a rational mean value within a given confidence level (CL). We refer the number of required data points as the necessary sampling size (NSS). The NSS depends on the subject of study, variability of the sample population, the desired accuracy and CL in estimating the population mean, and sampling costs (Skopp et al., 1995). Several sampling strategies have been developed, including the statistical, geostatistical, stratified sampling, and bootstrap methods (Gilbert, 1987; Kamgar et al., 1993; Sastre et al., 2001; Zhang, 2005). The statistical method has been widely used to estimate the necessary size of samples (Treasurer and Pope, 2000; Hupet and Vanclooster, 2004; Miyamoto et al., 2005). By assuming that sample values are independent and with the normal distribution, the statistical method shows that the NSS is proportional to the square of the coefficient of variation (CV) with a specified precision (Warrick, 2003; Hupet and Vanclooster, 2004). Nevertheless, the usefulness of the method is often limited because of inadequate samples and the assumption of data independence. Considering the spatial correlations of soil properties, geostatistics offers an alternative for the optimal sampling design. Mcbratney and Webster (1983) applied the geostatistics theory to estimate the NSS of pH and showed that the geostatistics was able to increase sampling efficiency three to nine times comparing with the statistical method. Di et al. (1989) and Chung et al. (1995) estimated NSS values for different soil properties with the geostatistic sampling method and showed that under the same accuracy, the method resulted in smaller NSS values than the statistical method. However, information of the semivariance is a prerequisite for the geostatistical sampling method. The stratified sampling method has been successfully applied to various research areas, such as population census (Seah et al., 2002), environmental analysis (Brus et al., 1999), production test (Fang et al., 2001), and soil science (Park and van de Giesen, 2004). However, the successful applications of the method depend on available information of dividing the target population into internally homogeneous strata (Miyamoto et al., 2005) and stratified allocations to the objective system (Cochran, 1977). Without any assumption about the population distribution, the bootstrap method has been used to design sampling strategies of soil properties in recent years (Hupet and Vanclooster, 2002). It is suggested that at least 1000 bootstrap reps (B) are needed in many applications (Manly, 1997) and
C. Wang et al. sometimes the reps are set as high as 10,000. To avoid tediously long computation time, a small population size (n) or a small number of bootstrap reps (B) is often chosen in practice (Dane et al., 1986; Kamgar et al., 1993). In the bootstrapping procedure, m samples are randomly selected from n samples (m = 1, 2, . . . , n) for the fixed B reps to estimate the distribution of statistics. Even for a small population size (say n = 20), the fixed B reps are difficult to cover all combinations between m and n (for example, with m = 10 and n = 20, all the combinations is Cm n ¼ 184; 756), which often leads to fluctuations of the confidence interval with the sample sizes (Kamgar et al., 1993). If all the combinations are taken into consideration, the results for sampling designs should be more representative. The object of this study was to propose a random combination method (RCM), improving the bootstrap method through considering all combinations of samples. The RCM was applied to estimate the NSS of surface soil moisture, using experimental data measured at different scales. The accuracy and validity of the RCM were compared with those of other sampling methods, such as the statistical, geostatistical, stratified sampling, and bootstrap methods. Estimated NSS values by the RCM at different scales were related to the coefficients of variation of soil moisture measured at dry and wet periods. To enhance the calculation efficiency, the RCM was simplified by reducing sample sizes of computations.
Materials and methods Statistical sampling In statistics, based on the assumption that the samples are independent of each other and drawn from a normal distribution, the NSS (n0) required for estimating the mean value within a specific absolute error or relative error is estimated by (Gilbert, 1987) n0 ¼ t21a=2;n0 1
r2
¼ t21a=2;n0 1
CV2 j2
ð1Þ d2 With n measured values of soil moisture in a field, namely h1, h2, . . . , hn, the coefficient of variation (CV) is calculated as follows: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi Pn 2 1 ðh hÞ r i¼1 i n ð2Þ CV ¼ ¼ h h n 1X h¼ hi ð3Þ n i¼1 In Eqs. (1)–(3), r is the standard error; t1a/2 the value of the student’s t-distribution at the chosen CL 1 a; d ¼ jl hj the absolute error, in which l is the true mean of the population and h is the mean of the n samples; and j = d/l the relative error (RE). If a reliable value of r is not available, CV and j can be used to estimate n0 because CV is usually less variable from one study site or time period to another than r (Gilbert, 1987). Since n0 in Eq. (1) is unknown, a proper number of the freedom degrees for estimating t is also unknown. The value of n0 is estimated using an iterative procedure in this study as follows. With the values of 1 a/2 and n, the standard normal deviation Z1a/2 that cuts off (a/2)100% of the upper
Estimating the necessary sampling size of surface soil moisture at different scales tail of the normal distribution is acquired and used to approximate t1a=2; n0 1 in Eq. (1) for estimating the initial iteration value of n0, represented as n1. Then, t1a=2; n0 1 in Eq. (1) is replaced with t1a=2; n1 1 to obtain next iterative freedom degree n2. The procedure is repeated successively for n3, n4, . . . , and nk until |nk nk1| 6 e, in which e is an iterative control value (setting as 0.5 in this study).
Geostatistical sampling
cðhÞ ¼
Nh 1 X ½hðx i þ hÞ hðxi Þ2 2Nh i¼1
ð4Þ
where xi represents the ith sampling location; c(h) is the semivariance at the lag distance h; h the sample value of soil moisture; and Nh the number of pairs (xi, xi + h). For a grid sampling with square blocks (S) and observation points at their centers, the variance r2S of the estimated blocks is (Mcbratney and Webster, 1983; Chung et al., 1995) r2S ¼ 2cðx; SÞ cðS; SÞ
ð5Þ
where cðx; SÞ is the average value of semivariance between the central point x and the square area and cðS; SÞ the average value of all possible semivariances within the square area itself. If the estimated values for the squares are hSi, i = 1, P 2, . . . , n, the average value for the region is hS ¼ n1 ni¼1 hSi , approximately. The corresponding global estimation variance is approximated by (Mcbratney and Webster, 1983) n h i 1 1 X E½fhS hg2 2 E fhSi hi g2 ¼ r2S ð6Þ n i¼1 n Based on Eqs. (5) and (6), the value of NSS for a given error level or a desired accuracy is obtained from a plot of the estimation variance (or standard error) vs. the sample sizes n (Chung et al., 1995).
Stratified sampling In the stratified sampling process, the population units (M) are grouped into L non-overlapping strata, which are internally homogeneous with M1, M2, . . . , ML units, respectively. The weight of the qth stratum (q = 1, 2, . . . , L) is Wq = Mq/M. Sometimes Wq is set as the area ratio between the qth stratum and the whole region (Park and van de Giesen, 2004). The population mean estimated in the stratified samP pling is in the form of hst ¼ Lq¼1 W q hq , which is the weighted average of the L stratum sample means hq . The variance Vðhq Þ of hq and the variance Vðhst Þ of hst are evaluated by (Cochran, 1977) r2q ð1 fq Þ nq L L X X r2q Vðhst Þ ¼ W 2q Vðhq Þ ¼ W 2q ð1 fq Þ nq q¼1 q¼1
Vðhq Þ ¼
where rq, nq, and fq = nq/Mq are the standard error, the sampling size, and the sampling fraction, respectively, of the qth stratum. If the data are normally distributed, 2 Vðhst Þ in Eq. (8) is replaced by d=Z1a=2 under the CL of 1 a. Then the NSS (n0) in all strata is estimated by (Gilbert, 1987) Z21a=2 n0 ¼
In geostatistics, the spatial variability and correlations of soil properties are considered; therefore, geostatistics provides an optimal way to estimate the NSS. The spatial dependence between observations is expressed by the semivariance (Burgess and Webster, 1980)
ð7Þ ð8Þ
311
L P q¼1
1 þ Z21a=2
W q r2q =d 2
L P q¼1
ð9Þ W q r2q =d 2 M
Bootstrap and random combination sampling If n measurements of surface soil moisture are taken in a plot, the mean value h of the plot is evaluated through averaging the measured values hi (i = 1, 2, . . ., n). In the bootstrap technique, the ‘‘true’’ sampling distributions are simulated through repeated sampling from n observation data (Efron and Tibshirani, 1993). The method does not require any assumption of a sampling statistical distribution (Dane et al., 1986). The construction of a bootstrap sampling strategy is as follows: (1) Randomly select m samples (m ranges from 1 to n) from the available n observations for B bootstrap reps (set B = 1000 in this study). (2) Calculate the mean of the selected m samples of each bootstrap replicate and obtain B mean values in total. (3) Calculate the RE between the B mean values and the mean of the n observations, and evaluate the CL of which the RE is less than a given value (e.g. 5% or 10%). (4) Plot the CL against the sample size m to obtain the NSS value (n0) on a certain CL (e.g. 95% or 90%) from the graph. As stated above, the fixed B reps in the bootstrap method are difficult to cover all the combinations between m and n, which often results in fluctuations of the confidence interval with the sample size (Kamgar et al., 1993). To eliminate or diminish the fluctuations, a random combination method (RCM) is proposed based on the bootstrap to estimate the NSS at a given CL. To obtain the RCM, the fixed B reps and the B mean values in the bootstrap procedure are replaced with the random combinations sðs ¼ Cm n Þ and the s mean values, respectively.
Field experiments Field experiments were carried out during 2005–2006 at an experimental station for Water-Saving in Agriculture and Ecology in Gansu Province, China. The experimental station is within the typical continental arid zone between Tenggeli Desert and Badanjilin Desert of northwest China (latitude 3751 0 N, longitude 10250 0 E, altitude 1550 m). The region is deficient in water resources, with a mean annual precipitation of 164.4 mm and water surface evaporation of 2000 mm. The groundwater table depth is consistently below 25 m. The surface relief of the experimental field is rather uniform with the maximal elevation difference about 0.5 m (Fig. 1). The soil texture in the topsoil (0–20 cm) is a silty loam with the contents of sand about 48%, silt 48%, and
312 clay 4%. The spring wheat in the experimental field was irrigated four times a year (on 30 April, 20 May, 8 June, and 30 June, 2005, and on 1 May, 21 May, 10 June, and 2 July, 2006), each with about 97.5 mm of water. In the first experiment (Exp. 1), the plot area was 55 · 55 m2 (represented as P4 in Fig. 1) and divided into 121 units, each with an area of 5 · 5 m2. Portable hydrosense instrument was used to measure the surface (0– 20 cm) soil moisture at the center of each unit on 23 May 2005 (3 days after an irrigation). The second experiment (Exp. 2) was set up for soil moisture measurements at different scales. Referring to some remote sensing image pixels (e.g. ETM, 60 m; ASTER, 90 m; TM, 120 m) and the experimental methods of Chaplot and Walter (2003) and Western et al. (2004), five scales (i.e. sampling domains of 10 · 10, 20 · 20, 40 · 40, 80 · 80, and 160 · 160 m2, indicated as P1, P2, P3, P5, and P6 in Fig. 1, respectively) were considered. The plots with sides of 10, 20, 40, 80, and 160 m were divided into 100 units with sampling unit areas of 1 · 1, 2 · 2, 4 · 4, 8 · 8, 16 · 16 m2, respectively. The sampling densities at all the scales were higher than those of Chaplot and Walter (2003) and Western et al. (2004). Seventeen times of surface soil moisture measurements were conducted at different scales in May and June 2006 during the dry and wet periods (before and after irrigation), using the hydro-sense instrument at the center of each unit (Table 1). At each measurement time in the experiments, several sets of the hydro-sense instrument were used simultaneously so that the measurements were fulfilled within 30 min for one scale and 3 h for all the scales to minimize the effects of changing surface soil moisture with time. Two confidence levels (CL = 95% and 90%) and two relative errors (RE = 5% and 10%) for estimating the NSS values at the different scales were discussed in this study.
Figure 1 Relative elevation (m) map of the experimental field (160 · 160 m2, with the reference point at the north-west corner of the field), in which the outlined squares represent the sampling plots for different scales (P1, 10 · 10 m2; P2, 20 · 20 m2; P3, 40 · 40 m2; P4, 55 · 55 m2; P5, 80 · 80 m2; and P6, 160 · 160 m2).
C. Wang et al.
Results and discussion Comparisons of the NSS estimations using different methods Based on the measured 121 values of surface soil moisture in Exp. 1, the best fitted semivariance was an isotropic spherical model with the correlation length (i.e. the range of semivariance model) about 47 m. The spatial distribution of surface soil moisture in the plot was obtained using the ordinary kriging interpolation (Fig. 2). The mean and standard error (r) were 0.304 cm3 cm3 and 0.056 cm3 cm3, respectively. The corresponding values of NSS under a standard error of 0.0076 cm3 cm3 (equivalent to RE = 5% and CL = 95%) were about 55 and 35 for the statistical and geostatistical methods, respectively. For the same standard error, the required sample sizes using the geostatistics were much smaller than those using the statistical method, which was in accordance with the results by other researchers (Di et al., 1989; Chung et al., 1995). Fig. 3 shows the relationship between r and sample sizes estimated using the statistical and geostatistical methods, indicating that increasing sample sizes resulted in rapidly decreasing of the standard error. The plot in Exp. 1 was divided into 4 strata according to the spatial distribution of soil moisture (Fig. 2). The weight coefficient Wq was set as the ratio between the area of the qth stratum and the total area. The changing tendency of the standard error with the sample sizes from the stratified sampling was similar to those from the statistical and geostatistical methods (Fig. 3). However, the estimated NSS using the stratified sampling method was only 26 under the standard error of 0.0076 cm3 cm3. In the bootstrap method, the sample size m was selected from 1 to 121 within the 121 (n) data points and each selection was replicated 1000 times (B). For each m, the mean of the surface soil moisture was computed from the 1000 reps. The CL of the 1000 reps, which had the mean within the given RE (5% or 10%) of the population mean (121 sampling data), was estimated for each m. Then the values of CL and RE were transformed into the standard errors. The relationship between the standard error and the sample size m for the bootstrap method was nearly identical to that for the statistical method (Fig. 3), similar to the result of Hupet and Vanclooster (2002). Nonetheless, a few fluctuations appeared, which was probably resulted from the insufficient reps (Kamgar et al., 1993). Under the standard error of 0.0076 cm3 cm3, the estimated NSS using the bootstrap was 52. The t-test was performed for evaluating the NSS differences at different standard errors estimated by the RCM and other sampling methods using SPSS (Statistical Product and Service Solutions, SPSS Inc., USA). The differences were considered to be significant when the significance level p < 0.1. Compared with the bootstrap, the RCM covered all the combinations of the samples, hence the corresponding sample size for a given standard error is smaller (Fig. 3). When the standard error was larger than 0.015 cm3 cm3 (equivalent to RE = 5% and CL = 69%), the estimated NSS values using the RCM were similar to those by the statistical (insignificant with p = 0.477) and geostatistical (p = 0.543)
Estimating the necessary sampling size of surface soil moisture at different scales
313
Table 1 Statistical properties (Min: minimal, Max: maximal, h: mean, r: standard error, and CV: coefficient of variation) of surface soil moisture at different scales and measurement times in Exps. 1 and 2 Scale (m)
Sampling date
Min (cm3 cm3)
Max (cm3 cm3)
h (cm3 cm3)
r (cm3 cm3)
CV
10
12 22 23 11 12
May 2006 May 2006 May 2006 June 2006 June 2006
0.146 0.306 0.273 0.290 0.246
0.260 0.326 0.314 0.310 0.328
0.216 0.315 0.295 0.305 0.288
0.019 0.004 0.010 0.006 0.017
0.090 0.013 0.033 0.020 0.060
20
12 May 2006 22 May 2006 13 June 2006
0.150 0.271 0.303
0.227 0.337 0.329
0.208 0.302 0.279
0.025 0.015 0.020
0.120 0.050 0.073
40
12 May 2006 23 May 2006
0.140 0.262
0.255 0.377
0.200 0.282
0.032 0.029
0.160 0.102
55
23 May 2005
0.210
0.400
0.304
0.056
0.184
80
12 May 2006 9 June 2006
0.120 0.031
0.250 0.148
0.190 0.090
0.034 0.026
0.180 0.290
160
12 May 2006 19 May 2006 20 May 2006 26 May 2006 1 June 2006
0.080 0.060 0.040 0.131 0.091
0.278 0.211 0.128 0.272 0.233
0.180 0.130 0.080 0.190 0.161
0.038 0.030 0.022 0.032 0.032
0.210 0.231 0.270 0.170 0.200
Figure 2 Spatial distribution of the measured surface soil moisture (cm3 cm3) in the 55 m-scale plot on 23 May, 2005 (3 days after an irrigation). The cross symbols represent the sample locations.
methods, but larger than those by the stratified sampling (significant with p = 0.086). If the standard error was between 0.015 and 0.005 cm3 cm3 (equivalent to RE = 5% and CL = 99%), the estimated NSS values through the RCM were larger than those by the stratified sampling (p = 0.078), but similar to those by the geostatistics (p = 0.68). For smaller standard errors, the estimated NSS values using the RCM became much smaller than those using
other methods (Fig. 3). The estimated NSS using the RCM under CL = 95% and RE = 5% was 38. Besides the sampling size, the spatial distribution of prediction errors is another important aspect to evaluate the sampling methods. A random realization from the 121 samples in Exp. 1 was used to compare the spatial distributions of the prediction errors from the sampling methods. Under CL = 95% and RE = 5%, values of the NSS in Exp. 1 were
314
C. Wang et al. 0.06
0.05
Standard error(cm3cm-3)
Statistics Geostatistics
0.04
Stratified sampling Bootstrap
0.03
RCM
0.02
0.01
0 0
10
20
30
40
50
60
70
80
Sample sizes
Figure 3 Standard errors of the estimated mean surface soil moisture for different sample sizes using the different sampling methods in the 55 m-scale plot.
estimated as 55, 52, 38, 35, and 26, respectively, using the statistical, bootstrap, RCM, geostatistical, and stratified sampling methods. With the different sampling sizes (i.e. 55, 52, 38, 35, and 26 samples were randomly selected from the 121 observations in Exp. 1) and using ordinary kriging interpolation, the spatial distributions of the relative prediction errors from the different methods were evaluated. As shown in Fig. 4a–e, in most part of the plot, the relative prediction errors produced by the sampling methods were within 15%. The average relative errors between the measurements and the estimations for the statistical, bootstrap, RCM, geostatistical, and stratified sampling methods were 7.5%, 8.0%, 8.9%, 9.5%, and 9.5%, respectively. Although with much smaller sample sizes, the RCM described the soil moisture distributions comparatively well as the statistical and bootstrap methods. Among the discussed sampling methods, the sampling efficiency of the geostatistics, the stratified sampling, and the RCM is much higher than that by the statistical and the bootstrap methods. Although with the highest efficiency, the stratified sampling must depend on the stratification to the spatial distribution of soil moisture at each sampling time and the sampling allocation in each stratum, which are difficult to be fulfilled at a short time for soil moisture measurements. The sampling design of the geostatistics is based on the semivariance of soil moisture, which may not be available in many situations. Therefore, it is favorable to use the RCM to estimate the NSS.
Estimation of the NSS at different scales using the RCM Measured values of the surface soil moisture on 12 May 2006 (11 days after an irrigation, Fig. 5) in Exp. 2 are used to estimate the NSS values at different scales using the RCM. The standard error, mean, and CV of the measurements are listed in Table 1. Values of the CL for different sample sizes at the scales were calculated on the basis of RE 6 10% and
5%, respectively (Fig. 6a and b). The results showed that the CL values increased and approached to 1.0 with increasing sample sizes. The feature of the CL values approaching to 1.0 for the different cases supported the assumption that 100 sample sizes were sufficient to represent the plots in Exp. 2. For a fixed CL, more sample sizes were needed for a smaller RE (5%) or at a larger scale. In this moderate dry period, the estimated NSS values with CL = 95% and RE = 5% for the scales of 10, 20, 40, 80, and 160 m were 12, 18, 29, 34, and 41, respectively. Based on the estimated NSS values under CL = 95% and RE = 5%, spatial distributions of the relative prediction errors were evaluated and are shown in Fig. 7a–e for the scales of 160, 80, 40, 20, and 10 m, respectively. The relative errors in most part of the plots and the average prediction errors for the different scales were less than 15%. Because of the lowest sampling density at the 160 m-scale (0.0016 m2) and the smallest number (12 samples) at the 10 m-scale, which might result in larger errors when using the kriging interpolation, the average relative errors at the scales of 10 and 160 m were larger than those at the other scales. Sampling design should be based on the soil variability and more samples are needed to estimate the mean values for a field with higher variability (Hupet and Vanclooster, 2004). The variability degree is often characterized by the CV. In the statistical method (Eq. (1)), the NSS is proportional to the square of CV on the assumptions of data independence and a normal distribution of the samples. However, practical sampling values often show spatial dependence (Warrick, 2003; Brocca et al., 2006). The CV is influenced by the sampling domain as well as variable properties. Large scales (sampling domains) usually include high variation (viz. high value of CV), as shown by the results in Table 1 for the different scales on 12 May 2006. On the other hand, CV values of the surface soil moisture may change with time and increase with the drying process (Famiglietti et al., 1998; Jacobs et al., 2004; Brocca et al., 2006). In the experiment, the surface soil moistures at the 100 locations in the 160 m-scale plot were measured five times during various periods (Table 1). The mean values of soil moisture changed from 0.19 to 0.08 cm3 cm3, and the corresponding CV values increased from 0.17 to 0.27. With 17 series of the 100 sample data at 10, 20, 40, 80, and 160 m scales in Exp. 2, values of the NSS under CL = 95% and 90% and RE = 10% and 5% were estimated using the RCM. A relationship between the NSS and the corresponding CV was established. As shown in Fig. 8a, the NSS increased almost linearly with the CV for RE = 10% or RE = 5%. Four linear regression functions, under CL = 90% and 95% and RE = 10% and 5%, were fitted with the coefficients of determination (R2) P 0.98 (Table 2 and Fig. 8a). To check the reliability of the fitted linear equations, estimated NSS values in the 55 m-scale plot (Exp. 1, CV = 0.184) using the regression equations in Table 2 are compared with that by the RCM. Under CL = 95%, NSS values estimated by the RCM were 12 and 38 for RE = 10% and 5%, while NSS values estimated by the fitted equations were 13 and 35, respectively. In the same way, under CL = 90%, the same NSS values were estimated by the RCM and the fitted equation, that is, 10 and 28 for RE = 10% and 5%, respectively.
Estimating the necessary sampling size of surface soil moisture at different scales
315
Figure 4 Spatial distributions of the relative error (RE) in the 55 m-scale plot on 23 May, 2005 using: (a) the statistical method, (b) the bootstrap method, (c) the random combination method (RCM), (d) the geostatistical method, and (e) the stratified sampling method.
The relationship between the NSS and CV was further discussed as follows. According to Cochran (1977), the variance of the sampling without replacement for the finite population is (1 m/n)r2/m, in which n and m represent
the population size and the sampling size, respectively. For the normal distribution, the probability P0 for a random value z within ±t1a/2 standard deviation of the mean h is expressed by
316
C. Wang et al.
Figure 5 Spatial distributions of the measured surface soil moisture (cm3 cm3) on 12 May, 2006 (11 days after an irrigation) at different scales of: (a) 160 m, (b) 80 m, (c) 40 m, (d) 20 m, and (e) 10 m.
Estimating the necessary sampling size of surface soil moisture at different scales 1
Confidence level
0.8
0.6
0.4
10m
20m
40m
80m
160m
0.2 4
0
8
12
16
20
Sample sizes 1
Confidence level
0.8
for a fixed CV was reduced significantly and nearly identical to the result from the RCM. Without the requirements of data independence and the normal distribution of measured data, the RCM should be more applicable than Eq. (11) in estimating the NSS. To demonstrate the interrelation between the NSS and CV with various population sizes n, estimated values of the NSS with CV for different n values (10, 40, 100, and 500) using Eq. (11) are also shown in Fig. 8b. For the same CV, the estimated NSS by Eq. (11) increased and approached to the result from Eq. (1) (the statistical method) with increasing n. The CV values of soil moisture change with time and the environment. For example, CV increased with the drying process of soils and the spatial scales in this study. To obtain rational NSS estimations for soil moisture measurements, preliminary sampling work may be needed to determine the relationship between the NSS and CV (i.e. the regression equations in Table 2). Through the preliminary sampling, it is also possible to obtain general information about the pattern of CV changing with time and the environment. Fortunately, typical CV ranges for some soil properties, especially for those less time-dependent properties, such as porosity, bulk density and pH, were widely studied and easily obtained in the literature (Beven et al., 1993; Mulla and McBratney, 2002). The proposed method should be useful to design sampling strategies of these soil properties.
0.6
Simplification of the RCM 0.4
0.2
10m
20m
40m
80m
160m
0 0
10
20
30
40
50
Sample sizes
Figure 6 Relationship between the estimated confidence level (CL) and the sample sizes using the random combination method (RCM) at the 10, 20, 40, 80, and 160 m scales under the relative error (RE) of (a) 10% and (b) 5%.
rffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffi t1a=2 r t1a=2 r m m 1 < z < h þ pffiffiffiffi 1 P0 ¼ P h pffiffiffiffi n n m m ð10Þ Given the absolute error d 6 jz hj, the NSS is obtained as n0 ¼
317
1 d2 t21a=2 r2
þ
1 n
¼
1 j2 t21a=2 CV2
þ n1
ð11Þ
Obviously, Eq. (1) is a special case of Eq. (11) as the population approches infinite (viz. n ! 1). Ignoring 1/n results in overestimating the NSS as that using Eq. (1) in the statistical method. Values of the NSS under CL = 95% and RE = 5% for different CV values in Exp. 2 were estimated using Eqs. (1) and (11) (with n = 100) and compared with the result estimated using the RCM (Fig. 8b). When the finite population size n was considered (Eq. (11)), the estimated NSS
Similar to the bootstrap method, the RCM involves intensive computation time, especially under situations with high coefficients of variation and complex data analyses. Therefore, how to enhance the calculation efficiency of the RCM is still a troublesome but essential task. In the RCM, the calculation efficiency is hindered by the huge number of sample combinations resulting from a large population size n. The value of n is usually set as large as possible to ensure to represent the feature of the studied area and obtain the necessary accuracy. In Exp. 2, n was set as 100 and all the data at each scale were used to estimate the NSS. However, Fig. 6a and b shows that it is not necessary to include all the 100 observations in the estimations. Even for the largest scale, if the number (n1) selected randomly from the 100 observations was P48, the relative error of the mean soil moisture between the selected n1 data and the total n (=100) observations was within 5% (Fig. 6b, CL = 99%). The results suggested that the NSS might be estimated with a smaller sample size n1 (than the original population size n), which would enhance the calculation efficiency greatly. As a starting point, the minimum computation sample size (n1) was set not less than the NSS value estimated by the statistical method. Considering that the NSS under a given CL and RE depends on CV (or the mean and standard error) of the total samples, we chose sub-sets of n1 samples with the same values of CV from the original n samples. Two series of 100 observations in the 20 m-scale plot with CV = 0.05 and 0.12, and mean soil moisture h ¼ 0:302 and 0.208 cm3 cm3 (Table 1) were considered. Under CL = 99% and RE = 5%, the NSS values were calculated as 7 and 39 through Eq. (1) for CV = 0.05 and 0.12, respectively.
318
C. Wang et al.
Figure 7 Spatial distributions of the relative error (RE) under CL = 95% and RE = 5% on 12 May, 2006 at different scales of: (a) 160 m, (b) 80 m, (c) 40 m, (d) 20 m, and (e) 10 m.
Based on the original sample size n = 100, the sub-sample sizes (n1) were set as 90, 80, 70, 60, 50, 40, 30, 20, 10, and 7 for CV = 0.05, and 90, 80, 70, 60, 50, and 39 for CV = 0.12, respectively, to test the effect of n1 on the result of CL vs. sample sizes. Different samples for n1 = 90, 80, 70,
60, 50, 40, 39, 30, 20, 10, and 7 were randomly generated by the Monte-Carlo method, with the mean h ¼ 0:302 and 0.208 cm3 cm3, and CV = 0.05 and 0.12, respectively. Thereafter, values of the CL of different sample sizes were calculated for each n1 using the RCM. The results are
Estimating the necessary sampling size of surface soil moisture at different scales
319
1
60 RE=5%,CL=95% RE=5%,CL=90%
Necessary sampling sizes
45
0.9
RE=10%,CL=95%
Confidence level
RE=10%,CL=90% Fitted 30
0.8
RE=5%
0.7
15
RE=10%
0.6
0 0
0.08
0.16
0.24
3
0
0.32
6
9
12
Sample sizes
Coefficient of variation 1
140 Eq.(1) Eq.(11) (n=100) RCM (n=100) Eq.(11) (n=10) Eq.(11) (n=40) Eq.(11) (n=500)
84
0.8
Confidence level
Necessary sampling sizes
112
56
0.6
RE=5%
0.4
RE=10%
28 0.2
0
0 0
0.08
0.16
0.24
0.32
Coefficient of variation
Figure 8 Relationships between the coefficient of variation (CV) and the necessary sampling size (NSS) estimated by (a) the random combination method (RCM) under CL = 90% and 95% and RE = 10% and 5% and (b) Eq. (1) (the statistical method), Eq. (11) (with the sample sizes n = 10, 40, 100, and 500), and the random combination method (RCM) (n = 100) under CL = 95% and RE = 5%.
6
12 18 Sample sizes
24
30
Figure 9 Relationships between the estimated confidence level (CL) and the sampling sizes using the simplified random combination method in the 20 m-scale field under RE = 5%, 10% and the coefficient of variation (CV) of (a) 0.05 and (b) 0.12.
Table 2 Relationships between the necessary sampling size (NSS, n0) and the coefficient of variation (CV) under the confidence level of CL = 90% and 95% and the relative error of RE = 5% and 10% RE
CL
5%
90%
95% 10%
90% 95%
Regression equation 188:33 CV 7:46; n0 ¼ 1; 222:74 CV 6:61; n0 ¼ 1; 77:45 CV 4:74; n0 ¼ 1; 99:83 CV 5:61; n0 ¼ 1;
Coefficient of determination CV P 0:03 CV < 0:03
R2 = 0.99
CV P 0:02 CV < 0:02
R2 = 0.99
CV P 0:06 CV < 0:06
R2 = 0.98
CV P 0:05 CV < 0:05
R2 = 0.98
320 presented in Fig. 9a and b for CV = 0.05 and 0.12, respectively. The short error bars in the figures indicated that the estimated results from different n1 were similar to that from the 100 measured data, all within the standard error of 5% for different sample sizes. Meanwhile, the changing tendency of CL with the sample sizes was stable and smooth without any fluctuations. In addition, reducing the total computation sample size from n to n1 resulted in decreasing computer time exponentially. For example, given CL = 95% and RE = 5% for CV = 0.12, the running time of a personal computer (Pentium IV, CPU 2.93 GHz, Memory 256 MB) for the sample sizes of n1 = 100, 70, and 39 was 82.12, 19.63, and 0.12 h, respectively. If n1 was set larger than the estimated NSS using the statistical method under CL = 99% and RE = 5%, the simplification procedure of the RCM provided reasonable NSS estimations and enhanced the calculation efficiency dramatically.
Conclusions Field experiments were conducted to measure soil moisture at various scales. Based on the bootstrap sampling method, a random combination method (RCM) was proposed and applied to estimate the necessary sampling sizes (NSS) of soil moisture at different scales. To show the applicability of the RCM, the method was compared with other commonly used sampling methods, including the statistical, geostatistical, stratified sampling, and bootstrap methods. Compared with the statistical and bootstrap methods, the RCM reduced the sampling sizes greatly under the same accuracy. The RCM should be more applicable in terms of less information requirement, while the statistical and stratified sampling methods require independent data with the normal distribution, the stratified sampling method needs information of stratified allocation, and the geostatistical method needs the semivariance model. Based on the experimental data of surface soil moisture, the RCM was used to estimate NSS values at different scales (squares with sides of 10, 20, 40, 80, and 160 m). The estimated NSS values for 10, 20, 40, 80, and 160 m scales in a moderate dry period were 12, 18, 29, 34, and 41, respectively, under the confidence level (CL) of 95% and the relative error (RE) of 5%. The spatial variability of surface soil moisture, represented by the coefficient of variation (CV), depended on the spatial scale and the drying degree of soils. The value of CV increased with the drying process of surface soils and the spatial scale. Seventeen series of soil moisture data with different mean and CV values at different scales were used to establish a relationship between CV and the NSS. In the studied area, the estimated NSS values under different CL and RE values were found to be linear functions of the CV. Considering the computation intensive nature of the RCM, we simplified the RCM procedure to enhance the calculation efficiency. The simplified RCM provided the same accurate results as the RCM, yet reduced computation time tremendously. Potentially, besides surface soil moisture, the RCM can be used to estimate the NSS of other soil properties, especially for less time-dependent properties, such as pH, texture, porosity, and bulk density. As a key parameter of
C. Wang et al. estimating the NSS using the RCM, ranges of CV for soil properties may be obtained through preliminary sampling investigations or retrieved from the literature. In this study, the sampling domain was limited within the 160 m scale and the research field was with relatively homogeneous soils and a single crop. The validity of RCM in larger scales and in fields with mixing crops and heterogeneous soils needs further research.
Acknowledgements The authors thank Mr. Xiangming Zhu, Mr. Hesong Yang, and Miss Wenjuan Zheng for their helps in the experiments. This study was supported partly by the National Key Basic Research Special Funds (NKBRSF, 2006CB403406), China, the National Natural Science Foundation of China (50339030), and the Program for New Century Excellent Talents in Universities (NCET-04-0137), Ministry of Education, China.
References Beven, K.J., Henderson, D.E., Reeves, A.D., 1993. Dispersion parameters for undisturbed partially saturated soil. Journal of Hydrology 143, 10–43. Brocca, L., Morbidelli, R., Melone, F., Moramarco, T., 2006. Soil moisture spatial variability in experimental areas of central Italy. Journal of Hydrology. doi:10.1016/j.jhydrol.2006.09.00. Brus, D.J., Spa ¨tjens, L.E.E.M., de Gruijter, J.J., 1999. A sampling scheme for estimating the mean extractable phosphorus concentration of fields for environmental regulation. Geoderma 89, 129–148. Burgess, T.M., Webster, R., 1980. Optimal interpolation and isarithmic mapping of soil properties. I. The semi-variogram and punctual kriging. Journal of Soil Science 31, 315–331. Chaplot, V., Walter, C., 2003. Subsurface topography to enhance the prediction of the spatial distribution of soil wetness. Hydrological Processes 17, 2567–2580. Chung, C.K., Chong, S., Varsa, E.C., 1995. Sampling strategies for fertility on a stony silt loam soil. Communication of Soil Science and Plant Analysis 26, 741–763. Cochran, W.G., 1977. Sampling Techniques, third ed. John Wiley & Sons, Inc., New York, pp. 89–110. Dane, J.H., Reed, R.B., Hopmans, J.W., 1986. Estimating soil parameters and sample size by bootstrapping. Soil Science Society of America Journal 50, 283–287. De Lannoy, G.J.M., Verhoest, N.E.C., Houser, P.R., Gish, T.J., Van Meirvenne, M., 2006. Spatial and temporal characteristics of soil moisture in an intensively monitored agricultural field (OPE3). Journal of Hydrology 331, 719–730. Di, H.J., Trangmar, B.B., Kemp, R.A., 1989. Use of geostatistics in designing sampling strategies for soil survey. Soil Science Society of America Journal 53, 1163–1167. Efron, B., Tibshirani, R.J., 1993. An introduction to the bootstrapMonographs on Statistics and Applied Probability, vol. 57. Chapman & Hall Press, New York, USA, pp. 45–56. Famiglietti, J.S., Rudnicki, J.W., Rodell, M., 1998. Variability in surface moisture content along a hill slope transect: Rattlesnake Hill, Texas. Journal of Hydrology 210, 259–281. Fang, K., Wang, S., Wei, G., 2001. A stratified sampling model in spherical feature inspection using coordinate measuring machines. Statistics and Probability Letters 51, 25–34. Gilbert, R.O., 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold Company, Inc., New York, pp. 26–54.
Estimating the necessary sampling size of surface soil moisture at different scales Hupet, F., Vanclooster, M., 2002. Intraseasonal dynamics of soil moisture variability within a small agricultural maize cropped field. Journal of Hydrology 261, 86–101. Hupet, F., Vanclooster, M., 2004. Sampling strategies to estimate field real evapotranspiration fluxes with a soil water balance approach. Journal of Hydrology 292, 262–280. Jacobs, J.M., Mohanty, B.P., Hsu, E., Miller, D., 2004. SMEX02: field scale variability, time stability and similarity of soil moisture. Remote Sensing of Environment 92, 436–446. Kamgar, A., Hopmans, J.W., Wallender, W.W., Wendroth, O., 1993. Plot size and sample number for neutron probe measurements in small field trials. Soil Science 156, 213–224. Manly, B.F.J., 1997. Randomization, bootstrap and Monte Carlo methods in biology, second ed. Chapman & Hall Press, London, pp. 34–35. Mcbratney, A.B., Webster, R., 1983. How many observations are needed for regional estimation of soil properties? Soil Science 135, 177–183. Miyamoto, S., Chacon, A., Hossain, M., Martinez, I., 2005. Soil salinity of urban turf areas irrigated with saline water: I. Spatial variability. Landscape and Urban Planning 71, 233–241. Mulla, D.J., McBratney, A.B., 2002. Soil spatial variability. In: Warrick, A.W. (Ed.), Soil Physics Companion. CRC Press, Boca Raton, FL, USA, pp. 343–373. Narayan, U., Lakshmi, V., Njoku, E.G., 2004. Retrieval of soil moisture from passive and active L/S band sensor (PALS) observations during the soil moisture experiment in 2002 (SMEX02). Remote Sensing of Environment 92, 483–496.
321
Park, S.J., van de Giesen, N., 2004. Soil-landscape delineation to define spatial sampling domains for hillslope hydrology. Journal of Hydrology 295, 28–46. Sastre, J., Vidal, M., Rauret, G., Sauras, T., 2001. A soil sampling strategy for mapping trace element concentrations in a test area. Science of the Total Environment 264, 141– 152. Seah, S.K.L., Wong, T.Y., Foster, P.J., Ng, T.P., Johnson, G.J., 2002. Prevalence of lens opacity in Chinese residents of Singapore: the tanjong pagar survey. Ophthalmology 109, 2058–2064. Skopp, J., Kachman, S.D., Hergert, G.W., 1995. Comparison of procedures for estimating sample numbers. Communications in Soil Science and Plant Analysis 26, 2559–2568. Treasurer, J.W., Pope, J.A., 2000. Selection of host sample number and design of a monitoring programme for ectoparasitic sea lice (Copepoda: Caligidae) on farmed Atlantic salmon, Salmo salar. Aquaculture 187, 247–260. Warrick, A.W., 2003. Soil Water Dynamics. Oxford University Press, Inc., New York, pp. 33–35. Western, A.W., Zhou, S., Grayson, R.B., McMahon, T.A., Blo ¨schl, G., Wilson, D.J., 2004. Spatial correlation of soil moisture in small catchments and its relationship to dominant spatial hydrological processes. Journal of Hydrology 286, 113–134. Zhang, R., 2005. Applied Geostatistics in Environmental Science. Science Press USA Inc., Monmouth Junction, USA, pp. 91–92.