Environmental Modelling & Software 20 (2005) 891e902 www.elsevier.com/locate/envsoft
A study on sensitivity of spatial sampling designs to a priori discretization schemes M.C. Buesoa, J.M. Angulob,), F.J. Alonsob, M.D. Ruiz-Medinab a
Departamento de Matema´tica Aplicada y Estadı´stica, Universidad Polite´cnica de Cartagena, Spain b Departamento de Estadı´stica e Investigacio´n Operativa, Universidad de Granada, Spain Received 4 January 2003; received in revised form 18 November 2003; accepted 30 March 2004
Abstract In a previous paper (Environ. Ecol. Stat. 5 (1998) 29.) we presented an entropy-based approach to spatial sampling design in a state-space model framework. We now address the problem of sensitivity of optimal designs with respect to the configuration of the set of potential observation sites considered, as well as to the model specifications. The latter involve both the spatial dependence structure of the variable of interest and its relationship with the observable variable. To analyze several aspects related to this problem, we have developed an extensive empirical study, from which we conclude the critical influence that the a priori selection of candidate observation sites can have on the final sampling designs for different situations. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Lattice refinement; Ratio of information; Shannon’s entropy; Spatial stochastic process; State-space model
1. Introduction In this work, we consider the problem of estimating a spatial process, non-directly observable, from the observation of a related spatial process, on a continuous domain. Generally, approaches to sampling network design introduced in the literature assume a finite set of potentially observable sites to be specified at a first stage of the study (see, for example, Caselton and Zidek, 1984; Christakos, 1992; Cressie, 1993; Pe´rez-Abreu and Rodrı´ guez, 1996; Mu¨ller, 2001; and references therein). This assumption is natural in cases where, for example, there exists a previous network of observation stations, as commonly occurs in groundwater hydrology when measurements are taken at a subset of available wells. On the other hand, in cases where such a network does not exist, spatial discretization is usually performed on
) Corresponding author. Tel.: C34-958240492; 958243267. E-mail address:
[email protected] (J.M. Angulo).
fax:
C34-
1364-8152/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2004.03.019
the continuous region to be sampled, often considering the nodes in a regular mesh. In this case, the level of refinement used in the discretization will affect the final results in the design. A higher grid refinement will lead to an excessive number of nodes and a higher computational cost in the search for the optimal network. In contrast, if the grid does not properly represent the whole sampling region, the design will be more sensitive to the placement of the locations where the variable of interest must be estimated. Network design based on entropy criteria has been considered by Caselton and Husain (1980); Caselton and Zidek (1984); Caselton et al. (1992); Wu and Zidek (1992); Zidek et al. (2000), among others. In this paper, we use this approach in a state-space model framework, as introduced in Bueso et al. (1998). Formally, assume that we are interested in estimating a spatial process X at a finite set of locations , by sampling a related spatial process Y, potentially observable on a continuous region. Assume that, for practical purposes, this region has been discretized by considering a regular mesh P composed of a finite set of sites. As a criterion for the definition of an optimal sampling
892
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
design, we follow the approach consisting of the selection of an appropriate subset S from the set of candidate sites to be sampled P which provides maximum information on X at L. Information is defined here in the sense of Shannon’s entropy. At this stage, the joint distribution of the variables involved is assumed to be known or estimated from historical data. This setup can be extended to the multivariate case, considering different degrees of importance assigned to each one-dimensional variable and/or to each location of interest, as introduced in Bueso et al. (1999); Bueso and Angulo (1999), and Angulo and Bueso (2001). Under the same entropy-based approach, Angulo et al. (2000) have studied some relevant aspects inherent in the design problem in the spatial-temporal context. In this paper, an extensive empirical study is carried out to show the influence of the a priori level of the discretization considered in the continuous region to be sampled on the final optimal designs, specifically in relation to the ordering of the candidate sites as well as to the corresponding information outcomes. In Section 2, we briefly summarize the entropy-based criterion presented in Bueso et al. (1998). The most relevant results derived from the empirical study performed are described in detail in Section 3. We conclude in Section 4 with further comments on the main aspects raised by this study.
2. Entropy-based spatial sampling design Caselton and Husain (1980); Caselton and Zidek (1984) developed an entropy-based approach to the sampling network design problem. Applications of this approach have been considered, for example, in Caselton et al. (1992); Wu and Zidek (1992); Guttorp et al. (1993), and Zidek et al. (2000). In our study we adopt the entropy-based criterion formulated in a state-space model framework as introduced in Bueso et al. (1998). This criterion consists of finding a set of locations S (selected from a certain class of admissible sets of candidate sites where variable Y can be observed) that maximizes the amount of information on XL (random vector whose co-ordinates represent the spatial process of interest at the set of locations L) contained in YS (observable process at S ),
Table 1 Coordinates of locations of interest s ¼ ðs1 ; s2 Þ ð15; 20Þ ð25; 13Þ ð42; 86Þ ð76; 50Þ
ð20; 10Þ ð25; 20Þ ð43; 75Þ ð77; 55Þ
ð20; 25Þ ð35; 84Þ ð47; 80Þ ð82; 51Þ
ð23; 15Þ ð38; 80Þ ð72; 50Þ ð86; 56Þ
HðXL rYS Þ ¼ EðXL ;YS Þ ½logf ðXL rYS Þ; where f(XL) is the density (or probability mass) function of XL, and f ðXL rYS Þ the conditional density (or probability mass) function of XL given YS. Here EZ denotes the expectation with respect to the distribution of Z. As H(XL) is a fixed quantity, the above criterion reduces to minimizing the mean conditional entropy HðXL rYS Þ; in particular, under the multivariate normality assumption, this is equivalent to minimizing the determinant of the conditional covariance matrix of XL given YS. The design problem is then formulated as an optimization problem whose optimal solution, in cases of high dimensionality, can be difficult to find due to the computational cost involved, derived either from the complexity of the function to be evaluated or from the time required in the search for an optimum set S. To compare the effectiveness of a design S with respect to a given, more informative, design S#, we obtain the proportion of information provided by S with respect to that given by S#: RS# ðSÞ ¼
IðYS ; XL Þ ; IðYS# ; XL Þ
referred to in this paper as ‘ratio of information’. When S# is considered to be the most informative design P, the
100
80
60
40
IðYS ; XL Þ ¼ HðXL Þ HðXL rYS Þ; with H(XL) representing the entropy of XL, HðXL Þ ¼ EXL ½logf ðXL Þ; and HðXL rYS Þ the mean conditional entropy of XL given YS,
20
0 0
20
40
60
Fig. 1. Locations of interest.
80
100
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
893
Fig. 2. Curves of ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, vs. the number of points in the scheme ðkC1ÞðkC1Þ for each value of the error variance s2e varying the value of the range a.
894
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
Table 2 Values of ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, for a ¼ 15 and for each value of the error variance s2e Scheme
4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400
s2e 0.1
0.5
1
1.5
2
2.5
0.00001 0.00039 0.00181 0.13394 0.10668 0.18398 0.27456 0.35500 0.36167 0.55011 0.42340 0.57922 0.61753 0.80025 0.80985 1.00000 0.76329 0.96359 0.98254
0.00001 0.00039 0.00185 0.11986 0.10217 0.17560 0.25370 0.31783 0.34586 0.46510 0.42159 0.56156 0.59127 0.76763 0.78465 0.95209 0.78841 0.94703 1.00000
0.00001 0.00038 0.00180 0.10884 0.09574 0.16436 0.23439 0.29104 0.32595 0.42726 0.40755 0.53702 0.56727 0.73510 0.76018 0.91423 0.79066 0.93516 1.00000
0.00001 0.00037 0.00175 0.10202 0.09113 0.15637 0.22176 0.27458 0.31190 0.40543 0.39600 0.51957 0.55127 0.71375 0.74408 0.89163 0.78990 0.92895 1.00000
0.00001 0.00036 0.00171 0.09731 0.08771 0.15048 0.21275 0.26314 0.30148 0.39043 0.38679 0.50646 0.53944 0.69824 0.73222 0.87598 0.78837 0.92477 1.00000
0.00001 0.00036 0.00167 0.09382 0.08508 0.14595 0.20598 0.25464 0.29344 0.37928 0.37936 0.49623 0.53022 0.68632 0.72298 0.86428 0.78668 0.92164 1.00000
latter ratio is very useful to determine the designs that achieve a given target value for RP().
3.1. Set-up
a model depending on certain parameters that allow us to specify different degrees of spatial dependence between the variables of interest and between these and the observable variables. Specifically, we assume that observation of the random process of interest X is affected by additive noise according to the observation equation
To illustrate the influence of the specification of P on the sampling designs, we have considered in the study
YðsÞ ¼ XðsÞCeðsÞ;
3. Effect of the specification of P on optimal designs: an empirical study
Table 3 Values of ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, for a ¼ 150 and for each value of the error variance s2e Scheme
4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400
s2e 0.1
0.5
1
1.5
2
2.5
0.05144 0.14323 0.22421 0.37093 0.40817 0.49064 0.55583 0.60399 0.64591 0.70234 0.71572 0.77796 0.79859 0.87285 0.89023 0.95097 0.91626 0.97122 1.00000
0.05227 0.14758 0.23547 0.35111 0.40328 0.47860 0.53916 0.58728 0.63393 0.68295 0.71477 0.76405 0.79359 0.84814 0.87632 0.92073 0.92737 0.96844 1.00000
0.04641 0.13432 0.22053 0.32429 0.38293 0.45661 0.51793 0.56890 0.61813 0.66763 0.70503 0.75270 0.78666 0.83679 0.86896 0.91087 0.92830 0.96731 1.00000
0.04135 0.12216 0.20539 0.30296 0.36472 0.43795 0.50028 0.55342 0.60457 0.65516 0.69569 0.74352 0.78018 0.82908 0.86358 0.90514 0.92762 0.96636 1.00000
0.03737 0.11222 0.19235 0.28563 0.34907 0.42205 0.48524 0.54003 0.59269 0.64435 0.68713 0.73551 0.77418 0.82278 0.85904 0.90082 0.92653 0.96548 1.00000
0.03422 0.10411 0.18128 0.27118 0.33550 0.40826 0.47211 0.52821 0.58210 0.63474 0.67928 0.72833 0.76862 0.81731 0.85500 0.89720 0.92530 0.96466 1.00000
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
895
Fig. 3. Curves of ratios of information RP0 ðS˜k;l Þ, k ¼ 1; .; 19, vs. the number of (sequentially) selected sites l ¼ 1; .; ðkC1ÞðkC1Þ, for a ¼ 15 and s2e ¼ 0:1 an‘d 2.5.
with e() being a zero-mean spatial random process satisfying the condition CovðeðsÞ; eðs#ÞÞ ¼ ds;s# s2e ;
cs; s#;
where s2e ¼ VarðeðsÞÞ is constant and X() and e() are mutually independent. We also assume that the variables involved have a Gaussian multivariate distribution and that X has zero mean and an exponential variogram with range a, h gðhÞ ¼ 1 exp 3 ; a
considering different values for the error variance s2e and for the range of the spatial dependence a. In all the cases studied, the set L is composed of 16 sites clustered in three groups within the domain defined by a 100!100 square (see Table 1 and Fig. 1). For the region to be sampled we have considered the whole domain. Several discretizations of this region have been performed by using different levels of refinement leading to regular grids whose nodes have been adopted as the set P. More precisely, we have considered 19 different sampling schemes defined by the sets of points
where the distance between two points, h represents h ¼ s s# . This variogram model is widely used in geostatistical applications (see, for example, Kitanidis, 1997). The comparative study has been carried out
Pk ¼
100ði 1Þ ; k 100ð j 1Þ s2 ¼ ; i; j ¼ 1; .; kC1 ; k
ðs1 ; s2 Þ : s1 ¼
896
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
Fig. 4. Curves of ratios of information RP0 ðS˜k;l Þ, k ¼ 1; .; 19, vs. the number of (sequentially) selected sites l ¼ 1; .; ðkC1ÞðkC1Þ, for a ¼ 150 and s2e ¼ 0:1 and 2.5.
for k ¼ 1; .; 19. Here, we refer to sampling schemes as the sets of candidate sites from which sampling designs are selected. For the interpretation of the results below, it is important to note that the Pk sets do not follow a nested construction. For each model parameter specification, the sites in each sampling scheme have been ordered based on the information gain obtained by the sequential addition of single sites. That is, in the first stage the best site to be sampled is selected; in the second stage, the best site is chosen from the remaining sites and added to the sampling set, and so on. For k ¼ 1; .; 19, denote by S˜ k;l , l ¼ 1; .; ðkC1ÞðkC1Þ, the optimal sequential design composed by the l sites obtained for each sampling scheme Pk. We compare the amount of information
provided by the set S˜ k;l with that provided by Pk, in terms of the ratios of information
RPk ðS˜ k;l Þ ¼
IðYS˜ k;l ; XL Þ IðYPk ; XL Þ
:
From P1 ; .; P19 , we also select the most informative sampling scheme, denoted by P0, and compare the amount of information provided by each set Pk, k ¼ 1; .; 19, by considering the ratios of information relative to P0,
897
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902 Table 4 Values of ratios of information RP0 ðS˜ k;l Þ, k ¼ 1; .; 19 and l ¼ 1; .; 10, for a ¼ 15 and s2e ¼ 0:1 Number of points
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
RP0 ðPk Þ ¼
Scheme 4
9
16
25
36
49
64
81
100
121
0.00001 0.00001 0.00001 0.00001
0.00021 0.00036 0.00037 0.00038 0.00039 0.00039 0.00039 0.00039 0.00039
0.00068 0.00094 0.00117 0.00138 0.00159 0.00168 0.00176 0.00179 0.00181 0.00181
0.09915 0.12013 0.13161 0.13276 0.13331 0.13352 0.13368 0.13382 0.13387 0.13390
0.05673 0.08864 0.10093 0.10267 0.10428 0.10474 0.10506 0.10531 0.10554 0.10573
0.06073 0.11713 0.14536 0.16192 0.17269 0.17641 0.17879 0.18000 0.18120 0.18233
0.09667 0.17669 0.20040 0.22355 0.23700 0.24783 0.25491 0.26090 0.26639 0.26939
0.13048 0.22963 0.25615 0.27660 0.29493 0.30835 0.31964 0.33054 0.33645 0.34218
0.09341 0.15082 0.19873 0.24017 0.27098 0.29170 0.30667 0.31997 0.33052 0.34025
0.22804 0.28477 0.34030 0.38883 0.41929 0.44842 0.46393 0.47571 0.48669 0.49753
144 0.06381 0.11305 0.15909 0.19889 0.23541 0.26727 0.29368 0.31920 0.34368 0.35909
169 0.09915 0.17376 0.23165 0.28735 0.34277 0.38352 0.41511 0.43887 0.46178 0.48112
196 0.14414 0.23257 0.27972 0.32350 0.36520 0.40158 0.43376 0.46351 0.48437 0.50428
225 0.12221 0.21889 0.29878 0.36387 0.42588 0.47768 0.52503 0.56024 0.59102 0.62089
256 0.15165 0.21218 0.27166 0.32809 0.38273 0.43397 0.48327 0.52886 0.56450 0.59829
289 0.13048 0.23811 0.33727 0.41600 0.49390 0.56775 0.64042 0.70305 0.74373 0.77905
324 0.06957 0.13532 0.20005 0.25173 0.30215 0.34995 0.39195 0.43362 0.47012 0.49548
361 0.17056 0.26052 0.32244 0.37932 0.43572 0.48421 0.53158 0.57529 0.61696 0.65369
400 0.08567 0.17027 0.24631 0.31004 0.36801 0.42345 0.47747 0.53023 0.58261 0.62842
IðYPk ; XL Þ : IðYP0 ; XL Þ
In addition, the information given by the designs S˜ k;l , l ¼ 1; .; ðkC1ÞðkC1Þ, k ¼ 1; .; 19, is compared with the information provided by P0 using
RP0 ðS˜ k;l Þ ¼
IðYS˜ k;l ; XL Þ IðYP0 ; XL Þ
:
Note that if we were interested in finding globally optimal designs of fixed size, we could use available procedures in the literature, e.g. Ko et al.’s (1995) algorithm to maximize entropy and its adaptation to a state-space model framework by Bueso et al. (1998). In the empirical study we have considered the values 0.1, 0.5, 1, 1.5, 2 and 2.5 for the error variance s2e , and the values 15, 30, 60, 90, 120 and 150 for the range a. The most significant aspects observed in the results obtained are described below. 3.2. Simulation results In Fig. 2 the curves of the ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, are represented for each value of the error variance s2e , varying the value of the range a. In
all cases, the most informative scheme is P19, corresponding to the sampling scheme with the largest number of points, except for the case of a ¼ 15 and s2e ¼ 0:1, where the optimum is given by the scheme defined by the grid with 289 points, P16. In the last case, the configuration of the locations provides more information on XL than the other schemes due to the low values of a and s2e , as well as to the configuration of set L. Note that in most of these curves the information on XL increases as the total number of points in the scheme increases. However, there are certain schemes having a lower number of points that are more informative than others with a higher density. In the graphs, this is reflected by a change of sign in the slope. The values of these ratios of information (truncated to five decimal digits) for the values of a ¼ 15 and a ¼ 150, for all the values of s2e considered, are shown in Tables 2 and 3, respectively. Observe that in the first case the scheme with 25 points is more informative than that with 36 points, and the differences between the values of the corresponding ratios of information decrease as the value of s2e increases. This behavior can also been seen in the schemes with 289 points and 324 points. As commented above, the best scheme for the model parameters a ¼ 15 and s2e ¼ 0:1 is composed of 289 points, and the schemes with 225 and 256 points are better than the scheme with 324 points. In the second case, this effect occurs for the value s2e ¼ 0:1. On the other hand, we can see that the larger the error variance
898
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
Table 5 Number of locations needed to obtain a 50%, 75% and 95% ratio of information RP0 ðS˜ 0; Þ, for each value of the parameters a and s2e a
s2e
Ratio of information 50%
75%
95%
15
0.1 0.5 1 1.5 2 2.5
6 9 10 10 11 11
10 16 17 19 20 21
19 31 35 38 41 43
30
0.1 0.5 1 1.5 2 2.5
7 9 11 13 14 15
14 18 22 25 27 29
27 35 43 50 56 62
60
0.1 0.5 1 1.5 2 2.5
7 10 13 15 17 18
14 20 26 30 34 37
28 42 55 66 75 83
90
0.1 0.5 1 1.5 2 2.5
7 10 14 16 19 21
14 22 29 35 39 43
30 48 64 77 88 98
120
0.1 0.5 1 1.5 2 2.5
7 11 14 17 20 22
15 24 32 38 43 48
31 53 72 86 99 110
150
0.1 0.5 1 1.5 2 2.5
7 11 15 18 21 23
15 26 34 41 47 52
33 57 78 94 109 121
s2e is, the smoother the curves of the ratios of information RP0 ðPk Þ behave, and the same occurs with respect to the range a. These facts lead us to conclude that the a priori configuration of the points from which the final network must be selected can be a crucial aspect in the design problem and there can exist equivalent or quasi-equivalent designs defined by configurations with a different number of points. For the above values of range a, the curves of the ratios of information RP0 ðS˜ k;l Þ obtained when a sequential optimal design is performed are represented in the
plot on the left-hand side of Figs. 3 and 4. The plots on the right-hand side in the same figures represent the details corresponding to the range from 1 to the minimum number of sites needed to obtain 75% of information IðYP0 ; XL Þ. In these plots we can observe some crossings between the ratio-of-information curves which mean that the corresponding configurations are almost equivalent in the neighborhood of the crossing points. Obviously, the values of these ratios depend on the grid defined by the sampling scheme. Thus, for example, the first point selected for a scheme can provide more information than the first chosen for another scheme. This can be noticed, for instance, in the case where a ¼ 15 and s2e ¼ 0:1, with the first point selected from the scheme composed of 121 points, which provides more information than the first of the remaining schemes, and more than the total number of points corresponding to the schemes composed of 4, 9, 16, 25, 36, and 49 points (see Tables 2 and 4). A conclusion from this fact is that, as the number of points in the scheme increases, there is in general more flexibility in the site selection; hence, for a ¼ 15 and s2e ¼ 0:1, the first seven sites selected from the scheme with 289 points give 64% of the global information, more than that provided by the scheme with 196 points or less. Another aspect considered in this study is the location of the sequentially selected sites, as well as the minimum number of sites needed to obtain a certain level of information. For percentages 50, 75 and 95, and for the most comprehensive scheme, the minimum number of sites needed is presented in Table 5. Note that this number increases as either s2e or a increase, and the increment of points is more significant for high values of s2e , that is, when the degree of dependence between X and Y decreases. When the value of s2e is very small, observing variable Y is similar to observing X. On the other hand, it can be noted that, for a fixed value of s2e and increasing the value of a, which means a stronger spatial dependence and, consequently, a higher redundance of information contained in the observations, a larger number of points is required to obtain a certain level of information. The resulting curves of the ratios of information RPk ðS˜ k;l Þ, k ¼ 1; .; 19, l ¼ 1; .; ðkC1ÞðkC1Þ, corresponding to the particular cases of a ¼ 15 and a ¼ 150, for s2e ¼ 0:1 and s2e ¼ 2:5, are represented in Fig. 5. All the curves are increasing and practically merge near the value of 1 (that is why these ratios of information are only represented vs. a subinterval of ½1; 400 where the most interesting features occur). The maps containing the locations selected to provide the levels of 50%, 75% and 95% of information are displayed in Figs. 6 and 7. In general, the sites selected are located close to the three clusters of sites in the set of interest. Note that the greater a and s2e , the larger the minimum number of sites needed to cover the region of interest.
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
899
Fig. 5. Curves of ratios of information RPk ðS˜k;l Þ, k ¼ 1; .; 19, vs. the number of (sequentially) selected sites l ¼ 1; .; ðkC1ÞðkC1Þ, for a ¼ 15 and 150 and s2e ¼ 0:1 and 2.5.
4. Conclusions In this paper we consider the problem of spatial sampling design in a state-space model context in the case where the domain of observation is continuous and spatial discretization is performed to apply sampling selection methods, generally designed for a finite search of the optimal solution. An empirical study has been carried out to show how the level of refinement of the discretization considered affects the configuration of possible alternative sampling designs and how, in some cases, a grid refinement involving a larger number of nodes may not lead to an improvement in the effectiveness of the sampling scheme with respect to the optimal criterion adopted. The results obtained suggest that the a priori definition of a prescribed set from which to select
a sampling scheme can be critical regarding the effectiveness of the results. The aim of the study presented in this paper is to show how this intuitively not surprising fact occurs in association with the values taken by significant parameters involved in the spatial dependence structure. In the study performed here a concrete but widely used model has been considered to illustrate the effect on the optimal designs of the configuration of the prescribed sampling scheme. Similar types of effects would be expected by considering alternative or more complex covariance structure models. We have focused in this paper on the problem related to discretization of the region to be sampled. A further study would be of interest to show the influence on the optimal designs of the consideration of a certain level of refinement in the discretization of the set of locations of
900
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
Fig. 6. Sequentially selected locations (B) needed to obtain a 50%, 75% and 95% ratio of information RP0 ðS˜0; Þ, for the fixed value a ¼ 15 and s2e ¼ 0:1 and 2.5. (Points ( ) represent the locations of interest).
interest. One possible approach to avoid the excessive dimensionality due to increasing refinements would be based on applying stochastic perturbations to location coordinates for obtaining improvement on information (see, for example, Van Groenigen and Stein, 1998). Under a different philosophy, the authors are investigating the sampling design problem in the context of generalized random fields (see Angulo et al., 2003), which can be useful in situations where information on
the variables involved can be measured in terms of averages using appropriate test functions.
Acknowledgements This work has been supported in part by Projects BFM2000-1465 and BFM2002-01836 of the DGI, Ministerio de Ciencia y Tecnologı´ a, Spain. The authors
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
901
Fig. 7. Sequentially selected locations (B) needed to obtain a 50%, 75% and 95% ratio of information RP0 ðS˜0; Þ, for the fixed value a=150 and s2e ¼ 0:1 and 2.5. (Points ( ) represent the locations of interest).
are grateful to the referees for constructive suggestions to improve the paper.
References Angulo, J.M., Bueso, M.C., 2001. Random perturbation methods applied to multivariate spatial sampling design. Environmetrics 12, 631e646.
Angulo, J.M., Bueso, M.C., Alonso, F.J., 2000. A study on sampling design for optimal prediction of space-time stochastic processes. Stochastic Environmental Research and Risk Assessment 14, 412e427. Angulo, J.M., Ruiz-Medina, M.D., Alonso, F.J., Bueso, M.C., 2003. Generalized approaches to spatial sampling design. In: Mateu, J., Holland, D., Gonza´lez-Manteiga, W. (Eds.), Proceedings of the ISI International Conference on Environmental Statistics and Health. Universidade de Santiago de Compostela, pp. 11e19.
902
M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902
Bueso, M.C., Angulo, J.M., 1999. Criteria for multivariate spatial sampling design based on covariance matrix perturbation. In: Go´mez-Herna´ndez, J., Soares, A., Froidevaux, R. (Eds.), geoENV II: Geostatistics for Environmental Applications. Kluwer, Amsterdam, pp. 491e502. Bueso, M.C., Angulo, J.M., Alonso, F.J., 1998. A state-space model approach to optimum spatial sampling design based on entropy. Environmental and Ecological Statistics 5, 29e 44. Bueso, M.C., Angulo, J.M., Cruz-Sanjulia´n, J., Garcı´ a-Aro´stegui, J.L., 1999. Optimal spatial sampling design in a multivariate framework. Mathematical Geology 31, 507e525. Caselton, W.F., Husain, T., 1980. Hydrologic networks: Information transmission. Journal of the Water Resources Planning and Management Division, ASCE 106, 503e520. Caselton, W.F., Kan, L., Zidek, J.V., 1992. Quality data networks that minimize entropy. In: Walden, A., Guttorp, P. (Eds.), Statistics in Environmental and Earth Sciences. Griffin, London, pp. 10e38. Caselton, W.F., Zidek, J.V., 1984. Optimal monitoring network designs. Statistics and Probability Letters 2, 223e227. Christakos, G., 1992. Random Field Models in Earth Sciences. Academic Press, San Diego, CA.
Cressie, N., 1993. Statistics for Spatial Data. Wiley & Sons, New York. Guttorp, P., Le, N.D., Sampson, P.D., Zidek, J.V., 1993. Using entropy in the redesign of an environmental monitoring network. In: Patil, G.P., Rao, C.R. (Eds.), Multivariate Environmental Statistics. Elsevier, New York, pp. 175e202. Kitanidis, P.K., 1997. Introduction to Geostatistics: Applications to Hydrology. Cambridge University Press, New York. Ko, C.-W., Lee, J., Queyranne, M., 1995. An exact algorithm for maximum entropy sampling. Operations Research 43, 684e691. Mu¨ller, W.G., 2001. Collecting Spatial Data: Optimum Design of Experiments for Random Fields. Physica-Verlag, Heidelberg. Pe´rez-Abreu, V., Rodrı´ guez, J.E., 1996. Index of effectiveness of a multivariate environmental monitoring network. Environmetrics 7, 489e501. Van Groenigen, J.W., Stein, A., 1998. Constrained optimization of spatial sampling using continuous simulated annealing. Journal of Environmental Quality 27, 1078e1086. Wu, S., Zidek, J.V., 1992. An entropy-based analysis of data from selected NADP/NTN network sites for 1983e1986. Atmospheric Environment 26A, 2089e2103. Zidek, J.V., Sun, W., Le, N.D., 2000. Designing and integrating composite networks for monitoring multivariate Gaussian pollution fields. Applied Statistics 49, 63e79.