A study on sensitivity of spatial sampling designs to a priori discretization schemes

Environmental Modelling & Software 20 (2005) 891e902 www.elsevier.com/locate/envsoft A study on sensitivity of spatial sampling designs to a priori d...

Download PDF

1MB Sizes 0 Downloads 49 Views

Report

PDF Reader
Full Text

Environmental Modelling & Software 20 (2005) 891e902 www.elsevier.com/locate/envsoft

A study on sensitivity of spatial sampling designs to a priori discretization schemes M.C. Buesoa, J.M. Angulob,), F.J. Alonsob, M.D. Ruiz-Medinab a

Departamento de Matema´tica Aplicada y Estadı´stica, Universidad Polite´cnica de Cartagena, Spain b Departamento de Estadı´stica e Investigacio´n Operativa, Universidad de Granada, Spain Received 4 January 2003; received in revised form 18 November 2003; accepted 30 March 2004

Abstract In a previous paper (Environ. Ecol. Stat. 5 (1998) 29.) we presented an entropy-based approach to spatial sampling design in a state-space model framework. We now address the problem of sensitivity of optimal designs with respect to the conﬁguration of the set of potential observation sites considered, as well as to the model speciﬁcations. The latter involve both the spatial dependence structure of the variable of interest and its relationship with the observable variable. To analyze several aspects related to this problem, we have developed an extensive empirical study, from which we conclude the critical inﬂuence that the a priori selection of candidate observation sites can have on the ﬁnal sampling designs for diﬀerent situations. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Lattice reﬁnement; Ratio of information; Shannon’s entropy; Spatial stochastic process; State-space model

1. Introduction In this work, we consider the problem of estimating a spatial process, non-directly observable, from the observation of a related spatial process, on a continuous domain. Generally, approaches to sampling network design introduced in the literature assume a ﬁnite set of potentially observable sites to be speciﬁed at a ﬁrst stage of the study (see, for example, Caselton and Zidek, 1984; Christakos, 1992; Cressie, 1993; Pe´rez-Abreu and Rodrı´ guez, 1996; Mu¨ller, 2001; and references therein). This assumption is natural in cases where, for example, there exists a previous network of observation stations, as commonly occurs in groundwater hydrology when measurements are taken at a subset of available wells. On the other hand, in cases where such a network does not exist, spatial discretization is usually performed on

) Corresponding author. Tel.: C34-958240492; 958243267. E-mail address: [email protected] (J.M. Angulo).

fax:

C34-

1364-8152/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2004.03.019

the continuous region to be sampled, often considering the nodes in a regular mesh. In this case, the level of reﬁnement used in the discretization will aﬀect the ﬁnal results in the design. A higher grid reﬁnement will lead to an excessive number of nodes and a higher computational cost in the search for the optimal network. In contrast, if the grid does not properly represent the whole sampling region, the design will be more sensitive to the placement of the locations where the variable of interest must be estimated. Network design based on entropy criteria has been considered by Caselton and Husain (1980); Caselton and Zidek (1984); Caselton et al. (1992); Wu and Zidek (1992); Zidek et al. (2000), among others. In this paper, we use this approach in a state-space model framework, as introduced in Bueso et al. (1998). Formally, assume that we are interested in estimating a spatial process X at a ﬁnite set of locations , by sampling a related spatial process Y, potentially observable on a continuous region. Assume that, for practical purposes, this region has been discretized by considering a regular mesh P composed of a ﬁnite set of sites. As a criterion for the deﬁnition of an optimal sampling

892

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

design, we follow the approach consisting of the selection of an appropriate subset S from the set of candidate sites to be sampled P which provides maximum information on X at L. Information is deﬁned here in the sense of Shannon’s entropy. At this stage, the joint distribution of the variables involved is assumed to be known or estimated from historical data. This setup can be extended to the multivariate case, considering diﬀerent degrees of importance assigned to each one-dimensional variable and/or to each location of interest, as introduced in Bueso et al. (1999); Bueso and Angulo (1999), and Angulo and Bueso (2001). Under the same entropy-based approach, Angulo et al. (2000) have studied some relevant aspects inherent in the design problem in the spatial-temporal context. In this paper, an extensive empirical study is carried out to show the inﬂuence of the a priori level of the discretization considered in the continuous region to be sampled on the ﬁnal optimal designs, speciﬁcally in relation to the ordering of the candidate sites as well as to the corresponding information outcomes. In Section 2, we brieﬂy summarize the entropy-based criterion presented in Bueso et al. (1998). The most relevant results derived from the empirical study performed are described in detail in Section 3. We conclude in Section 4 with further comments on the main aspects raised by this study.

2. Entropy-based spatial sampling design Caselton and Husain (1980); Caselton and Zidek (1984) developed an entropy-based approach to the sampling network design problem. Applications of this approach have been considered, for example, in Caselton et al. (1992); Wu and Zidek (1992); Guttorp et al. (1993), and Zidek et al. (2000). In our study we adopt the entropy-based criterion formulated in a state-space model framework as introduced in Bueso et al. (1998). This criterion consists of ﬁnding a set of locations S (selected from a certain class of admissible sets of candidate sites where variable Y can be observed) that maximizes the amount of information on XL (random vector whose co-ordinates represent the spatial process of interest at the set of locations L) contained in YS (observable process at S ),

Table 1 Coordinates of locations of interest s ¼ ðs1 ; s2 Þ ð15; 20Þ ð25; 13Þ ð42; 86Þ ð76; 50Þ

ð20; 10Þ ð25; 20Þ ð43; 75Þ ð77; 55Þ

ð20; 25Þ ð35; 84Þ ð47; 80Þ ð82; 51Þ

ð23; 15Þ ð38; 80Þ ð72; 50Þ ð86; 56Þ

HðXL rYS Þ ¼ EðXL ;YS Þ ½logf ðXL rYS Þ; where f(XL) is the density (or probability mass) function of XL, and f ðXL rYS Þ the conditional density (or probability mass) function of XL given YS. Here EZ denotes the expectation with respect to the distribution of Z. As H(XL) is a ﬁxed quantity, the above criterion reduces to minimizing the mean conditional entropy HðXL rYS Þ; in particular, under the multivariate normality assumption, this is equivalent to minimizing the determinant of the conditional covariance matrix of XL given YS. The design problem is then formulated as an optimization problem whose optimal solution, in cases of high dimensionality, can be diﬃcult to ﬁnd due to the computational cost involved, derived either from the complexity of the function to be evaluated or from the time required in the search for an optimum set S. To compare the eﬀectiveness of a design S with respect to a given, more informative, design S#, we obtain the proportion of information provided by S with respect to that given by S#: RS# ðSÞ ¼

IðYS ; XL Þ ; IðYS# ; XL Þ

referred to in this paper as ‘ratio of information’. When S# is considered to be the most informative design P, the

100

80

60

40

IðYS ; XL Þ ¼ HðXL Þ HðXL rYS Þ; with H(XL) representing the entropy of XL, HðXL Þ ¼ EXL ½logf ðXL Þ; and HðXL rYS Þ the mean conditional entropy of XL given YS,

20

0 0

20

40

60

Fig. 1. Locations of interest.

80

100

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

893

Fig. 2. Curves of ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, vs. the number of points in the scheme ðkC1ÞðkC1Þ for each value of the error variance s2e varying the value of the range a.

894

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

Table 2 Values of ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, for a ¼ 15 and for each value of the error variance s2e Scheme

4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400

s2e 0.1

0.5

1

1.5

2

2.5

0.00001 0.00039 0.00181 0.13394 0.10668 0.18398 0.27456 0.35500 0.36167 0.55011 0.42340 0.57922 0.61753 0.80025 0.80985 1.00000 0.76329 0.96359 0.98254

0.00001 0.00039 0.00185 0.11986 0.10217 0.17560 0.25370 0.31783 0.34586 0.46510 0.42159 0.56156 0.59127 0.76763 0.78465 0.95209 0.78841 0.94703 1.00000

0.00001 0.00038 0.00180 0.10884 0.09574 0.16436 0.23439 0.29104 0.32595 0.42726 0.40755 0.53702 0.56727 0.73510 0.76018 0.91423 0.79066 0.93516 1.00000

0.00001 0.00037 0.00175 0.10202 0.09113 0.15637 0.22176 0.27458 0.31190 0.40543 0.39600 0.51957 0.55127 0.71375 0.74408 0.89163 0.78990 0.92895 1.00000

0.00001 0.00036 0.00171 0.09731 0.08771 0.15048 0.21275 0.26314 0.30148 0.39043 0.38679 0.50646 0.53944 0.69824 0.73222 0.87598 0.78837 0.92477 1.00000

0.00001 0.00036 0.00167 0.09382 0.08508 0.14595 0.20598 0.25464 0.29344 0.37928 0.37936 0.49623 0.53022 0.68632 0.72298 0.86428 0.78668 0.92164 1.00000

latter ratio is very useful to determine the designs that achieve a given target value for RP().

3.1. Set-up

a model depending on certain parameters that allow us to specify diﬀerent degrees of spatial dependence between the variables of interest and between these and the observable variables. Speciﬁcally, we assume that observation of the random process of interest X is aﬀected by additive noise according to the observation equation

To illustrate the inﬂuence of the speciﬁcation of P on the sampling designs, we have considered in the study

YðsÞ ¼ XðsÞCeðsÞ;

3. Eﬀect of the speciﬁcation of P on optimal designs: an empirical study

Table 3 Values of ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, for a ¼ 150 and for each value of the error variance s2e Scheme

4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400

s2e 0.1

0.5

1

1.5

2

2.5

0.05144 0.14323 0.22421 0.37093 0.40817 0.49064 0.55583 0.60399 0.64591 0.70234 0.71572 0.77796 0.79859 0.87285 0.89023 0.95097 0.91626 0.97122 1.00000

0.05227 0.14758 0.23547 0.35111 0.40328 0.47860 0.53916 0.58728 0.63393 0.68295 0.71477 0.76405 0.79359 0.84814 0.87632 0.92073 0.92737 0.96844 1.00000

0.04641 0.13432 0.22053 0.32429 0.38293 0.45661 0.51793 0.56890 0.61813 0.66763 0.70503 0.75270 0.78666 0.83679 0.86896 0.91087 0.92830 0.96731 1.00000

0.04135 0.12216 0.20539 0.30296 0.36472 0.43795 0.50028 0.55342 0.60457 0.65516 0.69569 0.74352 0.78018 0.82908 0.86358 0.90514 0.92762 0.96636 1.00000

0.03737 0.11222 0.19235 0.28563 0.34907 0.42205 0.48524 0.54003 0.59269 0.64435 0.68713 0.73551 0.77418 0.82278 0.85904 0.90082 0.92653 0.96548 1.00000

0.03422 0.10411 0.18128 0.27118 0.33550 0.40826 0.47211 0.52821 0.58210 0.63474 0.67928 0.72833 0.76862 0.81731 0.85500 0.89720 0.92530 0.96466 1.00000

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

895

Fig. 3. Curves of ratios of information RP0 ðS˜k;l Þ, k ¼ 1; .; 19, vs. the number of (sequentially) selected sites l ¼ 1; .; ðkC1ÞðkC1Þ, for a ¼ 15 and s2e ¼ 0:1 an‘d 2.5.

with e() being a zero-mean spatial random process satisfying the condition CovðeðsÞ; eðs#ÞÞ ¼ ds;s# s2e ;

cs; s#;

where s2e ¼ VarðeðsÞÞ is constant and X() and e() are mutually independent. We also assume that the variables involved have a Gaussian multivariate distribution and that X has zero mean and an exponential variogram with range a, h gðhÞ ¼ 1 exp 3 ; a

considering diﬀerent values for the error variance s2e and for the range of the spatial dependence a. In all the cases studied, the set L is composed of 16 sites clustered in three groups within the domain deﬁned by a 100!100 square (see Table 1 and Fig. 1). For the region to be sampled we have considered the whole domain. Several discretizations of this region have been performed by using diﬀerent levels of reﬁnement leading to regular grids whose nodes have been adopted as the set P. More precisely, we have considered 19 diﬀerent sampling schemes deﬁned by the sets of points

where the distance between two points, h represents h ¼ s s# . This variogram model is widely used in geostatistical applications (see, for example, Kitanidis, 1997). The comparative study has been carried out

Pk ¼

100ði 1Þ ; k 100ð j 1Þ s2 ¼ ; i; j ¼ 1; .; kC1 ; k

ðs1 ; s2 Þ : s1 ¼

896

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

Fig. 4. Curves of ratios of information RP0 ðS˜k;l Þ, k ¼ 1; .; 19, vs. the number of (sequentially) selected sites l ¼ 1; .; ðkC1ÞðkC1Þ, for a ¼ 150 and s2e ¼ 0:1 and 2.5.

for k ¼ 1; .; 19. Here, we refer to sampling schemes as the sets of candidate sites from which sampling designs are selected. For the interpretation of the results below, it is important to note that the Pk sets do not follow a nested construction. For each model parameter speciﬁcation, the sites in each sampling scheme have been ordered based on the information gain obtained by the sequential addition of single sites. That is, in the ﬁrst stage the best site to be sampled is selected; in the second stage, the best site is chosen from the remaining sites and added to the sampling set, and so on. For k ¼ 1; .; 19, denote by S˜ k;l , l ¼ 1; .; ðkC1ÞðkC1Þ, the optimal sequential design composed by the l sites obtained for each sampling scheme Pk. We compare the amount of information

provided by the set S˜ k;l with that provided by Pk, in terms of the ratios of information

RPk ðS˜ k;l Þ ¼

IðYS˜ k;l ; XL Þ IðYPk ; XL Þ

:

From P1 ; .; P19 , we also select the most informative sampling scheme, denoted by P0, and compare the amount of information provided by each set Pk, k ¼ 1; .; 19, by considering the ratios of information relative to P0,

897

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902 Table 4 Values of ratios of information RP0 ðS˜ k;l Þ, k ¼ 1; .; 19 and l ¼ 1; .; 10, for a ¼ 15 and s2e ¼ 0:1 Number of points

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

RP0 ðPk Þ ¼

Scheme 4

9

16

25

36

49

64

81

100

121

0.00001 0.00001 0.00001 0.00001

0.00021 0.00036 0.00037 0.00038 0.00039 0.00039 0.00039 0.00039 0.00039

0.00068 0.00094 0.00117 0.00138 0.00159 0.00168 0.00176 0.00179 0.00181 0.00181

0.09915 0.12013 0.13161 0.13276 0.13331 0.13352 0.13368 0.13382 0.13387 0.13390

0.05673 0.08864 0.10093 0.10267 0.10428 0.10474 0.10506 0.10531 0.10554 0.10573

0.06073 0.11713 0.14536 0.16192 0.17269 0.17641 0.17879 0.18000 0.18120 0.18233

0.09667 0.17669 0.20040 0.22355 0.23700 0.24783 0.25491 0.26090 0.26639 0.26939

0.13048 0.22963 0.25615 0.27660 0.29493 0.30835 0.31964 0.33054 0.33645 0.34218

0.09341 0.15082 0.19873 0.24017 0.27098 0.29170 0.30667 0.31997 0.33052 0.34025

0.22804 0.28477 0.34030 0.38883 0.41929 0.44842 0.46393 0.47571 0.48669 0.49753

144 0.06381 0.11305 0.15909 0.19889 0.23541 0.26727 0.29368 0.31920 0.34368 0.35909

169 0.09915 0.17376 0.23165 0.28735 0.34277 0.38352 0.41511 0.43887 0.46178 0.48112

196 0.14414 0.23257 0.27972 0.32350 0.36520 0.40158 0.43376 0.46351 0.48437 0.50428

225 0.12221 0.21889 0.29878 0.36387 0.42588 0.47768 0.52503 0.56024 0.59102 0.62089

256 0.15165 0.21218 0.27166 0.32809 0.38273 0.43397 0.48327 0.52886 0.56450 0.59829

289 0.13048 0.23811 0.33727 0.41600 0.49390 0.56775 0.64042 0.70305 0.74373 0.77905

324 0.06957 0.13532 0.20005 0.25173 0.30215 0.34995 0.39195 0.43362 0.47012 0.49548

361 0.17056 0.26052 0.32244 0.37932 0.43572 0.48421 0.53158 0.57529 0.61696 0.65369

400 0.08567 0.17027 0.24631 0.31004 0.36801 0.42345 0.47747 0.53023 0.58261 0.62842

IðYPk ; XL Þ : IðYP0 ; XL Þ

In addition, the information given by the designs S˜ k;l , l ¼ 1; .; ðkC1ÞðkC1Þ, k ¼ 1; .; 19, is compared with the information provided by P0 using

RP0 ðS˜ k;l Þ ¼

IðYS˜ k;l ; XL Þ IðYP0 ; XL Þ

:

Note that if we were interested in ﬁnding globally optimal designs of ﬁxed size, we could use available procedures in the literature, e.g. Ko et al.’s (1995) algorithm to maximize entropy and its adaptation to a state-space model framework by Bueso et al. (1998). In the empirical study we have considered the values 0.1, 0.5, 1, 1.5, 2 and 2.5 for the error variance s2e , and the values 15, 30, 60, 90, 120 and 150 for the range a. The most signiﬁcant aspects observed in the results obtained are described below. 3.2. Simulation results In Fig. 2 the curves of the ratios of information RP0 ðPk Þ, k ¼ 1; .; 19, are represented for each value of the error variance s2e , varying the value of the range a. In

all cases, the most informative scheme is P19, corresponding to the sampling scheme with the largest number of points, except for the case of a ¼ 15 and s2e ¼ 0:1, where the optimum is given by the scheme deﬁned by the grid with 289 points, P16. In the last case, the conﬁguration of the locations provides more information on XL than the other schemes due to the low values of a and s2e , as well as to the conﬁguration of set L. Note that in most of these curves the information on XL increases as the total number of points in the scheme increases. However, there are certain schemes having a lower number of points that are more informative than others with a higher density. In the graphs, this is reﬂected by a change of sign in the slope. The values of these ratios of information (truncated to ﬁve decimal digits) for the values of a ¼ 15 and a ¼ 150, for all the values of s2e considered, are shown in Tables 2 and 3, respectively. Observe that in the ﬁrst case the scheme with 25 points is more informative than that with 36 points, and the diﬀerences between the values of the corresponding ratios of information decrease as the value of s2e increases. This behavior can also been seen in the schemes with 289 points and 324 points. As commented above, the best scheme for the model parameters a ¼ 15 and s2e ¼ 0:1 is composed of 289 points, and the schemes with 225 and 256 points are better than the scheme with 324 points. In the second case, this eﬀect occurs for the value s2e ¼ 0:1. On the other hand, we can see that the larger the error variance

898

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

Table 5 Number of locations needed to obtain a 50%, 75% and 95% ratio of information RP0 ðS˜ 0; Þ, for each value of the parameters a and s2e a

s2e

Ratio of information 50%

75%

95%

15

0.1 0.5 1 1.5 2 2.5

6 9 10 10 11 11

10 16 17 19 20 21

19 31 35 38 41 43

30

0.1 0.5 1 1.5 2 2.5

7 9 11 13 14 15

14 18 22 25 27 29

27 35 43 50 56 62

60

0.1 0.5 1 1.5 2 2.5

7 10 13 15 17 18

14 20 26 30 34 37

28 42 55 66 75 83

90

0.1 0.5 1 1.5 2 2.5

7 10 14 16 19 21

14 22 29 35 39 43

30 48 64 77 88 98

120

0.1 0.5 1 1.5 2 2.5

7 11 14 17 20 22

15 24 32 38 43 48

31 53 72 86 99 110

150

0.1 0.5 1 1.5 2 2.5

7 11 15 18 21 23

15 26 34 41 47 52

33 57 78 94 109 121

s2e is, the smoother the curves of the ratios of information RP0 ðPk Þ behave, and the same occurs with respect to the range a. These facts lead us to conclude that the a priori conﬁguration of the points from which the ﬁnal network must be selected can be a crucial aspect in the design problem and there can exist equivalent or quasi-equivalent designs deﬁned by conﬁgurations with a diﬀerent number of points. For the above values of range a, the curves of the ratios of information RP0 ðS˜ k;l Þ obtained when a sequential optimal design is performed are represented in the

plot on the left-hand side of Figs. 3 and 4. The plots on the right-hand side in the same ﬁgures represent the details corresponding to the range from 1 to the minimum number of sites needed to obtain 75% of information IðYP0 ; XL Þ. In these plots we can observe some crossings between the ratio-of-information curves which mean that the corresponding conﬁgurations are almost equivalent in the neighborhood of the crossing points. Obviously, the values of these ratios depend on the grid deﬁned by the sampling scheme. Thus, for example, the ﬁrst point selected for a scheme can provide more information than the ﬁrst chosen for another scheme. This can be noticed, for instance, in the case where a ¼ 15 and s2e ¼ 0:1, with the ﬁrst point selected from the scheme composed of 121 points, which provides more information than the ﬁrst of the remaining schemes, and more than the total number of points corresponding to the schemes composed of 4, 9, 16, 25, 36, and 49 points (see Tables 2 and 4). A conclusion from this fact is that, as the number of points in the scheme increases, there is in general more ﬂexibility in the site selection; hence, for a ¼ 15 and s2e ¼ 0:1, the ﬁrst seven sites selected from the scheme with 289 points give 64% of the global information, more than that provided by the scheme with 196 points or less. Another aspect considered in this study is the location of the sequentially selected sites, as well as the minimum number of sites needed to obtain a certain level of information. For percentages 50, 75 and 95, and for the most comprehensive scheme, the minimum number of sites needed is presented in Table 5. Note that this number increases as either s2e or a increase, and the increment of points is more signiﬁcant for high values of s2e , that is, when the degree of dependence between X and Y decreases. When the value of s2e is very small, observing variable Y is similar to observing X. On the other hand, it can be noted that, for a ﬁxed value of s2e and increasing the value of a, which means a stronger spatial dependence and, consequently, a higher redundance of information contained in the observations, a larger number of points is required to obtain a certain level of information. The resulting curves of the ratios of information RPk ðS˜ k;l Þ, k ¼ 1; .; 19, l ¼ 1; .; ðkC1ÞðkC1Þ, corresponding to the particular cases of a ¼ 15 and a ¼ 150, for s2e ¼ 0:1 and s2e ¼ 2:5, are represented in Fig. 5. All the curves are increasing and practically merge near the value of 1 (that is why these ratios of information are only represented vs. a subinterval of ½1; 400 where the most interesting features occur). The maps containing the locations selected to provide the levels of 50%, 75% and 95% of information are displayed in Figs. 6 and 7. In general, the sites selected are located close to the three clusters of sites in the set of interest. Note that the greater a and s2e , the larger the minimum number of sites needed to cover the region of interest.

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

899

Fig. 5. Curves of ratios of information RPk ðS˜k;l Þ, k ¼ 1; .; 19, vs. the number of (sequentially) selected sites l ¼ 1; .; ðkC1ÞðkC1Þ, for a ¼ 15 and 150 and s2e ¼ 0:1 and 2.5.

4. Conclusions In this paper we consider the problem of spatial sampling design in a state-space model context in the case where the domain of observation is continuous and spatial discretization is performed to apply sampling selection methods, generally designed for a ﬁnite search of the optimal solution. An empirical study has been carried out to show how the level of reﬁnement of the discretization considered aﬀects the conﬁguration of possible alternative sampling designs and how, in some cases, a grid reﬁnement involving a larger number of nodes may not lead to an improvement in the eﬀectiveness of the sampling scheme with respect to the optimal criterion adopted. The results obtained suggest that the a priori deﬁnition of a prescribed set from which to select

a sampling scheme can be critical regarding the eﬀectiveness of the results. The aim of the study presented in this paper is to show how this intuitively not surprising fact occurs in association with the values taken by signiﬁcant parameters involved in the spatial dependence structure. In the study performed here a concrete but widely used model has been considered to illustrate the eﬀect on the optimal designs of the conﬁguration of the prescribed sampling scheme. Similar types of eﬀects would be expected by considering alternative or more complex covariance structure models. We have focused in this paper on the problem related to discretization of the region to be sampled. A further study would be of interest to show the inﬂuence on the optimal designs of the consideration of a certain level of reﬁnement in the discretization of the set of locations of

900

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

Fig. 6. Sequentially selected locations (B) needed to obtain a 50%, 75% and 95% ratio of information RP0 ðS˜0; Þ, for the ﬁxed value a ¼ 15 and s2e ¼ 0:1 and 2.5. (Points ( ) represent the locations of interest).

interest. One possible approach to avoid the excessive dimensionality due to increasing reﬁnements would be based on applying stochastic perturbations to location coordinates for obtaining improvement on information (see, for example, Van Groenigen and Stein, 1998). Under a diﬀerent philosophy, the authors are investigating the sampling design problem in the context of generalized random ﬁelds (see Angulo et al., 2003), which can be useful in situations where information on

the variables involved can be measured in terms of averages using appropriate test functions.

Acknowledgements This work has been supported in part by Projects BFM2000-1465 and BFM2002-01836 of the DGI, Ministerio de Ciencia y Tecnologı´ a, Spain. The authors

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

901

Fig. 7. Sequentially selected locations (B) needed to obtain a 50%, 75% and 95% ratio of information RP0 ðS˜0; Þ, for the ﬁxed value a=150 and s2e ¼ 0:1 and 2.5. (Points ( ) represent the locations of interest).

are grateful to the referees for constructive suggestions to improve the paper.

References Angulo, J.M., Bueso, M.C., 2001. Random perturbation methods applied to multivariate spatial sampling design. Environmetrics 12, 631e646.

Angulo, J.M., Bueso, M.C., Alonso, F.J., 2000. A study on sampling design for optimal prediction of space-time stochastic processes. Stochastic Environmental Research and Risk Assessment 14, 412e427. Angulo, J.M., Ruiz-Medina, M.D., Alonso, F.J., Bueso, M.C., 2003. Generalized approaches to spatial sampling design. In: Mateu, J., Holland, D., Gonza´lez-Manteiga, W. (Eds.), Proceedings of the ISI International Conference on Environmental Statistics and Health. Universidade de Santiago de Compostela, pp. 11e19.

902

M.C. Bueso et al. / Environmental Modelling & Software 20 (2005) 891e902

Bueso, M.C., Angulo, J.M., 1999. Criteria for multivariate spatial sampling design based on covariance matrix perturbation. In: Go´mez-Herna´ndez, J., Soares, A., Froidevaux, R. (Eds.), geoENV II: Geostatistics for Environmental Applications. Kluwer, Amsterdam, pp. 491e502. Bueso, M.C., Angulo, J.M., Alonso, F.J., 1998. A state-space model approach to optimum spatial sampling design based on entropy. Environmental and Ecological Statistics 5, 29e 44. Bueso, M.C., Angulo, J.M., Cruz-Sanjulia´n, J., Garcı´ a-Aro´stegui, J.L., 1999. Optimal spatial sampling design in a multivariate framework. Mathematical Geology 31, 507e525. Caselton, W.F., Husain, T., 1980. Hydrologic networks: Information transmission. Journal of the Water Resources Planning and Management Division, ASCE 106, 503e520. Caselton, W.F., Kan, L., Zidek, J.V., 1992. Quality data networks that minimize entropy. In: Walden, A., Guttorp, P. (Eds.), Statistics in Environmental and Earth Sciences. Griﬃn, London, pp. 10e38. Caselton, W.F., Zidek, J.V., 1984. Optimal monitoring network designs. Statistics and Probability Letters 2, 223e227. Christakos, G., 1992. Random Field Models in Earth Sciences. Academic Press, San Diego, CA.

Cressie, N., 1993. Statistics for Spatial Data. Wiley & Sons, New York. Guttorp, P., Le, N.D., Sampson, P.D., Zidek, J.V., 1993. Using entropy in the redesign of an environmental monitoring network. In: Patil, G.P., Rao, C.R. (Eds.), Multivariate Environmental Statistics. Elsevier, New York, pp. 175e202. Kitanidis, P.K., 1997. Introduction to Geostatistics: Applications to Hydrology. Cambridge University Press, New York. Ko, C.-W., Lee, J., Queyranne, M., 1995. An exact algorithm for maximum entropy sampling. Operations Research 43, 684e691. Mu¨ller, W.G., 2001. Collecting Spatial Data: Optimum Design of Experiments for Random Fields. Physica-Verlag, Heidelberg. Pe´rez-Abreu, V., Rodrı´ guez, J.E., 1996. Index of eﬀectiveness of a multivariate environmental monitoring network. Environmetrics 7, 489e501. Van Groenigen, J.W., Stein, A., 1998. Constrained optimization of spatial sampling using continuous simulated annealing. Journal of Environmental Quality 27, 1078e1086. Wu, S., Zidek, J.V., 1992. An entropy-based analysis of data from selected NADP/NTN network sites for 1983e1986. Atmospheric Environment 26A, 2089e2103. Zidek, J.V., Sun, W., Le, N.D., 2000. Designing and integrating composite networks for monitoring multivariate Gaussian pollution ﬁelds. Applied Statistics 49, 63e79.

A study on sensitivity of spatial sampling designs to a priori discretization schemes

A study on sensitivity of spatial sampling designs to a priori discretization schemes

Recommend Documents