Residual-based specification of a hidden random field included in a hierarchical spatial model

Residual-based specification of a hidden random field included in a hierarchical spatial model

Computational Statistics & Data Analysis 51 (2007) 6404 – 6422 www.elsevier.com/locate/csda Residual-based specification of a hidden random field inclu...

630KB Sizes 0 Downloads 30 Views

Computational Statistics & Data Analysis 51 (2007) 6404 – 6422 www.elsevier.com/locate/csda

Residual-based specification of a hidden random field included in a hierarchical spatial model Samuel Soubeyrand∗ , Joël ChadWuf INRA, UR546 Biostatistique et Processus Spatiaux, Domaine Saint Paul, F-84914 Avignon, France Received 24 March 2006; received in revised form 6 February 2007; accepted 6 February 2007 Available online 21 February 2007

Abstract A method for specifying a hidden random field (HRF) included in a hierarchical spatial model is proposed. In hierarchical models of interest the first stage describes, conditional on a realization of the HRF, a response variable which is observable on a continuous spatial domain; the second stage models the HRF which reflects unobserved spatial heterogeneity. The question which is investigated is how can the HRF be modeled, i.e. specified. The method developed to address this question is based on residuals obtained when the base model, i.e. the hierarchical model in which the HRF is assumed constant, is fitted to data. It is shown that the residuals are linked with the HRF, and the link is used to specify the HRF. The method is applied to simulated data in order to assess its performance, and then to real data on radionuclide concentrations on Rongelap Island. © 2007 Elsevier B.V. All rights reserved. Keywords: Distribution specification; Random effects; Residual analysis; Spatial modeling; Spatial statistics

1. Introduction A hidden random field (HRF) is an unobserved process which influences the realization of random variables observable on a spatial domain. A HRF can be of interest in itself. Modeling, estimating and mapping the HRF can be useful in furthering our understanding of the studied phenomenon. A HRF can be a nuisance process which must be included in the modeling to avoid misleading inference. In spatial statistics, various hierarchical models including various kinds of HRFs have been proposed. For instance, Diggle et al. (1998) propose generalized linear mixed models including hidden normal random fields to analyze campylobacter-infections and radionuclide-concentrations data; Henderson et al. (2002) propose a frailty model including a hidden gamma random field to analyze leukemia survival data. The question we investigate in this paper is: how do we specify a HRF included in a hierarchical model, especially when the HRF corresponds to poorly known mechanisms? Specifying the HRF means determining elements such as the forms of the point distribution, the trend, and the covariance structure. How the HRF is specified is crucial since a misspecification can result in misleading inference (Ledford and Marriott, 1998; Monestiez et al., 2005). For specifying a HRF included in a hierarchical spatial model we have followed the approach of Soubeyrand et al. (2006). They developed a method based on residuals for specifying the distribution of shared random effects included in a model for clustered data. In the context considered by Soubeyrand et al. (2006) data consist of repeated observations ∗ Corresponding author. Tel.: +33 4 32 72 21 57; fax: +33 4 32 72 21 82.

E-mail address: [email protected] (S. Soubeyrand). 0167-9473/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2007.02.008

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6405

at each cluster. In the context we consider, data are observed on a continuous spatial domain and there is usually one observation per site. Despite this difference in the type of data, the residual-based method of Soubeyrand et al. (2006) can be followed although the residuals must be defined differently. Consequently, instead of using a sample mean for defining the residuals we use a kernel smoother. In our context, the residual-based method for specifying a HRF can be described as follows. Consider a parametric hierarchical model including a HRF. Suppose the HRF is unspecified. Let the base model denote the version of the model assuming the HRF is constant. The method consists of (i) estimating parameters of the base model and computing local residuals using the kernel smoother, (ii) exhibiting an asymptotic relationship between the local residuals and the realization of the HRF, (iii) estimating the values of the HRF at the sample sites from the asymptotic relationship, and (iv) using these estimated values for specifying the HRF. The paper is organized as follows. Section 2 presents the hierarchical model including the HRF that we consider. Section 3 provides the outline of the method for specifying the HRF. Section 4 assesses whether step (iii) provides “good” estimates of the HRF values or not. Obtaining good estimates is crucial since the method of specification is based on these estimates. In Section 5 the method is applied to data on radionuclide concentrations on Rongelap Island for which Diggle et al. (1998) developed a model including a HRF. The application of our method confirms results of Christensen (2004) about the specification of the HRF. The paper ends with a discussion. 2. Model and question For any location z in a continuous spatial domain Z ⊂ Rq , q ∈ N∗ , let Y (z) denote a response variable in R and x(z) denote a vector of explanatory variables. Let (·) be a HRF defined over Z with values in a bounded subset of R. The field (·) is supposed to reflect unobserved explanatory variables which influence the response variable. For any locations z1 , . . . , zN in Z we assume that, conditionally on (·), the random variables Y (zn ), n = 1, . . . , N, are mutually independent with distributions {·|x(zn ), (zn ), }, where  is a vector of real parameters. Particular cases of the model described above were used to describe, for example, radionuclide concentrations and campylobacter infections (Diggle et al., 1998), Rhizoctonia root rot infections (Zhang, 2002), leukemia survival data (Henderson et al., 2002), vine plants mortality (Desassis et al., 2005). In our model, the HRF (·) is not specified. If the physical, biological or economical mechanisms governing (·) are poorly known, then no obvious choice for the specification of (·) exists. So, how do we specify the HRF? This is the question we investigate in this paper, following the residual-based approach of Soubeyrand et al. (2006). 3. Method of specification of the HRF This section presents the outline of the method. For the sake of clarity, we slightly simplified notations compared with those used in the appendix which provides a proof for the result used in the method. Assumptions mentioned below are listed in Appendix F. 3.1. Notations Let Z = (Z1 , . . . , ZN )T denote the vector of locations drawn from the probability measure  defined on Z. Let Y = {Y (Z1 ), . . . , Y (ZN )}T = (Y1 , . . . , YN )T denote the vector of response variables. We define the mean of the HRF by  E{(z)}( dz), (1) ¯ = Z

where the function z  → E{(z)} is assumed to be -integrable over Z and the expectation is taken under the unspecified distribution of (z) (assumption (b) in Appendix F). In addition, let  = sup |(z) − ¯ | z∈Z

(2)

measure the variation of the HRF, where | · | denotes the absolute value of real numbers.  is finite since (·) takes values in a bounded subset of R (assumption (a)).

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6406

3.2. Estimation under the base model The base model is defined as the version of the hierarchical model in which the HRF is constant, i.e. (z) = ¯ for all z ∈ Z. So, the function (·) defined over Z is replaced in the base model by the single parameter ¯ . Under the base model, the (scaled) log-likelihood for the parameters is lN (¯, ; Z, Y) =

N 1  log {Yn |x(Zn ), ¯ , }. N

(3)

n=1

The maximizer of lN (.; Z, Y), say (ˆ¯ N , ˆ N ), is viewed as an estimator of (¯, ). 3.3. Local residuals under the base model For any n in {1, . . . , N}, let rN (n) be the ordinary residual estimated under the base model rN (n) = Yn − Eˆ¯ ,ˆ (Yn |Zn ). N N

(4)

Define the local residual Rˆ N (s) at location s ∈ Z as the kernel smoother (Bosq and Lecoutre, 1987; Pagan and Ullah, 1999) of the ordinary residuals Rˆ N (s) =

N  n=1

K((s − Zn )/ hN ) rN (n), N m=1 K((s − Zm )/ hN )

(5)

where K is a kernel function and hN is a bandwidth. If data are drawn from the hierarchical model characterized by (·) and , then the local residual Rˆ N (s) is the kernel estimator of E(·), {Yn − Eˆ¯ ,ˆ (Yn |Zn ) | Zn = s}. N N This quantity is the conditional expected value, under the true hierarchical model, of the ordinary residual rN (n) obtained under the base model given that observation n is made at location s. Thus, the local residual is a function of the HRF (·). The following subsection exhibits this functional relationship. 3.4. Asymptotic relationship between local residuals and the HRF ˆ N = {Rˆ N (s1 ), . . . , Rˆ N (sI )}T be the vector of local residuals at Let s1 , . . . , sI be I ∈ N locations in Z. Let R locations s1 , . . . , sI , and  = {(Z1 ) − ¯ , . . . , (ZN ) − ¯ }T be the vector of variations of the HRF at sample locations Z1 , . . . , ZN . In the appendix we prove the following theorem. Theorem 1. Suppose assumptions (a)–(j) in Appendix F are satisfied. Then ˆ N = B + N, + N, , R

(6)

where • B is a matrix, specified in Appendix D, which can be estimated by plugging-in the estimators ˆ¯ N and ˆ N ; • under PD , i.e. the joint distribution of Z, Y and (·), lim E(e

N→∞

 q it T NhN N,

) − E(eit

TX 

)=0

∀t ∈ RI ,

where X |(·) ∼ N{0, ()}, () is diagonal and is specified by Eq. (E.6); • N, is, as N → ∞ and  → 0, PD − a.s.-negligible with respect to B and N, .

(7)

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6407

Remarks. 1. On the conditional and unconditional convergence of N, . In Appendix E we first show the following pointwise convergence: conditionally on (·)  d q N hN N, −→ N{0, ()} as N → ∞. This states the conditional asymptotic normality of N, given (·). Then we show the unconditional weak convergence of N, (Eq. (7)) in term of characteristic functions. 2. On the convergence rate of N, . Our residuals are kernel smoothers of the ordinary residuals, whereas those of Soubeyrand et al. (2006) are sample means of ordinary residuals. This convergence  difference results in different √ q rates of the stochastic fluctuation N, . In our context the rate is 1/ N hN which is slower than 1/ N , namely the rate stated by Soubeyrand et al. (2006). ˆ N of local residuals can be decomposed 3. On the interpretation of Theorem 1. The theorem states that the vector R into the sum of (i) a linear function B of the HRF values at the sample sites, (ii) a stochastic fluctuation N, tending to 0 as the sample size increases (N → ∞), and (iii) a residual term N, which is negligible as N → ∞ and the variations of the HRF tend to 0 ( → 0). 3.5. Estimation of the HRF values at the sample sites We now estimate the HRF values (Z1 ), . . . , (ZN ) at the sample sites using decomposition (6). From Theorem 1, ˆ N of local residuals can be approximated by when N is large and  is small, the vector R ˆ N ≈ B ˆ , R

(8)

where Bˆ is the estimator of B obtained by plugging-in ˆ¯ N and ˆ N . As proposed by Soubeyrand et al. (2006), solving equation ˆ N = Bt ˆ R

(9)

ˆ  = Bˆ − R ˆ N . Then, an estimator ˆ yields an estimator, say ˆ  , of  :  by using a Moore–Penrose inverse, say Bˆ − , of B, of  = {(Z1 ), . . . , (ZN )}T is given by ˆ N, ˆ = ˆ¯ N 1N + Bˆ − R

(10)

where 1N is the unit vector of size N. ˆ N is the vector of local residuals at sites s1 , . . . , sI whereas  is the vector of variations of (·) at sample Remark. R ˆ N = Bt ˆ is a system with I equations and N unknowns. In practice, to conveniently estimate sites Z1 , . . . , ZN . Thus, R all components of  , the grid {s1 , . . . , sI } must be large enough and must overlap the zone of sample sites. In the applications in Sections 4 and 5 we let sites s1 , . . . , sI be the sample sites. 3.6. Specification of the HRF We now have at our disposal estimates of the realized values of the HRF at the sample locations; in other words, the HRF was restored at the sample locations. Based on these estimates, we can propose an appropriate class of models for the HRF. Then, we can select a particular specification in this class using the estimated values of the HRF as input data. 4. Performance of the estimation of the HRF values ˆ provided by Eq. (10), of the HRF values The method of specification of the HRF is based on the estimator ,  = {(Z1 ), . . . , (ZN )}T at the sample sites. According to Theorem 1, the performance of the estimator depends on the sample size N and the variations of the HRF measured by . So, we performed a simulation-study to assess the

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6408

performance of the estimator ˆ as N and  vary. We also made the range and the anisotropy of the HRF vary in order to see the extent of applicability of the method. The model from which the simulations were drawn is a generalized linear mixed model with a HRF consisting of spatially dependent normal random effects. It belongs to the class of models defined by Diggle et al. (1998). 4.1. Model We simulated data from a 2D-spatial overdispersed-Poisson model including a HRF and an explanatory variable. Y1 , . . . , YN are counts observable at locations Z1 , . . . , ZN uniformly and independently distributed in the square Z = [0, 1]2 . Given locations Z1 = z1 , . . . , ZN = zN and the HRF (·), counts Yn , n = 1, . . . , N, are assumed to be mutually independent and Poisson distributed with means (zn ) exp{x(zn )} Yn | Zn = zn , (zn ) ∼ Poisson[(zn ) exp{x(zn )}],

(11)

where  is an unknown regression parameter ( = 1) and x(·) is a known explanatory function (x(z1 ), . . . , x(zN ) were independently and identically drawn from the normal distribution N(1, 0.52 )). HRF values (z1 ), . . . , (zN ) were drawn from a lognormal random field. The mean of the logarithm of (·) was 1 and its autocovariance function was a Gaussian model ⎧ ⎛ ⎞2 ⎫ ⎪ ⎪ 2 + ( z )2 ⎨ z b ⎟ ⎬ ⎜ a 2 2

exp −⎝ ⎠ , ∀z = (za , zb ) ∈ [0, 1] , ⎪ ⎪ ⎭ ⎩ where the standard deviation measures the degree of variation of the HRF and plays the role of ; measures the range of the spatial correlation; and measures the degree of anisotropy. Note that the lognormal HRF does not take values in a bounded set (assumption (a)); thus, we can see the applicability of the method when assumption (a) is not satisfied. Fig. 1 shows simulations of {log (z1 ), . . . , log (zN )} (left), {x(z1 ), . . . , x(zN )} (center) and {Y1 , . . . , YN } (right) for N = 200 and for different values of , and used in the simulation study. The sample locations are at the centers of the circles, and the simulated values are proportional to the circle diameters. The intervals written under the plots provide the range of simulated values; besides, for each row, the circle scale is similar for the two plots on the left; thus we can compare the HRF variations with the covariate variations. In addition, the background images on the left plots allow us to gauge the form of the spatial correlation structures of the HRFs. 4.2. Estimation of the HRF values Consider N counts drawn from the model described above. The estimate ˆ of the vector  of the HRF values is computed using Eq. (10) as follows. First, we compute the MLEs ˆ¯ N and ˆ N of the parameters of the base model, i.e. the model assuming (z1 ) = · · · = (zN ) = ¯ , which is a GLM. Second, we compute local residuals at sample sites using the normal kernel which satisfies the required assumptions (see Appendix E) and selecting bandwidth hN by generalized cross validation (Loader, 1999). Third, the generalized inverse of the matrix B is estimated. Then, ˆ can be computed. 4.3. Quantitative assessment of the fit between simulated and estimated HRF values We quantitatively assessed the goodness-of-fit between the HRF values and their estimates when N, (i.e. ), and vary by performing a study based on simulations. The performance of our method in restoring the HRF was assessed ˆ by comparing sample statistics computed, firstly, for the simulated values  and, secondly, with the estimated values . Sample mean and standard deviation: We first compared the sample mean m(·) and the sample standard deviation ˆ For comparing m() ˆ and m() (resp. sd(·) computed for any realized vector  initially and then for the restored vector . ˆ and sd()) we studied the relative bias RB m =100{m()−m()}/m() ˆ ˆ sd() (resp. RB sd =100{sd()−sd()}/sd()). Table 1 shows the averages, RB m and RB sd , of the relative biases computed for 1000 simulated data sets drawn for different couples (N, ).

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

= 0.1 = 0.1 =1

1.0

1.0

1.0

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.0

0.0 0.0

=1 = 0.1 =1

0.2

0.4 0.6 0.8 [ 0.64 , 1.29 ]

1.0

= 0.1 =1

=1

= 0.25

1.0

0.0

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.0 0.2

0.4 0.6 0.8 [ −1.79 , 3.13 ]

1.0

0.2

0.4

0.6

0.8

1.0

0.0

1.0

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.0

0.2

0.4

0.6

0.8

0.0

1.0

1.0

1.0

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.0

0.2

0.4

0.6

0.8

0.0

1.0

1.0

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0.0 0.6

0.8

[ −1.22 , 3.35 ]

1.0

0.6

0.8

1.0

0.4

0.6

0.8

1.0

0.4

0.6

0.8

1.0

0.8

1.0

[ 0 , 483 ]

1.0

0.4

0.2

[ −0.28 , 2.3 ]

1.0

0.2

0.4

0.0

0.0

0.0

1.0

[ 0 , 702005 ]

1.0

0.0

0.2

[ −0.51 , 2.64 ]

1.0

0.8

0.0

0.0

0.4 0.6 0.8 [ −0.99 , 4.39 ]

0.6

[ 0 , 114 ]

1.0

0.2

0.2

[ − 0.3 , 2.48 ]

1.0

0.4

0.0 0.0

1.0

0.2 0.4 0.6 0.8 [ −10.25 , 13.27 ]

0.2

[ 0 , 35 ] 1.0

0.0

= 0.1

0.8

0.8

0.0

=1

0.6

1.0

0.0

= 0.2

0.4

0.8

0.0

=1

0.2

1.0

0.0

=5

0.0 0.0

[ -0.59 , 2.36 ]

0.0

6409

0.0 0.0

0.2

0.4

0.6

0.8

[ − 0.17 , 2.43 ]

1.0

0.0

0.2

0.4

0.6

[ 0 , 297 ]

Fig. 1. Examples of simulations. Simulations of {log (z1 ), . . . , log (zN )} (left plots), {x(z1 ), . . . , x(zN )} (center plots) and {Y1 , . . . , YN } (right plots) for N = 200 and for different values of , and written on the left of the plots. The sample locations are at the centers of the circles, and the simulated values are proportional to the circle diameters. The intervals written under the plots provide the range of the simulated values. For each row, the circle scale is similar for the left and center plots. The background image on each left plot provides the values of the HRF for other locations than the sampled ones.

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6410

Table 1 Average relative biases for the sample mean and the sample standard deviation



0.1 0.2 0.5 1.0 2.0 5.0

RB m (%)

RB sd (%)

N

N

100

200

500

1000

100

200

500

1000

0.9 0.7 1.8 4.7 29.5 124.8

0.3 0.3 0.5 3.6 16.4 96.3

0.1 0.3 0.6 0.6 8.3 57.0

0.0 0.0 0.0 0.4 5.7 37.4

116.7 35.1 15.3 6.3 21.5 1159.0

85.1 39.6 21.9 6.5 11.8 183.1

63.9 75.7 24.4 4.2 6.8 45.5

76.1 87.0 23.5 4.0 4.9 31.2

4 Variogram

Variogram

0.08 0.04 0.00

σ= 0.2 corr = 0.9774 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

Variogram

1.2

0.12

0.8 0.4 0.0

σ= 1.0 corr = 0.9999 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

3 2 1 0

σ= 2.0 corr = 0.9981 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

Fig. 2. Effect of the standard deviation on the variograms ¯ (black dots) and ˆ (circles) based, respectively, on the simulated and estimated HRF values. Simulations were performed for N = 200, = 0.1, = 1 and in {0.2, 1.0, 2.0}. The solid lines are the theoretical variograms. The “corr” values indicated in the plots provide the sample correlations between ¯ and ˆ .

RB m decreases as the sample size N increases and as the variance decreases. Moreover, for large enough variations of (·) ( in {1.0, 2.0, 5.0}), RB sd decreases as N increases and as decreases. It reflects that the stochastic fluctuations N, and the residual term N, become small with respect to B as, respectively, “N → ∞” and “N → ∞ and ˆ N ≈ B ˆ  , from which the estimate ˆ is computed, becomes

→ 0 (i.e.  → 0)”. Consequently, the approximation R more and more valid as N → ∞ and → 0. ˆ N = B + N, + N, of For low values of (0.1, 0.2 and 0.5), RB sd is high. It reflects that in the decomposition R Theorem 1, N, is strongly varying compared with the variations of the HRF whose values are, consequently, estimated with a strong noise (note that RB sd is a relative bias). The behavior of RB sd when N and vary is complicated because asymptotics are in N and in . For better understanding how RB sd varies, the rate of convergence in both N and  should be investigated. ˆ Sample variograms: Second, we compared how a simulated sample  and the corresponding estimated sample , respectively, catch the dependence structure of the HRF (·). In practice, we compared sample variograms (Chilès and ˆ Delfiner, 1999) computed from  and . For N = 200 and different values of , and , we computed ¯ , the average of 1000 sample variograms computed from 1000 simulated samples , and ˆ , the average of 1000 sample variograms computed from 1000 estimated samples ˆ The variograms were computed for the same vector of distances varying from 0.01 to 0.6. . Fig. 2 shows ¯ (black dots) and ˆ (circles) for = 0.2, 1.0 and 2.0; the solid line is the theoretical variogram. The high correlations between ¯ and ˆ (statistics given in the plots of Fig. 2) show that ˆ allows us to catch the same dependence structure as  does, up to biases on the nugget effect and the sill. The nugget effect observed for ˆ , whereas it is zero for ¯ , is due to the estimation uncertainty caused by the stochastic fluctuation N, . The bias on the sill of ˆ , when

= 0.2 or 1.0, is mainly due to the bias on the nugget effect. But when = 2.0, the sill of ˆ underestimates the actual ˆ N ≈ B ˆ . sill because of the increase with (i.e. ) of the residual term N, which degrades the approximation R

1.0

1.0

0.8

0.8

0.8

0.6 0.4

0.0

φ= 0 corr = -0.0216

0.6 0.4

Variogram

Variogram

1.0 0.8 0.6 0.4

0.0

φ= 0.05 corr =0.9998 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

0.4 φ= 0.025 corr = 0.9879

0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

1.2

6411

0.6

0.2

0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

0.2

φ= 0.01 corr = 0.8414

0.2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

1.4

1.4

1.2

1.2

1.0

1.0

Variogram

0.2

Variogram

1.0

Variogram

Variogram

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

0.8 0.6 0.4

φ= 0.1 corr = 0.9999

0.2 0.0

0.8 0.6 0.4 0.2

φ= 0.2 corr = 0.9999

0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

0.0 0.1 0.2 0.3 0.4 0.5 0.6 Distance

Fig. 3. Effect of the range parameter on the variograms ¯ (black dots) and ˆ (circles) based, respectively, on the simulated and estimated HRF values. Simulations were performed for N = 200, = 1, = 1 and in {0, 0.01, 0.025, 0.05, 0.1, 0.2}. The solid lines are the theoretical variograms. The “corr” values indicated in the plots provide the sample correlations between ¯ and ˆ .

Fig. 3 shows ¯ (black dots) and ˆ (circles) for different values of the range parameter . The plots for 0.05 show that ˆ allows us to catch the same dependence structure as  does, up to a bias on the nugget effect discussed in the previous paragraph. For = 0 (no dependence) and = 0.01 or 0.025 (very short range dependence), there seems to be a smoothing effect due to the use of a kernel in the definition of the local residuals. Fig. 4 shows, for the anisotropy parameter equal to 0.25 or 0.5, the variograms ¯ (filled symbols) and ˆ (unfilled symbols) computed in the horizontal (squares) and vertical (circles) directions. The variograms in the horizontal direction are steeper because the HRF is varying more quickly horizontally than vertically (see Fig. 1 bottom left). The theoretical variograms (solid and dashed lines) and the variograms ¯ based on the simulated HRF values (filled symbols) do not exactly coincide because a /8 tolerance angle was used to compute the directional sample variograms. Fig. 4 shows that ˆ allows us to catch the anisotropy of  in the two cases which are presented. 5. Radionuclide concentrations on Rongelap Island We applied our method to investigate the nature of a HRF (·) included in a model designed for the analysis of data on radionuclide concentrations (Diggle et al., 1997, 1998). More precisely, we investigated its point distribution and covariance structure. 5.1. Context and model Diggle et al. (1998) applied their model-based geostatistics methodology to data on radionuclide concentrations on Rongelap Island in the Pacific Ocean. Data consists of -ray counts y1 , . . . , yN measured at N = 157 locations z1 , . . . , zN over the island. Let x1 , . . . , xN denote the durations of measurements (the duration of measurement is a

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6412

1.5 1.4 1.2 1.0

0.5

η= 0.25 Vertically: corr = 0.9998

Variogram

Variogram

1.0

0.8 0.6 η= 0.5

0.4

Vertically: corr = 0.9990

0.2

Horizontally: corr = 0.9998 0.0

Horizontally: corr = 0.9996 0.0 0.0

Distance

0.1

0.2

0.3 0.4 Distance

0.5

0.6

Fig. 4. Effect of the anisotropy parameter on the variograms ¯ (filled symbols) and ˆ (unfilled symbols) computed in the horizontal (squares) and vertical (circles) directions. Simulations were performed for N = 200, = 1, = 0.1 and in {0.25, 0.5}. The solid lines are the theoretical variograms. The “corr” values indicated in the plots provide the sample correlations between ¯ and ˆ in both the horizontal direction and the vertical direction.

space-varying explanatory variable). Conditionally on an unknown and space-varying intensity of the radioactivity, say (·), random counts Y1 , n = 1, . . . , N, are assumed to be mutually independent and Poisson distributed with means (zn )xn . Diggle et al. (1998) assumed that the logarithm of (·) was a normal random field with a constant mean (or trend), a powered-exponential covariance function and a nugget effect; the authors applied their methodology to predict local values of (·). Then, they studied high values of the intensity of the radioactivity (·). In the discussion on the paper by Diggle et al. (1998), Ledford and Marriott (1998) asked whether normality of the logarithm of (·) is an appropriate assumption. The question is relevant as the study of high values of (·) depends on the heaviness of the tail of the distribution chosen for (·). Christensen (2004) answered Ledford and Marriott’s concern by letting (·) be in a wider class of random fields: he assumed that (·) is a Box–Cox transformed normal random field and selected the most suitable transformation. More precisely, he assumed that there exists b 0 such that fb {(z1 )}, . . . , fb {(zN )} are generated by a stationary normal random field, where  fb (t) =

tb − 1 b log t

if b > 0, if b = 0,

(12)

and he used MCMC maximum likelihood to select b. Christensen (2004) concluded by preferring the identity link (b = 1) instead of the log link (b = 0) chosen by Diggle et al. (1998). Moreover, he selected a spatial covariance structure including an exponential covariance function and a nugget effect. Note that the optimal b was 0.84 but the author prefers b = 1 because the identity link provides a possible additive separation of (·) in two components: a measurement error component and a spatially structured component. In the following, we investigate the nature of (·) using our method and we compare Christensen’s results with ours. 5.2. Application of the method Estimation of the HRF values at the sample sites: We first fitted the base model assuming (·) congruent with a constant value ¯ . The estimator of ¯ is ˆ¯ N = N −1 N n=1 Yn and its value is 7.49. Second, we let K be the normal kernel and selected the bandwidth hN by generalized cross validation (Loader, 1999). Third, we computed the residuals Rˆ N (zn ), n = 1, . . . , N (we let s1 , . . . , sI be z1 , . . . , zN ). Fourth, using a Moore–Penrose generalized inverse, we solved

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6413

10 25 8 Semivariance

Frequency

20 15 10

6 4

5

2

0

0 0

5

10 Estimates

15

0

200 400 600 800 1000 1200 Distance (m)

Fig. 5. Characteristics of the estimates ˆ (z1 ), . . . , ˆ (zN ) of the radioactivity intensity. Left: sample distribution of the estimates. Right: sample variogram computed using the estimates.

ˆ N = Bt, ˆ whose row n is the system of equations (9), i.e. R Rˆ N (zn ) =

N  m=1

wm (zn )xm tm − x(z ˆ n)

N  m=1

xm N

l=1 xl

tm ,

  ˆ = N where wm (z) = K{(z − zm )/ hN }/ N l=1 K{(z − zl )/ hN } are the kernel smoother weights and x(z) m=1 wm (z)xm . Then, the estimate ˆ of the HRF values  at the sample sites was computed using Eq. (10). Characteristics of the estimated HRF values: Fig. 5 shows the histogram (left) and the sample variogram (right) of estimated values of (·) at z1 , . . . , zN . The histogram is symmetric and unimodal. The sample variogram is increasing. These are informal observations since no test has been done; however, these observations will allow us to propose a sensible class for the HRF, and the specification for the HRF will be selected within this class by performing tests (see next few paragraphs). Proposition of a class of specifications for the HRF: The histogram of ˆ (z1 ), . . . , ˆ (zN ) is unimodal (Fig. 5, left) and ˆ (z1 ), . . . , ˆ (zN ) shows a spatial dependence decreasing with distance (Fig. 5, right). Based on these characteristics, we assumed that (·) is in the class of the Box–Cox transformed normal random fields with constant trends and with spatial covariance structures depending on a Matérn covariance function (Stein, 1999) and a nugget effect. In the following, we select a specification for (·) within this class using ˆ (z1 ), . . . , ˆ (zN ) as data. Selection of a specification for the HRF: The Box–Cox transformation is given by expression (12). The covariance structure is given by  2

+     if z0 = 0,

2 z0 z0  (13) Cov{(z), (z + z0 )} = else, K 2−1 () where z and z0 are in R2 , 0 is the nugget effect, 2 is the variance parameter, > 0 is the range parameter,  > 0 is the smoothness parameter and K is the modified Bessel function of the third kind of order .  + 2 is the sill. The exponential covariance structure selected by Christensen (2004) corresponds to  = 0.5. We first fitted the model for (·) letting all the parameters be free and using ˆ as data. Parameters were estimated by maximum of likelihood. We found that the maximum likelihood estimate of the transformation parameter b was 0.84, the same value as the one found as optimal by Christensen (2004). Then we tested some submodels for (·). For several values of the transformation parameter b, we let all the other parameters be free and we fitted the model for (·). The left plot of Fig. 6 shows the evolution with b of the maximum loglikelihood. In particular, it provides the maximum loglikelihoods for b = 0, i.e. the link chosen by Diggle et al. (1998), and for b = 1, i.e. the link selected by Christensen (2004). We performed likelihood ratio tests (LRTs) to test the hypotheses b = 1 and then b = 0. The hypothesis b = 1 is not rejected (−2 log R = 1.31, df = 1 and p = 0.25).

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6414

365

500

370 400 Frequency

Loglikelihood

375 380 385

300 200

390 100 395 0 0.0

0.84

0.5

1.0

1.5

0

b

10

20

30 40 -2logR

50

60

Fig. 6. Left: Evolution with b of the maximum loglikelihood. Right: graphical likelihood ratio test of the hypothesis b =0; histogram: null distribution of the LRT statistic obtained by parametric bootstrap; dot: observed LRT statistic.

Table 2 Estimates for the parameters of the HRF obtained by Christensen and by this study

Christensen This study

Trend

2





6.19 7.08

2.41 2.39

338.2 340.0

2.05 4.97

For b = 0, the LRT statistic was parametrically bootstrapped (McLachlan, 1987) to get its null distribution because the value 0 lies at the border of the domain of parameter b (1000 replications were performed). The right plot of Fig. 6 shows the null distribution of the LRT statistic together with the observed LRT statistic, i.e. 64.08. The hypothesis b = 0 is rejected. So we prefer the identity link (b = 1) instead of the log link (b = 0). Moreover, we performed a LRT to test the hypothesis b = 1 and  = 0.5 (values that Christensen (2004) finally selected). This hypothesis is not rejected (−2 log R = 1.31, df = 2 and p = 0.52). Conclusion: Finally, we select the same specification for the intensity (·) as the one selected by Christensen (2004): (·) is a normal random field with constant trend and with spatial covariance structure depending on an exponential covariance function and a nugget effect. The right tail of this distribution is certainly more appropriate than the right tail of the distribution chosen by Diggle et al. (1998). Note that the selected specification allows negative values for (·) although (·) is a positive function; so, the normal distribution should be truncated at 0. Remark 1. A quite similar specification for the HRF is obtained when, instead of applying our method, the values yn /xn , n = 1, . . . , N, are directly used as estimates of the HRF values. The latter approach works well because the case which is treated is simple (no regression parameter in the model, simple data characteristics). However, one of the advantages of our method, for the case studied in this section but also for more complicated cases such as the one studied in the previous section, is that it is justified by a theoretical result. Remark 2. Table 2 shows the estimates for the parameters of the model for (·) obtained by Christensen (2004) and by us. The estimates for the covariance structure parameters ( 2 and ) obtained by Christensen and by us are almost identical but this is not the case for the constant trend and the nugget effect  (Section 4.3 provides an explanation). This comparison is informal since uncertainties about the parameters are not taken into account. However, it shows

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6415

that estimates of the HRF parameters obtained from our method must be considered with care: our method is designed to specify the HRF, not to estimate the parameters of the HRF. Consequently, once the HRF is specified using our method, parameters of the hierarchical model (including parameters of the HRF) must be estimated using, for example, the MCMC maximum likelihood of Christensen (2004). 6. Discussion In this paper we have considered hierarchical models including hidden random fields (HRFs), and studied how to specify the HRF. For specifying the HRF we have adapted the approach of Soubeyrand et al. (2006) based on a residual analysis. The idea is that residuals of any model fitted to data can be marked by elements influencing data but not integrated into the model. In our context, the residuals of the base model, which does not integrate the element ‘HRF’, are marked by the HRF; we have exhibited this mark and used it to specify the HRF. Different approaches for specifying the HRF: A first approach for specifying the HRF consists of choosing a specification before performing data analysis. This approach requires some knowledge about the underlying mechanisms of the analyzed phenomenon. If the analyst does not have at his disposal such knowledge, the risk of misspecification of the HRF is high. Alternatively, there are data-based approaches. For instance, the approach of Christensen (2004) consists of (i) estimating parameters of different hierarchical models including HRFs specified differently by using observed variables as data and (ii) selecting the more appropriate specification for the HRF. Our approach is also a data-based approach. It consists of (i ) estimating values of the HRF by using observed variables as data, (ii ) estimating parameters of different HRF models by using the estimated values ˆ as data and (iii ) selecting the more appropriate specification. In our method a preliminary analysis that provides discernment of the main characteristics of the HRF can be performed after step (i ) as a means of proposing relevant HRF models in step (ii ). Indeed, step (i ) is a sort of restoration of the HRF since it provides estimates for local values of the HRF. The histogram of these estimates can be drawn to see whether they show one or several modes (see Section 5.2), their sample variogram can be drawn to see whether they show a spatial structure or not and what kind of structure they show (see Section 5.2), and their map can be drawn to see whether they show a drift (Chilès and Delfiner, 1999) or not (not shown in this paper). Such a preliminary analysis provides guidance for proposing a relevant class of specifications for the HRF in step (ii ). In comparison, in step (i) of Christensen (2004), the HRF stays hidden and the different specifications for the HRF are proposed blindly. A comment on the asymptotics: The restoration of the HRF (step (i )) is based on a result obtained by assuming that the sample size tends to infinity (N → ∞) and that the HRF variations tends to zero ( → 0), see Section 3. When a phenomenon is studied, the sample size can be increased in general, whereas the hidden process cannot be changed. So  → 0 could seem unreasonable; but in fact,  → 0 allows us to obtain a linear approximation of the local residuals as a function of the HRF realization. Indeed, as the HRF is not included in the base model, the residuals obtained under the base model satisfy residuals = g(realization of the HRF), where g is an unknown function; this function cannot be determined in general cases; but  → 0 justifies the use of Taylor’s expansions which allows us to obtain a linear approximation of g in quite a general setting. Note that if E, (Yn |Zn ) is linear in , as in Sections 4 and 5, then terms in N, are suppressed and the linear approximation is improved. In this paper, we have approximated g using rather strong assumptions. We expect in further research to approximate g by weakening the assumptions, especially those related to the variations of the HRF. Application of the method: In practice, the simulation study (Section 4) has shown situations where our method of specification is adapted and others where it is not because local values of the HRF are reasonably estimated in the former and poorly estimated in the latter. For the Rongelap data, the situation is favorable, and Christensen’s method and our method result in the same specification for the HRF. For any data set, it would be useful to develop an approach to see whether the method of specification is adapted to the situation. Moreover, in both the simulation study and the real-case study, we have considered Poisson error distributions. According to our theorem, the method can be applied to continuous and discrete error distributions. Nevertheless, simulation studies should be carried out to view the performance of the method when non-Poisson error

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6416

distributions are used, especially when an error distribution with low information content, such as the Bernoulli, is used. Acknowledgements We thank Dr. O.F. Christensen, S.E. Townsend, the associate editor and the reviewers for their comments and suggestions about the paper. We also thank Dr. S. Simon of the Marshall Islands National Radiological Survey for making available to anyone the radionuclide concentration data through the geoRglm package of the R Statistical Software. Appendix A. Model The appendices provide a sketch of the proof for Theorem 1. The proof is quite similar to the proof of Theorem 1 of Soubeyrand et al. (2006). The main change concerns the determination of the limiting distribution of the stochastic fluctuation because of the use of a kernel smoother in the definition of the local residuals. In this appendix, we detail the definition of the hierarchical model briefly presented in Section 2 and we introduce notation. The notation has been made very similar to that used in Soubeyrand et al. (2006) to show what the differences between the two papers are. Y (Z1 ) = Y1 , . . . , Y (ZN ) = YN are response variables in R observable at locations Z1 , . . . , ZN in a spatial domain Z ⊂ Rq , q ∈ N∗ . Conditionally on a function (·) from Z to  ⊂ Rd , d ∈ N∗ , couples (Z1 , Y1 ), . . . , (ZN , YN ) are independent and identically distributed in Z × R with distribution P satisfying P (dz, dy) = {y|x(z), (z)}(dy)(dz) ∀(z, y) ∈ Z × R.

(A.1)

In expression (A.1), •  is a probability measure, with density f , on its support Z, for example the Gaussian measure on R2 or a uniform measure if Z is a bounded set; •  is a measure on its support R, for example the Lebesgue measure if the response variable is continuous over R or the Poisson measure if it takes values in N; • (z) can be viewed as a vector of parameters lying at location z and is assumed to be split in two components   (z) (z) = ,  where (·) is a varying function over the spatial domain Z and takes values in Rd1 , and  ∈ Rd2 is constant over Z (d1 + d2 = d). (·) is an unknown realization of a HRF whose distribution is unspecified, and  is a vector of unknown regression parameters. The notation (·) allows the replacement of the heavier notation (·), . Remark: although (·) is real-valued in the body of the paper, we will show that the method can be applied more generally for (·) with values in Rd1 . So, in the following,  = supz∈Z |(z) − ¯ | introduced in Eq. (2) is changed to  = sup (z) − ¯ ∞ , z∈Z

(A.2)

where · ∞ is the supremum norm of a vector.  is finite since (·) takes values in a bounded subset of Rd1 (assumption (a)); • x(·) is a function, defined over Z, of space-dependent explanatory variables; • {·|x(z), (z)} is the conditional distribution with respect to  of Yn given Zn = z ∈ Z and (·). For the sake of brevity, x(z) in  is replaced by z in the following. So, we will write {·|z, ·}. Functions , ,  and x are assumed to be known. Let D denote the unspecified probability distribution of  on a class C(Z, ) of functions from Z to , and PD the joint distribution of (Zn , Yn , ), PD (dz, dy, d) = P (dz, dy)D(d) ∀(z, y, ) ∈ Z × R × C(Z, ).

(A.3)

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6417

Let us introduce additional notation.  • Set ¯ = ( ¯ ) where ¯ = Z ED {(z)}(dz) which was already written in Eq. (1) exists under assumption (b) (assumptions mentioned in this text are detailed in Appendix F). • For  in C(Z, ), let E and V denote the expected value and the variance under P . For all  in , let Z denote the function defined over Z such that Z (z) =  for all z ∈ Z. • Let ∇ and H, respectively, be the gradient operator and the Hessian operator with respect to the vector  applied to functions depending on . Let ∇k be the kth component of ∇. Appendix B. Estimation of ¯ under the base model and expression of the bias ¯ − ˆ N Under the base model PZ ,  ∈ , the (scaled) loglikelihood function is lN (; Z, Y) =

N 1  log (Yn |Zn , ). N

(B.1)

n=1

Its maximizer ˆ N exists and is finite under assumptions of regularity (c) and (d) and concavity (e). The statistic ˆ N is ¯ viewed as an estimator of . In the following, assumptions (a)–(d) are supposed to be satisfied. From the strong law of large numbers and under integrability assumption (h2), the P -a.s.-limit of lN (; Z, Y) as N → ∞ is l ∗ (, ) = E [log (Yn |Zn , )]   = log (y|z, )P (dz, dy). Z R



Let  be the maximizer of l ∗ (·, ). Define ¯ l (, ) = E [log {Yn |Zn ,  + (Zn ) − }]   ¯  (dz, dy). = log {y|z,  + (z) − }P Z R

¯ is the maximizer of l (·, ). Moreover, from a Taylor expansion and under integrability assumption (h3) the limit of l (, ) as  → 0 is l ∗ (, ). Lemma 1. Under concavity assumptions (e) and (f), assumption (g) of uniqueness of ∗ , and integrability assumptions (h2)–(h5), ˆ N −→ ∗ P −a.s.

¯ −→ ∗

as N → ∞,

(B.2)

as  → 0,

and

(B.3) 

¯ ∗ − ˆ N ) = F(∗ , )(

 N 1  ∇ log (Yn |Zn , ∗ ) − E {∇ log (Yn |Zn , ∗ )} N n=1

+ oP −a.s. ( + ∗ − ˆ N ∞ )

as N → ∞,

N  ¯ ¯ − ∗ ) = − 1 ¯ ¯ F(∗ , )( F (Zn , ∗ , ){(Z n ) − } N n=1   N 1  ∗ ¯ ∗ ¯ − E [F (Zn ,  , ){(Z ¯ ¯ + F (Zn ,  , ){(Zn ) − } n ) − }] N n=1

+ o( + ¯ − ∗ ∞ ) as  → 0,

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6418

where ¯ = −E Z {H log (Yn |Zn , ∗ ) | Zn = z}, F (z, ∗ , ) ¯  ¯ = ¯ F (z, ∗ , )(dz) F(∗ , )

(B.4) (B.5)

Z

= − E ¯ Z {H log (Yn |Zn , ∗ )}.

(B.6)



¯ is a Fisher information matrix at location z ∈ Z and F(∗ , ) ¯ is a mean Fisher information matrix. F (z, ∗ , ) Sketch of proof. Convergences (B.2) and (B.3) can be shown by using a Rockafellar theorem and a corollary both presented in Senoussi (1990). The consecutive equations are obtained by using Taylor expansions.  Corollary 1. In addition, if F(,  ) is invertible for all  and  in  (assumption (i)), then as N → ∞ and  → 0 ¯ − ˆ N → [PD − a.s.]0 and N  ¯ ¯ ¯ −1 1 ¯ − ˆ N = − F(∗ , ) F (Zn , ∗ , ){(Z n ) − } N n=1  N  ¯ −1 1 + F(∗ , ) ∇ log (Yn |Zn , ∗ ) − E {∇ log (Yn |Zn , ∗ )} N n=1  N  1 ∗ ¯ ¯ ¯ ¯ + F (Zn , ∗ , ){(Z n ) − } − E [F (Zn ,  , ){(Zn ) − }] N n=1

+ oPD −a.s. ( + ∗ − ˆ N ∞ ) + o( + ¯ − ∗ ∞ ).

(B.7)

Appendix C. Expression of local residual Rˆ N (s) Suppose assumptions mentioned in Appendix B and assumptions of integrability (h6)–(h8) are satisfied. By using Taylor expansions and expression (B.7) of the bias ¯ − ˆ N , it can be shown that the local residual Rˆ N (s), s ∈ Z, obtained under the base model P ˆ Z and defined by Eq. (5), satisfies under P N

Rˆ N (s) =

N 



¯ − wn (s)A(Zn , ˆ N ) {(Zn ) − } T

n=1

N 

 wn (s)A(Zn , ˆ N )

T

n=1

N N   ¯ −1 1 ¯ ¯ × F(∗ , ) F (Zn , ∗ , ){(Z wn (s){Yn − E (Yn |Zn )} n ) − } + N n=1 n=1 N   N   ∗ T ∗ ¯ −1 1 + wn (s)A(Zn ,  ) F( , ) ∇ log (Yn |Zn , ∗ ) − E {∇ log (Yn |Zn , ∗ )} N n=1 n=1  N  1 ∗ ¯ ¯ ¯ ¯ + F (Zn , ∗ , ){(Z n ) − } − E [F (Zn ,  , ){(Zn ) − }] + N, (s), N n=1

(C.1)

  where wn (s) = K{(s − Zn )/ hN }/ N m=1 K{(s − Zm )/ hN } is a kernel weight, A(z, ) = R y∇(y|z, )(dy), z ∈ Z and  ∈ , and N, (Zn ) = oPD −a.s. ( + ∗ − ˆ N ∞ + ¯ − ∗ ∞ ) as N → ∞ and  → 0. Eq. (C.1) exhibits the link between a local residual and variations of the HRF (·) which appears in the first two terms at the right side of the  equality sign since (z) − ¯ = ( (z)−¯ 0 ). Moreover, the third and fourth terms are differences between weighted means and their expected values under P ; the fifth term is a negligible term as N → ∞ and  → 0 ( measures variations of (·), see Eq. (A.2)).

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6419

ˆ N of local residuals Appendix D. Decomposition of the vector R Set



A() = ⎝

A(Z1 , )T



(0) ..

⎠,

.

(0) A(ZN , ) ⎞ w1 (si ) .. ⎠ ⎝ Wi = , i = 1, . . . , I and . wN (si )

 ∈ ,

T



⎛WT ⎞ 1

W = ⎝ ... ⎠ , WIT

¯ = W A(ˆ N )[IdN − N −1 (1N ⊗ Id )F(∗ , ) ¯ −1 {F (Z1 , ∗ , ), ¯ . . . , F (ZN , ∗ , )}], ¯ B0 (ˆ N , ∗ , ) where Ik and 1k are, respectively, the identity matrix and the unit vector of dimension k ∈ N∗ , and ⊗ is the Kronecker product also called matrix direct product, UN, =

N 1  ∇ log (Yn |Zn , ∗ ) − E {∇ log (Yn |Zn , ∗ )} N n=1

+

N 1  ∗ ¯ ¯ ¯ ¯ F (Zn , ∗ , ){(Z n ) − } − E [F (Zn ,  , ){(Zn ) − }], N n=1

¯ −1 UN, , N, = W {Y − E (Y|Z)} + W A(∗ )(1N ⊗ Id )F(∗ , ) ⎞ ⎞ ⎛ ⎛ (Z1 ) − ¯ (Z1 ) − ¯ . . ⎠ and  = ⎝ ⎠. .. ..  = ⎝ (ZN ) − ¯ (ZN ) − ¯ ˆ N = {Rˆ N (s1 ), . . . , Rˆ N (sI )}T can be written Then, the vector of local residuals R ¯ + ˆ N = B0 (ˆ N , ∗ , ) R N, + ,N , where N, = {N, (s1 ), . . . , N, (sI )}T is oPD −a.s. ( + ∗ − ˆ N ∞ + ¯ − ∗ ∞ )1I as N → ∞ and  → 0. Only components 1, . . . , d1 , . . . , (I − 1)d + 1, . . . , (I − 1)d + d1 of  can be different from 0; indeed, the others ¯ and denote the correspond to . So, keep columns 1, . . . , d1 , . . . , (I − 1)d + 1, . . . , (I − 1)d + d1 of B0 (ˆ N , ∗ , ) ∗ ¯ ∗ ¯ ∗ ¯ ˆ ˆ ˆ resulting matrix by B(N ,  , ), then B0 (N ,  , ) = B(N ,  , ) . We obtain Eq. (6) ¯ + ˆ N = B(ˆ N , ∗ , ) R N, + N, . ¯ and taking the limit as N → ∞ lead to the zero matrix. Note that summing block-columns of size d of B0 (ˆ N , ∗ , ) ∗ ¯ ¯ ˆ Therefore, B0 (N ,  , ) is not asymptotically invertible and neither is B(ˆ N , ∗ , ). Appendix E. Limiting distribution of N, E.1. Limiting distribution of W {Y − E (Y|Z)} W {Y − E (Y|Z)} is the kernel estimator of the vector whose components are E {Yn − E (Yn |Zn )|Zn = si } = 0, i = 1, . . . , I . Let f denote the density of Zn over Z. Using Theorem IV-1 Bosq and Lecoutre (1987, p. 122), under q assumptions (j1)–(j4), if hN → 0 and there exists a > 0 such as N 1−a hN → ∞ as N → ∞, then 1/2  −1/2   f (si ) d q : i = 1, . . . , I NhN K2 diag W {Y − E (Y|Z)} −→ N(0, II ). (E.1) (Y |Z = s ) V i Z  n n

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6420

E.2. Limit of W A(∗ )(1N ⊗ Id ) Using Theorem II-1 in Bosq and Lecoutre (1987, p. 106), under assumptions (j1), (j2), (j5), and (j6), if hN → 0 and q NhN → ∞ as N → ∞, then ⎞ ⎛ A(s1 , ∗ )T P  . ⎠. .. W A(∗ )(1N ⊗ Id ) −→ ⎝ (E.2) ∗ T A(sI ,  ) E.3. Limiting distribution of UN, Using the central limit theorem, under assumptions (h3) and (h4), as N → ∞ √ d ¯ ¯ N UN, −→ N[0, V [∇ log (Yn |Zn , ∗ ) + F (Zn , ∗ , ){(Z n ) − }]].  E.4. Limiting distribution of

q

N hN N, conditionally on 

Combining convergences (E.2) and (E.3) leads to   q ¯ −1 UN, −P→ NhN W A(∗ )(1N ⊗ Id )F(∗ , ) 0

as N → ∞.

(E.4)

Then, from convergences (E.1) and (E.4)  d q NhN N, −→ N{0, ()} as N → ∞, where

(E.3)

(E.5)

   V (Yn |Zn = si ) : i = 1, . . . , I . K 2 diag f (si ) Z

 () =

(E.6)

Convergence (E.5) has been proved for  fixed in  (pointwise convergence). The next section shows that the convergence is not conditional on . E.5. Unconditional weak convergence Convergence (E.5) can be written in term of characteristic functions N,t () = E (e

 q it T NhN N,

) − E (eit

TX 

)→0

as N → ∞

∀t ∈ RI ,

where X ∼ N{0, ()}. Apply the dominated convergence theorem of Lebesgue (Rudin, 1998) to the sequence of functions t,N , N = 1, 2, . . . , defined over C(Z, ). For all  ∈ C(Z, ) limN→∞ N,t () = 0. In addition, for all  ∈ C(Z, ) and for all N = 1, 2, . . . , |N,t ()| 2 and the function   → 2 is D-integrable over C(Z, ) (probability measure D was introduced in Eq. (A.3)). Therefore  lim N,t ()D(d) = 0. N→∞ C(Z,)

This equation means that for all t ∈ RI lim EPD {N,t ()} = lim EPD (e

N→∞

N→∞

= 0,

 q it T NhN N,

) − EPD (eit

TX 

) (E.7)

where the probability measure PD was introduced in Eq. (A.3). The convergence to 0 of the difference between the characteristic functions is no more conditional on .

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

6421

E.6. Estimation of the limiting variance ()  q To estimate the limiting variance–covariance matrix () of N hN N, , the unknown quantity (si ) needed to compute V (Yn |Zn = si ) is replaced by its estimate obtained with our method. Appendix F. Assumptions  is a closed bounded convex subset of Rd . For all  in C(Z, ), z  → ED {(z)} is -integrable over Z. (z, y, ) → (y|z, ) is positive on Z × R × .  → (y|z, ) is 3 times differentiable on  for all (z, y) ∈ Z × R.  → lN (; Z, Y) is concave on  for all (Z, Y) ∈ (Z × R)N .   → l (, ) is concave on  for all  in C(Z, ). For all  in C(Z, ),   → l ∗ (, ) has a unique maximum in the interior of . There exists a constant M < ∞ such that for all  in C(Z, ), for all z in Z, for all ,  in , and for all K, k , k

in {1, . . . , d}, (h1) E (|Yn |3 | Zn = z)M, (h2) |E {log (Yn |Zn , ) | Zn = z}|M, (h3) E {|∇k log (Yn |Zn , )|3 | Zn = z} M, (h4) E {|∇k ∇k log (Yn |Zn , )|2 | Zn = z} M, (h5) |E {∇k ∇k ∇k

log (Yn |Zn , ) | Zn = z}|M, (h6) | R y∇k (y|z, )(dy)| M, (h7) | R y∇k ∇k (y|z, )(dy)|M, (h8) | R y∇k ∇k ∇k

(y|z, )(dy)|M, (i) F(,  ) is invertible for all  and  in . (j) (j1) K is a positive Parzen-Rozenblatt kernel, (j2) f is continuous and strictly positive in Z, (j3) z  → V (Yn |Zn = z) is continuous, strictly positive and bounded in Z for all  in C(Z, ), (j4) E {|Yn − E (Yn |Zn )|p } < ∞ for all p in N and for all  in C(Z, ), (j5) z → R y∇(y|z, ∗ )(dy) is continuous in Z,   (j6) E { R y∇k (y|Zn , ∗ )}2 (dy) < ∞ for all K in {1, . . . , d} and for all in C(Z, ).

(a) (b) (c) (d) (e) (f) (g) (h)

References Bosq, D., Lecoutre, J.-P., 1987. Théorie de l’Estimation Fonctionnelle. Economica, Paris. Chilès, J.-P., Delfiner, P., 1999. Geostatistics. Modeling Spatial Uncertainty. Wiley, New York. Christensen, O.F., 2004. Monte Carlo maximum likelihood in model-based geostatistics. J. Comput. Graph. Statist. 13, 702–718. Desassis, N., Monestiez, P., Bacro, J.N., Lagacherie, P., Robbez-Masson, J.M., 2005. Mapping unobserved factors on vine plant mortality. In: Renard, P., Demougeot-Renard, H., Froidevaux, R. (Eds.), Geostatistics for Environmental Applications. Springer, Berlin, pp. 125–136. Diggle, P.J., Harper, L., Simon, S., 1997. Geostatistical analysis of residual contamination from nuclear weapons testing. in: Barnett, V., Turkman, K.F. (Eds.), Statistics for Environment, vol. 3, Wiley, Chichester, pp. 89–107. Diggle, P.J., Tawn, J.A., Moyeed, R.A., 1998. Model-based geostatistics. J. Roy. Statist. Soc. C 47, 299–350. Henderson, R., Shimakura, S., Gorst, D., 2002. Modeling spatial variation in leukemia survival data. J. Amer. Statist. Assoc. 97, 965–972. Ledford, A.W., Marriott, P.K., 1998. Discussion on the paper by Diggle et al. (1998). J. Roy. Statist. Soc. C 47, 326–350. Loader, C.R., 1999. Bandwidth selection: classical or plug-in? Ann. Statist. 27 (2), 415–438. McLachlan, G.J., 1987. On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. J Roy. Statist. Soc. C 36 (3), 318–324. Monestiez, P., Dubroca, L., Bonnin, E., Durbec, J.-P., Guinet, C., 2005. Comparison of model based geostatistical methods in ecology: application to fin whale spatial distribution in northwestern mediterranean sea. in: Leuangthong, O., Deutsch, C. V. (Eds.), Geostatistics Banff 2004, vol. 2. Springer, Berlin, pp. 777–786. Pagan, A., Ullah, A., 1999. Nonparametric Econometrics. Cambridge University Press, Cambridge. Rudin, W., 1998. Analyse Réelle et Complexe, Third ed., Dunod, Paris.

6422

S. Soubeyrand, J. ChadWuf / Computational Statistics & Data Analysis 51 (2007) 6404 – 6422

Senoussi, R., 1990. Statistique asymptotique presque-sûre de modèles statistiques convexes. Ann. Inst. Henri Poincaré 26, 19–44. Soubeyrand, S., ChadWuf, J., Sache, I., Lannou, C., 2006. Residual-based specification of the random-effects distribution for cluster data. Statist. Methodol. 3, 464–482. Stein, M.L., 1999. Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York. Zhang, H., 2002. On estimation and prediction for spatial generalized linear mixed models. Biometrics 58, 129–136.