Bayesian analysis of regression models with spatially correlated errors and missing observations

Bayesian analysis of regression models with spatially correlated errors and missing observations

Computational Statistics & Data Analysis 39 (2002) 387–400 www.elsevier.com/locate/csda Bayesian analysis of regression models with spatially correla...

400KB Sizes 0 Downloads 97 Views

Computational Statistics & Data Analysis 39 (2002) 387–400 www.elsevier.com/locate/csda

Bayesian analysis of regression models with spatially correlated errors and missing observations Man-Suk Oha; ∗ , Dong Wan Shina , Han Joon Kimb a

Department of Statistics, Ewha Womans University, So-Dae-Moon Gu, Seoul 120-750, South Korea b Korea Ocean Research & Development Institute, Ansan, Kyung-Gi-Do, 425-600 South Korea Received 1 November 2000; accepted 1 August 2001

Abstract A Bayesian approach is proposed for estimating regression models on rectangular grids in which errors are spatially correlated and missing observations are present in the response variable. An easy and e5cient Markov chain Monte Carlo algorithm is fully described for posterior inference on parameters and prediction of missing observations. Analysis of a real marine remote-sensing data set is presented c 2002 Elsevier Science B.V. All rights reserved. to illustrate the method.  Keywords: Gibbs sampling algorithm; Markov chain Monte Carlo; Missing value; Posterior inference; Spatial data

1. Introduction Spatial data arise in many :elds such as geophysics, image restoration, agriculture, medicine, and others. In the recent literature, diverse aspects of spatial data have been actively discussed. One important area is the class of regression models with spatially correlated errors. This class is proposed for surface modeling in geology and is also used in archaeology, geography, epidemiology, and image processing. Cressie (1991, Chapter 7) details the basic issues of model estimation for regression models with spatially correlated errors. ∗ Corresponding author. E-mail address: [email protected] (M.-S. Oh).

c 2002 Elsevier Science B.V. All rights reserved. 0167-9473/02/$ - see front matter  PII: S 0 1 6 7 - 9 4 7 3 ( 0 1 ) 0 0 0 8 4 - 6

388

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

Several authors such as Green (1985), Besag and Kempton (1986), StoBer (1986), and Zimmerman and Harville (1991) utilize spatial correlations in analyzing spatial regression data. However, their analyses are one-dimensional in that correlations only in one spatial dimension are incorporated into their models. An estimation procedure utilizing two-dimensional spatial correlation is proposed by Basu and Reinsel (1994). They consider a regression model yij = xij  + zij ;

i = 1; : : : ; m;

j = 1; : : : ; n;

(1)

on rectangular grids {(i; j): i = 1; : : : ; m; j = 1; : : : ; n}, where {yij } is an array of observations, {xij } is an array of p × 1 dimensional regressors, and {zij } is an array of spatially correlated regression errors. They consider correlations in both directions of the rectangular grids by modeling the regression errors as a two-dimensional :rst-order ARMA model. They develop a maximum likelihood estimation scheme assuming that complete observations are available. However, their method requires analytic expressions for spatial correlation structures among neighboring observations for evaluation of the likelihood, which are not available when missing values are present. Thus, their method is not applicable to data sets with missing values. Missing values are frequently encountered in spatial data. For example, missing values can arise due to cloud cover in satellite images, suppressed areal unit data in census publications because of con:dentiality legislation that suppresses areal unit, and loss of rainfall data because of machine malfunction. StoBer (1986) considers method of moment estimation for spatial regressions with missing responses whose models are de:ned on rectangular grids consisting of m locations and n time points. StoBer (1986) incorporates correlations only in time direction by adding time-lags of the dependent variable to the set of regressors. In addition, as Shin and Sarkar (1995) notice for time series models, when the missing portion of the data is not negligible, the method of moment estimator suBers from lack of e5ciency compared to the maximum likelihood estimator. Gri5th (1988, Chapter 6) considers a simple mean model yij = + zij in which correlations in both directions (i; j) are assumed to be the same. They discuss estimation of the common mean and prediction of missing values. A regression model with time series error and missing values is studied by, among others, Wincek and Reinsel (1986) and Shin and Sarkar (1994) in the context of direct maximum likelihood estimation. In this article, we take a Bayesian approach to analyzing spatial regression models having missing values in the response variable yij , but having no missing values in the covariates xij . Previous sampling-based Bayesian approaches to spatial data or time series data are those of Besag and Green (1993) and McCulloch and Tsay (1994). Besag and Green (1993) present an overview of Markov chain Monte Carlo (MCMC) methods in the context of spatial data and brieJy discuss the basic idea of MCMC methods for parameter estimation in the presence of missing responses in analyzing data sets from agricultural :eld trials. Their model is an aggregate of variety eBect (i: , say), fertility eBect (:j , say), and independent errors (independent zij ). McCulloch and Tsay (1994) discuss application of Gibbs sampling methods to estimation of autoregressive (AR) time series models with missing responses. They provide detailed implementation of a Gibbs sampler which consists of conditional

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

389

posterior distributions of each of the AR parameters and missing responses given all the other information. We consider regression model (1) with spatially correlated errors zij = 1 zi−1; j + 2 zi; j−1 − 1 2 zi−1; j−1 + ij ;

i = 1; : : : ; m; j = 1; : : : ; n;

(2)

where ij are independent normal errors having mean zero and variance 2 , and | 1 | ¡ 1, | 2 | ¡ 1. The spatial model (2) is called a multiplicative AR(1) model. Its correlation function st = corr(zij , zi−s; j−t ) satis:es the reJection symmetry condition st = −s; −t = s; −t = −s; t . The multiplicative spatial model is widely used for analyzing :eld trial data, see, for example, Martin (1979, 1990), Cullis and Gleeson (1991), Basu and Reinsel (1994), and Alonso et al. (1991). Note that the correlation structure of (2) is more general than those used in spatial regression models of Besag and Green (1993), StoBer (1986), and Gri5th (1988). For example, Besag and Green consider independent errors, StoBer considers correlation only in one direction of (i; j), and Gri5th assumes that correlations in both directions are the same. Our Bayesian method, like any other Bayesian method, imposes no technical restrictions on the regression function xij  except for the trivial conditions of matrix nonsingularities required for computing posterior densities. Therefore, our model allows the covariates xij to contain lags of yij such as yi−t; j−s ; t; s = 0; 1; : : : ; (t; s) = (0; 0), retaining independence of xij and ij . This model contains the model of StoBer (1986) as a special case. Also, our model allows the situation xij  = i: + :j , say, which is the model of Besag and Green (1993). Clearly, our regression model (1) contains the simple mean model of Gri5th (1988). Therefore, our spatial missing response regression model can be considered as a generalization of all the models of previous researchers in terms of both regression models and spatial correlation structures. The Bayesian approach has several advantages. It takes into account all possible values of missing observations instead of one maximum point, it utilizes available prior information, and it can incorporate restrictions on parameters easily. However, a key di5culty in Bayesian analysis is that posterior estimation of the unknown parameters and prediction of missing observations require high-dimensional integrations which are not analytically tractable. As a numerical solution to this problem, we suggest a sampling-based method by using the Gibbs sampling algorithm of Gelfand and Smith (1990). The Gibbs sampling algorithm is particularly useful for prediction of missing values since it only requires the full conditional distribution of each missing observation given all the others, whose analytical form is often easily derived. This is in contrast with most other direct maximum likelihood methods requiring the joint conditional distribution of all missing observations given the observed values, which is usually di5cult to derive because the correlation structure among observations is hard to derive. The Gibbs sampling algorithm also requires full conditional posterior distributions of components of unknown parameters. Some of the full conditional distributions may not have convenient forms for random generation, as will be shown later. In such cases we suggest an e5cient Metropolis-Hastings algorithm (Hastings, 1970) for generating the component in each step of the Gibbs sampling algorithm.

390

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

In the remainder of this paper, a detailed description of Bayesian estimation via the Gibbs sampler is provided in Section 2 and a real data set is analyzed in Section 3. A discussion is provided in Section 4. 2. Bayesian estimation via the Gibbs sampler Let Y = (y11 ; y21 ; : : : ; ym1 ; : : : ; y1n ; : : : ; ymn ) ; Z = (z11 ; z21 ; : : : ; zm1 ; : : : ; z1n ; : : : ; zmn ) ; X = (x11 ; x21 ; : : : ; xm1 ; : : : ; x1n ; : : : ; xmn ) and let M be the set of indices corresponding to the missing observations. We assume that yij , (i; j) ∈ M are missing, but that the regressor xij are fully observed. Let Ymis (Yobs ) be the subvector of Y with all missing (observed) values. We assume missing observations are “missing completely at random” (MCAR) or “missing at random” (MAR) in the sense of Rubin (1976). Let = ( 1 ; 2 ) . The likelihood function is given by  (3) L(; 2 ; |Yobs ) = f(Y |; 2 ; ) dYmis ; where f(Y |; 2 ; ) = (22 )−mn=2 ||−1=2 exp[ − 0:5−2 h() −1 h()] is the joint density of Y , h() = Y − X,  = −2 Var(Z), and || is the determinant of . According to Basu and Reinsel (1994), || = (1 − 12 )n (1 − 22 )m ; h() −1 h() = h∗ () h∗ (); where h∗ () = H  h(); H = H1 ⊗ H2 ;  1 − k2 0 0 :::    − k 1 0 :::   Hk =  0 − k 1 : : :   :: :: :: : : :  0

0

0

0



  0    0 ;  ::: 

k = 1; 2

1

and ⊗ is the Kronecker product. Analytic evaluation of the likelihood L(, 2 , |Yobs ) is impossible because the missing values are entangled with the spatial correlation structure and hence analytic posterior inference is not possible.

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

391

To get around this di5culty, we adopt the Gibbs sampling algorithm of Gelfand and Smith (1990). Considering both the unknown parameters and the missing observations as unknown variables, the Gibbs sampler generates samples of the unknown variables via iterative sampling of each component of the unknown variables from its full conditional posterior distribution. Thus, to implement the Gibbs sampler, we only need to derive the full conditional posterior distributions of component variables, which are low-dimensional and usually given in simple forms. For Bayesian analysis of the model, we need to choose priors for the parameters. We assume independence among , 2 , 1 , and 2 so that the prior of (; 2 ; ) is (; 2 ; ) = ()(2 )( 1 )( 2 ). For the prior of each parameter, we :rst consider the conjugate prior: for , Np (0 ; A), a p-variate multivariate normal distribution with mean 0 and variance A; for 2 , IG(#; $), the inverse gamma distribution whose density is proportional to (2 )−(#+1)=2 exp[ − $=2 ], where # and $ are given positive real numbers; for i , N( i0 ; i2 )I (| i | ¡ 1), a univariate normal distribution with mean

i0 and variance i2 restricted to the interval (−1; 1), for i = 1; 2. Combining the above prior and the density function of Y , we have the following full conditional posterior distributions of components of the parameters

−1

1 ∗ ∗ 1 ∗ ∗ −1 −1 0 |all others ∼ N X X +A X Y +A  ; 2 2

1 ∗ ∗ X X + A− 1 2

−1

;

2 |all others ∼ IG((mn=2 + #); h∗ () h∗ () + $); ( 1 |all others) ˙ (1 − 12 )n=2 &( 1 ; 1 ; 12 )I (| 1 | ¡ 1); ( 2 |all others) ˙ (1 − 22 )m=2 &( 2 ; 2 ; 22 )I (| 1 | ¡ 1); where X ∗ = H  X;

Y ∗ = H  Y;

12 = (S11 =2 + 1=

2 −1 1) ;

1 = 12 [(S12 =2 ) + (1= 22 = (S21 =2 + 1=

2 −1 2) ;

2 = 22 [(S22 =2 ) + (1= S11 =

m− 1

i=2



2 0 1 ) 1 ];



2 0 2 ) 2 ];

zi: () H2 H2 zi: ();

S12 =

m

i=2

zi: () H2 H2 zi−1: ();

392

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

S21 =

n− 1

z:j () H1 H1 z:j ();

S22 =

j=2

n

z:j () H1 H1 z:j−1 ();

j=2

zi: () = [zi1 (); : : : ; zin ()] ;

z:j () = [z1j (); : : : ; zmj ()] ;

zij () = yij − xij , and &(x; a; b) is the density function of N(a; b) at x and ( i |all others) is the full posterior density function of i given all the other values. When noninformative priors () = 1, (2 ) = 1=2 , ( 1 ) = I (| 1 | ¡ 1), ( 2 ) = I (| 2 | ¡ 1) are chosen then the full conditional posterior distributions are obtained by letting A−1 = # = $ = 1= 1 = 1= 2 = 0 in the above formulas. We also need to :nd full conditional posterior distributions of the missing observations yij ; (i; j) ∈ M . Letting *11 = (1 − 12 )(1 − 22 );

*i1 = (1 − 12 ); i ¿ 2;

*1j = (1 − 22 );

*ij = 1; i; j ¿ 2;

zij () = 0

j ¿ 2;

if i ¡ 1; j ¡ 1;

i ¿ m; or j ¿ n;

it can be shown that yij |all others ∼ N( ij ; 2 = vij );

if (i; j) ∈ M;

where ij = Xij  + vij−1 [*ij { 1 zi−1; j () + 2 zi; j−1 () − 1 2 zi−1; j−1 ()} + *i+1; j 1 {zi+1; j () − 2 zi+1; j−1 () + 1 2 zi; j−1 ()} + *i; j+1 2 {zi; j+1 () − 1 zi−1; j+1 () + 1 2 zi−1; j ()} − *i+1; j+1 1 2 {zi+1; j+1 () − 2 zi; j+1 () − 2 zi+1; j ()}]; vij = (*ij + *i+1; j 12 + *i; j+1 22 + *i+1; j+1 12 22 ): The full conditional posterior distributions of , yij , (i, j) ∈ M , and 2 are given in standard forms and random number generation from these distributions is straightforward. On the other hand, the full conditional posterior distributions of 1 and 2 are not given in standard forms. Thus, we suggest a Metropolis–Hastings step described in Hastings (1970) for sample generations of 1 and 2 . Speci:cally, given the current sample value iC of i , a new sample iN can be obtained as follows: (i) generate a candidate sample i∗ from a distribution having density g( i ),  ∗

i with probability min{1; w( i∗ )=w( iC )}; N (ii) let i =

iC with probability 1 − min{1; w( i∗ )=w( iC )}; where w( i ) = ( i |all others)=g( i ). Note that the normalizing constant of ( i |all others) is not required since the ratio of w’s is used. E5ciency of the algorithm depends on the sample generating density g( i ). Desirable properties of the sample generating density g( i ) are that it is a good approximation to the target density ( i |all others) and that sample generation from

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

393

the corresponding distribution is simple. In our model, if | 1 | is not large and n is not small, ( 1 |all others) ˙ (1 − 12 )n=2 &( 1 ; 1 ; 12 )I (| 1 | ¡ 1) ≈ exp[ − n=2 12 ]&( 1 ; 1 ; 12 )I (| 1 | ¡ 1) ˙ &( 1 ; 0; 1=n)&( 1 ; 1 ; 12 )I (| 1 | ¡ 1) ˙ &( 1 ; 1 ={n12 + 1}; 12 ={n12 + 1})I (| 1 | ¡ 1): Thus, &( 1 ; 1 ={n12 +1}; 12 ={n12 +1})I (| 1 | ¡ 1) would be a good sample generating density g( 1 ) for 1 . Similarly, &( 2 ; 2 ={m22 + 1}; 22 ={m22 + 1})I (| 2 | ¡ 1) would be a good sample generating density g( 2 ) for 2 . For sample generation from the restricted univariate normal distribution, there are many e5cient algorithms such as the inverse cdf method (Devroye, 1986) and the mixed rejection algorithm (Geweke, 1991). Starting with initial values of (; 2 ; ; Ymis ), the Gibbs sampler generates samples iteratively from the full conditional posterior distributions given above, and samples taken after burn-in can be used for parameter estimation and missing value prediction. It should be noted that samples from MCMC methods may not be independent and samples taken from consecutive iterations may be autocorrelated. To get independent samples one needs to take samples from every ‘th iteration where ‘ is the lag size for which the autocorrelation becomes ignorable. 2 Let ((k) ; (k) ; (k) ; Ymis(k) ) be the kth sample value of (; 2 ; ; Ymis ) obtained from the MCMC algorithm, then the posterior mean of  can be simply estimated by K 1 ˆ (k) : ˆ = E(|Y obs ) = K k=1

In some cases, some improvement of this estimator can be achieved by averaging 2 conditional expectations of  given (k) , (k) , Ymis(k) , Yobs ; k = 1; : : : ; K, see Gelfand and Smith (1990). The posterior variances and the standard errors of estimates can also be obtained. For instance, when the samples are independent the posterior variance and standard deviation of  can be estimated, respectively, by K

1 2 2  ˆ Var(|Y (k) − (E(|Y obs ) = obs )) : K k=1  ˆ ˆ   Var(|Y and SD(|Y obs ) = obs ). In addition, the standard error SE() of  is es√ ˆ = SD(|Y  )  timated by SE( obs )= K. When serial correlations of the samples taken from consecutive iterations are not negligible, one can construct a better estimate  of Var(|Y obs ) taking advantage of serial correlations in (k) , see Geweke (1992). With these estimates, one can form an approximate 100(1 − )% Bayesian credible interval for ˆi as  ˆ ): ˆi ± z =2 SE( i

394

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

As starting values for the Gibbs sampler, an initial value of  can be chosen O  X )−1 (X  Y ), the ordinary least squares estimator based on the obas ˆ = (Xobs obs obs obs served values, where Xobs is a submatrix of X corresponding to Yobs . An initial value of 2 can be chosen as ˆO2 , the variance estimator in the ordinary least squares reO O O O gression of Yobs on Xobs . By regressing zij (ˆ ) on zi−1; j (ˆ ); zi; j−1 (ˆ ); zi−1; j−1 (ˆ ) for (i; j) such that (yij ; yi−1; j ; yi; j−1 ; yi−1; j−1 ) are available, we get estimates of 1 and 2 and can use these as initial values of 1 and 2 . Finally, initial values of yij ; (i; j) ∈ M , can be chosen as yˆ ij0 = xij ˆ0 :

3. An example We apply the proposed Bayesian method in analyzing a real data set obtained from the East Sea which is located between Korea and Japan. In constructing contour plots of seaJoor, acoustic remote sensing with multibeam echo-sounders are widely used (Moustier and Matsumoto, 1993). In order to construct shape images of seaJoor of certain regions of the East Sea, the Korean Ocean Research & Development Institute collected data using the acoustic remote sensing system. The multibeam echo-sounder has 121 beams located on a long bar crossing the ship. A ship (Onnuriho) equipped with the echo-sounder moves and each beam generates a sound pulse at regular time intervals. The sound signals, after reaching the seaJoor, echo back to the sensor located in the ship. The center beam (beam #61) is oriented directly toward the sea bottom. The beam designated #60 (#62) ◦ is slanted to the right (left) by 1 and so on, so that beam #1 (beam # 121) is ◦ slanted by 60 to the right (left). Therefore, if the seaJoor is Jat, the signal emitted from beam #61 has the largest amplitude received because it travels the shortest distance. In addition, signals emitted from beams #1 and # 121 have the weakest amplitude received because they travel the longest distance. The amplitudes of the echoes are used to form an acoustic image of the sea bottom. Amplitudes of sound signals are aBected by two factors, the depth of the seabottom and the angle orientations of the beams. In order to use as a benchmark to adjust the angle orientation eBect, the Onnuriho collected amplitude data from a place known to be Jat. The Onnuriho moved along a straight line in a Jat area and gathered 394 sets of 121 amplitudes from each beam at regular time intervals. Fig. 1 shows a surface plot of the 121 × 394 amplitude (dB) data. The original data were transformed by changing their signs for a better view. Thus, larger values correspond to weaker amplitudes. Due to malfunction of the sensor located in the ship, some of the data are missing. In this case, malfunction of the sensor (missing) is not related to the shape of the seaJoor (response), hence the missing observations can be considered to be missing at random in the sense of Rubin (1976). The total number of missing values is 5546 so that the percent of missing values is 5546=(121 × 394) = 11:6%. The missing data are coded by zeros and are revealed in Fig. 1 as long lines dropping to the x–y plane.

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

395

Fig. 1. Surface plot of amplitude: lines dropping to the x–y plane correspond to missing values.

is

One of our goals is to estimate the mean response for each beam. Thus, our model yij = i + zij ; zij = 1 zi−1; j + 2 zi; j−1 − 1 2 zi−1; j−1 + ij ;

i = 1; : : : ; 121;

j = 1; : : : ; 394;

where i represents the mean response for beam i. Another goal is to predict the values of missing observations, which can be used in constructing a contour plot of the seaJoor surface. Since we do not have any prior information, we choose a noninformative prior (; 2 ; ) = ()(2 )( ) = 1=2 I (| 1 | ¡ 1; | 2 | ¡ 1) for Bayesian inference on the model and follow the algorithm described in Section 2. For sensitivity analysis to the prior choice, we tried some other vague priors of the conjugate forms described in Section 2, but they all gave similar results. Sample generations are done in the sequence of 1 ; 2 ; 2 ; , and Ymis . For initialization of the Gibbs sampler, the ordinary least square estimates O and O2 are used for initial values of  and 2 , respectively. For simplicity, 1 = 2 = 0, and yij = iO ; (i; j) ∈ M are chosen as initial values of and the missing values. We observe that convergence of the Gibbs sampler is very fast and burn-in size of around 10 or 20 is large enough (see Fig. 2). We investigate the autocorrelation function (ACF) of the series of parameter samples and observe that the ACF become negligible at lag size of 5 or larger (see Fig. 3). Thus, we choose a burn-in size of 20 and collect 500 samples from every 5th iteration after the burn-in to obtain independent samples, and make all posterior inference based on these samples. We use FORTRAN subroutines RNMVN, RNGAM, and RNUN in IMSL (1989) to generate multivariate normal, gamma, and uniform samples, respectively. The inverse cdf method is used for the candidate sample generation of in the Metropolis algorithm by using the IMSL subroutine DFNOR for the normal cdf.

396

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

Fig. 2. Time sequence plots of parameters.

The estimated values of and 2 are

ˆ1 = 0:5528 (0:1866 × 10−3 );

ˆ2 = 0:1175 (0:2159 × 10−3 );

ˆ2 = 0:5840 (0:1874 × 10−3 ); where the numbers in the parentheses are the standard errors of the estimates. We see that both of the spatial parameters 1 and 2 are highly signi:cant. We also observe ˆ1 is greater than ˆ2 , implying that the correlation of error in the direction of port is stronger than that in the direction of ship moving. ˆ i ); i = 1; : : : ; 121. The general Fig. 4 shows the estimated mean amplitudes ˆi = E( ˆ shape of i is bell-shaped and is almost symmetric about i = 61. We also calculate approximate 95% and 99.9% credible intervals for the mean amplitudes, ˆi ±1:96SE(ˆi )

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

397

Fig. 3. Autocorrelation coe5cients of parameters.

and ˆi ± 3SE(ˆi ), respectively. However, the credible bands are visibly indistinguish ˆ )’s are very small compared with able from the plot of ˆi in Fig. 4 because SE( i the range of ˆi . Thus, to investigate the shape of the credible intervals for dif i ) as dotted lines in Fig. 4. Note that ferent beams, we superimpose ˆi ± 3SD( √ ˆ    i ) = K SE(  ˆ ); K = 500. Therefore, SD(i ) is proportional to SE(i ), that is SD( i the dotted-line-band in √ Fig. 4 can be thought of as a magni:cation of the credible band by the factor of K. From Fig. 4, we observe that the lengths between the

398

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

 i ) lines. Fig. 4. Mean amplitudes (ˆi ) and ˆi ± 3SD(

Fig. 5. Surface plot of the predicted missing values.

two dotted lines are shorter for i around 61 than those for i close to 1 or 121. This implies that data obtained from center beams are more reliable than those from marginal beams. A surface plot of the predicted missing values is provided in Fig. 5. It reveals a close resemblance to the surface plot of the observed values in Fig. 1. Thus, we

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

399

would be able to get a better surface plot, if we replace the missing values in Fig. 1 by the predicted ones.

4. Discussion In this paper, we have proposed a Bayesian method for two-dimensional spatial data with spatially correlated errors and missing observations, by using a MCMC algorithm. A real seaJoor acoustic sensing data set has been analyzed to illustrate the method, and it was found that the method provided reasonable parameter estimation and missing value predictions. Key advantages of the proposed method are: it provides a systematic method for general regression models with general multiplicative spatial AR(1) errors; it handles missing responses very easily via MCMC without serious di5culties, which contrasts with classical maximum likelihood estimation or method of moment estimation; the proposed MCMC is so simple and e5cient that it could be easily implemented even by non-experts. The proposed algorithm can also be used in the E-step of the EM algorithm, in which one needs the expectation of the log-likelihood with respect to the conditional distribution of the missing observations Ymis , given parameter values provided in the previous M-step. With other parameter values :xed, iterative generation of yij for {i; j} ∈ M by using the proposed algorithm may be used to estimate the above conditional expectation. Non-conjugate priors may be used for the parameters. However, use of nonconjugate priors prohibits direct implementation of the Gibbs sampling algorithm and hence one would need to use a Metropolis–Hastings algorithm for each parameter, which requires some tailoring for e5cient application. An alternative to a nonconjugate prior would be a mixture of conjugate priors with appropriate mixing weights and component parameters. A mixture of conjugate priors yields simple full conditional posterior density functions while allowing great Jexibility in the prior distributions. However, it increases the number of parameters in the models and hence there should be a compromise between variability and simplicity of MCMC. We note that our method is limited to models with normally distributed errors and multiplicative spatial AR(1) errors. Also, our method is not applicable to models having missing values in covariates. Extensions to nonnormal models, general high-order spatial error models, or models with missing values in covariates would be challenging work. Due to the simplicity of the Bayesian MCMC method, such extensions would be much more tractable than classical methods based on large sample theory.

Acknowledgements The authors are very grateful for the valuable comments of the three referees and the previous Editor, Dr. Naeve. Also, we are indebted to Professor Paul Green for

400

M.-S. Oh et al. / Computational Statistics & Data Analysis 39 (2002) 387–400

the careful proofreading of this paper. The :rst author was supported by 1997 Ewha Womans University Research Fund. References Alonso, F.J., Angulo, J.M., Bueso, M.C., 1991. Application of EM-type algorithms to spatial data. Comm. Statist. Theory Methods 26, 669– 683. Basu, S., Reinsel, G.C., 1994. Regression models with spatially correlated errors. J. Amer. Statist. Assoc. 89, 88–99. Besag, J.E., Green, P.J., 1993. Spatial statistics and Bayesian computation with discussions. J. Roy. Statist. Soc. 55, 25–37. Besag, J.E., Kempton, R., 1986. Statistical analysis of :eld experiments using neighboring plots. Biometrics 42, 231–251. Cressie, N., 1991. Statistics for Spatial Data. Wiley, New York. Cullis, B.R., Gleeson, A.C., 1991. Spatial analysis of :eld experiments — an extension to two dimensions. Biometrics 47, 1449–1460. Devroye, M., 1986. Non-Uniform Random Generation. Springer, New York. Gelfand, A.E., Smith, A.F.M., 1990. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85, 398– 409. Geweke, M., 1991. E5cient simulation from the multivariate normal and student t-distributions subject to linear constraints. In: E.M. Keramidas (Ed.), Computing Science and Statistics, Proceedings of the 23rd Symposium on the Interface. Interface Foundation of north America. Fairfax Station, USA. Geweke, M., 1992. Evaluating the accuracy of sampling-based approaches to the calculation of the posterior moments. In: Bayesian Statistics, vol. 4, Oxford University Press, Oxford, pp. 169 –194. Green, P.J., 1985. Linear models for :eld trials. Smoothing and cross validation. Biometrika 72, 527–537. Gri5th, D.A., 1988. Advanced Spatial Statistics. Spatial Topics in the Exploration of Quantitative Spatial Data Series. Kluwer Academic Publishers, Dordrecht. Hastings, W.K., 1970. Monte Carlo sampling methods using markov chains and their applications. Biometrika 57, 97–109. IMSL, 1989. Fortran Subroutines for Statistical Analysis. IMSL, Inc., Houston, USA. Martin, R.J., 1979. A subclass of lattice processes applied to a problem in planar sampling. Biometrika 66, 209–217. Martin, R.J., 1990. The use of time series models and methods in the analysis of agricultural :eld trials. Comm. Statist. Theory Method 19, 55–81. McCulloch, R.E., Tsay, R.S., 1994. Bayesian analysis of autoregressive time series via the Gibbs sampler. J. Time Series Anal. 15, 235–250. Moustier, C.D., Matsumoto, H., 1993. SeaJoor acoustic remote sensing with multibeam echo-sounders and bathymetric sidescan sonar systems. Marine Geophys. Res. 15, 27– 42. Rubin, D.B., 1976. Inference and missing data. Biometrika 63, 581–592. Shin, D.W., Sarkar, S., 1994. Parameter estimation in regression models with autocorrelated errors using irregular data. Comm. Statist. Theory Methods 23, 3567–3580. Shin, D.W., Sarkar, S., 1995. Estimation for stationary AR(1) models with nonconsecutively observed samples. Sankhya Ser. A. 57, 287–298. StoBer, D.S., 1986. Estimation and identi:cation of space-time ARMAX models in the presence of missing data. J. Amer. Statist. Assoc. 81, 762–772. Wincek, M.A., Reinsel, G.C., 1986. An exact maximum likelihood estimation procedure for regression ARMA time series models with possibly nonconsecutive data. J. Roy. Statist. Soc. 48, 303–313. Zimmerman, D.L., Harville, D.A., 1991. A random :eld approach to the analysis of :eld-plot experiments and other spatial experiments. Biometrics 47, 223–239.