Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058 www.elsevier.com/locate/jspi
Spatially varying dynamic coefficient models Marina S. Paez∗ , Dani Gamerman, Flávia M.P.F. Landim, Esther Salazar Departamento de Métodos Estatísticos, Instituto de Matemática, Universidade Federal do Rio de Janeiro, Brazil Received 19 October 2005; received in revised form 13 March 2007; accepted 15 March 2007 Available online 10 July 2007
Abstract In this work we present a flexible class of linear models to treat observations made in discrete time and continuous space, where the regression coefficients vary smoothly in time and space. This kind of model is particularly appealing in situations where the effect of one or more explanatory processes on the response present substantial heterogeneity in both dimensions. We describe how to perform inference for this class of models and also how to perform forecasting in time and interpolation in space, using simulation techniques. The performance of the algorithm to estimate the parameters of the model and to perform prediction in time is investigated with simulated data sets. The proposed methodology is used to model pollution levels in the Northeast of the United States. © 2007 Elsevier B.V. All rights reserved. Keywords: Bayesian statistics; Monte Carlo methods; Predictive density; Spatial interpolation
1. Introduction Modeling environmental data sets has been the subject of several researches in Statistics in the last decades. Particularly, space–time models have been used to deal with this kind of data as a natural consequence, considering the fact that environmental processes have in general smooth transitions in time and space. In this paper our aim is to propose a flexible class of models for environmental processes observed in discrete time and continuous space, allowing smooth space–time variation of the effects of the covariates in the response process. The models are formulated with dynamic transitions in time, leading to a class of dynamic models for spatio-temporal data. Dynamic models have been used several times to deal with spatio-temporal data under both classical and Bayesian approaches. Under a classical approach, this kind of model was proposed by Huang and Cressie (1996), Wikle and Cressie (1999) and Mardia et al. (1998), among others, and under a Bayesian approach they were proposed by Sansó and Guenni (1999), Stroud et al. (2001), Huerta et al. (2004) among others. Quintana (1987) proposed an extension of the dynamic linear models to the case of multivariate responses and multiple measurements in time, written in a matrix form, where the observational and evolutional covariance matrices are defined by common scale factors. Gamerman and Migon (1993) proposed hierarchical dynamic models for univariate responses, also restricting the variances in the model to a common scale factor. Combining both approaches, Landim and Gamerman (2000) presented a class of hierarchical dynamic models for matrix-variate observations. ∗ Corresponding author.
E-mail address:
[email protected] (M.S. Paez). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.03.060
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1039
In this paper we explore the same idea of Landim and Gamerman (2000), but imposing a parametric structure to account for the spatial correlation between observations made in different locations in space. With that restriction, the spatial correlation can be captured without the need of estimating completely unknown covariance matrices. Instead of the specification of a completely unstructured covariance matrix as Landim and Gamerman (2000), we specify a spatially structured matrix with a small number of parameters. That way, if there are N monitoring stations, the number of parameters needed for the estimation of the matrix decays from N(N + 1)/2 to only a few (typically two—one for range and one for smoothness), which can be a great improvement for large values of N. As a consequence, the models proposed here allow the interpolation of the response variable to any set of non-observed locations in space for any fixed period of time, taking into account the spatial variation captured from the data. Gelfand et al. (2005) use a similar approach to deal with this same kind of problem, where the temporal variance is described through dynamic components which capture the association between measurements made for a fixed location in space and period of time, as well as the dependence between space and time themselves. The multivariate space processes described in that paper were developed through the use of corregionalization techniques and non-separable space–time covariance structures. Here we propose a different spatial structure for model components that is easier to handle computationally. Also, in our application, when dealing with multivariate response models, both intercepts and regression coefficients are allowed to vary in space, while in Gelfand et al. (2005) only the intercept varies in this dimension. Other researches in this area, accounting for the temporal variation only were developed by Cargnoni et al. (1997) and Aguilar and West (1998) and accounting for spatial variation only were developed by Gelfand et al. (2003). They also offer extensions to generalized linear models. We work under a fully Bayesian approach to inference, such that both time forecasting and space interpolation can be performed naturally, based in the probabilistic description of the model. Inference is performed through the estimation of the posterior distribution of the model parameters using MCMC (Markov chain Monte Carlo) methods (see Gamerman and Lopes, 2006 for a review). This paper is organized as follows. The model is described in Section 2. In Section 3, some results of inference for the unknown parameters of the model are presented (with details given in Appendix A), as well as the algorithm used to generate samples from their posterior distributions. Sections 4 and 5 show how to perform forecasting and spatial interpolation, respectively, under the proposed model. In Section 6 the model is applied to a pollutants data set collected in the Northeast of the United States, and finally the conclusions are presented in Section 7. 2. Model description In this section we describe a class of space–time models, firstly to model univariate responses and secondly to model multivariate responses. The model can be described in a matrix-variate notation, which has the advantage, specially in the case of multivariate responses, of having a more compact form. 2.1. Univariate response model Consider a set of discrete periods of time, t = 1, . . . , T , where for each t a random process Yt (·) is observed in a set of locations in space S = {s1 , . . . , sN }. Let Xt (s) be a p-dimensional vector of covariates, observed in time t and location s. Suppose Yt (·) can be modeled through a regression equation, where the effects of the covariates vary smoothly along time and space as specified below: Yt (s) = Xt (s)1,t (s) + v1t (s), 1,t (s) = 2,t + v2t (s), 2,t = 2,t−1 + wt ,
ind
v1t (·) ∼ N(0, V1 ), ind
v2t (·) ∼ f (v2t ), ind
wt ∼ N(0, W ),
(1)
for t = 1, . . . , T and s = s1 , . . . , sN . N(, 2 ) denotes the normal distribution with mean and variance 2 . Vectors 1,t (s) and 2,t have dimension p and are independent, respectively, of v1t (s) and v2t (s) by hypothesis, as well as 2,t−1 and wt are independent. f (v2t ) denotes the distribution of the errors v2t which depend upon parameters . Suppose that this distribution defines a spatial correlation structure for these errors, and consequently for 1,t (·). That way,
1040
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
supposing no spatial structure in V1 , the spatial correlation presented in the response variable is given only through the regression parameters 1,t . Many definitions for f (v2t ) are possible, including the three standard types of Gaussian process models for multivariate spatial data which are presented below: • f (v2t ) = N(0, V (, ·)). In this specification, the matrix V defines the covariance between coefficients which correspond, for example, to different covariates. These coefficients share the same spatial correlation defined through (, ·). p • f (v2t ) = A l=1 N(0, l (l , ·)), = (1 , . . . , p ). This case differs from the previous one as it mixes different Gaussian processes. This model is known in the literature as corregionalization linear model (Wackernagel, 1995; Gelfand et al., 2003). p • f (v2t ) = l=1 N(0, l (l , ·)), = (1 , . . . , p ). This permits that different coefficients follow different Gaussian processes independently. In all the above cases, the correlation function can be defined by any function that is able to capture the spatial correlation structure given in the data. Working under the hypothesis of isotropy, this function typically depends on dij , the distance between locations si and sj , and parameters . Another way of describing this model is through a matrix-variate notation, given by ind
Yt = F1t 1,t + v1t ,
v1t ∼ N(0, V1 ),
1,t = F2t 2,t + v2t ,
v2t ∼ f (v2t ),
ind
2,t = Gt 2,t−1 + wt ,
ind
wt ∼ N(0, W ),
(2)
for t = 1, . . . , T . 1,t is a Np -dimensional vector, 2,t is a p-dimensional vector, the matrix F1t is N × Np , F2t is Np × p and Gt is p × p. The matrices F1t , F2t and Gt are assumed known, with F1t and F2t incorporating covariate values. In (1), F1t =diag(Xt (s1 ), . . . , Xt (sN )), F2t = 1N ⊗ Ip and Gt = Ip (where Ip denotes an identity matrix of order p, and 1N denotes a column vector of size N with all the elements equal to 1), but more general forms can be considered. The distribution of v2t , fv2t (), is specified through one of the three cases presented previously, for example. Here we consider the case where the spatial correlation presented in the response variables is given only through the regression parameters 1,t , specifying V1 = 2v1t IN . The model is completed with independent inverted Wishart (Dawid, 1981) prior distributions for the covariance matrix W , inverted gamma prior distribution for 2v1t , normal prior distribution for 2,0 and the prior distributions for the parameters . 2.1.1. Example 1 In this example consider a response variable Y observed in N = 3 locations in space and T = 10 periods of time. Suppose that for each location si , i = 1, 2, 3 and time t = 1, . . . , T , a single covariate X is also observed. The regression equation which defines the relation between Y and X is given by Yt (si ) = t (si ) + t (si )Xt (si ) + v1t (si ),
i = 1, 2, 3, t = 1, . . . , T ,
where t (si ) and t (si ) are regression parameters that vary in time and space, and v1t (s1 ), v1t (s2 ), and v1t (s3 ) are independent for a fixed t, so that V1 = 21 I3 . According to model (2), F1t are 3 × 6 matrices defined by ⎡
1 ⎣ F1t = 0 0
Xt (s1 ) 0 0 1 0
0
0
0
⎤
Xt (s2 ) 0 0 1
0
⎦,
0
Xt (s3 )
t = 1, . . . , T
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1041
and 1,t , which is the collection of regression parameters in the first level of the hierarchy, has dimension 6, and can be written as ⎡ ⎤ t (s1 ) ⎢ (s ) ⎥ ⎢ t 1 ⎥ ⎢ ⎥ ⎢ t (s2 ) ⎥ ⎢ ⎥ , t = 1, . . . , T . 1,t = ⎢ ⎥ ⎢ t (s2 ) ⎥ ⎢ ⎥ ⎣ t (s3 ) ⎦ t (s3 ) Suppose that for a fixed period of time t the parameters t (s1 ), t (s2 ), and t (s3 ) have the same mean t , and the parameters t (s1 ), t (s2 ), and t (s3 ) have the same mean t . The vector 2,t and the matrix F2t can therefore be defined as ⎡ ⎤ 1 0 ⎢0 1⎥ ⎢ ⎥ ⎢ ⎥
⎢1 0⎥ t ⎢ ⎥ , t = 1, . . . , T . 2,t = , F2t = ⎢ ⎥
t ⎢0 1⎥ ⎢ ⎥ ⎣1 0⎦ 0
1
So, they lead to the equations: ()
t (si ) = t + v2t (si ), ()
t (si ) = t + v2t (si ),
i = 1, 2, 3.
Suppose that (t (s1 ), t (s2 ), t (s3 )) are spatially correlated, and so are (t (s1 ), t (s2 ), t (s3 )), and both sets of parameters t and t share the same correlation matrix C. Suppose also that the correlation between parameters t and t in any fixed location is given by V , and does not depend on this location. This way, the correlation matrix of (1,t |2,t ) is separable and it can be written as C ⊗ V . This corresponds to f (v2t ) = N(0, V (, ·)), where C is the matrix specified through the correlation function (, ·). Note that the covariance matrices of t and t are not the same, unlike their correlation matrices. To define the last level of the hierarchy, suppose that 2,t is a random walk, such that 2,t = 2,t−1 + wt , with W being the covariance matrix of wt , which defines Gt = IN2 . 2.2. Multivariate response model Suppose we do not observe only one but q (q > 1) response variables in a discrete set of periods of time t = 1, . . . , T and set of locations in the continuous space S = {s1 , . . . , sN }. In the matrix notation, after imposing a restriction of separability in the covariance matrices, we are led to the following model: Yt = F1t 1,t + v1t ,
ind
v1t ∼ N(0, V1 , ),
1,t = F2t 2,t + v2t , 2,t = Gt 2,t−1 + wt ,
ind
v2t ∼ N(0, V2 , ), ind
wt ∼ N(0, W, ),
(3)
for t = 1, . . . , T , where N(M, C, W ) denotes a matrix-variate normal distribution, with mean M, left covariance matrix C and right covariance matrix (see Landim, 1998 for details). Note that the imposed restriction states that part of the conditional covariance between elements of Yt , which account for the different responses, is specified in . The same restriction is imposed to the conditional covariances of 1,t and 2,t , with all sharing the same structure to account
1042
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
for the covariance between elements coming from different responses. This assumption is usually made in multivariate dynamic models (see, for example, West and Harrison, 1997, Chapter 16; Quintana, 1987). It is reasonable since the magnitude of the covariance between elements at different hierarchical levels are not imposed to be the same. The models above could be specified and operated without this restriction but that would substantially increase parameter dimensionality. The matrix of regression parameters 1,t is Np × q, and it can be written as a collection of blocks 1,t = [1,1,t , . . . , N,1,t ] , where j,1,t is p × q such that Np = N × q. 2,t is p × q, F1t is N × Np , F2t is Np × q, and Gt is p × p. Assume that the matrices F1t , F2t , and Gt are known, with F1t and F2t possibly incorporating covariate values. We do not impose spatial structure in v1t , for t = 1, . . . , T , and define V1 = IN . The variance matrix V2 is specified through parametric structures describing spatial dependency. In particular, we define V2 = C ⊗ V which corresponds to v2t ∼ N(0, V (, ·), ), where C is the matrix specified through the correlation function (, ·). The model is completed with independent inverted Wishart prior distributions for the covariance matrices , W , and V , normal matrix-variate prior distribution for (2,0 | ) and the prior distributions for the parameters which define the spatial correlation matrix C. 2.2.1. Example 2 Consider now a model for q = 2 response variables Y1 and Y2 , and one covariate X, observed in N = 3 monitoring stations and T = 10 periods of time. Suppose also that a second variable Zt (si ) was observed for each i = 1, 2, 3 and t = 1, . . . , T , helping to explain the variability of the regression parameters 1,t in the first level of the hierarchy. For a given period of time t, and location in space si , define each response through the regression equation Yj,t (si ) = j,t (si ) + j,t (si )Xj,t + vj 1t (si ), j = 1, 2, where j,t (si ) and j,t (si ) are the regression coefficients which vary in time and space, for each response variable. This way, for each response there are two regression parameters (p = 2). Suppose now, as in Example 1, that for a fixed period of time t, and response variable j, the errors vj 1t (si ), i = 1, 2, 3 are independent, such that V1 = I3 . For fixed t and si , the covariance between v11t (si ) and v21t (si ) is given by . This way, the first hierarchical level of the model is defined as in (3), where the matrices F1t and 1,t are given by ⎡ ⎤ 1,t (s1 ) 2,t (s1 ) ⎢ (s ) (s ) ⎥ ⎢ 1,t 1 ⎡ ⎤ 2,t 1 ⎥ 0 0 0 1 Xt (s1 ) 0 ⎢ ⎥ ⎢ 1,t (s2 ) 2,t (s2 ) ⎥ ⎢ ⎥. ⎣ ⎦ F1t = 0 , 1,t = ⎢ 0 1 Xt (s2 ) 0 0 ⎥ ⎢ 1,t (s2 ) 2,t (s2 ) ⎥ 0 0 0 0 1 Xt (s3 ) ⎢ ⎥ ⎣ 1,t (s3 ) 2,t (s3 ) ⎦ 1,t (s3 ) 2,t (s3 ) In the second hierarchical level, suppose that the variability of the parameters and , which define the matrix 1,t , can be explained by a covariate Zt (si ), i = 1, 2, 3, t = 1, . . . , 10, such that: ()
j,t (si ) = j,t + ∗j,t Zt (si ) + vj 2t (si ), ()
j,t (si ) = j,t + ∗j,t Zt (si ) + vj 2t (si ), That way, F2t and 2,t are defined by ⎡ ⎤ 0 1 0 Zt (s1 ) ⎢0 1 0 Zt (s1 ) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 0 Zt (s2 ) 0 ⎥ ⎢ ⎥, F2t = ⎢ 0 Zt (s2 ) ⎥ ⎢0 1 ⎥ ⎢ ⎥ ⎣ 1 0 Zt (s3 ) 0 ⎦ 0
1
0
i = 1, . . . , 3, j = 1, 2, t = 1, . . . , 10.
⎡
1,t
⎢
⎢ 2,t = ⎢ ∗1,t ⎣ 1,t
∗1,t
⎤ 2,t
2,t ⎥ ⎥ ⎥. ∗2,t ⎦
∗2,t
Zt (s3 )
Like in Example 1, suppose that the covariance matrix of (1,t |2,t ) is separable and can be written as V2 =C ⊗V . The matrix C defines the spatial correlation structure in the model, and matrix V defines the correlation structure between
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1043
the collections of parameters and . For fixed t and si , the correlation matrix between parameters , as well as between parameters , is given by the 2 × 2 matrix . For the evolution of 2,t we specify Gt = I2 , as in Example 1, defining the equation 2,t = 2,t−1 + wt , and suppose that for a given t, wt | , W ∼ N(0, W, ). For a fixed response j and time t, W corresponds to the correlation matrix between the elements of column j in matrix 2,t : j,t , j,t , ∗j,t , ∗j,t . is the matrix that defines correlations between the elements of each of the following pairs of parameters: 1,t and 2,t ; 1,t and 2,t ; ∗1,t and ∗2,t ; and ∗1,t and ∗2,t . It is also important to notice that defines the correlation between the response variables in the first level of the hierarchical model in (3). 3. Inference and computational methods From now on let us consider the most general model, the one with multivariate responses given in (3), under the restrictions made in Section 2.2 that V1 = IN and V2 = C ⊗ V . Similar results can be obtained for model (2). Under model (3), the unknown quantities which must be estimated are , V , W , 1,t , for t = 1, . . . , T , 2,t , for t = 0, . . . , T , and the set of parameters of the spatial correlation function which defines the matrix C. Consider here the prior distributions proposed in Section 2.2, with the specification of the prior distribution for discussed later on in this text, as it depends on each particular definition of C. The prior distributions for the other parameters (rather than ) are given by 2,0 | ∼ N(M2 , C2 , ), W ∼ IW(TW 0 , SW 0 ),
∼ IW(T0 , S0 ),
V ∼ IW(TV 0 , SV 0 ),
(4)
where IW(T0 , S0 ) denotes the inverted Wishart distribution with mean S0 /(T0 − 2), T0 > 2. Define Y = {Y1 , . . . , YT }, {1 } = {1,1 , . . . , 1,T }, {2 } = {2,1 , . . . , 2,T }, = {, V , W, } and as the collection of all the parameters, such that = {{1 }, {2 }, 2,0 , }. The joint posterior distribution of the unknown parameters in model (3) is given by p( |Y ) ∝ p(2,0 | )p( )p(V )p(W )p()
T
p(Yt |1,t , )p(1,t |2,t , , V , )p(2,t |2,t−1 , , W ).
t=1
As this posterior distribution cannot be analytically solved, the parameters in are sampled through an efficient algorithm using Gibbs sampler, proposed by Landim and Gamerman (2000). {1 }, , V , and W are sampled through their full conditionals and (2,0 , {2 }) are sampled through FFBS (forward filtering backward sampling—Frühwirth-Schnatter, 1994; Carter and Kohn, 1994). For details on the full conditionals and the FFBS algorithm see Appendix A. Additional Metropolis–Hastings steps (Metropolis et al., 1953; Hastings, 1970) were included to sample from . The proposed algorithm to generate from the posterior distributions of the parameters is given below: Set initial values for all the parameters and do j = 1. Generate {1 }, , V and W from their full conditional distributions. Sample (2,0 , {2 }) through FFBS. Sample through Metropolis–Hastings steps, after specifying prior distributions and suitable proposals to generate from . 5. Set j = j + 1 and return to step 1 until convergence is reached.
1. 2. 3. 4.
After reaching convergence, the above algorithm is used to obtain samples from the posterior distributions of the unknown quantities. 4. Prediction Suppose that we are interested in the distribution of the response variable h steps ahead, given the past observations. This way, we are interested in obtaining samples from the distribution of (YT +h |Y ).
1044
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
The joint posterior distribution of YT +h , 1,T +h , 2,T +h and can be written as p(YT +h , 1,T +h , 2,T +h , |Y ) = p(YT +h , 1,T +h , 2,T +h |Y, )p( |Y ) = p(YT +h |1,T +h , 2,T +h , , Y )p(1,T +h |2,T +h , , Y )p(2,T +h | , Y )p( |Y ) = p(YT +h |1,T +h , )p(1,T +h |2,T +h , , V , )p(2,T +h |2,T , , W )p( |Y ). p( |Y ) can be sampled through the algorithm described in the previous section. The distributions of (YT +h |1,T +h , ) and (1,T +h |2,T +h , , V , ) are known, and p(2,T +h |2,T , , W ) can be easily found through the repeated use of equation 2,T = Gt 2,T −1 + wt , with wt |W, ∼ N(0, W, ). Using properties of the matrix-variate normal distribution we have T +h T +h T +h T +h 2,T +h |2,T , W, ∼ N Gt 2,T , W + Gt W Gt , . t=T +1
i=T +1
t=i
t=i
That way, to generate a sample from the posterior distribution of YT +h , it is needed to add one step to the algorithm presented in the previous section where 2,T +h is sampled from p(2,T +h |2,T , , W ), then 1,T +h is sampled from p(1,T +h |2,T +h , , V , ), and finally YT +h is sampled from p(YT +h |1,T +h , ). 5. Interpolation Suppose the data are observed in a set of regions S = {s1 , . . . , sN }, and now our interest holds in interpolating the response variables in other Nn locations, collected in the set Sn = {sN+1 , . . . , sN+Nn }, for a given period of time t. Let Yto be the N × q matrix of observed values, and let Ytn be the Nn × q matrix of values to be interpolated for a fixed o as a matrix with N rows corresponding to the observed values, o as its period of time t. In the same way define Ft1 1,t o o o n as a matrix with N rows corresponding to the noncoefficients and v1t =(v1t (s1 ), . . . , v1t (sN )) .Analogously, define Ft1 n n = (v n (s ), . . . , v n (s )) . Define as well v o = (v o (s ), . . . , v o (s )) observed values, n1,t as its coefficients, and v1t 1t 1 1t Nn 2t 2t 1 2t N n = (v n (s ), . . . , v n (s )) , and let be the collection of parameters = { , V , W, }. The extended model can and v2t 2t 1 2t Nn be written as o o o o o Yt F1t 1,t v1t v1t = + , ∼ N(0, IN+Nn , ), n n Ytn F1tn n1,t v1t v1t o o o o o 1t F2t v2t C C on v2t = , , V , ∼ N 0, ⊗ V , , + 2,t n Fn vn vn C no C n 1,t
2t
2,t = Gt 2,t−1 + wt ,
2t
2t
wt | , W ∼ N(0, W, ),
where C o , C no , C on and C n are partitions of C, obtained through the function which defines the spatial structure of 1,t . Under the hypothesis that F1to , F1tn , F2to , F2tn and Gt are known, the joint posterior distribution for Ytn , n1,t , o1,t ,2,t and is given by p(Ytn , n1,t , o1,t , 2,t , |Yto ) = p(Ytn , n1,t |o1,t , 2,t , , Yto )p(o1,t , 2,t , |Yto ) = p(Ytn |n1,t , , Yto )p(n1,t |o1,t , 2,t , )p(o1,t , 2,t , |Yto ),
(5)
which is obtained noting that given (n1,t , , Yto ), Ytn does not depend on o1,t and 2,t , and given (o1,t , 2,t , ), n1,t does not depend on Yto . To sample from this distribution we sample from each distribution in (5) separately. Firstly, a sample from P (o1,t , 2,t , |Yto ) is obtained through the algorithm described in the previous section, noting that o1,t , 2,t , are a subset of . Then a sample from the distribution of (n1,t |o1,t , n2 , ) is obtained using the result below, which is found using properties of the normal distribution: (n1,t |o1,t , n2 , ) ∼ N(M, H, ) where M = F2tn 2,t + (C no ⊗ V ) (C o ⊗ V )−1 (o1,t − F2to 2,t ) and H = (C n − C no (C o )−1 C on ) ⊗ V .
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1045
Finally, we sample from (Ytn |n1,t , , Yto ), using an analogous result: (Ytn |n1,t , , Yto ) ∼ N(M ∗ , H ∗ , ), where = F1tn n1,t and H ∗ = INn .
M∗
6. Application An application of the proposed model to a pollutant data set in the Northeast of the United States is presented in this section. The data is available by the Clean Air Status and Trends Network (CASTNet) at the web site www.epa.gov/castnet. CASTNet is the primary source for data on dry acidic deposition and rural ground-level ozone in the United States. Operating since 1987, it is used in conjunction with other national monitoring networks to provide information for evaluating the effectiveness of national emission control strategies. CASTNet consists of over 70 sites across the eastern and western United States and is cooperatively operated and funded with the National Park Service. In this application we focus on 24 monitoring stations in the Northeast of the country, presented in Fig. 1, to model two pollutants: SO2 and NO3 , which were measured in micrograms per cubic meter of air (g/m3 ). We work with weekly measurements made from the 1st of January 1998 to the 30th of December 2004, in a total of 342 periods of time. The data set, however, is not completely available, and about 4% of it is missing. The estimation of the missing data is easily incorporated into our algorithm, as under the Bayesian point of view any unknown quantity can be treated as a parameter. As a way of measuring the forecasting performance of the proposed model, we worked with 22 of the 24 stations to predict the other 2 stations (SPD and LRL) over time, and therefore compare predictions with observed values. We work with the logarithmic transformation of SO2 and NO3 as response variables, since these transformed variables are closer to the normal distribution according to preliminary analysis. For the prediction, each value sampled from Y (·) receives an exponential transformation, so that a sample of values in the original scale of the pollutants SO2 and NO3 are obtained. As an illustration, Fig. 2 shows log(SO2 ) and log(NO3 ) through time for five monitoring stations, showing a clear existence of a seasonal trend in log(SO2 ), and a possible seasonal trend in log(NO3 ). This seasonality must be taken in consideration for the proposed model to be able to do forecasting and spatial interpolation in an efficient way. Besides that, preliminary analysis show strong correlation between the transformed response variables SO2 and NO2 , estimated in 0.622. In this application we first estimate the model for a single response variable, in Section 6.1, and then using both responses log(SO2 ) and log(NO2 ), in Section 6.2. 6.1. Univariate model Here we consider models with a single response variable (Yt = log(SO2 )t and Yt = log(NO3 )t ). As explanatory variables, sine and cosine waves with periodicity of one year were incorporated in Ft . As we observe 52 weeks a year, we have that Ft (si ) = (1, sin(2t/52), cos(2t/52)) .
(6)
That way, 1,t (si ) is a vector with three elements given by 1,t (si ) = (1,t (si ), 2,t (si ), 3,t (si )) , and for every particular element of the matrix Yt , we have the following equation: Yt (si ) = 1,t (si ) + 2,t (si ) sin(2t/52) + 3,t (si ) cos(2t/52) + v1t (si ), for i = 1, . . . , 22 and t = 1, . . . , 342, and where Yt (si ) is the ith element of Yt . 1,t is defined by 1,t = F2t 2,t + v2t , where v2t ∼ N(0, C ⊗ V , ), and F2t is a 66 × 3 matrix given by F2t = 122 ⊗ I3 . Matrix C defines the correlation between the regression coefficients, which in this application includes the intercept and the sine and cosine waves. The temporal evolution of the parameters 2 is a random walk: 2,t = 2,t−1 + wt .
1046
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
44 CTH CAT
42
MKG KEF LYK
Latitude
SAL
40
OXF
DCP
PSU WSP
QAK
LRL +
CDR PAR
36
BEL SHN
CKT
MCK
38
ARE
VPI ESP
SPD +
PNF
-86
-84
-82
PED
34 -80
-78
-76
-74
Longitude
Fig. 1. Map of the 24 monitoring stations in the Northeast United States. The coordinates Latitude and Longitude are expressed in decimal degrees.
4 3 2 1 0 -1 0
100
200
300
200
300
LOG (SO2)
3 2 1 0 -1 0
100 LOG (NO3)
Fig. 2. Response variables log(SO2 ) and log(NO3 ) through time for five randomly selected monitoring stations.
The spatial correlation function which defines the matrix C was specified in the Matérn family (Matérn, 1986; Handcock and Stein, 1993), which is given by dij dij 1 , , > 0, (7) C(i, j ) = −1 K 2 () where K is the modified Bessel function of order . This is a flexible family, containing the exponential function ( = 0.5), and the squared exponential function, which is the limiting case when → ∞. is a range parameter that controls how fast the correlation decays with distance, and is a smoothness (or roughness) parameter that controls
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1047
geometrical properties of the random field. For the calculation of the distance between sites, due to the large size of the region of interest, the coordinates of the monitoring stations were converted from latitude–longitude (decimal degrees) to the universal transverse Mercator coordinates, which gives appropriate distances in kilometers. In this application 1 dij is measured in 1000 of kilometers. The original scale of decimal degrees is still preserved in the figures presented throughout this section. Zhang (2004) showed that scale and range parameters in the Matérn family cannot be estimated consistently. In our model, the scale parameters of the spatial structure are the matrices V and , and is the range parameter. Our main goal here, however, is to be able to produce consistent interpolation in space, which according to Zhang (2004) is possible when the smoothness parameter is fixed. We picked up one value for from a collection of possible values: = (0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0), which covers a wide range of smoothness, including the exponential family ( = 0.5). We chose to work with = 2.5 after comparing forecasting performances. The prior chosen for is based on the reference prior of Berger et al. (2001). They proposed the use of the reference prior for a data set consisting of N observations from a single realization of a random field. If this random field has a constant mean and variance, correlation matrix , with being the vector of parameters which define the correlation function (in our case, = ), the reference prior for is given by
1 (tr(W ))2 p() ∝ tr(W ) − (N − 1) 2
1/2 ,
−1 −1 −1 tr(A) denotes the trace of matrix A, W = ((j/j) ) −1 P and P = IN − 1N (1N 1N ) 1N . (j/j) denotes the matrix obtained by differentiating element by element. We apply this idea to our model as if v2t was an observed realization of a random field with constant variance. Finally, the specification of the prior distributions for the other unknown parameters of the model are non-informative given by
2,0 ∼ N(0, I4 ),
−5 −5 −2 v1t ∼ G(10 , 10 ),
W ∼ IW(5, I4 ),
V ∼ IW(5, I4 ),
∼ IW(5, I4 ).
Note that the first parameter in the inverted Wishart prior distributions was set to 5. This was due to Choleski decomposition problems that occurred when this parameter was set to smaller values in simulation exercises with these models (Paez et al., 2007). 0.4 1.8
0.2
1.6
0.0
1.4
-0.2 0
100
200
300
intercept
0
100
200
300
sine
0.8 0.6 0.4 0.2 0
100
200
300
cosine
Fig. 3. The 2.5%, 50%, and 97.5% percentiles of the sample from the posterior distribution of the elements of 2,t , t = 1, . . . , 342 under the univariate model for Yt = log(SO2 ).
1048
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
30 20 10 0 0
100
200
300
SPD - SO2
50 30 10 0 0
100
200
300
LRL - SO2
Fig. 4. SO2 levels (g/m3 ) observed at sites SPD and LRL, for t = 1, . . . , 342, and the 2.5%, 50%, and 97.5% percentiles of the sample from the posterior distribution of SO2 (g/m3 ) under the univariate model. The true trajectory is in red.
10 8 6 4 2 0 0
100
200
300
200
300
SPD - NO3
10 8 6 4 2 0 0
100 LRL - NO3
Fig. 5. NO3 levels (g/m3 ) observed at sites SPD and LRL, for t = 1, . . . , 342, and the 2.5%, 50%, and 97.5% percentiles of the sample from the posterior distribution of NO3 (g/m3 ) under the univariate model. The true trajectory is in red.
Samples from the posterior distributions of the parameters were obtained through the algorithm described in Section 3, with a truncated normal proposal to sample from in the Metropolis–Hastings algorithm. The algorithm was programmed in Ox version 3.30 (Doornik, 2002). The convergence was analyzed visually, through the comparison of two chains starting in different locations of the parametric space. The elements of 2 were shown to change significantly in time, as can be seen in Fig. 3, for the SO2 model. Predictions were obtained under both models: Yt = SO2 (Fig. 4) and Yt = NO3 (Fig. 5) for the sites SPD and LRL over time. The predictions for SO2 at the monitoring station SPD are good, with the posterior median almost coinciding with the observed values, which, with very few exceptions, fall inside their credibility intervals. The values of SO2 observed at the monitoring station LRL also fall, most of the times, inside their credibility intervals. However, the model is clearly underpredicting the values at this
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1049
600 800 800 500 600 400
600
300
400
400 200
200
200 100
0
0 0.010
0.020 σ11
0
0.030
0.25
0.35 ρ12
0.002
0.006 σ22
0.010
Fig. 6. Histogram of the posterior distributions of 11 , 12 , and 22 .
600 800
800
500
600
600
400
400
400
300
200 200
200
100
0
0
0 0
10 V11
20
30
0
20
40 V22
60
10
20
30
40
V33
Fig. 7. Histogram of the posterior distribution of the elements of the main diagonal in matrix V .
station. Fig. 5 shows that the model is overpredicting the values of NO3 at both stations. Besides that, the credibility intervals for the predicted values of NO3 are so wide that they are nearly useless. Most of these problems are solved by the multivariate analysis below. 6.2. Multivariate model Consider now two response variables Yt = (Y1,t , Y2,t ), where Y1,t = log(SO2 )t and Y2,t = log(NO3 )t . The model proposed here is like in (3), a multivariate version of the ones proposed for the single response of the previous section. Define
1050
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058 800
800 600 600 600 400
400
400
200
200
0
0
200
0 0.1
0.3
0.5
0.2
0.4
W11
0.6
0.8
0.1
0.3
W12
0.5
W22
Fig. 8. Histogram of the posterior distribution of the elements of the main diagonal in matrix W .
300 200 100 0 0.05
0.06
0.07
0.08
φ
Fig. 9. Histogram of the posterior distribution of the parameter .
Ft as in (6), for i = 1, . . . , 22, 1,t (si ) = (1,1,t (si ), 1,2,t (si )) where 1,j,t (si ) = (1,j,t (si ), 2,j,t (si ), 3,j,t (si )) , j = 1, 2. j = 1 corresponds to the coefficients of Y1 = log(SO2 ) and j = 2 to the coefficients of Y2 = log(NO2 ). Once again, the matrix C is defined through the Matérn function in (7). The prior distributions for the parameters in the model are the same specified in the previous section, including the inverted Wishart prior distribution ∼ IW(5, I2 ). Once again samples from the posterior distributions of the parameters are obtained through the algorithm described in Section 3, and the convergence was analyzed visually. Fig. 6 shows the histogram of the elements 11 and 22 (where ij denotes the (i, j ) element of ) and the correlation √ between the two response variables, which is 12 = 12 / 12 22 . It can be seen that the values taken by 11 are about double the values taken by 22 , showing that log(SO2 ) is more volatile than log(NO3 ). Besides that, the estimated correlation between the two response variables is significant (as suggested by preliminary analysis). Figs. 7 and 8 show, respectively, the histograms that were obtained from the posterior distribution of elements of the diagonal in matrices V and W . We can see that the elements of have low values if compared to the values assumed by the elements of V and W . The values assumed by the elements of the main diagonal of W are also small if compared to the ones of V , showing smaller variation of the parameters 2 through time. It was also verified that the elements of V which correspond to the covariance between coefficients of sine and cosine are significant with mean 3.43. The other non-diagonal elements of the matrix V are not significantly different from zero. Also, none of the elements outside the diagonal of matrix W are significant, showing that there is no significant correlation in the evolution of the coefficients.
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1051
SO2 1.0
1.2
0.5
0.8
0.0
0.4
2.0 1.5 1.0 0.0
-0.5 0
100
200
300
0
100
intercept
200
300
0
100
sine
200
300
cosine
NO3 1.4 0.4
0.4
1.2 0.2 1.0
0.0
0.0 0.8 -0.2 0.6
-0.4 0
100
200
300
0
100
intercept
200
300
0
100
sine
200
300
cosine
Fig. 10. The 2.5%, 50%, and 97.5% percentiles of the sample from the posterior distribution of the elements of 2,t , t = 1, . . . , 342.
SO2 46
46
46
44
44
44
42
42
42
40
40
40
38
38
38
36
36
36
34
34
34 32
32
32 -86
-84
-82
-80
-78
-76
-74
-86
-84
-82
intercept
-80
-78
-76
-74
-86
-84
-82
sine
-80
-78
-76
-74
-78
-76
-74
cosine
NO3 46
46
46
44
44
44
42
42
42
40
40
40
38
38
38
36
36
36
34
34
34
32
32 -86
-84
-82
-80
-78
intercept
-76
-74
32 -86
-84
-82
-80 sine
-78
-76
-74
-86
-84
-82
-80 cosine
Fig. 11. Predicted maps of the sample from the posterior distribution of the elements of 1,342 (·).
Fig. 9 shows the histogram of the parameter of the covariance function which defines the matrix C. The sample from the posterior distribution of defines correlations varying from approximately 0.7 to zero between the monitoring stations.
1052
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058 4
2
8
6
44 CTH CAT
42
MKG
KEF
LYK
SAL
PSU WSP
40
OXF
DCP
QAK
ARE PAR
CDR
BEL SHN
CKT
MCK
38
VPI
PED
PNF
ESP
36
34 -86
-84
-80
-82
-78
-76
-74
SO2
Fig. 12. Predicted surface of SO2 levels (g/m3 ) for t = 342.
1
1.5
2
2.5
3
3.5
4
44 CTH CAT
42
MKG LYK
SAL
KEF PSU WSP
40
OXF
DCP
QAK
ARE PAR
CDR
BEL SHN
38
CKT
MCK
VPI
36
PED
PNF
ESP
34 -86
-84
-82
-80
-78
-76
-74
NO3 Fig. 13. Predicted surface of NO3 levels (g/m3 ) for t = 342.
Fig. 10 shows the mean and the 95% credibility intervals of the elements of matrix 2 through time. Note that the variation of the elements of 2 in time is smaller than the variation observed in the univariate models. This result suggests that the higher variation of these parameters in time tries to compensate for the smaller amount of information
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1053
SO2 44
44
44
42
42
42
40
40
40
38
38
38
36
36
36
34
34
34
-86 -84 -82 -80 -78 -76
-86 -84 -82 -80 -78 -76
-86 -84 -82 -80 -78 -76
t = 341
t = 340
t = 342
NO3 44
44
44
42
42
42
40
40
40
38
38
38
36
36
36
34
34
34
-86 -84 -82 -80 -78 -76 t = 340
-86 -84 -82 -80 -78 -76 t = 341
-86 -84 -82 -80 -78 -76 t = 342
Fig. 14. Plot of the posterior means of the residuals obtained for SO2 and NO3 at times t = 340, 341, and 342. Black circles represent positive residuals and white circles represent negative residuals, with the size of the circle being proportional to their absolute values. The bigger circle represents 1.1 g/m3 .
provided by the univariate models. Nevertheless, the variation of some elements of 2 in time is still significant, and models which do not allow this variation would possibly lead to poorer predictions. Thus, the use of temporally varying coefficients seems to be justified in this application. Interpolation of 1 was performed at an equally spaced grid of points, for a fixed period of time t = 342. Fig. 11 shows a significant spatial variation of the elements of 1 , which supports the importance of allowing these parameters to vary in this dimension. Thus, the use of spatially varying coefficients also seems to be justified in this application. The variation of these elements is smooth in space, specially for the coefficients of sine and cosine of NO3 . Based on the interpolated coefficients 1 , the response process Yt (·) is also interpolated at t = 342. Each value sampled from Y1,342 (·) and Y2,342 (·) receives an exponential transformation, so that a sample of values in the original scale of the pollutants SO2 and NO3 are obtained. The spatial variation of the posterior median of these pollutants is shown, in the original scale, in Figs. 12 and 13. To study possible spatial structures left in the residuals, we analyze the residuals obtained for both response variables, for fixed times t = 340, 341, and 342, in each of the 22 sites. Fig. 14 shows a plot of the posterior means of these residuals in space. This figure suggests there is no spatial structure left in the residuals, indicating that the model is being able to capture all the spatial structure presented in the data. It can also be observed that apparently there is no correlation between residuals from SO2 and NO3 . Besides that, the magnitude of the residuals is low when compared to the magnitude of the data, showing that the model is being able to explain most of the variability of the response variables SO2 and NO3 . Finally, Figs. 15 and 16 show the predictions obtained for SO2 and NO3 at the sites SPD and LRL over time. It can be seen that the predictions were very much improved if compared to the ones obtained under the univariate models, showing the advantage of working with the response variables jointly in this application. Under the multivariate model, predictions for SO2 at both sites were very satisfactory. Predictions for NO3 are also good at the station SPD, but a slight overprediction can still be observed for the monitoring station LRL.
1054
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058 50 30 10 0 0
100
200
300
200
300
SPD - SO2
60 40 20 0 0
100 LRL - SO2
Fig. 15. SO2 levels (g/m3 ) observed at sites SPD and LRL, for t = 1, . . . , 342, and the 2.5%, 50%, and 97.5% percentiles of the sample from the posterior distribution of SO2 (g/m3 ) under the multivariate model. The true trajectory is in red.
10 8 6 4 2 0 0
100
200
300
200
300
SPD - NO3
10 8 6 4 2 0 0
100 LRL - NO3
Fig. 16. NO3 levels (g/m3 ) observed at sites SPD and LRL, for t = 1, . . . , 342, and the 2.5%, 50%, and 97.5% percentiles of the sample from the posterior distribution of NO3 (g/m3 ) under the multivariate model. The true trajectory is in red.
7. Conclusion A spatio-temporal class of models to treat observations made in the discrete time and continuous space was presented, with examples given for univariate and multivariate responses. These models can be seen as hierarchical regression models where the coefficients (which correspond to different covariates, for example) vary smoothly in time and space. The evolution of the coefficients is made through an autoregressive process in time, and through the specification of a spatial structure in the covariance matrix (between the regression coefficients) to define a smooth transition of these coefficients in space.
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1055
The objective of this research is not only to propose this flexible class of models, but also to propose, working under the Bayesian point of view, an efficient way of estimating the unknown parameters and also perform forecasting in time and interpolation in space. The proposed model was applied to a pollutant data set, where both univariate and bivariate versions of the model were compared. The obtained results justify the use of two response variables in the model, which provided a great improvement in the predictions. The methodology described here was shown to be efficient for the estimation of the proposed models, which are flexible enough to give a reasonable description for many kinds of environmental processes. Computations can be problematic when a large number of monitoring stations, say more than 100, are present, but can be performed straightforwardly for the number of stations considered here. This model formulation also has the advantage of allowing a reasonably easy way of performing forecasting in time and interpolating in space, being therefore adequate to applications where predictions in time and space are desired. Acknowledgment The first author thanks the financial support of CAPES, the second author thanks CNPq-Brazil for research grants, and the forth author thanks the financial support of FAPERJ. The authors also wish to thank the Editor, the Associate Editor and the two referees for their helpful comments which led to substantial improvements in the paper. Appendix A. Details of MCMC algorithm Let − be the collection of all the parameters of model (3) except from . −{1 } , −{2 } , −2,0 , − , −V , and
−W can be analogously defined. Define Yt∗ =Yt −F1t 1,t , ∗1,t =1,t −F2t 2,t , ∗2,t =2,t −Gt 2,t−1 , t =1, . . . , T , and ∗2,0 = 2,0 − M2 . In the sequence some probabilistic results are presented. These results are needed for the implementation of the algorithm, presented in Section 3, with the objective of generating samples from the posterior distributions of the unknown parameters of the model. The first result is the full posterior distribution of , denoted by p( | − , Y ), which is given below: p( | − , Y ) ∝ p(2,0 | )p( ) −N2 /2
∝ | | ×
T t=1
T
p(Yt |1,t , )p(1,t |1,t−1 , 2,t , , V , )p(2,t |2,t−1 , , W )
t=1
1 1 −1 ∗ ∗ −1 −(T0 /2)+q −1 exp − tr(2,0 C2 2,0 ) | | exp − tr[S0 ] 2 2
| |−((N+N1 +N2 )/2)
T T T
1 −1 ∗ ∗ ∗ −1 ∗ ∗ −1 ∗ −1 × exp − tr Yt Yt + 1,t V2 1,t + 2,t W 2,t 2
t=1
t=1
t=1
= | |−((T0 +N2 +T (N+N1 +N2 )/2)+q) T
1 −1 ∗ −1 ∗ ∗ ∗ ∗ ∗ ∗ −1 ∗ −1 × exp − tr S0 + 2,0 C2 2,0 + (Yt Yt + 1,t V2 1,t + 2,t W 2,t ) 2 t=1
⇒ ( | − , Y ) ∼ WI(T0 + N2 + T (N1 + N2 + N ), S0 ) T
−1 −1 where S = S0 + ∗2,0 C2 ∗2,0 + (yt∗ yt∗ + ∗1,t V2 ∗1,t + ∗2,t W −1 ∗2,t ) . t=1
1056
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
As it is easy to sample from the full posterior distribution of , a Gibbs sampling step can be used to sample from this parameter in the MCMC algorithm. The full posterior distributions of the parameters w, v, and 1 are also well known, easy to sample distributions, as shown in the results below, such that they all can be sampled through Gibbs sampling. The full posterior distribution of W is given by
p(W | −W , Y ) ∝ p(W )
T
p(2,t |W, , 2,t−1 )
t=1 −((TW 0 /2)+N2 )
∝ |W |
T 1 1 ∗ −1 ∗ −1 −1 −T q/2 exp − tr(SW 0 W ) |W | exp − tr 2,t 2,t W 2 2
−((TW 0 /2)+(T q/2)+N2 )
= |W |
1 exp − tr 2
SW 0 +
t=1
T
∗2,t −1 ∗2,t
W
−1
.
t=1
It can be noticed that the density above is proportional to a Wishart distribution density, and we have that W | −W , Y ∼ WI TW 0 + T q; SW 0 +
T
∗
2,t −1 ∗2,t .
t=1
The full posterior distribution of V is given by
p(V | −V , Y ) ∝ p(V )
T
p(1,t |2,t , C, V , )
t=1
∝ |V |
−((TV 0 /2)+N2 )
T 1 1 ∗ −1 ∗ −1 −1 −T q/2 exp − tr(SV 0 V ) |V2 | exp − tr 1,t V2 1,t 2 2
t=1
1 ∝ |V |−(((TV 0 +T Nq)/2)+N2 ) exp − tr(SV 0 V −1 + ∗1,t (C ⊗ V )−1 ∗1,t −1 ) . 2 Using the property (C ⊗ V )−1 = C −1 ⊗ V −1 , and defining Kij = (C −1 )ij , we have the following result: p(V | −V , Y ) ∝ |V |−((TV 0 +T Nq/2)+N2 ) ⎛ ⎞ N N T
1 × exp ⎝− tr(SV 0 + Kj i (i,1,t − 2,t ) −1 (j,1,t − 2,t )V −1 )⎠ . 2 t=1 i=1 j =1
The above density is proportional to a Wishart distribution density, which is given below: ⎛ V | −V , Y ∼ WI ⎝TV 0 + T Nq; SV 0 +
N N T
t=1 i=1 j =1
⎞ Kj i (i,1,t − 2,t ) −1 (j,1,t − 2,t )⎠ .
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
1057
The full posterior distribution of {1 } is given by p({1 }| −{1 } , Y ) ∝
T t=1
p(Yt |1,t , )p(1,t |2,t , , V , )
1 ∗ ∗ ∝ exp − tr (Yt Yt + ∗1,t V2−1 ∗1,t ) −1 2
T
t=1
1 = exp − tr (Yt Yt − 1,t F1t Yt − Yt F1t 1,t + 1,t F1t F1t 1,t 2 T
t=1
+ 2,t F2 V2−1 F2t 2,t − 2,t F2t V2−1 1,t − 1,t V2−1 F2t 2,t + 1,t V2−1 1,t ) −1
T 1 −1 −1 −1 ∝ exp − tr , ((1,t − Bt Mt ) Bt (1,t − Bt Mt )) 2 t=1
where Bt = F1t F1t + V2−1 and Mt = F1t Yt + V2−1 F2t 2,t . We conclude that 1,t | −1,t , Y ∼ N(Bt−1 Mt ; Bt−1 ; ). The full posterior distribution of {2 } does not have an easy to sample closed form, and it is given by p({2 }| −2 , Y ) = p(2,T |1 , W, , , V )
T −1
p(2,t |2,t+1 , 2,T , 1 , W, , , V ).
t=0
To sample from this distribution we use the FFBS algorithm. This algorithm is based in the following results: p(2,t |2,t+1 , . . . , 2,T , 1 , W, , , V ) = p(2,t |2,t+1 , 1,1 , . . . , 1,t , W, , , V ) and p(2,t |2,t+1 , 1,1 , . . . , 1,t , W, , , V ) ∝ p(2,t+1 |2,t , W, )p(2,t |1,1 , . . . , 1,t , W, , , V ), for t = 1, . . . , T − 1. It can be shown that (2,t |2,t+1 , W, ) ∼ N(ht , Ht , ), where Ht = (C −1 + Gt W −1 Gt )−1 , ht = Ht (Ct−1 Mt + Gt W −1 2,t+1 ) and Mt = Gt Mt−1 + Rt F2t Q−1 t (1t − Gt Mt−1 ), M0 = M20 = M2 , Ct = Rt − Rt F2t Q−1 t F2t Rt , Qt = V2 + F2t Rt F2t , Rt = W + Gt Ct−1 Gt , C0 = C20 = C2 . The above quantities are obtained through the updating of the parameters (West and Harrison, 1997), taking 1 as if they were the response variables. After successive updates, 2,T | −2,T ∼ N(MT , CT , ) is obtained, where MT and CT are defined through the equations of Mt and Ct given above, doing t = T . The algorithm works as following: for t varying from 1 to T, evaluate Mt , Ct , Qt , and Rt . Then sample 2,T from N(MT , CT , ), and for t varying from (T − 1) to zero sample 2,t from N(ht , Ht , ).
1058
M.S. Paez et al. / Journal of Statistical Planning and Inference 138 (2008) 1038 – 1058
References Aguilar, O., West, M., 1998. Analysis of hospital quality monitors using hierarchical time series models. In: Gatsonis, C. et al. (Eds.), Bayesian Statistics in Science e Technology: Case Studies 4. Springer, Berlin. Berger, J.O., de Oliveira, V., Sansó, B., 2001. Objective Bayesian analysis of spatially correlated data. J. Amer. Statist. Assoc. 96, 1361–1374. Cargnoni, C., Muller, P., West, M., 1997. Bayesian forecasting of multinomial time series through conditionally Gaussian dynamic models. J. Amer. Statist. Assoc. 87, 493–500. Carter, C.K., Kohn, R., 1994. On Gibbs sampling for state space models. Biometrika 81, 541–553. Dawid, A.P., 1981. Some matrix-variate distribution theory: notational considerations e a Bayesian application. Biometrika 68, 265–274. Doornik, J.A., 2002. Object-Oriented Matrix Programming Using Ox. third ed. Timberlake Consultants Press, London, Oxford. www.nuff.ox.ac.uk/Users/Doornik . Frühwirth-Schnatter, S., 1994. Data augmentation and dynamic linear models. J. Time Ser. Anal. 15, 183–202. Gamerman, D., Lopes, H.F., 2006. Monte Carlo Markov Chain: Stochastic Simulation for Bayesian Inference. second ed. Chapman & Hall, London. Gamerman, D., Migon, H.S., 1993. Dynamic hierarchical models. J. Roy. Statist. Soc. Ser. B 55 (3), 629–642. Gelfand, A.E., Banerjee, S., Gamerman, D., 2005. Spatial process modelling for univariate and multivariate dynamic spatial data. Environmetrics 16, 465–479. Gelfand, A.E., Kim, H.-J., Sirmans, C.F., Banerjee, S., 2003. Spatial modeling with spatially varying coefficient processes. J. Amer. Statist. Assoc. 98, 387–396. Handcock, M.S., Stein, M.L., 1993. A Bayesian analysis of kriging. Technometrics 35, 403–410. Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109. Huang, H.C., Cressie, N.A.C., 1996. Spatiotemporal prediction of snow water equivalent using the Kalman filter. Comput. Statist. Data Anal. 22, 157–175. Huerta, G., Sansó, B., Stroud, J.R., 2004. A spatio temporal model for Mexico city ozone levels. Appl. Statist. 53, 231–248. Landim, F.M.P.F., 1998. Dynamic hierarchical matrix-variate models. Unpublished Ph.D. Thesis, Universidade Federal do Rio de Janeiro (in Portuguese). Landim, F.M.P.F., Gamerman, D., 2000. Dynamic hierarchical models—an extension to matrix variate observations. Comput. Statist. Data Anal. 35, 11–42. Mardia, D., Goodall, C., Redfern, E., Alonso, F., 1998. The kriged Kalman filter. Test 7, 217–285. Matérn, B., 1986. Spatial Variation. second ed. Springer, Berlin. Metropolis, N., Rosenbluth, A., Rosenbluth, R., Teller, A., Teller, E., 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092. Paez, M.S., Gamerman, D., Landim, F.M.P.F., Salazar, E., 2007. Spatially varying dynamic coefficient models. Technical Report, Statistical Laboratory, Universidade Federal do Rio de Janeiro. Quintana, J., 1987. Multivariate Bayesian forecasting models. Unpublished Ph.D. Thesis, University of Warwick. Sansó, B., Guenni, L., 1999. Venezuelan rainfall data analyzed using a Bayesian space–time model. Appl. Statist. 48, 345–362. Stroud, J.R., Muller, P., Sansó, B., 2001. Dynamic models for spatiotemporal data. J. Roy. Statist. Soc. Ser. B 63, 673–689. Wackernagel, H., 1995. Multivariate Geostatistics. Springer, Berlin. West, M., Harrison, P.J., 1997. Bayesian Forecasting and Dynamic Model. second ed. Springer, London. Wikle, C.K., Cressie, N., 1999. A dimension-reduction approach to space–time Kalman filtering. Biometrika 86, 815–829. Zhang, H., 2004. Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Amer. Statist. Assoc. 99, 250–261.