An autoregressive spatio-temporal precipitation model

An autoregressive spatio-temporal precipitation model

Available online at www.sciencedirect.com Procedia Environmental Sciences 3 (2011) 2–7 1st Conference on Spatial Statistics 2011 – Mapping Global Ch...

154KB Sizes 2 Downloads 80 Views

Available online at www.sciencedirect.com

Procedia Environmental Sciences 3 (2011) 2–7

1st Conference on Spatial Statistics 2011 – Mapping Global Change

An autoregressive spatio-temporal precipitation model Fabio Sigrist∗, Hans R. K¨unsch, Werner A. Stahel Seminar for Statistics, ETH Zurich, 8092 Zurich, Switzerland

Abstract A spatio-temporal model for precipitation is presented. It is assumed that precipitation follows a censored and power-transformed normal distribution. Through a regression term, precipitation is linked to covariates. Spatial and temporal dependencies are accounted for by a latent Gaussian variable that follows a Markovian temporal evolution combined with spatially correlated innovations. Such a specification allows for nonseparable covariances in space and time. Further, the Markovian structure yields computational efficiency and it exploits in a natural way the unidirectional flow of time. In addition, the model is space as well as time resolution consistent. The model is applied to three-hourly Swiss rainfall data, collected at 26 stations. Keywords: Precipitation modeling, Space-time model, Bayesian hierarchical model, Markov chain Monte Carlo (MCMC) method, Censoring, Gaussian random field

1. Introduction Precipitation is a very complex phenomenon that varies in space and time. It can be characterized by statistical models. Statistical models are used to address important problems in areas such as agriculture, climate science, ecology, and hydrology. They can be used as stochastic generators to provide realistic inputs to flooding, runoff, and crop growth models. Moreover, they can be applied as components within general circulation models used in climate change studies, or for postprocessing precipitation forecasts. A characteristic feature of precipitation is that its distribution consists of a discrete component, indicating occurrence of precipitation, and a continuous one, determining the amount. As a consequence, there are two basic statistical modeling approaches. The continuous and the discrete part are either modelled separately ([1], [2], [3]) or together ([4], [5], [6], [7], [8]). Since precipitation exhibits structured variation across space and time, models need to incorporate spatial as well as temporal dependencies. A simple approach combines correlations at a single site across time with correlations at a single time point across space. For realistic models for data where the time spacing is relatively small, e.g., smaller than a day, this is not enough, and a non-separable spatio-temporal covariance structure is needed. ∗ Corresponding

author. Tel.:+41 44 633 23 10. Email address: [email protected] (Fabio Sigrist)

1878-0296 © 2011 Published by Elsevier doi:10.1016/j.proenv.2011.02.002

F. Sigrist et al. / Procedia Environmental Sciences 3 (2011) 2–7

3

2. The model The model presented in the following determines the distribution of the rainfall amounts and the probability of rainfall together by using a censored distribution. Originally, this approach goes back to Tobin [9] who analyzed household expenditure on durable goods. For modeling precipitation, [10] took up this idea and modified it by including a power-transformation for the non-zero part so that the model can account for skewness. The model presented in the following is a regression model, which means that precipitation is linked to covariates. Spatiotemporal dependencies are modeled via a latent Gaussian process that follows a temporal autoregressive convolution with spatially correlated innovations. To be more specific, let the rainfall Yt (si ) at time t = 1, . . . , T on site si , i = 1, . . . , N, depend on a latent normal variable Wt (si ) through Yt (si ) = 0,

if Wt (si ) ≤ 0, λ

= Wt (si ) , if Wt (si ) > 0,

(1)

where λ > 0. The latent variable Wt (si ) can be interpreted as a precipitation potential. The variables Wt (si ) are assumed to depend linearly on the regressors xt (si ) ∈ Rk with an error term showing both spatial and temporal correlations. For notational convenience, we split the error term into an uncorrelated “nugget” part νt (si ) and a part t (si ) accounting for correlations, Wt (si ) = xt (si )T b + t (si ) + νt (si ),

(2)

where b ∈ Rk and the νt (si ) are independent and identically distributed (iid), νt (si ) ∼ N(0, τ2 ), τ2 > 0. 2.1. Modeling spatio-temporal dependencies For modeling the process t (si ), we assume an explicit time evolution with spatially correlated innovations ([11], [12]). Writing  t = (t (s1 ), . . . , t (sN )) , it is assumed that  t = φGt−1 + ξt , G ∈ RN×N .

(3)

Note that we assume a linear autoregressive function, i.e., a vector autoregression, so that  t remains Gaussian for all t. The innovations ξt are assumed to be independent over time and to follow a stationary, isotropic Gaussian random depend on the distances field with mean zero, ξt ∼ N(0, σ2 · V ρ0 ), σ2 > 0. It is assumed  covariances  the spatial   that between sites through an exponential covariance function, i.e., V ρ0 = exp −di j /ρ0 , ρ0 > 0, 1 ≤ i, j, ≤ N,where ij di j denotes the distance between two sites i and j. In contrast to assuming an explicit space-time covariance function (see, e.g., [13], [14], [15], [16], [17]), we exploit the unidirectional flow of time. This approach has computational benefits, compared to an explicit space-time covariance specification, since it allows for a convenient factorization of the likelihood, thus avoiding extensive matrix decompositions. 2.2. The convolution autoregressive model The N × N matrix G governing the evolution is specified using a parametric function. This has the obvious advantage that less parameters are needed than in the general case, in which each entry in the matrix has to be estimated, resulting in N 2 parameters. Moreover, the parametric approach allows for making predictions at sites where no measurements are available, which is often essential in applications. We propose to use a model that is motivated by an autoregressive convolution of the form  t (s) = φ hθ (s − s )t−1 (s )ds + ξt (s), s ∈ R2 , (4) R2

where hθ (·) is a parametric spatial convolution kernel with parameters θ. We opt for a Gaussian kernel, hθ (·),   hθ (s − s ) = exp −(s − s − μ)T Σ−1 (s − s − μ) ,

(5)

4

F. Sigrist et al. / Procedia Environmental Sciences 3 (2011) 2–7

where θ is the vector containing the elements of μ and Σ. [18] show that the Gaussian function is the only kernel for a stationary, continuous time process that satisfies a reasonable constraint called the Lindeberg condition. For our application, we restrict the area over which the integral is taken to the area A ⊂ R2 in which the measurement sites lie. The integral is then approximated as follows. Assuming that the t (si )’s lie on a grid with disjoint cells N Ai , we approximate Ai , i = 1, . . . , N, A = i=1  t (si ) = φ A

hθ (si − s )t−1 (s )ds + ξt (si ) ≈ φ

N 

hθ (si − s j )t−1 (s j )|A j | + ξt (si ),

(6)

j=1

where |A j | denotes the area of cell A j . If the sites si do not lie on a regular grid, we propose to use the Voronoi tessellation ([19]) which decomposes the space as follows. Each site si has a corresponding Voronoi cell consisting of all points closer to si than to any other site s j , j  i. See, e.g., [20] for more details. N ˜ Regarding the determination of the area A, we first calculate the Voronoi tessellation R2 = i=1 Ai . The area of the cells at the border is then set equal to the average of the neighbouring non-border cells. In Figure 1, an example of the Voronoi tessellation, used in the application below, is shown. The choice of A also needs to be specified. Rather than specifying somewhat arbitrarily an area A, over which the N ˜ Ai . The area of the cells at the border is convolution is made, we first calculate the Voronoi tessellation R2 = i=1 then set equal to the average of the neighbouring non-border cells, thus obtaining a tessellation Ai , i = 1, . . . , N, that N yields an area A = i=1 Ai . In Figure 1, an example of the Voronoi tessellation for the Swiss stations, used in the application below, is shown. 2.3. Specific parametrizations of the kernel function Concerning the parameters μ and Σ of the kernel, note that μ is a parameter that can be interpreted as an external drift, whereas Σ determines the range of spatial correlation and can account for non-isotropy. First, assuming no external drift and isotropy, we consider ⎞ ⎛1 ⎜⎜⎜ ρ2 0 ⎟⎟⎟ −1 1 μ = 0 and Σ = ⎜⎜⎝ (7) ⎟⎟ . 0 ρ12 ⎠ 1



 The convolution kernel then reduces to hθ (s − s ) = exp − ((s − s )/ρ1 )2 . This model will be referred to as the isotropic convolution autoregressive model. An extension is obtained by relaxing the isotropy assumption and by allowing for μ  0. Let

T 

 cos α/A sin α/A cos α/A sin α/A cos ϕ , (8) μ=R· and Σ−1 = − sin α/B cos α/B − sin α/B cos α/B sin ϕ where R ≥ 0, ϕ ∈ [−π, π], 0 ≤ A ≤ B, and α ∈ [−π/2, π/2]. The drift term μ could even be time dependent. For instance, if information on wind is available, this quantity would lend itself naturally to be used as drift. We call this model the non-isotropic drift convolution autoregressive model. Finally, taking G to be the identity, a simple time autoregressive model ([21]) is obtained  t = φ t−1 + ξt .

(9)

With this specification, each point at time t − 1 only has an influence on itself at time t. I.e., there is no spatio-temporal interaction. Henceforth, we will refer to this model as the simple autoregressive model. 2.4. Discussion of the model The convolution model has the advantage that it is “space resolution consistent”, i.e., it retains its temporal Markovian structure if a site is removed and the distribution of the latent process does not depend on the locations of the stations. A random field t (s), (s, t) ∈ R2 × R is said to have a separable covariance structure (see [22]) if there exist purely spatial and purely temporal covariance functions CS and CT , respectively, such that cov(t1 (s1 ), t2 (s2 )) =

F. Sigrist et al. / Procedia Environmental Sciences 3 (2011) 2–7

CS (s1 , s2 ) · CT (t1 , t2 ). The convolution based approach allows for nonseparable covariance structures, whereas the simple autoregressive model has a separable covariance structure. Finally, concerning stationarity of  t , we note that the largest eigenvalue of φG needs to be smaller than 1 in order that  t is stationary. In our application, we check this condition after fitting the models. 2.5. Fitting Fitting is done using a Markov chain Monte Carlo method (MCMC) known as the Metropolis-Hastings algorithm ([23], [24]). In analogy to  t , we define the vectors W t , t = 1, . . . , T , as W t = (Wt (s1 ), . . . , Wt (sN )) . Since we have censored data and sometimes missing values, we follow a data augmentation approach proposed by [25]. Our goal is then to simulate from the joint posterior distribution of τ2 , σ2 , φ, ρ0 , θ, λ, b,  = ( 1 , . . . ,  T ),  0 , W = (W 1 , . . . , W T ). We note that those Wt (si ) that correspond to observed values above zero Yt (si ) > 0 are known. In that case, the full conditional 2 2 distribution consists of  a Dirac distribution. The prior distributions are specified as P[τ , σ , φ, ρ0 , ρ1 , λ, b,  0 ] ∝ 1 1 2 2 2 P [θ] P  0 |σ , ρ0 with  0 having a normal prior P[ 0 |σ , ρ0 ] = N(0, σ · V ρ0 ). The prior of θ is defined in the τ2 σ2 application in Section 3. The joint posterior distribution is then proportional to

(N(T +1))/2+1 NT/2+1   Yt (si )1/λ−1 1 1 1 1  −1 −(T +1)/2 · exp − |V ρ0 |  V  0 · P[θ] · 1{Wt (si )≤0;Yt (si )=0} λ 2 σ2 0 ρ0 σ2 τ2 Yt (si )>0 (10)

T    1 1 1  T 2 −1 · exp − ||W t − xt b −  t || + 2  t − φg( t−1 ) V ρ0  t − φg( t−1 ) . 2 t=1 τ2 σ The product in the first line is the Jacobian for the power transformation in (1). For most parameters, including the latent field, we have known distribution functions as full conditionals. Gibbs steps ([26]) are therefore applied for these parameters. For λ, ρ0 , and θ, Metropolis steps will be used. In doing so, ρ0 and θ are sampled together.

SHA GUT BAS

250

RUE

KLO REH SMA

BUS

WYN

STG HOE

WAE

CHA

CDF

VAD BER

200

TAE

NAP

LUZ

GLA ALT

FRE PLF INT

150

MLS

100

AIG

500

550

600

650

700

750

800

Figure 1: Locations of stations. Both axis are in km using the Swiss coordinate system (CH1903). The lines illustrate the Voronoi tessellation. The boundary cells are represented by circles. The area of a circle corresponds to the area of a boundary cell and is determined as outlined in Section 2.2

3. An application 3.1. The data The models are applied to a dataset of three-hourly precipitation amounts collected by 26 stations around the Swiss Middleland between December 2008 and March 2009. The data is kindly provided by MeteoSchweiz. The locations

5

6

F. Sigrist et al. / Procedia Environmental Sciences 3 (2011) 2–7

of these stations are shown in Figure 1. The covariates consist of the x- and y-coordinates, altitude, global radiation, relative air humidity 2 m above ground, sunshine duration, and air temperature. Covariates are centered around their means in order to avoid correlations of the regression coefficients with the intercept and to reduce posterior correlations. We also include an interaction term of the coordinates, and the squared altitude. Occasionally, some covariates are not observed at all locations. In that case, we apply spatial interpolation using thin plate splines ([27], [28]). The number of basis functions is chosen using generalized cross-validation ([29]). 3.2. Results The three different models (see (7), (8), and (9)) presented in Section 2 were fitted. Concerning the isotropic convolution model as defined in, we observe very strong posterior correlation between φ and ρ1 when using noninformative priors. This results in strong autocorrelation and slow mixing properties of the corresponding Markov chains. This problem appears similar to the difficulties in model-based geostatistics when estimating the variance and scale parameters of the exponential covariogram (see, e.g., [30], [31], [32], [33], [34]). Hence, we fit the isotropic convolution model for various choices of ρ1 and then select the one which minimizes the deviance information criterion (DIC) ([35]). Concerning the non-isotropic drift convolution model, we assume a uniform prior on [−π/2, π/2] for α, a uniform prior on [−π, π] for ϕ, and independent and locally uniform priors for A, B, and R, i.e., P[A, B, R] ∝ 1. Here, we do not observe such slow-mixing problems as in the case of the isotropic model. For all models, after a burn-in of 5 000 draws, 495 000 samples from the Markov chain were used to characterize posterior distributions. Convergence was monitored by inspecting trace plots. Model checking was done as briefly outlined in the following. We use a tool called the Primary Posterior Predictive Distribution (PPPD), see [36] for more details, for selecting the model that provides the best fit to the spatiotemporal dependencies observed in the data. Based on this, we conclude that the simple autoregressive model does not accurately account for the spatio-temporal correlations. This is a clear indicator that the separability assumption is too restrictive and needs to be relaxed. On the other hand, both the isotropic convolution model and the non-isotropic drift convolution model provide good fits to the spatio-temporal dependence structure of the observed precipitation. Since the isotropic convolution model has a lower DIC than the non-isotropic drift convolution model, we conclude that the isotropic convolution model provides the best fit to the data. We have also examined the fit of the marginal distribution at individual stations and found good agreement between observed quantities and fitted ones. 4. Conclusion A spatio-temporal model for precipitation has been presented. The model determines the probability of rainfall and the rainfall amount distribution together. This is done using a latent Gaussian variable that depends linearly on covariates. A Markovian temporal evolution with spatially correlated innovations accounts for spatio-temporal dependency. The model can be extended by relaxing one or several important assumptions. First, an external drift, such as wind, can be included in the specification of the autoregressive kernel function. In the case of spatially high resolved data, the innovation terms can be Markov random fields. Finally, the primary parameters might evolve dynamically over time. References [1] R. Coe, R. Stern, Fitting models to daily rainfall data., Journal of Applied Meteorology 21 (7) (1982) 1024–1031. [2] D. Wilks, Multisite downscaling of daily precipitation with a stochastic weather generator, Climate Research 11 (2) (1999) 125–136. [3] E. Bellone, J. Hughes, P. Guttorp, A hidden Markov model for downscaling synoptic atmospheric patterns to precipitation amounts, Climate Research 15 (1) (2000) 1–12. [4] T. Bell, A space-time stochastic model of rainfall for satellite remote-sensing studies, Journal of Geophysical Research 92 (D8) (1987) 9631–9643. [5] D. Wilks, Maximum likelihood estimation for the gamma distribution using data containing zeros, Journal of Climate 3 (12) (1990) 1495– 1501.

F. Sigrist et al. / Procedia Environmental Sciences 3 (2011) 2–7 g

/

(

)

[6] A. Bardossy, E. Plate, Space-time model for daily rainfall using atmospheric circulation patterns, Water Resources Research 28 (5) (1992) 1247–1259. [7] M. Hutchinson, Stochastic space-time weather models from ground-based data, Agricultural and Forest Meteorology 73 (3-4) (1995) 237– 264. [8] B. Sanso, L. Guenni, A Bayesian approach to compare observed rainfall data to deterministic simulations, Environmetrics 15 (6) (2004) 597–612. [9] J. Tobin, Estimation of relationships for limited dependent variables, Econometrica 26 (1958) 24–36. [10] C. K. Stidd, Estimating the precipitation climate, Water Resources Research 9 (1973) 1235–1241. [11] K. Solna, P. Switzer, Time trend estimation for a geographic region, Journal of the American Statistical Association 91 (434) (1996) 577–589. [12] C. K. Wikle, N. Cressie, A dimension-reduced approach to space-time Kalman filtering, Biometrika 86 (4) (1999) 815–829. [13] R. Jones, Y. Zhang, Models for continuous stationary space-time processes, in: T. Gregoire, D. R. Brillinger, P. J. Diggle, E. Russek-Cohen, W. G. Warren, R. Wolfinger (Eds.), Modelling Longitudinal and Spatially Correlated Data, Vol. 122 of Lecture Notes in Statistics, SpringerVerlag, New-York, 1997, pp. 289–298. [14] N. Cressie, H.-C. Huang, Classes of nonseparable, spatio-temporal stationary covariance functions, Journal of the American Statistical Association 94 (448) (1999) 1330–1340. [15] T. Gneiting, Nonseparable, stationary covariance functions for space-time data, Journal of the American Statistical Association 97 (458) (2002) 590–600. [16] C. Ma, Families of spatio-temporal stationary covariance models, Journal of Statistical Planning and Inference 116 (2) (2003) 489 – 501. [17] M. L. Stein, Space-time covariance functions, Journal of the American Statistical Association 100 (2005) 310–321. [18] P. E. Brown, K. F. Karesen, G. O. Roberts, S. Tonellato, Blur-generated non-separable space-time models, Journal of the Royal Statistical Society. Series B (Statistical Methodology) 62 (4) (2000) 847–860. [19] G. Voronoi, Nouvelles applications des param`etres continus a` la th´eorie des formes quadratiques. Deuxi`eme m´emoire. Recherches sur les parall´ello`edres primitifs., Journal f¨ur die reine und angewandte Mathematik (Crelles Journal) 1908 (134) (1908) 198–287. [20] A. Okabe, B. Boots, K. Sugihara, S. N. Chiu, Spatial tessellations: concepts and applications of Voronoi diagrams, Wiley Series in Probability and Statistics, John Wiley & Sons Ltd., Chichester, 2000. [21] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, Time Series Analysis; Forecasting and Control, 3rd Edition, Prentice Hall, Inc., Englewood Cliffs, New Jersey, 1994. [22] T. Gneiting, M. G. Genton, P. Guttorp, Geostatistical space-time models, stationarity, separability and full symmetry, in: B. Finkenst¨adt, L. Held, V. Isham (Eds.), Modelling Longitudinal and Spatially Correlated Data, Vol. 107 of Monographs on Statistics and Applied Probability, Chapman & Hall/CRC, Boca Raton, 2007, pp. 151–175. [23] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, Equation of state calculations by fast computing machines, The Journal of Chemical Physics 21 (6) (1953) 1087–1092. [24] W. K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57 (1) (1970) 97–109. [25] A. F. M. Smith, G. O. Roberts, Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods, Journal of the Royal Statistical Society. Series B (Methodological) 55 (1) (1993) 3–23. [26] A. E. Gelfand, A. F. M. Smith, Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association 85 (410) (1990) 398–409. [27] G. Wahba, Spline Models for Observational Data, Vol. 59 of CBMS-NSF Regional conference series in applied mathematics, SIAM, 1990. [28] P. J. Green, B. W. Silverman, Nonparametric Regression and Generalized Linear Models, no. 58 in Monographs on Statistics and Applied Probability, Chapman and Hall, 1994. [29] P. Craven, G. Wahba, Smoothing noisy data with spline functions. estimating the correct degree of smoothing by the method of generalized cross-validation, Numer. Math. 31 (4) (1979) 377–403. [30] J. J. Warnes, B. D. Ripley, Problems with likelihood estimation of covariance functions of spatial Gaussian processes, Biometrika 74 (3) (1987) pp. 640–642. [31] K. V. Mardia, A. J. Watkins, On multimodality of the likelihood in the spatial linear model, Biometrika 76 (2) (1989) pp. 289–295. [32] P. J. Diggle, J. A. Tawn, R. A. Moyeed, Model-based geostatistics, Journal of the Royal Statistical Society. Series C (Applied Statistics) 47 (3) (1998) pp. 299–350. [33] H. Zhang, Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics, Journal of the American Statistical Association 99 (465) (2004) pp. 250–261. [34] M. Stein, Uniform asymptotic optimality of linear predictions of a random field using an incorrect second-order structure, The Annals of Statistics 18 (2) (1990) pp. 850–872. [35] D. J. Spiegelhalter, N. G. Best, B. P. Carlin, A. v. d. Linde, Bayesian measures of model complexity and fit, Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64 (4) (2002) 583–639. [36] F. Sigrist, H. R. Kuensch, W. A. Stahel, A dynamic spatio-temporal precipitation model, Preprint (http://arxiv.org/abs/1102.4210).

7