Evaluating spatio-temporal representations in daily rainfall sequences from three stochastic multi-site weather generation approaches

Evaluating spatio-temporal representations in daily rainfall sequences from three stochastic multi-site weather generation approaches

Advances in Water Resources 32 (2009) 948–962 Contents lists available at ScienceDirect Advances in Water Resources journal homepage: www.elsevier.c...

5MB Sizes 1 Downloads 67 Views

Advances in Water Resources 32 (2009) 948–962

Contents lists available at ScienceDirect

Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres

Evaluating spatio-temporal representations in daily rainfall sequences from three stochastic multi-site weather generation approaches R. Mehrotra *, Ashish Sharma The University of New South Wales, Water Research Centre, Kensington, Sydney, NSW 2052, Australia

a r t i c l e

i n f o

Article history: Received 2 December 2008 Received in revised form 2 March 2009 Accepted 18 March 2009 Available online 25 March 2009 Keywords: Spatio-temporal rainfall distribution Sydney Modified Markov model KNN model Kernel density estimation

a b s t r a c t Many hydrological and agricultural studies require simulations of weather variables reflecting observed spatial and temporal dependence at multiple point locations. This paper assesses three multi-site daily rainfall generators for their ability to model different spatio-temporal rainfall attributes over the study area. The approaches considered consist of a multi-site modified Markov model (MMM), a reordering method for reconstructing space–time variability, and a nonparametric k-nearest neighbour (KNN) model. Our results indicate that all the approaches reproduce adequately the observed spatio-temporal pattern of the multi-site daily rainfall. However, different techniques used to signify longer time scale observed temporal and spatial dependences in the simulated sequences, reproduce these characteristics with varying successes. While each approach comes with its own advantages and disadvantages, the MMM has an overall advantage in offering a mechanism for modelling varying orders of serial dependence at each point location, while still maintaining the observed spatial dependence with sufficient accuracy. The reordering method is simple and intuitive and produces good results. However, it is primarily driven by the reshuffling of the simulated values across realisations and therefore may not be suited in applications where data length is limited or in situations where the simulation process is governed by exogenous conditioning variables. For example, in downscaling studies where KNN and MMM can be used with confidence. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction Weather generators are commonly used to generate synthetic sequences of rainfall and other weather variables to enhance our understanding of hydrological system response, and to assist in the design and operation of water resource systems. Single-site weather generators are relatively simple to formulate, but cannot be used in situations that require the information of weather variables simultaneously at multiple locations, such as studying the hydrological or agricultural behaviour of a region, discharge of a river, and floods. A number of multi-site parametric stochastic models have been proposed in the literature to address the crucial issue of spatial dependence among weather variables at multiple locations, both with and without conditioning on exogenous atmospheric predictors. These models simulate weather variable sequences simultaneously at multiple locations, usually at a daily time step (e.g. [4,35,12,1,39,13–15]). However, these models are constrained to be used on a small network of stations, as the number of parameters of the model grows almost exponentially with increase in the number of stations.

* Corresponding author. Tel.: +61 2 9385 5140; fax: +61 2 9385 6139. E-mail address: [email protected] (R. Mehrotra). 0309-1708/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2009.03.005

It is also common, to incorporate synoptic information for generation of weather variables in an attempt to improve the spatial and temporal attributes of the simulated series [14,15,21,33]. This is also found to improve the representation of the low frequency variability in the simulated rainfall [16]. The low frequency variability is important for applications that are sensitive to proper representation of low frequency persistence, or, the representation of sustained droughts and periods of above average rainfall (or above average wet days) in the generated record. To overcome the theoretical as well as computational difficulties of parameter estimation of multi-site stochastic models, Wilks [36] proposed a simple extension of commonly used single-site Markov model for precipitation to multi-sites by driving individual single-site models with temporally independent yet spatially correlated random numbers and successfully simulated daily precipitation while reproducing the observed spatial correlation and preserving the individual behaviours of the local models. Since then, this logic has been successfully applied for simultaneous simulations of rainfall and other weather variables at multiple locations [37,38,27,23,25,26]. Recently, in an attempt to reproduce the spatial and temporal structure of observed multi-site series of weather variables without introducing additional complexities, Clark [7,8] introduced a method to reconstruct the observed spatial (intersite)

949

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962

and temporal correlation statistics in the multi-site generated sequences by reordering of the generated series of weather variables at individual locations. Their approach involves reshuffling across ensembles (realisations) of the simulated observations to match the observed spatial and temporal characteristics in the reshuffled output. In essence, the procedure requires generating multiple realisations (ensembles) individually at each site and reshuffling the simulated series based on the observed information available at these sites. They claimed that the reordering procedure successfully reproduced the number of wet days, rainfall amounts and spell characteristics of the rainfall series at number of stations, without requiring a complex model structure and tedious parameter estimation procedures. Another class of weather generators is based on nonparametric alternatives. These methods offer a different rationale for generation of climate variables and have been used extensively for this purpose in recent years [29,3,6,31,10,11,24]. These methods provide an alternative of developing the temporal and spatial relationship among weather variables without a priori assumptions on the joint probability distribution associated with these variables. For multi-site resampling, since the variables at multiple locations are simulated concurrently, dependence across space is accurately preserved. The k-nearest neighbour bootstrap (KNN) is a technique that conditionally resamples the values from the observed record based on the conditional relationship specified. The lack of any assumptions defining the joint distribution of the weather variables helps ensure an accurate representation of features such as nonlinearity, asymmetry or multimodility in the observed record of the variables being modelled. This paper assesses these three modelling strategies for daily rainfall generation at multiple locations. All the approaches follow a two step rainfall generation procedure, rainfall occurrences are generated in the first step, amounts being modelled subsequently on the days identified as wet by the occurrence models, using a nonparametric kernel density estimation procedure. The approaches considered are (a) a multi-site modified Markov model (MMM) [26]; (b) reordering of ensemble output Clark [7,8]; and (c) a nonparametric k-nearest neighbour multi-site model [6,19]. Rainfall amounts for these models are generated using kernel density estimation (KDE) approach [31,11,24]. These models (also including the rainfall amounts) are hereafter referred to as (a) MMM, (b) reordering and (c) KNN. Results of these models are discussed and evaluated against each other, with the observed data and also with the results obtained using an unconditional (zero-order) Markov model (for rainfall occurrences) and an unconditional KDE model (for rainfall amounts) applied to each site in isolation without any treatment for spatial dependence. This model is hereafter referred to as the independent model. A 43 year long record of daily rainfall at a network of 45 locations near Sydney, Australia, is used to compare the methods.

The paper is organised as follows. The methodological aspects of models used and data are presented in Section 2. Section 3 provides applications of different models and presents their comparison. In Section 4, finally, the results are summarized and conclusions are drawn. 2. Methodology and data used The approaches considered in the study mainly differ in the procedure used to generate the rainfall occurrences, with the general model structure for rainfall amount generation being common to all approaches. Table 1 summarises the spatial and temporal dependence structures of the rainfall occurrence and amounts processes used in these approaches. Following describe in brief about the general structure of the models used in the study. 2.1. Modified Markov model (MMM) We denote rainfall occurrence at a location k and time t as Rt ðkÞ and at the pth time step before the current as Rtp ðkÞ. The MMM as proposed in Mehrotra and Sharma [26] is based on the conditional simulation of Rt ðkÞjZt ðkÞ within the general framework of Markov process. Here, Zt ðkÞ represents a vector of conditioning variables at a location k and at time t that in addition to previous time steps values of rainfall imparting daily or short term persistence, can also include other continuous variables explaining the higher time scale persistence (denoted as Xt ðkÞÞ. Thus, for the specific case of rainfall occurrence generation as considered here, Xt ðkÞ is a vector of aggregated rainfall predictors at site k representing the wetness over the recent past, to convey the slow varying temporal persistence that creates sustained low or high rainfall periods and is expressed as:

Xt ðkÞ 2 fXðkÞj1 ;t ; XðkÞj2 ;t ; . . . ; XðkÞjm ;t g;

XðkÞji ;t ¼

ji 1X RðkÞtl ji l¼1

ð1Þ

where m is the number of such predictors and XðkÞji ;t describes how wet it has been over the preceding ji days with RðkÞtl being the rainfall occurrences at site k. Similarly, an area averaged wetness state over a time period is defined as:

X ji ;t ¼

ji NS X 1 1X Rk;tl  NS ji k¼1 l¼1

ð2Þ

where NS is the number of stations and X ji ;t defines the area averaged wetness of rainfall occurrence/amount over the preceding ji days with Rk;tl being the rainfall occurrence/amount at site k. For brevity, site notations are dropped in the subsequent discussions. The parameters (or the transition probabilities) of a Markov model expressing the order one Markovian dependence (first order Markov model) are defined by PðRt jRt1 Þ with Zt consisting

Table 1 Details of temporal and spatial dependences considered in the models used. Model

Process

Dependence modelled Temporal dependence of order

Spatial dependence by

Independent

Occurrence Amount

Zero Zero

None None

MMM

Occurrence Amount

One and annual wetness state One

Spatially correlated random numbers Spatially correlated random numbers

Reordering

Occurrence Amount

Using reordering Using reordering

Reordering Reordering

KNN

Occurrence Amount

Spatially averaged order one and 365 days wetness state One

Simultaneously picking up observations at all stations Spatially correlated random numbers

950

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962

of Rt1 only. Inclusion of additional predictors Xt in the conditioning vector Zt would modify these transition probabilities as PðRt jRt1 ; Xt Þ. The following parameterization is adopted to estimate PðRt jRt1 ; Xt Þ:

PðRt ¼ 1jRt1 ¼ i; Xt Þ ¼ ¼ p1i 

this method and other multi-site weather generators is that the space–time variability is not preserved intrinsically, but is reconstructed as a post-processing step. In brief, the procedure can be described as follows.

PðRt ¼ 1; Rt1 ¼ i; Xt Þ f ðXt jRt ¼ 1; Rt1 ¼ iÞ  PðRt ¼ 1; Rt1 ¼ iÞ ¼ PðRt1 ¼ i; Xt Þ f ðXt jRt1 ¼ iÞ  PðRt1 ¼ iÞ

f ðXt jRt ¼ 1; Rt1 ¼ iÞ ½f ðXt jRt ¼ 1; Rt1 ¼ iÞPðRt ¼ 1jRt1 ¼ iÞ þ ½f ðXt jRt ¼ 0; Rt1 ¼ iÞPðRt ¼ 0jRt1 ¼ iÞ

The first term of (3) defines the transition probabilities PðRt jRt1 Þ of a first order Markov model (representing order one dependence, when Rt1 ¼ i and Rt ¼ 1) while the second term signifies the effect of inclusion of predictor set Xt in the conditioning vector Zt . A closer look of Eq. (3) reveals that the baseline transition probability p1i is re-scaled in proportion to the influence of predictor variables set Xt on the original p1i and p0i thereby keeping the sum of modified transition probabilities p1i and p0i or ð1  p1i Þ again as 1. If Xt consists of derived measures (typically linear combinations) of summation of number of wet days in pre-specified aggregation time periods (as specified by Eq. (1) or (2)), one could approximate the associated probability with a multivariate normal distribution, leading to the following simplification for PðRt jRt1 ; Xt Þ:

PðRt jRt1 ; Xt Þ ¼ p1i 

1 detðV1;i Þ1=2

ð3Þ

For a given day, the generated ensemble members ðnÞ are ranked from lowest to highest. A short moving window of prespecified days (say 31 days) is formed centred on the given day. A subset of n observed dates (number of observations being same as number of generated realisations) is randomly selected from the days of the historical record falling within this moving window (dates can be drawn from all years in the historical record except the year for which reordering is being performed). This observed subset is also ranked from lowest to highest separately for each variable and station. For each variable and station, each observation of the ranked observed subset is tagged with the corresponding ranked generated ensemble members, and is re-ordered to its original position with the tagged generated record. The random

n o 0 exp  12 ðXt  l1;i ÞV1 1;i ðXt  l1;i Þ  n o   n o 0 0 1 1 1 exp  12 ðXt  l1;i ÞV1 1;i ðXt  l1;i Þ p1i þ detðV Þ1=2 exp  2 ðXt  l0;i ÞV0;i ðXt  l0;i Þ ð1  p1i Þ 1 detðV1;i Þ1=2

where the l1;i represents the mean vector EðXt jRt ¼ 1; Rt1 ¼ iÞ and V1;i is the corresponding variance–covariance matrix. Similarly, l0;i and V0;i represent, respectively, the mean vector and the variance– covariance matrix of subset of X when ðRt1 ¼ iÞ and ðRt ¼ 0Þ. The p1i represents the baseline transition probabilities of the first order Markov model defined by PðRt ¼ 1jRt1 ¼ iÞ, and det() represents the determinant operation. The parameters of the MMM for PðRt jRt1 ; Xt Þ in (4) are l0 ; V0 ; l1 ; V1 and p1i which can be estimated directly from the data by forming different conditional sub-sets of the given rainfall series and calculating the conditional probabilities, means, variances and covariances. It may be noted that for some applications with the increase in the number of conditioning variables, the assumption of a multivariate normal may not be sufficient. In such situations, conditional probabilities PðXt jRt ; Rt1 Þ and PðXt jRt1 Þ may be estimated either using more appropriate probability distributions or using nonparametric alternatives (such as kernel density estimation). However, for the current application, the assumption of a multivariate normal is found to be appropriate [26]. 2.2. The reordering method The ensemble (realisation) reordering method was proposed by Clark [7,8]. This method is fairly simple and intuitive, and involves a reordering of ensemble outputs for maintaining the space–time variability in the generated series. The major difference between

ð4Þ

0;i

selections of dates that are used to construct the observed subset are only used for the first day, and are persisted for subsequent days. Following describes in brief the stepwise procedure to be followed in formulating the reordering method. For a given station j and variable k, let ~ X be a vector of n generated observations drawn from the n ensemble members for a given v be a sorted vector of ~ X, that is, day and let ~

~ X ¼ ðx1 ; x2 ; . . . ; xn Þ;

~ v ¼ ðxð1Þ ; xð2Þ ; xð3Þ ; . . . ; xðnÞ Þ;

xð1Þ 6 xð2Þ    6 xðnÞ

ð5Þ

Also, let ~ Y be a vector of n historical observations that are selected randomly from the observed dataset within a moving window of c be the sorted vector of ~ Y, that is, pre-specified days, and let ~

~ Y ¼ ðy1 ; y2 ; . . . ; yn Þ; 6 yð2Þ    6 yðnÞ

~ c ¼ ðyð1Þ ; yð2Þ ; yð3Þ ; . . . ; yðnÞ Þ;

yð1Þ ð6Þ

Additionally, let ~ B be the vector of indices describing the original observation number that corresponds to the values in the ordered c. vector ~ Now the final step involves constructing the reordered vector ~ X o , which is the final reordered output:

  ~ X o ¼ xo1 ; xo2 ; . . . ; xon

ð7Þ

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962

951

where, xom ¼ xðrÞ ; m ¼ ~ B½r and r ¼ 1; . . . ; n, with the subscripts in the v. parentheses referring to the elements in the sorted vector ~ Further details on the reordering method are available in Clark [7,8]. This approach is intended to preserve both spatial and temporal correlations in the reordered generated series. Considering the spatial dependence at two stations, if these are highly correlated, then the observations at these stations on a given day are likely to have a similar rank. The rank of each simulated realisation at the two stations is matched with the rank of each randomly selected observation, meaning that, for all realisations, the rank will be similar at the two stations. When this process is repeated for all days, the ranks of a given realisation will on average be similar for the two stations, and the spatial correlation will be reconstructed once the randomly selected days are re-sorted to their original order. In order to maintain the temporal persistence at each station, the random selections of dates that are used to construct the observed subset are only used for the first day. At a given station, historical observations following high temporal persistence, for subsequent days, on average, would have a similar rank. The ensemble output is assigned identical ranks to the randomly selected observations, and thus the temporal persistence is reconstructed once the ensemble output is re-sorted.

where m is the number of variables included in the Z vector and sj is the scaling weight for the jth predictor. Scaling of variables imparts an equal weight to each predictor in selection of the nearest neighbours. Scaling is also necessary when the feature vector consists of combination of discrete and continuous variables [19,6]. The reciprocal of the sample standard deviation of each predictor variable is a common choice for the scaling weights [19,29,10,2]. The corresponding k sets of rainfall occurrences, Ri , chosen from the historical rainfall records of all stations are termed as nearest neighbours, and they form an empirical estimate of conditional probability Rt jZt for the day t that is based directly on the characteristics of the historical sample. The generated rainfall occurrence at all stations for day t from these k nearest neighbours is estimated by using (8) and a random number u  U(0,1). The day i for which u is closest to pi is picked up as the nearest neighbour of day t and corresponding Ri vector forms the estimated values of rainfall occurrence at all stations. In KNN, the spatial rainfall distribution structure is maintained by resampling simultaneously at all stations. Seasonal variations in the generation process are accommodated using a moving window approach. Further details on the method are available in Rajagopalan and Lall [29], Buishand and Brandsma [6], Beersma and Buishand [2] and Mehrotra et al. [21].

2.3. k-Nearest neighbour (KNN) approach

2.4. The single-site (independent) rainfall occurrence model

In the context of multi-site generation of rainfall occurrences, the KNN approach considers simultaneously sampling with replacement of the rainfall occurrences at multiple locations, from the historical records of rainfall. To preserve lagged correlations, resampling is conditioned on the days in the historical record that have similar characteristics as those of the previously simulated days. The conditional probability Rt jZt is estimated based on the k-nearest neighbours of the conditioning vector Zt . Where Rt represents rainfall occurrence vector at all stations and Zt represents a vector of conditioning variables used to find the similarity. In the approach, the k patterns in the historical series of predictors that are most similar to the current pattern are found, and the k sets of corresponding predictands (in our case, rainfall occurrences at all stations) are specified as likely realisations the system may have. Lall and Sharma [19] suggested that, as a basic guideline, k can be chosen as the square root of the length of the dataset. The Lall and Sharma [19] k-nearest neighbour conditional PDF is expressed as:

The independent model used for comparison generates rainfall occurrence at individual site ignoring the at-site temporal and across-site spatial dependence. Similarly, the ensemble reordering method requires ensembles of generated rainfall occurrence at each site before introducing spatial and temporal dependence by reshuffling. In the present application, the single-site daily rainfall occurrence model used in both these approaches is a zero-order Markov model.

1=i pðiÞ ¼ Pk j¼1 1=j

ð8Þ

where pðiÞ is the probability that the ith nearest neighbour will be resampled, and k is the number of nearest neighbours considered based on Euclidean distances from the conditioning vector Zt . Generation of rainfall occurrences at network of raingauge stations proceeds by defining the vector Zi for every day i of all years, except the ith day of the current year in the seasonal subset of the historical series of rainfall occurrences formed by the moving window centred at the Julian day corresponding to t, which is the current day in the generated sequence. The values used in the calculation of Euclidean distances are made dimensionless through standardization to ensure that each predictor has the same weighting in selection of nearest neighbours. The k most similar patterns of Zt are selected based on the minimum Euclidean distances between Zi and Zt . These Euclidean distances Ei are calculated as:

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uX u m Ei ¼ t fsj ðZj;i  Zj;t Þg2 j¼1

ð9Þ

2.5. The single-site rainfall amount model A nonzero rainfall amount must be generated for each wet day and location of the generated sequences of all the occurrence models described in previous sub-sections. The model for rainfall amounts presented here is nonparametric, and is based on the kernel density procedure described in Sharma [32], Sharma and O’Neill [31], Harrold et al. [11] and Mehrotra and Sharma [26]. The model is mentioned hereafter as KDE model. For KNN and MMM approaches, it simulates rainfall amount at individual stations conditional on previous day’s rainfall amount at the station using spatially correlated random numbers, thereby imparting temporal and spatial dependences in the generated series. For reordering and independent models, the model generates rainfall amounts at individual stations unconditionally using spatially uncorrelated random numbers. 2.6. Modelling observed spatial dependence in the generated rainfall field The MMM accommodates the spatial correlation structure in the generated multi-site rainfall occurrences using spatially correlated random numbers. Similarly, the KDE model used to generate rainfall amounts for MMM and KNN imparts spatial dependence in the generated rainfall amounts by making use of spatially correlated and serially independent random numbers during the generation of rainfall amounts at individual stations. The general logic of estimating the correlation matrices of random numbers for rainfall occurrence/amounts is available in Wilks [36] and Mehrotra et al. [23].

952

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962

2.7. Data and study area The study region is located around Sydney, eastern Australia spanning between 149°E and 152°E longitude and 32°S–36°S latitude (Fig. 1). The physio-geographical conditions in Sydney region cause large climatic gradients even over short distances, e.g. from lowland areas to mountain regions and from the coast to the inland. Most significant rainfall events in winter (March–August months) in this region involve air masses that have been brought over from the east-coast low-pressure systems. Orographic uplift of these air masses when they strike coastal ranges or the Great Dividing Range often produces very heavy rain. Several of the most severe floods experienced east of the Great Dividing Range have resulted from east-coast low-pressure systems. In summer (September–February months), tropical depressions moving southwards from Queensland into New South Wales bring warm, cloudy and drizzly weather to coastal regions of eastern Australia. These conditions can result in heavy rainfall if some means of lifting is available. The inter-station distances between station pairs vary approximately from 20 to 340 km. Missing values at some stations (<0.5%), are estimated using the records of nearby stations. A day is considered as wet if the rainfall amount is greater than or equal to 0.3 mm. 3. Application of models and result discussion For MMM, following Mehrotra and Sharma [26], an order one Markov dependence with previous 365 days wetness state variable explaining the low frequency variability, are considered as condi-

tioning variables for all stations. The reordering method is applied to the unconditional generated series of rainfall amounts only, observed dependence structure of rainfall occurrences is automatically preserved once amounts are reordered. For KNN, based on the results of a sensitivity analysis to different choices of width of moving window l, a value of l ¼ 31 days is chosen for use in our study. Similarly, an analysis is performed to find out the optimal value of k, the number of nearest neighbours, and consequently a value of k ¼ 10, is adopted in the study. For KNN, the area averaged wetness states (or wetness fraction) of the previous day and preceding 365 days are selected as conditioning variables. Similarly, for reordering and independent rainfall occurrence models, daily unconditional wet day probabilities are estimated from observations falling within a moving window of length 31 days centred on the current day. The relationship between correlations of the series of normally distributed random numbers and corresponding simulated rainfall series at each station pair is ascertained empirically on a daily basis considering the observations falling within a moving window of length 31 days centred on the current day. In all the approaches, results are ascertained by generating 100 ensembles or realisations of the rainfall, based on which statistical performance measures are estimated. The comparison of results is based on the reproduction of various statistics of interest, representing a wide range of spatial, temporal and combined spatiotemporal dependence characteristics of rainfall including those of importance to water resources planning and management. The graphical comparison of different models is performed on the basis of: (a) spatial dependence statistics, namely, log-odds ratio – a measure of the spatial correlation in the daily rainfall occurrence, and cross correlations – a measure of the spatial correlation in the daily rainfall amounts and aggregated wet days and rainfall amounts in a month and year; (b) average number of wet days and rainfall amounts in a month and year and their variability – a measure of the frequency at which wet days and amounts are simulated and their distribution within and across a year; (c) distributional attributes of wet days and rainfall amount in a year – again a measure of the year-to-year variability of rainfall occurrence and amounts; (d) spell length rainfall characteristics i.e. wet and dry spells – a measure of day-to-day dependence in the rainfall series; (e) extreme rainfall characteristics i.e. maximum wet and dry spells and daily maximum rainfall amount in excess of a threshold – a measure of hydrologic extremes i.e. floods and droughts and; (f) combined spatio-temporal statistics i.e. weather states and probabilities of area averaged series – measure of spatial as well as temporal rainfall distribution behaviour over the study region. 3.1. Spatial dependence

Fig. 1. Map of the study area showing the locations of raingauge stations.

Fig. 2 presents observed and modelled log-odds ratios (for daily rainfall occurrences) and cross correlations (for monthly and annual wet days) for all station pairs and models. A log-odds ratio is expressed as lri;j ¼ log p00i;j p11i;j =p10i;j p01i;j , where lr i;j is the log-odds ratio between i and j pair of stations; p11i;j ; p00i;j ; p10i;j and p01i;j are the joint probabilities of rain at both stations, no-rain at either of the station, rain at station i and no-rain at station j, and no-rain at station i and rain at station j, respectively. A higher value indicates better-defined spatial dependence. In these plots observed statistics are plotted on the horizontal axes while corresponding statistics computed from the simulated series are plotted on the vertical axes. For daily and yearly plots, points are shown for each station pair, while for monthly plots these are shown for each month and station pair. Considering spatial dependence of daily rainfall occurrences, as the independent model simulates rainfall occurrences at individual station in isolation, the

953

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962

6

0

4

2

4

6

0

2

Observed

0

0.25 0 -0.25

-0.25

0.25 0.5 0.75

4

6

-0.5 -0.5 -0.25

1

1

0.5 0.25 0

0

0.25 0.5 0.75

-0.5 -0.5 -0.25

1

1

0

1

-0.5 -0.5 -0.25

1

0.5 0.25

0

Observed

0.25

0.5

0.75

1

0.25 0.5 0.75

1

1 Annual 0.75

0.5 0.25 0

-0.25 -0.25

0

Observed

Annual

0

0.75

0.25 0.5 0.75

0.75 Simulated

Simulated

0.25

0

1

Annual 0.75

0.5

0

Observed

1 Annual

0.5 0.25

-0.25

Observed

0.75

Monthly 0.75

-0.25

Observed

Independent

2 Observed

Monthly

Simulated

0.25

0.5

0

0.75

0.5

Simulated

Simulated

0.5

0.25

6

1 Monthly

0.75

0

4

Simulated

Monthly

0

2 Observed

1

1

-0.5 -0.5 -0.25

2

0

0

6

4

Observed

0.75

Simulated

4

Simulated

2

2

0

0 0

4

Simulated

Simulated

Simulated

Simulated

2

Daily

Daily

Daily

4

-0.25 -0.25

6

6

6 Daily

0.5 0.25 0

-0.25 -0.25

0

0.25

0.5

Observed

0.75

1

-0.25 -0.25

0

0.25

0.5

0.75

1

Observed

Observed

MMM

Reordering

KNN

Fig. 2. Scatter plots of observed and model simulated daily log-odds ratios and cross correlations of monthly and annual wet days. For daily and annual plots points are shown for each station pair while for monthly plots these are shown for each station pair and month.

generated sequences from the model fail to preserve this characteristic. Use of spatially correlated random number in the MMM helps reproducing the spatial dependence in the generated sequences quite successfully. Similarly, by virtue of assigning equal ranked observations at all stations, the reordering method is able to maintain the ranked spatial dependence, albeit some bias for highly correlated stations. As the KNN approach generates rainfall occurrences concurrently at all stations, the dependence between the stations is automatically preserved by the model. At higher time scales of month and year, the simulated series from KNN and MMM overestimate the observed correlations. The ranking procedure of the reordering model is able to maintain the observed monthly and annual spatial correlations in the simulated series with some success. Similar to Figs. 2 and 3 presents the cross correlations of rainfall amounts at daily, monthly and annual time scales. As can be seen, all the models show consistent under estimation of daily cross correlations for highly correlated stations with marked bias by the reordering model. At time scales of month and year all models show large scatter with nominal under estimation for highly correlated stations. 3.2. Mean wet days, rainfall totals and associated variability It is vital that the mean number of wet days and average rainfall totals at the raingauge network be reproduced accurately before using the simulated rainfall series as an input to any water balance modelling exercise. Fig. 4 presents the scatter plots of observed and modelled monthly and annual wet days and rainfall totals at all stations for different models. As expected, all models, including the independent model, provide a good fit to the mean number

of wet days and rainfall totals. It is easy to model the mean monthly and annual rainfall totals by reproducing properly the probability of occurrence of wet days and average per wet day rainfall amount. However, what is more desirable and often difficult to achieve, is the successful reproduction of both inter and intra-annual variability in the rainfall occurrence and amount processes, an effect termed as ‘‘overdispersion” [17]. It may be noted that the monthly and/or annual variance being directly related not only to the rainfall amount variance, probability of a wet day and the average per wet day rainfall amount, but also to the monthly/annual variance of wet days [16,18], is difficult to reproduce properly. It has been observed that the low-order Markov process based models, in general, undersimulate the variance at aggregated time scales of month, season and year [5,38]. However, this overdispersion characteristic is of importance to applications that are sensitive to proper representation of sustained droughts and periods of above average rainfall (or above average wet days) in the generated record. Fig. 5 presents the scatter plots of observed and modelled standard deviations of monthly and annual wet days and rainfall totals. As can be depicted from these plots, at annual time scale, the KNN model underestimates the variability, the reordering model provides a large scatter and the MMM provides a good fit to the variance of aggregated annual wet days (top row). At time scale of a month, all the models are successful in reproducing the variability (second row). Clearly, the use of 365 days area averaged wetness state in the KNN model is not sufficient to explain the variability at individual stations at the annual level. The MMM that considers conditioning on previous 365 days wetness state at individual station is able to recognise this variability quite well. Similarly, the reordering model that follows the calendar dates in the selection of observed sam-

954

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 1

1

1

Daily

0.6

0.4

0.8 Simulated

0.4

Simulated

0.6

Daily

0.8

0.8 Simulated

Simulated

0.8

1

Daily

Daily

0.6

0.4

0.6 0.4

0.2 0.2

0.2

0

0

0

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.8

1

0.4

0

0.5

0

0.5 Observed

-0.5 -0.5

1

0

0.2

0.5

0.6

0.8

1

Monthly

0.5

0

-0.5

1

0.4

Observed

-0.5 0

0

0.5 Observed

0.5

0

-0.5 -0.5

1

0.5 Observed

1

0.75

1

Annual

Annual

Annual

0

1

1

1 Annual

1

1

Observed

1

0.8

Monthly

Simulated

Simulated

0.5

0.6

1 Monthly

0

0.2

Observed

1

Monthly

-0.5 -0.5

0 0

1

Observed

Observed

Simulated

0.6

Simulated

-0.2 -0.2

0.2

0.2

0.75

0.5

0.75

Simulated

0.4

Simulated

0.6

Simulated

Simulated

0.8 0.75

0.5

0.5

0 -0.2 -0.2

0

0.2

0.4

0.6

0.8

1

0.25 0.25

Observed

Independent

0.5 0.75 Observed

1

MMM

0.25 0.25

0.5 0.75 Observed

Reordering

1

0.25 0.25

0.5

Observed

KNN

Fig. 3. Scatter plots of observed and models simulated cross correlations of daily, monthly and annual rainfall amounts. For daily and annual plots points are shown for each station pair while for monthly plots these are shown for each station pair and month.

ples is also able to reproduce this variability with some success. All models are more or less successful in reproducing the observed variances of monthly and annual rainfall totals (third and last rows) in the simulated series with nominal bias for highly correlated stations and large scatter at monthly time scale. Another important rainfall statistic related to the annual variance is the year-to-year (inter-annual) distribution of annual wet days and annual rainfall totals. Fig. 6 presents the probability plots of the distribution of annual wet days and total annual rainfall for a representative station. The vertical axes in these plots show statistics computed from the observed and model simulated series, and the horizontal axes show the exceedance probabilities of occurrence of the events. The first column shows that the driest year on record has only 60 wet days, but the wettest year has approximately 180 wet days. The generated sequences from the independent model do not reproduce these distributions whereas MMM and reordering successfully capture these highs and lows, with KNN being partly successful at reproducing this distribution. The standard deviation of the number of wet days per year is directly related to this distribution, and thus generated sequences from the independent model also under-represent the historical annual-level standard deviation (Fig. 5). The distribution of annual rainfall (right column of Fig. 6) follows the same trends as annual wet days with less variation. 3.3. Extreme rainfall characteristics It has been noted that assuming a low-order Markov dependence, in general, undersimulates long dry spells (runs of consecu-

tive dry days) [5,9,28,30,38]. Fig. 7 presents scatter plots of observed and model simulated highest wet and dry spell lengths in days and, daily rainfall amount >35 mm, at all stations. The threshold of 35 mm is picked up to define adequately the extreme rainfall conditions over the study area. Reordering approach is more successful in reproducing the maximum wet spell length (top row) at all stations while MMM and KNN somewhat overestimate this characteristic at majority of stations. Similarly, generated sequences from the KNN underestimate the maximum dry spell lengths at majority of stations whereas MMM and reordering provide marginally better representation of this characteristic at all stations (middle row), however, the differences amongst the models are not so obvious. Number of days in a year with daily rainfall >35 mm, being a function of rainfall amount model only, is reproduced adequately by all the models at majority of stations (bottom row). 3.4. Combined spatio-temporal rainfall characteristics In this section we examine the performance of models in simulating a few rainfall characteristics exhibiting combined spatiotemporal behaviour of the observed record over the study region. One such statistic, indicative of proper reproduction of spatial as well as temporal dependence is the marginal and conditional probabilities of series of area averaged wetness state of rainfall occurrence and corresponding rainfall amounts at time scale of a day. Mean area averaged rainfall in a class is calculated by summing all daily area averaged rainfall values in that class and dividing it by the number of days falling in that class. For marginal probabil-

955

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 150

150

150

50

12

16

2000

8 Observed

12

16

800

1200

1200

800

1200

100 50

100

50

100 150 Observed

Independent

200

250

1200 1600 2000 Observed

400

50

100 150 Observed

200

250

MMM

1200

1600 2000

Observed

Monthly rainfall totals in mm 200

150 100

150 100 50 0

0 0

800

250

50

0 0

400 800

200

50

0

1200

Monthly rainfall totals in mm

150

16

800

250

Simulated

Simulated

150

12

Annual rainfall totals in mm

1200

Monthly rainfall totals in mm 200

8 Observed

1600

400 400

1600 2000

250 Monthly rainfall totals in mm

4

2000

Observed

200

0

800

Observed 250

16

1600

400 400

1600 2000

8 12 Observed

Annual rainfall totals in mm

800

800

0

4

2000

Simulated

Simulated

1200

8 4

Annual rainfall totals in mm

1600

400 400

8

0

2000 Annual rainfall totals in mm

1600

Wet days in a month

0

4

150

12

4

0

100 Observed

16

Simulated

8 Observed

50

Simulated

8

0

4

150

12

4

0

100 Observed

Wet days in a month

Simulated

Simulated

Simulated

8

100

50

50

16

12

4

Simulated

150

Wet days in a month

12

Simulated

100 Observed

16

Wet days in a month

0

100

50

150

16

Simulated

100

50 100 Observed

Wet days in a year

Simulated

50

Simulated

100

50

150 Wet days in a year

Wet days in a year

Simulated

Simulated

Wet days in a year

0

50

100 150 Observed

Reordering

200

250

0

50

100 150 Observed

200

250

KNN

Fig. 4. Scatter plots of observed and models simulated monthly and annual wet days and rainfall totals. For annual plots points on the graph are shown for each station while for monthly plots these are shown for each station and month.

ities, 10 discrete states with a class interval of 0.1 are formed while for the conditional probabilities 4 discrete states with a class interval of 0.25 are formed. Thus the class 1 corresponds to days when previous and current days area averaged wetness state is less than 0.25. Class two corresponds to days when previous day value of area averaged wetness state is less than 0.25 while current day value is between 0.25 and 0.5. Similarly, the last class belongs to the days when previous and current days area averaged wetness state is >0.75. Fig. 8 presents the probability plots of observed and modelled daily area averaged wetness state and mean area averaged rainfall amount in these classes. In these plots observed statistics are shown as solid circles while simulated values from different models are shown as lines with different notations. The MMM, reordering and KNN successfully reproduce the marginal and conditional probabilities with reordering showing minor bias for lower wetness state classes (top left plot of Fig. 8). However, the mean area averaged rainfall amount in a class is better reproduced by reordering (plots on the right column). As KNN considers conditioning on area averaged wetness state of previous day as well as

365 days, it is structured to reproduce this statistic appropriately. The reordering tends to match the observed ranks in the generated rainfall amounts and therefore matches the amounts in these classes quite well. KNN and MMM exhibit similar performances for rainfall amounts. It appears that for proper reproduction of temporal dependence of area averaged rainfall in the MMM, modelling of the spatial and at-site lagged temporal correlations might not suffice. Consideration of lagged cross correlations or conditioning on a variable representing short-term area averaged conditions might help in improving these results. Table 2 presents quite a few observed and model simulated rainfall attributes exhibiting combined spatio-temporal behaviour of rainfall. Note that the choice of different thresholds used in the table to define extreme wet and dry conditions over the region is purely subjective and is adopted solely for models evaluation purpose. The first and second rows of the table present, respectively, the number of days in a year when 85% of stations in the study area receive rainfall and the average number of days in a year when area averaged rainfall is >25 mm. MMM and KNN reproduce these attributes better than

956

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 40

40

40

30

20

SD of annual wet days

30

Simulated

20

SD of annual wet days Simulated

Simulated

30 Simulated

40

SD of annual wet days

SD of annual wet days

20

30

20

10 0

10 30

40

10 10

1

600

SD of annual rainfall totals in mm

800

3 Observed

0 200

400 600 Observed

800

200

400 600 Observed

SD of monthly rainfall totals in mm

50

100

0 50

100 150 Observed

Independent

200

400 600 Observed

800

0

0

50

100 150 Observed

200

MMM

200

400 600 Observed

800

200 SD of monthly rainfall totals in mm

SD of monthly rainfall totals in mm

150

100

100 50

0

0 0

200

50

50

400

0

150 Simulated

100

SD of annual rainfall totals in mm

600

200

150 Simulated

150

5

200

0

SD of monthly rainfall totals in mm

3 Observed

800 SD of annual rainfall totals in mm

400

800

200

200

1

5

0

0

3

1

3 Observed

200

0 0

3

600

200

40

SD of monthly wet days

800

400

20 30 Observed

5

1

Simulated

Simulated

200

5

SD of annual rainfall totals in mm

600

400

10

1 1

800

40

SD of monthly wet days

3

5

3 Observed

20 30 Observed

5 Simulated

Simulated

Simulated

3

1

10 10

5

1

Simulated

40

SD of monthly wet days

SD of monthly wet days 5

Simulated

20 30 Observed

Simulated

20 Observed

Simulated

10

Simulated

0

0 0

50

100 150 Observed

Reordering

200

0

50

100 150 Observed

200

KNN

Fig. 5. Scatter plots of observed and model simulated standard deviations (SD) of monthly and annual wet days and rainfall totals. For annual plots points on the graph are shown for each station while for monthly plots these are shown for each station and month.

reordering. Similarly, the next three rows depict the annual frequency of occurrence of instances when area averaged wetness state is >0.60 and this condition has persisted for few days (3–4, 5–6 and more that 6 days) in succession (defines area averaged wet spells, in a sense). Corresponding rainfall amount in these states is provided in the subsequent three rows. Finally, the last three rows provide average annual occurrence of consecutive number of days (5–8, 9–14 and more than 14 days) with area averaged wetness state less than 0.20 (defines area averaged dry spells, in a sense). These results indicate that all the models reproduce these rainfall attributes with varying successes. The spatially averaged wet spells are reproduced better by MMM and KNN while spatially averaged wet spell rainfall amount and dry spells are captured better by reordering. Another important statistic indicative of the spatio-temporal rainfall behaviour is the frequency of occurrence of defined discrete weather states as well as the spatial distribution of rainfall in these weather states. These weather states correspond to the dominant precipitation patterns over the study area and thus pro-

vide a means of analysing and comparing the rainfall distribution patterns in the observed and simulated records. These discrete weather states can be formed from observed rainfall distribution based on objective and/or automated weather classification procedures [34]. In the present study, we follow the Mehrotra and Sharma [22] approach for formulation of weather on a day and classify each day’s weather into a few discrete categories. They defined weather on a day on the basis of spatial rainfall occurrence distribution and expressed it in terms of an area averaged wetness state and its location from a fixed origin. Following this, six discrete weather states are formed namely; (a) when majority of stations are dry, (b) when most of them are wet, and (c, d, e and f) the north, south, east and west parts of study area are receiving rainfall. Each day of the observed and simulated record is categorised using this discrete weather state classification. The first column of Fig. 9 presents the bar plots of observed and simulated occurrences of these weather states whereas second column presents mean area averaged rainfall amount in these weather states by all the models. As can be seen from the plots, all models, except independent,

957

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962

200

4000 Annual wet days

2500

120 80

model

1000 500

0

0 1

11

21

31 41 51 61 71 Percent of time exceeded

81

91

100

1

Annual wet days

mm

Days

80

100

Observed 5 and 95% limits Median

2000

1000 500

0

0 1

11

21

31 41 51 61 71 Percent of time exceeded

81

91

100

1

11

21

31 41 51 61 71 Percent of time exceeded

81

91

100

4000

200 Annual wet days

Annual rainfall totals

3500

160

Observed 5 and 95% limits Median

3000 2500

120

mm

Days

91

1500 Observed 5 and 95% limits Median

40

80

2000 1500

Observed 5 and 95% limits Median

40

1000 500

0

0 1

11

21

31 41 51 61 71 Percent of time exceeded

81

91

100

1

11

21

31 41 51 61 71 Percent of time exceeded

81

91

100

4000

200 Annual wet days

Annual rainfall totals

3500

160

Observed 5 and 95% limits Median

3000 2500

120 mm

Days

81

Annual rainfall totals

2500

120

Model

model

31 41 51 61 71 Percent of time exceeded

3000

Modified

KNN

21

3500

160

model

11

4000

200

Reordering

2000 1500

Observed 5 and 95% limits Median

40

Markov

Observed 5 and 95% limits Median

3000 mm

Days

Independent

Annual rainfall totals

3500

160

80

2000 1500

Observed 5 and 95% limits Median

40

1000 500

0

0 1

11

21

31 41 51 61 71 Percent of time exceeded

81

91

100

1

11

21

31 41 51 61 71 Percent of time exceeded

81

91

100

Fig. 6. Distribution plots of observed and model simulated annual wet days and rainfall amounts at a representative station Sydney. Historical observations are shown as circles while 5 and 95 percentile simulated values are shown as dotted lines with simulated median values as solid lines.

are successful in reproducing adequately, the average occurrence of these weather states and mean rainfall amount in each state. Most of the rainfall occurs in second weather state where the majority of stations receive rainfall, which is somewhat better reproduced by the reordering in comparison to KNN and MMM. Similarly, Fig. 10 presents the scatter plots of probability of a wet day in these weather states at all stations computed from the observed and model simulated series. All the models are able to reproduce the observed wet day’s probabilities under a weather state at all stations quite successfully. As KNN simulates rainfall occurrences concurrently at all stations, the spatial patterns and hence wet day probabilities under each weather state are reproduced better than the reordering and MMM. Transition probabilities of going from one weather state (of previous day) to another (of current day) are also reproduced reasonably well by all the

models with reordering and MMM better reproducing the transition of weather states (results not included). Finally, we compare the distribution of area averaged wet days and rainfall amounts at monthly and annual time scale using observed data and results of all models. Distribution of wet days are shown in left column whereas of amounts are shown on the right column of Fig. 11. In these plots variables are shown on vertical axes while exceedance probabilities are shown on the horizontal axes, with circles representing observed and lines models generated values. At time scale of a month KNN, reordering and MMM exhibit similar performances by reproducing the observed distribution quite successfully. However, at annual level, area averaged wet days are slightly over estimated by MMM at lower exceedance probabilities whereas area averaged annual rainfall is slightly underestimated by KNN at lower exceedance probabilities.

958

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 30

30

30

Maximum wet spell length - days

18 12

12

24

120

24

30

6

60

80 60

20

20

20

100

20

120

40

60 80 Observed

100

15

0

10

5

5

10 Observed

15

40

60 80 Observed

100

120

20

0

5

10 Observed

Independent

60 80 Observed

100

120

Number of days in a year with rainfall >35 mm

10

5

10

5

0 0

15

40

15

0

0 0

60

Number of days in a year with rainfall >35 mm Simulated

Simulated

5

80

20

Number of days in a year with rainfall >35 mm

10

Maximum dry spell length - days

15

Number of days in a year with rainfall >35 mm

30

40

20

120

18 24 Observed

100

60 40

12

120

80

40

60 80 Observed

6

30

Maximum dry spell length - days

40

40

18 24 Observed

100 Simulated

80

12

120

Maximum dry spell length - days

Simulated

Simulated

18 Observed

100

15

Simulated

12

120 Maximum dry spell length - days

100

20

6

6 6

30

Simulated

18 Observed

18 12

Simulated

12

18 12

6

6

24 Simulated

Simulated

18

Maximum wet spell length - days

24

24 Simulated

Simulated

24

6

30 Maximum wet spell length - days

Maximum wet spell length - days

5

MMM

10 Observed

15

0

5

10 Observed

Reordering

15

KNN

Fig. 7. Scatter plots of observed and model simulated maximum wet and dry spells and number of days with daily rainfall amount >35 mm. Points on the graph are shown for each station.

0.6

25 KNN Modified Markov Reordering Independent Observed

0.4

Rainfall totals in area averaged wetness state marginal probabilities KNN Modified markov Reordering Independent Observed

20 15 mm

Marginal probabilities

Area averaged wetness state marginal probabilities

10

0.2

5 0

0 0.1

0.2

0.3

0.4

0.5 0.6 0.7 Wetness state

0.8

0.9

1

0.1

0.3

0.4

0.5 0.6 0.7 Wetness state

0.8

0.9

1

25

0.8 Area averaged wetness state conditional probabilities KNN Modified Markov Reordering Independent Observed

0.6

0.4

0.2

Rainfall totals in area averaged wetness state conditional probabilities KNN

20

Modified Markov Reordering Independent Observed

15 mm

Conditional probabilities

0.2

10 5

0

0

1

2

3

4

5

6

7 8 9 10 11 12 13 14 15 16 Probability classes

1

2

3

4

5

6

7 8 9 10 11 12 13 14 15 16 Probability classes

Fig. 8. Marginal and conditional probabilities and average rainfall amounts under discrete area averaged wetness states. For marginal probabilities wetness state classes varies from 0 to 1 with a bin width of 0.1 while for conditional probabilities bin width is 0.25 constituting 4 classes and 16 possible transitions of going from one wetness state (of previous day) to another (of current day) on a day. Thus these 16 classes defines the transition state with previous and current days area averaged rainfall state being less than 0.25, between 0.25 and 0.5, between 0.5 and 0.75 and >0.75.

4. Summary and conclusions This paper has presented an elaborate assessment of three multi-site models of daily rainfall, which are based on different rationales for reproducing the observed spatial and temporal dependence in the generated rainfall. The study region exhibits

substantial topographic and spatio-temporal rainfall variations, and thus provides a challenging setting to evaluate the performance any stochastic weather generator. These schemes do not pretend to model directly the physical processes underlying the spatial and higher order temporal distributions of weather variations, but rather are based on relatively simple stochastic simula-

959

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 Table 2 Some important observed and models simulated spatially area averaged rainfall attributes. Rainfall attribute

Observed

MMM

Reordering

KNN

IND

No. of days in a year with area averaged wetness state >0.85 No. of days in a year with area averaged rainfall >25 mm No. of times in a year with area average wetness state >0.60 (consecutively for 3–4 days) No. of times in a year with area average wetness state >0.60 (consecutively for 5–6 days) No. of times in a year with area average wetness state >0.60 (consecutively for 7 days or more) Average rainfall amount with area average wetness state >0.60 (consecutively for 3–4 days) Average rainfall amount with area average wetness state >0.60 (consecutively for 5–6 days) Average rainfall amount with area averaged wetness state >0.60 (consecutively for 7 days or more) No. of times in a year with area average wetness state <0.20 (consecutively for 5–7 days) No. of times in a year with area average wetness state <0.20 (consecutively for 8–15 days) No. of times in a year with area average wetness state <0.20 (consecutively for 15 days or more)

42.77 6.51 7.42 3.40 1.12 36.06 73.61 132.55 10.26 3.21 0.70

42.11 6.71 7.61 3.21 0.98 35.45 64.45 106.74 9.92 3.17 0.61

37.76 7.10 6.84 2.84 0.82 39.77 76.05 123.59 10.16 3.12 0.71

42.43 6.30 7.25 3.37 1.15 34.63 62.95 109.76 9.35 3.64 1.14

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

100 Average occurrence of weather states Observed Simulated

40

20

Average rainfall in a weather state Rainfall in mm

Independent model

Percent occurrence

60

1

2

3 4 Weather state

5

25

1

100 Observed

Simulated

40

20

0 2

3 4 Weather state

5

75

6

25

1

2

3 4 Weather state

5

6

100 Average rainfall in a weather state Observed Simulated

Observed

Simulated

40

20

Rainfall in mm

Average occurrence of weather states Percent occurrence

5

50

6

60

75 50 25 0

0 1

2

3 4 Weather state

5

1

6

100

60 Observed

Simulated

40

20

2

3 4 Weather state

5

6

Average rainfall in a weather state

Average occurrence of weather states Rainfall in mm

Percent occurrence

3 4 Weather state

0 1

KNN model

2

Average rainfall in a weather state Observed Simulated

Average occurrence of weather states

Rainfall in mm

Percent occurrence

50

6

60

Reordering model

Simulated

0

0

Modified Markov Model

Observed

75

Observed

75

Simulated

50 25 0

0 1

2

3 4 Weather state

5

6

1

2

3 4 Weather state

5

6

Fig. 9. Percent occurrence of weather states and average rainfall in these weather states for observed and models simulated series. These weather states formed on the basis of rainfall distribution over the study region and are defined as (1) majority of stations are dry, (2) most are wet, and the (3) north, (4) south, (5) east and (6) west parts of study area are receiving rainfall.

tion of weather derived from the behaviour of the observed rainfall. The performance of these approaches is evaluated using a wide range of rainfall occurrence and amount attributes depicting at site temporal, across the site spatial and combined spatio-temporal rainfall attributes at different time scales. Table 3 presents an over-

all broad assessment of the relative performances of these approaches in reproducing a particular observed rainfall attribute considered in the paper. This assessment indicates that for majority of statistics, MMM and reordering perform better than KNN. It may be noted that the ranking provided in this table is by no means reflects the superiority of an approach over other ap-

960

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 1

1 Modified Markov model

0.8

0.8

0.6

0.6

Simulated

Simulated

Independent model

0.4

0.2

0.4

0.2 WS1 WS4

0 0

0.2

WS2 WS5

0.4 0.6 Observed

WS1 WS4

WS3 WS6

0.8

1

0

WS3 WS6

0.2

0.4 0.6 Observed

0.8

1

1

1

KNN model

Reordering model 0.8

0.8

0.6

0.6

Simulated

Simulated

WS2 WS5

0

0.4

0.4

0.2

0.2 WS1 WS4

0 0

0.2

WS2 WS5

0.4 0.6 Observed

WS1 WS4

WS3 WS6

0.8

WS2 WS5

WS3 WS6

0 0

1

0.2

0.4 0.6 Observed

0.8

1

Fig. 10. Scatter plots of observed and models simulated rain probabilities at stations in six discrete weather states. WS indicates a weather state as defined in the previous figure. Points on the plots are shown for each station under all weather states.

500

25

Distribution of area averaged monthly rainfall

Distribution of area averaged monthly wet days 400

KNN Modified Markov Reordering Independent Observed

15 10

Rainfall in mm

Wet days

20

5

KNN Modified Markov Reordering Independent Observed

300 200 100

0

0

1

20

40 60 Exceedence probabilities

80

100

1

20

80

100

2000

200

Distribution of area averaged annual rainfall

Distribution of area averaged annual wet days KNN Modified Markov Reordering Independent Observed

150

100

KNN Modified Markov Reordering Independent Observed

1600 Rainfall in mm

Wet days

40 60 Exceedence probabilities

1200

800

400

50 1

11

21

31 41 51 61 71 Exceedence probabilities

81

91

100

1

11

21

31

41

51

61

71

81

91

100

Exceedence probabilities

Fig. 11. Distribution of area averaged wet days and rainfall amounts at monthly and annual time scales for all models.

proaches as for many rainfall attributes two or more approaches are found to provide almost similar results (see other plots and tables for details). Representing spatial dependence has always been a topic of concern among the researchers when it comes to stochastic simulation of rainfall. The three methods presented here address this concern following entirely different approaches. While KNN res-

amples observed spatial dependence, reordering maintains the observed rank dependence in the simulated series. The MMM maintains the observed spatial dependence by making use of random numbers that are spatially correlated in such a manner so as to reproduce the observed spatial structure in the simulated series. In general, all three methods provide a simple yet effective logic of reproducing this characteristic at a network of stations.

961

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 Table 3 Rainfall attributes and relative rankings of approaches used. Rainfall attribute

MMM

Reordering

KNN

Log-odds ratio of daily rainfall occurrences Cross correlations of monthly wet days Cross correlations of annual wet days Cross correlations of daily rainfall Cross correlations of monthly rainfall Cross correlations of annual rainfall Mean annual wet days Mean monthly wet days Mean annual rainfall Mean monthly rainfall Standard deviation of annual wet days Standard deviation of monthly wet days Standard deviation of annual rainfall Standard deviation of monthly rainfall Distribution of annual wet days at a station Distribution of annual rainfall at a station Maximum wet spell Maximum dry spell No. of days with daily rainfall >35 mm Area averaged wetness state – marginal probabilities Area averaged wetness state – rainfall amount in a probability class Area averaged conditional probabilities Rainfall amount in area averaged conditional probability class Average occurrence of a weather state Average rainfall amount under each weather state Probability of a wet day under each weather state Distribution of area averaged monthly wet days Distribution of area averaged monthly rainfall Distribution of area averaged annual wet days Distribution of area averaged annual rainfall No. of days in a year with area averaged wetness state >0.85 No. of days in a year with area averaged rainfall >25 mm No. of times in a year with area average wetness state >0.60 (consecutively for 3–4 days) No. of times in a year with area average wetness state >0.60 (consecutively for 5–6 days) No. of times in a year with area average wetness state >0.60 (consecutively for 7 days or more) Average rainfall amount with area average wetness state >0.60 (consecutively for 3–4 days) Average rainfall amount with area average wetness state >0.60 (consecutively for 5–6 days) Average rainfall amount with area averaged wetness state >0.60 (consecutively for 7 days or more) No. of times in a year with area average wetness state <0.20 (consecutively for 5–7 days) No. of times in a year with area average wetness state <0.20 (consecutively for 8–15 days) No. of times in a year with area average wetness state <0.20 (consecutively for 15 days or more)

2 2 2 2 2 1 2 2 2 2 1 1 2 3 2 1 2 2 2 1 3 1 1 1 3 3 1 1 3 1 2 1 2 2 2 1 2 3 2 1 2

3 1 1 3 1 3 1 1 1 1 2 2 1 1 1 2 1 1 1 3 1 3 3 3 1 2 2 3 1 2 3 3 3 3 3 3 1 1 1 2 1

1 3 3 1 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 2 2 2 2 2 2 1 3 2 2 3 1 2 1 1 1 2 3 2 3 3 3

We found that the models with low-order Markovian dependence such as the independent model and to some extent KNN (as it considers area averaged values as conditioning variables) are partially successful in producing extended dry spells at individual stations with proper frequency. This is in agreement with the findings of other researchers (e.g. [5,28,20]). The use of higher time scale wetness state (MMM) or following the dependence of the observed record (reordering) facilitates retaining the higher order dependence at individual locations in the generated simulations. The KNN as used here is structured to maintain temporal dependence of area averaged series only, and therefore, when used over large areas, may not reproduce adequately the higher time scale rainfall characteristics at individual stations which are of interest in many applications. The reordering is based on a simple logic of reproducing the ranked statistics at individual stations which in turn helps reproducing the desired spatial and temporal statistics in the shuffled generated sequences. The approach equally holds good for other weather variables. However, the approach has limitations if there are many similar records in the observed data such as the case with rainfall which contains many zeros. The poor simulation of daily spatial correlations is the result of having many zeros at quite a few stations. Additionally, were the approach to use a small moving window and a limited observed record size, it will simulate realisations that have an identical rank structure (in space and time) to what is observed. While the spatio-temporal dependence

statistics will be represented well (perfectly in the rank space), the stochasticity of the simulated sequences will be compromised. For the success of the method, the observed record should be sufficiently long with not many repeated observations. It should be noted that the good simulation of difficult to represent statistics such as long term persistence is because of the observed rank structure being replicated without significant alteration in formulating the ensembles. The approach in the present form is not suited for downscaling or climate change related studies or in applications where observed rainfall spatio-temporal structure is expected to behave in a different manner in the changed climate conditions or where simulation is carried out conditioning on past observations. Other approaches like KNN and MMM can easily be extended for use in such situations. The KNN offers a simple modelling mechanism, however, being a nonparametric model, is expensive in terms of computer time in comparison to reordering and MMM. The reordering alternative is easy to formulate and implement while MMM offers a complex model structure and requires matrix operations. However, once structured, MMM offers a novel mechanism of modelling varying orders of serial dependence at each point location, while still maintaining the observed spatial dependence. It is helpful to note that, as the structure of MMM supports rainfall simulation at individual stations in isolation, the at site model structure can easily be modified to accommodate the station specific needs in rainfall statistics. Finally, we feel that the final choice amongst the models

962

R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962

presented here may be made in large part according to the user convenience or constraints originating from the data and resources availability or simulation context. References [1] Bardossy A, Plate EJ. Space–time model for daily rainfall using atmospheric circulation patterns. Water Resour Res 1992;28:1247–59. [2] Beersma JJ, Buishand TA. Multi-site simulation of daily precipitation and temperature conditional on the atmospheric circulation. Climate Res 2003;25:121–33. [3] Brandsma T, Buishand TA. Simulation of extreme precipitation in the Rhine basin by nearest neighbour resampling. Hydrol Earth Syst Sci 1998;2:195–209. [4] Bras R, Rodriguez-Iturbe I. Rainfall generation: a nonstationary time varying multidimensional model. Water Resour Res 1976;12:450–6. [5] Buishand TA. Some remarks on the use of daily rainfall models. J Hydrol 1978;36:295–308. [6] Buishand TA, Brandsma T. Multi-site simulation of daily precipitation and temperature in the Rhine basin by nearest-neighbor resampling. Water Resour Res 2001;37:2761–76. [7] Clark MP, Gangopadhyay S, Hay LE, Rajagopalan B, Wilby RL. The Schaake shuffle: a method for reconstructing space–time variability in forecasted precipitation and temperature fields. J Hydrometeorol 2004;5:243–62. [8] Clark MP, Gangopadhyay S, Brandon D, Werner K, Hay L, Rajagopalan B, et al. A resampling procedure for generating conditioned daily weather sequences. Water Resour Res 2004;40:W04304. doi:10.1029/2003WR002747. [9] Guttorp P. Stochastic modeling of scientific data. London: Chapman and Hall; 1995 [chapter 2]. [10] Harrold TI, Sharma A, Sheather SJ. A nonparametric model for stochastic generation of daily rainfall occurrence. Water Resour Res 2003;39(12):1343. doi:10.1029/2003WR002570. [11] Harrold TI, Sharma A, Sheather SJ. A nonparametric model for stochastic generation of daily rainfall amounts. Water Resour Res 2003;39(12):1300. doi:10.1029/2003WR002570. [12] Hay LE, McCabe Jr GJ, Wolock DM, Ayres MA. Simulation of precipitation by weather type analysis. Water Resour Res 1991;27:493–501. [13] Hughes JP, Lettenmaier DP, Guttorp P. A stochastic approach for assessing the effect of changes in synoptic circulation patterns on gauge precipitation. Water Resour Res 1993;29:3303–15. [14] Hughes JP, Guttorp P. A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena. Water Resour Res 1994;30:1535–46. [15] Hughes JP, Guttorp P, Charles SP. A nonhomogeneous hidden Markov model for precipitation occurrence. Appl Stat 1999;48(1):15–30. [16] Katz RW, Parlange MB. Effects of an index of atmospheric circulation on stochastic properties of precipitation. Water Resour Res 1993;29:2335–44. [17] Katz RW, Parlange MB. Overdispersion phenomenon in stochastic modeling of precipitation. J Climate 1998;11:591–601. [18] Katz RW, Parlange MB, Tebaldi C. Stochastic modeling of the effects of largescale circulation on daily weather in the southeastern US. Climatic Change 2003;60:189–216. [19] Lall U, Sharma A. A nearest neighbor bootstrap for time series resampling. Water Resour Res 1996;32:679–93.

[20] Lettenmaier D. Stochastic modeling of precipitation with applications to climate model downscaling. In: von Storch H, Navarra A, editors. Analysis of climate variability. Applications of Statistical Techniques. Berlin: SpringerVerlag; 1995. p. 197–212. [21] Mehrotra R, Sharma A, Cordery I. Comparison of two approaches for downscaling synoptic atmospheric patterns to multi-site precipitation occurrence. J Geophys Res 2004;109:D14107. doi:10.1029/2004JD004823. [22] Mehrotra R, Sharma A. A nonparametric nonhomogeneous hidden Markov model for downscaling of multi-site daily rainfall occurrences. J Geophys Res 2005;110:D16108. doi:10.1029/2004JD005677. [23] Mehrotra R, Srikanthan R, Sharma A. A comparison of three stochastic multisite precipitation occurrence generators. J Hydrol 2006;331:280–92. doi:10.1016/j.jhydrol.2006.05.016. [24] Mehrotra R, Sharma A. A nonparametric stochastic downscaling framework for daily rainfall at multiple locations. J Geophys Res 2006;111:D15101. doi:10.1029/2005JD006637. [25] Mehrotra R, Sharma A. A semi-parametric model for stochastic generation of multi-site daily rainfall exhibiting low-frequency variability. J Hydrol 2007;335:180–93. doi:10.1016/j.jhydrol.2006.11.011. [26] Mehrotra R, Sharma A. Preserving low-frequency variability in generated daily rainfall sequences. J Hydrol 2007;345:102–20. doi:10.1016/ j.jhydrol.2007.08.003. [27] Qian B, Corte-Real JA, Xu H. Multi-site stochastic weather models for impact studies. Int J Climatol 2002;22:1377–97. [28] Racsko P, Szeidl L, Semenov M. A serial approach to local stochastic weather models. Ecol Model 1991;57:27–41. [29] Rajagopalan B, Lall U. A k-nearest neighbour simulator for daily precipitation and other weather variables. Water Resour Res 1999;35:3089–101. [30] Semenov MA, Porter JR. Climatic variability and the modelling of crop yields. Agr Forest Meteorol 1995;73:265–83. [31] Sharma A, O’Neill R. A nonparametric approach for representing interannual dependence in monthly streamflow sequences. Water Resour Res 2002;38(7):5.1–5.10. [32] Sharma A. Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3 – a nonparametric probabilistic forecast model. J Hydrol 2000;239:249–58. [33] Stehlik J, Bardossy A. Multivariate stochastic downscaling model for generating daily precipitation series based on atmospheric circulation. J Hydrol 2002;256:120–41. [34] Vrac M, Stein M, Hayhoe K. Statistical downscaling of precipitation through nonhomogeneous stochastic weather typing. Climate Res 2007;34:169–84. [35] Waymire E, Gupta VK, Rodriguez-Iturbe I. A spectral theory of rainfall at the meso-b scale. Water Resour Res 1984;20:1453–65. [36] Wilks DS. Multi-site generalization of a daily stochastic precipitation model. J Hydrol 1998;210:178–91. [37] Wilks DS. Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agr Forest Meteorol 1999;96:85–101. [38] Wilks DS. Multi-site downscaling of daily precipitation with a stochastic weather generator. Climate Res 1999;11:125–36. [39] Wilson LL, Lettenmaier DP, Skyllingstad E. A hierarchical stochastic model of large-scale atmospheric circulation patterns and multiple station daily precipitation. J Geophys Res 1992;D3:2791–809.