Advances in Water Resources 32 (2009) 948–962
Contents lists available at ScienceDirect
Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres
Evaluating spatio-temporal representations in daily rainfall sequences from three stochastic multi-site weather generation approaches R. Mehrotra *, Ashish Sharma The University of New South Wales, Water Research Centre, Kensington, Sydney, NSW 2052, Australia
a r t i c l e
i n f o
Article history: Received 2 December 2008 Received in revised form 2 March 2009 Accepted 18 March 2009 Available online 25 March 2009 Keywords: Spatio-temporal rainfall distribution Sydney Modified Markov model KNN model Kernel density estimation
a b s t r a c t Many hydrological and agricultural studies require simulations of weather variables reflecting observed spatial and temporal dependence at multiple point locations. This paper assesses three multi-site daily rainfall generators for their ability to model different spatio-temporal rainfall attributes over the study area. The approaches considered consist of a multi-site modified Markov model (MMM), a reordering method for reconstructing space–time variability, and a nonparametric k-nearest neighbour (KNN) model. Our results indicate that all the approaches reproduce adequately the observed spatio-temporal pattern of the multi-site daily rainfall. However, different techniques used to signify longer time scale observed temporal and spatial dependences in the simulated sequences, reproduce these characteristics with varying successes. While each approach comes with its own advantages and disadvantages, the MMM has an overall advantage in offering a mechanism for modelling varying orders of serial dependence at each point location, while still maintaining the observed spatial dependence with sufficient accuracy. The reordering method is simple and intuitive and produces good results. However, it is primarily driven by the reshuffling of the simulated values across realisations and therefore may not be suited in applications where data length is limited or in situations where the simulation process is governed by exogenous conditioning variables. For example, in downscaling studies where KNN and MMM can be used with confidence. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Weather generators are commonly used to generate synthetic sequences of rainfall and other weather variables to enhance our understanding of hydrological system response, and to assist in the design and operation of water resource systems. Single-site weather generators are relatively simple to formulate, but cannot be used in situations that require the information of weather variables simultaneously at multiple locations, such as studying the hydrological or agricultural behaviour of a region, discharge of a river, and floods. A number of multi-site parametric stochastic models have been proposed in the literature to address the crucial issue of spatial dependence among weather variables at multiple locations, both with and without conditioning on exogenous atmospheric predictors. These models simulate weather variable sequences simultaneously at multiple locations, usually at a daily time step (e.g. [4,35,12,1,39,13–15]). However, these models are constrained to be used on a small network of stations, as the number of parameters of the model grows almost exponentially with increase in the number of stations.
* Corresponding author. Tel.: +61 2 9385 5140; fax: +61 2 9385 6139. E-mail address:
[email protected] (R. Mehrotra). 0309-1708/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2009.03.005
It is also common, to incorporate synoptic information for generation of weather variables in an attempt to improve the spatial and temporal attributes of the simulated series [14,15,21,33]. This is also found to improve the representation of the low frequency variability in the simulated rainfall [16]. The low frequency variability is important for applications that are sensitive to proper representation of low frequency persistence, or, the representation of sustained droughts and periods of above average rainfall (or above average wet days) in the generated record. To overcome the theoretical as well as computational difficulties of parameter estimation of multi-site stochastic models, Wilks [36] proposed a simple extension of commonly used single-site Markov model for precipitation to multi-sites by driving individual single-site models with temporally independent yet spatially correlated random numbers and successfully simulated daily precipitation while reproducing the observed spatial correlation and preserving the individual behaviours of the local models. Since then, this logic has been successfully applied for simultaneous simulations of rainfall and other weather variables at multiple locations [37,38,27,23,25,26]. Recently, in an attempt to reproduce the spatial and temporal structure of observed multi-site series of weather variables without introducing additional complexities, Clark [7,8] introduced a method to reconstruct the observed spatial (intersite)
949
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962
and temporal correlation statistics in the multi-site generated sequences by reordering of the generated series of weather variables at individual locations. Their approach involves reshuffling across ensembles (realisations) of the simulated observations to match the observed spatial and temporal characteristics in the reshuffled output. In essence, the procedure requires generating multiple realisations (ensembles) individually at each site and reshuffling the simulated series based on the observed information available at these sites. They claimed that the reordering procedure successfully reproduced the number of wet days, rainfall amounts and spell characteristics of the rainfall series at number of stations, without requiring a complex model structure and tedious parameter estimation procedures. Another class of weather generators is based on nonparametric alternatives. These methods offer a different rationale for generation of climate variables and have been used extensively for this purpose in recent years [29,3,6,31,10,11,24]. These methods provide an alternative of developing the temporal and spatial relationship among weather variables without a priori assumptions on the joint probability distribution associated with these variables. For multi-site resampling, since the variables at multiple locations are simulated concurrently, dependence across space is accurately preserved. The k-nearest neighbour bootstrap (KNN) is a technique that conditionally resamples the values from the observed record based on the conditional relationship specified. The lack of any assumptions defining the joint distribution of the weather variables helps ensure an accurate representation of features such as nonlinearity, asymmetry or multimodility in the observed record of the variables being modelled. This paper assesses these three modelling strategies for daily rainfall generation at multiple locations. All the approaches follow a two step rainfall generation procedure, rainfall occurrences are generated in the first step, amounts being modelled subsequently on the days identified as wet by the occurrence models, using a nonparametric kernel density estimation procedure. The approaches considered are (a) a multi-site modified Markov model (MMM) [26]; (b) reordering of ensemble output Clark [7,8]; and (c) a nonparametric k-nearest neighbour multi-site model [6,19]. Rainfall amounts for these models are generated using kernel density estimation (KDE) approach [31,11,24]. These models (also including the rainfall amounts) are hereafter referred to as (a) MMM, (b) reordering and (c) KNN. Results of these models are discussed and evaluated against each other, with the observed data and also with the results obtained using an unconditional (zero-order) Markov model (for rainfall occurrences) and an unconditional KDE model (for rainfall amounts) applied to each site in isolation without any treatment for spatial dependence. This model is hereafter referred to as the independent model. A 43 year long record of daily rainfall at a network of 45 locations near Sydney, Australia, is used to compare the methods.
The paper is organised as follows. The methodological aspects of models used and data are presented in Section 2. Section 3 provides applications of different models and presents their comparison. In Section 4, finally, the results are summarized and conclusions are drawn. 2. Methodology and data used The approaches considered in the study mainly differ in the procedure used to generate the rainfall occurrences, with the general model structure for rainfall amount generation being common to all approaches. Table 1 summarises the spatial and temporal dependence structures of the rainfall occurrence and amounts processes used in these approaches. Following describe in brief about the general structure of the models used in the study. 2.1. Modified Markov model (MMM) We denote rainfall occurrence at a location k and time t as Rt ðkÞ and at the pth time step before the current as Rtp ðkÞ. The MMM as proposed in Mehrotra and Sharma [26] is based on the conditional simulation of Rt ðkÞjZt ðkÞ within the general framework of Markov process. Here, Zt ðkÞ represents a vector of conditioning variables at a location k and at time t that in addition to previous time steps values of rainfall imparting daily or short term persistence, can also include other continuous variables explaining the higher time scale persistence (denoted as Xt ðkÞÞ. Thus, for the specific case of rainfall occurrence generation as considered here, Xt ðkÞ is a vector of aggregated rainfall predictors at site k representing the wetness over the recent past, to convey the slow varying temporal persistence that creates sustained low or high rainfall periods and is expressed as:
Xt ðkÞ 2 fXðkÞj1 ;t ; XðkÞj2 ;t ; . . . ; XðkÞjm ;t g;
XðkÞji ;t ¼
ji 1X RðkÞtl ji l¼1
ð1Þ
where m is the number of such predictors and XðkÞji ;t describes how wet it has been over the preceding ji days with RðkÞtl being the rainfall occurrences at site k. Similarly, an area averaged wetness state over a time period is defined as:
X ji ;t ¼
ji NS X 1 1X Rk;tl NS ji k¼1 l¼1
ð2Þ
where NS is the number of stations and X ji ;t defines the area averaged wetness of rainfall occurrence/amount over the preceding ji days with Rk;tl being the rainfall occurrence/amount at site k. For brevity, site notations are dropped in the subsequent discussions. The parameters (or the transition probabilities) of a Markov model expressing the order one Markovian dependence (first order Markov model) are defined by PðRt jRt1 Þ with Zt consisting
Table 1 Details of temporal and spatial dependences considered in the models used. Model
Process
Dependence modelled Temporal dependence of order
Spatial dependence by
Independent
Occurrence Amount
Zero Zero
None None
MMM
Occurrence Amount
One and annual wetness state One
Spatially correlated random numbers Spatially correlated random numbers
Reordering
Occurrence Amount
Using reordering Using reordering
Reordering Reordering
KNN
Occurrence Amount
Spatially averaged order one and 365 days wetness state One
Simultaneously picking up observations at all stations Spatially correlated random numbers
950
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962
of Rt1 only. Inclusion of additional predictors Xt in the conditioning vector Zt would modify these transition probabilities as PðRt jRt1 ; Xt Þ. The following parameterization is adopted to estimate PðRt jRt1 ; Xt Þ:
PðRt ¼ 1jRt1 ¼ i; Xt Þ ¼ ¼ p1i
this method and other multi-site weather generators is that the space–time variability is not preserved intrinsically, but is reconstructed as a post-processing step. In brief, the procedure can be described as follows.
PðRt ¼ 1; Rt1 ¼ i; Xt Þ f ðXt jRt ¼ 1; Rt1 ¼ iÞ PðRt ¼ 1; Rt1 ¼ iÞ ¼ PðRt1 ¼ i; Xt Þ f ðXt jRt1 ¼ iÞ PðRt1 ¼ iÞ
f ðXt jRt ¼ 1; Rt1 ¼ iÞ ½f ðXt jRt ¼ 1; Rt1 ¼ iÞPðRt ¼ 1jRt1 ¼ iÞ þ ½f ðXt jRt ¼ 0; Rt1 ¼ iÞPðRt ¼ 0jRt1 ¼ iÞ
The first term of (3) defines the transition probabilities PðRt jRt1 Þ of a first order Markov model (representing order one dependence, when Rt1 ¼ i and Rt ¼ 1) while the second term signifies the effect of inclusion of predictor set Xt in the conditioning vector Zt . A closer look of Eq. (3) reveals that the baseline transition probability p1i is re-scaled in proportion to the influence of predictor variables set Xt on the original p1i and p0i thereby keeping the sum of modified transition probabilities p1i and p0i or ð1 p1i Þ again as 1. If Xt consists of derived measures (typically linear combinations) of summation of number of wet days in pre-specified aggregation time periods (as specified by Eq. (1) or (2)), one could approximate the associated probability with a multivariate normal distribution, leading to the following simplification for PðRt jRt1 ; Xt Þ:
PðRt jRt1 ; Xt Þ ¼ p1i
1 detðV1;i Þ1=2
ð3Þ
For a given day, the generated ensemble members ðnÞ are ranked from lowest to highest. A short moving window of prespecified days (say 31 days) is formed centred on the given day. A subset of n observed dates (number of observations being same as number of generated realisations) is randomly selected from the days of the historical record falling within this moving window (dates can be drawn from all years in the historical record except the year for which reordering is being performed). This observed subset is also ranked from lowest to highest separately for each variable and station. For each variable and station, each observation of the ranked observed subset is tagged with the corresponding ranked generated ensemble members, and is re-ordered to its original position with the tagged generated record. The random
n o 0 exp 12 ðXt l1;i ÞV1 1;i ðXt l1;i Þ n o n o 0 0 1 1 1 exp 12 ðXt l1;i ÞV1 1;i ðXt l1;i Þ p1i þ detðV Þ1=2 exp 2 ðXt l0;i ÞV0;i ðXt l0;i Þ ð1 p1i Þ 1 detðV1;i Þ1=2
where the l1;i represents the mean vector EðXt jRt ¼ 1; Rt1 ¼ iÞ and V1;i is the corresponding variance–covariance matrix. Similarly, l0;i and V0;i represent, respectively, the mean vector and the variance– covariance matrix of subset of X when ðRt1 ¼ iÞ and ðRt ¼ 0Þ. The p1i represents the baseline transition probabilities of the first order Markov model defined by PðRt ¼ 1jRt1 ¼ iÞ, and det() represents the determinant operation. The parameters of the MMM for PðRt jRt1 ; Xt Þ in (4) are l0 ; V0 ; l1 ; V1 and p1i which can be estimated directly from the data by forming different conditional sub-sets of the given rainfall series and calculating the conditional probabilities, means, variances and covariances. It may be noted that for some applications with the increase in the number of conditioning variables, the assumption of a multivariate normal may not be sufficient. In such situations, conditional probabilities PðXt jRt ; Rt1 Þ and PðXt jRt1 Þ may be estimated either using more appropriate probability distributions or using nonparametric alternatives (such as kernel density estimation). However, for the current application, the assumption of a multivariate normal is found to be appropriate [26]. 2.2. The reordering method The ensemble (realisation) reordering method was proposed by Clark [7,8]. This method is fairly simple and intuitive, and involves a reordering of ensemble outputs for maintaining the space–time variability in the generated series. The major difference between
ð4Þ
0;i
selections of dates that are used to construct the observed subset are only used for the first day, and are persisted for subsequent days. Following describes in brief the stepwise procedure to be followed in formulating the reordering method. For a given station j and variable k, let ~ X be a vector of n generated observations drawn from the n ensemble members for a given v be a sorted vector of ~ X, that is, day and let ~
~ X ¼ ðx1 ; x2 ; . . . ; xn Þ;
~ v ¼ ðxð1Þ ; xð2Þ ; xð3Þ ; . . . ; xðnÞ Þ;
xð1Þ 6 xð2Þ 6 xðnÞ
ð5Þ
Also, let ~ Y be a vector of n historical observations that are selected randomly from the observed dataset within a moving window of c be the sorted vector of ~ Y, that is, pre-specified days, and let ~
~ Y ¼ ðy1 ; y2 ; . . . ; yn Þ; 6 yð2Þ 6 yðnÞ
~ c ¼ ðyð1Þ ; yð2Þ ; yð3Þ ; . . . ; yðnÞ Þ;
yð1Þ ð6Þ
Additionally, let ~ B be the vector of indices describing the original observation number that corresponds to the values in the ordered c. vector ~ Now the final step involves constructing the reordered vector ~ X o , which is the final reordered output:
~ X o ¼ xo1 ; xo2 ; . . . ; xon
ð7Þ
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962
951
where, xom ¼ xðrÞ ; m ¼ ~ B½r and r ¼ 1; . . . ; n, with the subscripts in the v. parentheses referring to the elements in the sorted vector ~ Further details on the reordering method are available in Clark [7,8]. This approach is intended to preserve both spatial and temporal correlations in the reordered generated series. Considering the spatial dependence at two stations, if these are highly correlated, then the observations at these stations on a given day are likely to have a similar rank. The rank of each simulated realisation at the two stations is matched with the rank of each randomly selected observation, meaning that, for all realisations, the rank will be similar at the two stations. When this process is repeated for all days, the ranks of a given realisation will on average be similar for the two stations, and the spatial correlation will be reconstructed once the randomly selected days are re-sorted to their original order. In order to maintain the temporal persistence at each station, the random selections of dates that are used to construct the observed subset are only used for the first day. At a given station, historical observations following high temporal persistence, for subsequent days, on average, would have a similar rank. The ensemble output is assigned identical ranks to the randomly selected observations, and thus the temporal persistence is reconstructed once the ensemble output is re-sorted.
where m is the number of variables included in the Z vector and sj is the scaling weight for the jth predictor. Scaling of variables imparts an equal weight to each predictor in selection of the nearest neighbours. Scaling is also necessary when the feature vector consists of combination of discrete and continuous variables [19,6]. The reciprocal of the sample standard deviation of each predictor variable is a common choice for the scaling weights [19,29,10,2]. The corresponding k sets of rainfall occurrences, Ri , chosen from the historical rainfall records of all stations are termed as nearest neighbours, and they form an empirical estimate of conditional probability Rt jZt for the day t that is based directly on the characteristics of the historical sample. The generated rainfall occurrence at all stations for day t from these k nearest neighbours is estimated by using (8) and a random number u U(0,1). The day i for which u is closest to pi is picked up as the nearest neighbour of day t and corresponding Ri vector forms the estimated values of rainfall occurrence at all stations. In KNN, the spatial rainfall distribution structure is maintained by resampling simultaneously at all stations. Seasonal variations in the generation process are accommodated using a moving window approach. Further details on the method are available in Rajagopalan and Lall [29], Buishand and Brandsma [6], Beersma and Buishand [2] and Mehrotra et al. [21].
2.3. k-Nearest neighbour (KNN) approach
2.4. The single-site (independent) rainfall occurrence model
In the context of multi-site generation of rainfall occurrences, the KNN approach considers simultaneously sampling with replacement of the rainfall occurrences at multiple locations, from the historical records of rainfall. To preserve lagged correlations, resampling is conditioned on the days in the historical record that have similar characteristics as those of the previously simulated days. The conditional probability Rt jZt is estimated based on the k-nearest neighbours of the conditioning vector Zt . Where Rt represents rainfall occurrence vector at all stations and Zt represents a vector of conditioning variables used to find the similarity. In the approach, the k patterns in the historical series of predictors that are most similar to the current pattern are found, and the k sets of corresponding predictands (in our case, rainfall occurrences at all stations) are specified as likely realisations the system may have. Lall and Sharma [19] suggested that, as a basic guideline, k can be chosen as the square root of the length of the dataset. The Lall and Sharma [19] k-nearest neighbour conditional PDF is expressed as:
The independent model used for comparison generates rainfall occurrence at individual site ignoring the at-site temporal and across-site spatial dependence. Similarly, the ensemble reordering method requires ensembles of generated rainfall occurrence at each site before introducing spatial and temporal dependence by reshuffling. In the present application, the single-site daily rainfall occurrence model used in both these approaches is a zero-order Markov model.
1=i pðiÞ ¼ Pk j¼1 1=j
ð8Þ
where pðiÞ is the probability that the ith nearest neighbour will be resampled, and k is the number of nearest neighbours considered based on Euclidean distances from the conditioning vector Zt . Generation of rainfall occurrences at network of raingauge stations proceeds by defining the vector Zi for every day i of all years, except the ith day of the current year in the seasonal subset of the historical series of rainfall occurrences formed by the moving window centred at the Julian day corresponding to t, which is the current day in the generated sequence. The values used in the calculation of Euclidean distances are made dimensionless through standardization to ensure that each predictor has the same weighting in selection of nearest neighbours. The k most similar patterns of Zt are selected based on the minimum Euclidean distances between Zi and Zt . These Euclidean distances Ei are calculated as:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uX u m Ei ¼ t fsj ðZj;i Zj;t Þg2 j¼1
ð9Þ
2.5. The single-site rainfall amount model A nonzero rainfall amount must be generated for each wet day and location of the generated sequences of all the occurrence models described in previous sub-sections. The model for rainfall amounts presented here is nonparametric, and is based on the kernel density procedure described in Sharma [32], Sharma and O’Neill [31], Harrold et al. [11] and Mehrotra and Sharma [26]. The model is mentioned hereafter as KDE model. For KNN and MMM approaches, it simulates rainfall amount at individual stations conditional on previous day’s rainfall amount at the station using spatially correlated random numbers, thereby imparting temporal and spatial dependences in the generated series. For reordering and independent models, the model generates rainfall amounts at individual stations unconditionally using spatially uncorrelated random numbers. 2.6. Modelling observed spatial dependence in the generated rainfall field The MMM accommodates the spatial correlation structure in the generated multi-site rainfall occurrences using spatially correlated random numbers. Similarly, the KDE model used to generate rainfall amounts for MMM and KNN imparts spatial dependence in the generated rainfall amounts by making use of spatially correlated and serially independent random numbers during the generation of rainfall amounts at individual stations. The general logic of estimating the correlation matrices of random numbers for rainfall occurrence/amounts is available in Wilks [36] and Mehrotra et al. [23].
952
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962
2.7. Data and study area The study region is located around Sydney, eastern Australia spanning between 149°E and 152°E longitude and 32°S–36°S latitude (Fig. 1). The physio-geographical conditions in Sydney region cause large climatic gradients even over short distances, e.g. from lowland areas to mountain regions and from the coast to the inland. Most significant rainfall events in winter (March–August months) in this region involve air masses that have been brought over from the east-coast low-pressure systems. Orographic uplift of these air masses when they strike coastal ranges or the Great Dividing Range often produces very heavy rain. Several of the most severe floods experienced east of the Great Dividing Range have resulted from east-coast low-pressure systems. In summer (September–February months), tropical depressions moving southwards from Queensland into New South Wales bring warm, cloudy and drizzly weather to coastal regions of eastern Australia. These conditions can result in heavy rainfall if some means of lifting is available. The inter-station distances between station pairs vary approximately from 20 to 340 km. Missing values at some stations (<0.5%), are estimated using the records of nearby stations. A day is considered as wet if the rainfall amount is greater than or equal to 0.3 mm. 3. Application of models and result discussion For MMM, following Mehrotra and Sharma [26], an order one Markov dependence with previous 365 days wetness state variable explaining the low frequency variability, are considered as condi-
tioning variables for all stations. The reordering method is applied to the unconditional generated series of rainfall amounts only, observed dependence structure of rainfall occurrences is automatically preserved once amounts are reordered. For KNN, based on the results of a sensitivity analysis to different choices of width of moving window l, a value of l ¼ 31 days is chosen for use in our study. Similarly, an analysis is performed to find out the optimal value of k, the number of nearest neighbours, and consequently a value of k ¼ 10, is adopted in the study. For KNN, the area averaged wetness states (or wetness fraction) of the previous day and preceding 365 days are selected as conditioning variables. Similarly, for reordering and independent rainfall occurrence models, daily unconditional wet day probabilities are estimated from observations falling within a moving window of length 31 days centred on the current day. The relationship between correlations of the series of normally distributed random numbers and corresponding simulated rainfall series at each station pair is ascertained empirically on a daily basis considering the observations falling within a moving window of length 31 days centred on the current day. In all the approaches, results are ascertained by generating 100 ensembles or realisations of the rainfall, based on which statistical performance measures are estimated. The comparison of results is based on the reproduction of various statistics of interest, representing a wide range of spatial, temporal and combined spatiotemporal dependence characteristics of rainfall including those of importance to water resources planning and management. The graphical comparison of different models is performed on the basis of: (a) spatial dependence statistics, namely, log-odds ratio – a measure of the spatial correlation in the daily rainfall occurrence, and cross correlations – a measure of the spatial correlation in the daily rainfall amounts and aggregated wet days and rainfall amounts in a month and year; (b) average number of wet days and rainfall amounts in a month and year and their variability – a measure of the frequency at which wet days and amounts are simulated and their distribution within and across a year; (c) distributional attributes of wet days and rainfall amount in a year – again a measure of the year-to-year variability of rainfall occurrence and amounts; (d) spell length rainfall characteristics i.e. wet and dry spells – a measure of day-to-day dependence in the rainfall series; (e) extreme rainfall characteristics i.e. maximum wet and dry spells and daily maximum rainfall amount in excess of a threshold – a measure of hydrologic extremes i.e. floods and droughts and; (f) combined spatio-temporal statistics i.e. weather states and probabilities of area averaged series – measure of spatial as well as temporal rainfall distribution behaviour over the study region. 3.1. Spatial dependence
Fig. 1. Map of the study area showing the locations of raingauge stations.
Fig. 2 presents observed and modelled log-odds ratios (for daily rainfall occurrences) and cross correlations (for monthly and annual wet days) for all station pairs and models. A log-odds ratio is expressed as lri;j ¼ log p00i;j p11i;j =p10i;j p01i;j , where lr i;j is the log-odds ratio between i and j pair of stations; p11i;j ; p00i;j ; p10i;j and p01i;j are the joint probabilities of rain at both stations, no-rain at either of the station, rain at station i and no-rain at station j, and no-rain at station i and rain at station j, respectively. A higher value indicates better-defined spatial dependence. In these plots observed statistics are plotted on the horizontal axes while corresponding statistics computed from the simulated series are plotted on the vertical axes. For daily and yearly plots, points are shown for each station pair, while for monthly plots these are shown for each month and station pair. Considering spatial dependence of daily rainfall occurrences, as the independent model simulates rainfall occurrences at individual station in isolation, the
953
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962
6
0
4
2
4
6
0
2
Observed
0
0.25 0 -0.25
-0.25
0.25 0.5 0.75
4
6
-0.5 -0.5 -0.25
1
1
0.5 0.25 0
0
0.25 0.5 0.75
-0.5 -0.5 -0.25
1
1
0
1
-0.5 -0.5 -0.25
1
0.5 0.25
0
Observed
0.25
0.5
0.75
1
0.25 0.5 0.75
1
1 Annual 0.75
0.5 0.25 0
-0.25 -0.25
0
Observed
Annual
0
0.75
0.25 0.5 0.75
0.75 Simulated
Simulated
0.25
0
1
Annual 0.75
0.5
0
Observed
1 Annual
0.5 0.25
-0.25
Observed
0.75
Monthly 0.75
-0.25
Observed
Independent
2 Observed
Monthly
Simulated
0.25
0.5
0
0.75
0.5
Simulated
Simulated
0.5
0.25
6
1 Monthly
0.75
0
4
Simulated
Monthly
0
2 Observed
1
1
-0.5 -0.5 -0.25
2
0
0
6
4
Observed
0.75
Simulated
4
Simulated
2
2
0
0 0
4
Simulated
Simulated
Simulated
Simulated
2
Daily
Daily
Daily
4
-0.25 -0.25
6
6
6 Daily
0.5 0.25 0
-0.25 -0.25
0
0.25
0.5
Observed
0.75
1
-0.25 -0.25
0
0.25
0.5
0.75
1
Observed
Observed
MMM
Reordering
KNN
Fig. 2. Scatter plots of observed and model simulated daily log-odds ratios and cross correlations of monthly and annual wet days. For daily and annual plots points are shown for each station pair while for monthly plots these are shown for each station pair and month.
generated sequences from the model fail to preserve this characteristic. Use of spatially correlated random number in the MMM helps reproducing the spatial dependence in the generated sequences quite successfully. Similarly, by virtue of assigning equal ranked observations at all stations, the reordering method is able to maintain the ranked spatial dependence, albeit some bias for highly correlated stations. As the KNN approach generates rainfall occurrences concurrently at all stations, the dependence between the stations is automatically preserved by the model. At higher time scales of month and year, the simulated series from KNN and MMM overestimate the observed correlations. The ranking procedure of the reordering model is able to maintain the observed monthly and annual spatial correlations in the simulated series with some success. Similar to Figs. 2 and 3 presents the cross correlations of rainfall amounts at daily, monthly and annual time scales. As can be seen, all the models show consistent under estimation of daily cross correlations for highly correlated stations with marked bias by the reordering model. At time scales of month and year all models show large scatter with nominal under estimation for highly correlated stations. 3.2. Mean wet days, rainfall totals and associated variability It is vital that the mean number of wet days and average rainfall totals at the raingauge network be reproduced accurately before using the simulated rainfall series as an input to any water balance modelling exercise. Fig. 4 presents the scatter plots of observed and modelled monthly and annual wet days and rainfall totals at all stations for different models. As expected, all models, including the independent model, provide a good fit to the mean number
of wet days and rainfall totals. It is easy to model the mean monthly and annual rainfall totals by reproducing properly the probability of occurrence of wet days and average per wet day rainfall amount. However, what is more desirable and often difficult to achieve, is the successful reproduction of both inter and intra-annual variability in the rainfall occurrence and amount processes, an effect termed as ‘‘overdispersion” [17]. It may be noted that the monthly and/or annual variance being directly related not only to the rainfall amount variance, probability of a wet day and the average per wet day rainfall amount, but also to the monthly/annual variance of wet days [16,18], is difficult to reproduce properly. It has been observed that the low-order Markov process based models, in general, undersimulate the variance at aggregated time scales of month, season and year [5,38]. However, this overdispersion characteristic is of importance to applications that are sensitive to proper representation of sustained droughts and periods of above average rainfall (or above average wet days) in the generated record. Fig. 5 presents the scatter plots of observed and modelled standard deviations of monthly and annual wet days and rainfall totals. As can be depicted from these plots, at annual time scale, the KNN model underestimates the variability, the reordering model provides a large scatter and the MMM provides a good fit to the variance of aggregated annual wet days (top row). At time scale of a month, all the models are successful in reproducing the variability (second row). Clearly, the use of 365 days area averaged wetness state in the KNN model is not sufficient to explain the variability at individual stations at the annual level. The MMM that considers conditioning on previous 365 days wetness state at individual station is able to recognise this variability quite well. Similarly, the reordering model that follows the calendar dates in the selection of observed sam-
954
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 1
1
1
Daily
0.6
0.4
0.8 Simulated
0.4
Simulated
0.6
Daily
0.8
0.8 Simulated
Simulated
0.8
1
Daily
Daily
0.6
0.4
0.6 0.4
0.2 0.2
0.2
0
0
0
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.8
1
0.4
0
0.5
0
0.5 Observed
-0.5 -0.5
1
0
0.2
0.5
0.6
0.8
1
Monthly
0.5
0
-0.5
1
0.4
Observed
-0.5 0
0
0.5 Observed
0.5
0
-0.5 -0.5
1
0.5 Observed
1
0.75
1
Annual
Annual
Annual
0
1
1
1 Annual
1
1
Observed
1
0.8
Monthly
Simulated
Simulated
0.5
0.6
1 Monthly
0
0.2
Observed
1
Monthly
-0.5 -0.5
0 0
1
Observed
Observed
Simulated
0.6
Simulated
-0.2 -0.2
0.2
0.2
0.75
0.5
0.75
Simulated
0.4
Simulated
0.6
Simulated
Simulated
0.8 0.75
0.5
0.5
0 -0.2 -0.2
0
0.2
0.4
0.6
0.8
1
0.25 0.25
Observed
Independent
0.5 0.75 Observed
1
MMM
0.25 0.25
0.5 0.75 Observed
Reordering
1
0.25 0.25
0.5
Observed
KNN
Fig. 3. Scatter plots of observed and models simulated cross correlations of daily, monthly and annual rainfall amounts. For daily and annual plots points are shown for each station pair while for monthly plots these are shown for each station pair and month.
ples is also able to reproduce this variability with some success. All models are more or less successful in reproducing the observed variances of monthly and annual rainfall totals (third and last rows) in the simulated series with nominal bias for highly correlated stations and large scatter at monthly time scale. Another important rainfall statistic related to the annual variance is the year-to-year (inter-annual) distribution of annual wet days and annual rainfall totals. Fig. 6 presents the probability plots of the distribution of annual wet days and total annual rainfall for a representative station. The vertical axes in these plots show statistics computed from the observed and model simulated series, and the horizontal axes show the exceedance probabilities of occurrence of the events. The first column shows that the driest year on record has only 60 wet days, but the wettest year has approximately 180 wet days. The generated sequences from the independent model do not reproduce these distributions whereas MMM and reordering successfully capture these highs and lows, with KNN being partly successful at reproducing this distribution. The standard deviation of the number of wet days per year is directly related to this distribution, and thus generated sequences from the independent model also under-represent the historical annual-level standard deviation (Fig. 5). The distribution of annual rainfall (right column of Fig. 6) follows the same trends as annual wet days with less variation. 3.3. Extreme rainfall characteristics It has been noted that assuming a low-order Markov dependence, in general, undersimulates long dry spells (runs of consecu-
tive dry days) [5,9,28,30,38]. Fig. 7 presents scatter plots of observed and model simulated highest wet and dry spell lengths in days and, daily rainfall amount >35 mm, at all stations. The threshold of 35 mm is picked up to define adequately the extreme rainfall conditions over the study area. Reordering approach is more successful in reproducing the maximum wet spell length (top row) at all stations while MMM and KNN somewhat overestimate this characteristic at majority of stations. Similarly, generated sequences from the KNN underestimate the maximum dry spell lengths at majority of stations whereas MMM and reordering provide marginally better representation of this characteristic at all stations (middle row), however, the differences amongst the models are not so obvious. Number of days in a year with daily rainfall >35 mm, being a function of rainfall amount model only, is reproduced adequately by all the models at majority of stations (bottom row). 3.4. Combined spatio-temporal rainfall characteristics In this section we examine the performance of models in simulating a few rainfall characteristics exhibiting combined spatiotemporal behaviour of the observed record over the study region. One such statistic, indicative of proper reproduction of spatial as well as temporal dependence is the marginal and conditional probabilities of series of area averaged wetness state of rainfall occurrence and corresponding rainfall amounts at time scale of a day. Mean area averaged rainfall in a class is calculated by summing all daily area averaged rainfall values in that class and dividing it by the number of days falling in that class. For marginal probabil-
955
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 150
150
150
50
12
16
2000
8 Observed
12
16
800
1200
1200
800
1200
100 50
100
50
100 150 Observed
Independent
200
250
1200 1600 2000 Observed
400
50
100 150 Observed
200
250
MMM
1200
1600 2000
Observed
Monthly rainfall totals in mm 200
150 100
150 100 50 0
0 0
800
250
50
0 0
400 800
200
50
0
1200
Monthly rainfall totals in mm
150
16
800
250
Simulated
Simulated
150
12
Annual rainfall totals in mm
1200
Monthly rainfall totals in mm 200
8 Observed
1600
400 400
1600 2000
250 Monthly rainfall totals in mm
4
2000
Observed
200
0
800
Observed 250
16
1600
400 400
1600 2000
8 12 Observed
Annual rainfall totals in mm
800
800
0
4
2000
Simulated
Simulated
1200
8 4
Annual rainfall totals in mm
1600
400 400
8
0
2000 Annual rainfall totals in mm
1600
Wet days in a month
0
4
150
12
4
0
100 Observed
16
Simulated
8 Observed
50
Simulated
8
0
4
150
12
4
0
100 Observed
Wet days in a month
Simulated
Simulated
Simulated
8
100
50
50
16
12
4
Simulated
150
Wet days in a month
12
Simulated
100 Observed
16
Wet days in a month
0
100
50
150
16
Simulated
100
50 100 Observed
Wet days in a year
Simulated
50
Simulated
100
50
150 Wet days in a year
Wet days in a year
Simulated
Simulated
Wet days in a year
0
50
100 150 Observed
Reordering
200
250
0
50
100 150 Observed
200
250
KNN
Fig. 4. Scatter plots of observed and models simulated monthly and annual wet days and rainfall totals. For annual plots points on the graph are shown for each station while for monthly plots these are shown for each station and month.
ities, 10 discrete states with a class interval of 0.1 are formed while for the conditional probabilities 4 discrete states with a class interval of 0.25 are formed. Thus the class 1 corresponds to days when previous and current days area averaged wetness state is less than 0.25. Class two corresponds to days when previous day value of area averaged wetness state is less than 0.25 while current day value is between 0.25 and 0.5. Similarly, the last class belongs to the days when previous and current days area averaged wetness state is >0.75. Fig. 8 presents the probability plots of observed and modelled daily area averaged wetness state and mean area averaged rainfall amount in these classes. In these plots observed statistics are shown as solid circles while simulated values from different models are shown as lines with different notations. The MMM, reordering and KNN successfully reproduce the marginal and conditional probabilities with reordering showing minor bias for lower wetness state classes (top left plot of Fig. 8). However, the mean area averaged rainfall amount in a class is better reproduced by reordering (plots on the right column). As KNN considers conditioning on area averaged wetness state of previous day as well as
365 days, it is structured to reproduce this statistic appropriately. The reordering tends to match the observed ranks in the generated rainfall amounts and therefore matches the amounts in these classes quite well. KNN and MMM exhibit similar performances for rainfall amounts. It appears that for proper reproduction of temporal dependence of area averaged rainfall in the MMM, modelling of the spatial and at-site lagged temporal correlations might not suffice. Consideration of lagged cross correlations or conditioning on a variable representing short-term area averaged conditions might help in improving these results. Table 2 presents quite a few observed and model simulated rainfall attributes exhibiting combined spatio-temporal behaviour of rainfall. Note that the choice of different thresholds used in the table to define extreme wet and dry conditions over the region is purely subjective and is adopted solely for models evaluation purpose. The first and second rows of the table present, respectively, the number of days in a year when 85% of stations in the study area receive rainfall and the average number of days in a year when area averaged rainfall is >25 mm. MMM and KNN reproduce these attributes better than
956
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 40
40
40
30
20
SD of annual wet days
30
Simulated
20
SD of annual wet days Simulated
Simulated
30 Simulated
40
SD of annual wet days
SD of annual wet days
20
30
20
10 0
10 30
40
10 10
1
600
SD of annual rainfall totals in mm
800
3 Observed
0 200
400 600 Observed
800
200
400 600 Observed
SD of monthly rainfall totals in mm
50
100
0 50
100 150 Observed
Independent
200
400 600 Observed
800
0
0
50
100 150 Observed
200
MMM
200
400 600 Observed
800
200 SD of monthly rainfall totals in mm
SD of monthly rainfall totals in mm
150
100
100 50
0
0 0
200
50
50
400
0
150 Simulated
100
SD of annual rainfall totals in mm
600
200
150 Simulated
150
5
200
0
SD of monthly rainfall totals in mm
3 Observed
800 SD of annual rainfall totals in mm
400
800
200
200
1
5
0
0
3
1
3 Observed
200
0 0
3
600
200
40
SD of monthly wet days
800
400
20 30 Observed
5
1
Simulated
Simulated
200
5
SD of annual rainfall totals in mm
600
400
10
1 1
800
40
SD of monthly wet days
3
5
3 Observed
20 30 Observed
5 Simulated
Simulated
Simulated
3
1
10 10
5
1
Simulated
40
SD of monthly wet days
SD of monthly wet days 5
Simulated
20 30 Observed
Simulated
20 Observed
Simulated
10
Simulated
0
0 0
50
100 150 Observed
Reordering
200
0
50
100 150 Observed
200
KNN
Fig. 5. Scatter plots of observed and model simulated standard deviations (SD) of monthly and annual wet days and rainfall totals. For annual plots points on the graph are shown for each station while for monthly plots these are shown for each station and month.
reordering. Similarly, the next three rows depict the annual frequency of occurrence of instances when area averaged wetness state is >0.60 and this condition has persisted for few days (3–4, 5–6 and more that 6 days) in succession (defines area averaged wet spells, in a sense). Corresponding rainfall amount in these states is provided in the subsequent three rows. Finally, the last three rows provide average annual occurrence of consecutive number of days (5–8, 9–14 and more than 14 days) with area averaged wetness state less than 0.20 (defines area averaged dry spells, in a sense). These results indicate that all the models reproduce these rainfall attributes with varying successes. The spatially averaged wet spells are reproduced better by MMM and KNN while spatially averaged wet spell rainfall amount and dry spells are captured better by reordering. Another important statistic indicative of the spatio-temporal rainfall behaviour is the frequency of occurrence of defined discrete weather states as well as the spatial distribution of rainfall in these weather states. These weather states correspond to the dominant precipitation patterns over the study area and thus pro-
vide a means of analysing and comparing the rainfall distribution patterns in the observed and simulated records. These discrete weather states can be formed from observed rainfall distribution based on objective and/or automated weather classification procedures [34]. In the present study, we follow the Mehrotra and Sharma [22] approach for formulation of weather on a day and classify each day’s weather into a few discrete categories. They defined weather on a day on the basis of spatial rainfall occurrence distribution and expressed it in terms of an area averaged wetness state and its location from a fixed origin. Following this, six discrete weather states are formed namely; (a) when majority of stations are dry, (b) when most of them are wet, and (c, d, e and f) the north, south, east and west parts of study area are receiving rainfall. Each day of the observed and simulated record is categorised using this discrete weather state classification. The first column of Fig. 9 presents the bar plots of observed and simulated occurrences of these weather states whereas second column presents mean area averaged rainfall amount in these weather states by all the models. As can be seen from the plots, all models, except independent,
957
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962
200
4000 Annual wet days
2500
120 80
model
1000 500
0
0 1
11
21
31 41 51 61 71 Percent of time exceeded
81
91
100
1
Annual wet days
mm
Days
80
100
Observed 5 and 95% limits Median
2000
1000 500
0
0 1
11
21
31 41 51 61 71 Percent of time exceeded
81
91
100
1
11
21
31 41 51 61 71 Percent of time exceeded
81
91
100
4000
200 Annual wet days
Annual rainfall totals
3500
160
Observed 5 and 95% limits Median
3000 2500
120
mm
Days
91
1500 Observed 5 and 95% limits Median
40
80
2000 1500
Observed 5 and 95% limits Median
40
1000 500
0
0 1
11
21
31 41 51 61 71 Percent of time exceeded
81
91
100
1
11
21
31 41 51 61 71 Percent of time exceeded
81
91
100
4000
200 Annual wet days
Annual rainfall totals
3500
160
Observed 5 and 95% limits Median
3000 2500
120 mm
Days
81
Annual rainfall totals
2500
120
Model
model
31 41 51 61 71 Percent of time exceeded
3000
Modified
KNN
21
3500
160
model
11
4000
200
Reordering
2000 1500
Observed 5 and 95% limits Median
40
Markov
Observed 5 and 95% limits Median
3000 mm
Days
Independent
Annual rainfall totals
3500
160
80
2000 1500
Observed 5 and 95% limits Median
40
1000 500
0
0 1
11
21
31 41 51 61 71 Percent of time exceeded
81
91
100
1
11
21
31 41 51 61 71 Percent of time exceeded
81
91
100
Fig. 6. Distribution plots of observed and model simulated annual wet days and rainfall amounts at a representative station Sydney. Historical observations are shown as circles while 5 and 95 percentile simulated values are shown as dotted lines with simulated median values as solid lines.
are successful in reproducing adequately, the average occurrence of these weather states and mean rainfall amount in each state. Most of the rainfall occurs in second weather state where the majority of stations receive rainfall, which is somewhat better reproduced by the reordering in comparison to KNN and MMM. Similarly, Fig. 10 presents the scatter plots of probability of a wet day in these weather states at all stations computed from the observed and model simulated series. All the models are able to reproduce the observed wet day’s probabilities under a weather state at all stations quite successfully. As KNN simulates rainfall occurrences concurrently at all stations, the spatial patterns and hence wet day probabilities under each weather state are reproduced better than the reordering and MMM. Transition probabilities of going from one weather state (of previous day) to another (of current day) are also reproduced reasonably well by all the
models with reordering and MMM better reproducing the transition of weather states (results not included). Finally, we compare the distribution of area averaged wet days and rainfall amounts at monthly and annual time scale using observed data and results of all models. Distribution of wet days are shown in left column whereas of amounts are shown on the right column of Fig. 11. In these plots variables are shown on vertical axes while exceedance probabilities are shown on the horizontal axes, with circles representing observed and lines models generated values. At time scale of a month KNN, reordering and MMM exhibit similar performances by reproducing the observed distribution quite successfully. However, at annual level, area averaged wet days are slightly over estimated by MMM at lower exceedance probabilities whereas area averaged annual rainfall is slightly underestimated by KNN at lower exceedance probabilities.
958
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 30
30
30
Maximum wet spell length - days
18 12
12
24
120
24
30
6
60
80 60
20
20
20
100
20
120
40
60 80 Observed
100
15
0
10
5
5
10 Observed
15
40
60 80 Observed
100
120
20
0
5
10 Observed
Independent
60 80 Observed
100
120
Number of days in a year with rainfall >35 mm
10
5
10
5
0 0
15
40
15
0
0 0
60
Number of days in a year with rainfall >35 mm Simulated
Simulated
5
80
20
Number of days in a year with rainfall >35 mm
10
Maximum dry spell length - days
15
Number of days in a year with rainfall >35 mm
30
40
20
120
18 24 Observed
100
60 40
12
120
80
40
60 80 Observed
6
30
Maximum dry spell length - days
40
40
18 24 Observed
100 Simulated
80
12
120
Maximum dry spell length - days
Simulated
Simulated
18 Observed
100
15
Simulated
12
120 Maximum dry spell length - days
100
20
6
6 6
30
Simulated
18 Observed
18 12
Simulated
12
18 12
6
6
24 Simulated
Simulated
18
Maximum wet spell length - days
24
24 Simulated
Simulated
24
6
30 Maximum wet spell length - days
Maximum wet spell length - days
5
MMM
10 Observed
15
0
5
10 Observed
Reordering
15
KNN
Fig. 7. Scatter plots of observed and model simulated maximum wet and dry spells and number of days with daily rainfall amount >35 mm. Points on the graph are shown for each station.
0.6
25 KNN Modified Markov Reordering Independent Observed
0.4
Rainfall totals in area averaged wetness state marginal probabilities KNN Modified markov Reordering Independent Observed
20 15 mm
Marginal probabilities
Area averaged wetness state marginal probabilities
10
0.2
5 0
0 0.1
0.2
0.3
0.4
0.5 0.6 0.7 Wetness state
0.8
0.9
1
0.1
0.3
0.4
0.5 0.6 0.7 Wetness state
0.8
0.9
1
25
0.8 Area averaged wetness state conditional probabilities KNN Modified Markov Reordering Independent Observed
0.6
0.4
0.2
Rainfall totals in area averaged wetness state conditional probabilities KNN
20
Modified Markov Reordering Independent Observed
15 mm
Conditional probabilities
0.2
10 5
0
0
1
2
3
4
5
6
7 8 9 10 11 12 13 14 15 16 Probability classes
1
2
3
4
5
6
7 8 9 10 11 12 13 14 15 16 Probability classes
Fig. 8. Marginal and conditional probabilities and average rainfall amounts under discrete area averaged wetness states. For marginal probabilities wetness state classes varies from 0 to 1 with a bin width of 0.1 while for conditional probabilities bin width is 0.25 constituting 4 classes and 16 possible transitions of going from one wetness state (of previous day) to another (of current day) on a day. Thus these 16 classes defines the transition state with previous and current days area averaged rainfall state being less than 0.25, between 0.25 and 0.5, between 0.5 and 0.75 and >0.75.
4. Summary and conclusions This paper has presented an elaborate assessment of three multi-site models of daily rainfall, which are based on different rationales for reproducing the observed spatial and temporal dependence in the generated rainfall. The study region exhibits
substantial topographic and spatio-temporal rainfall variations, and thus provides a challenging setting to evaluate the performance any stochastic weather generator. These schemes do not pretend to model directly the physical processes underlying the spatial and higher order temporal distributions of weather variations, but rather are based on relatively simple stochastic simula-
959
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 Table 2 Some important observed and models simulated spatially area averaged rainfall attributes. Rainfall attribute
Observed
MMM
Reordering
KNN
IND
No. of days in a year with area averaged wetness state >0.85 No. of days in a year with area averaged rainfall >25 mm No. of times in a year with area average wetness state >0.60 (consecutively for 3–4 days) No. of times in a year with area average wetness state >0.60 (consecutively for 5–6 days) No. of times in a year with area average wetness state >0.60 (consecutively for 7 days or more) Average rainfall amount with area average wetness state >0.60 (consecutively for 3–4 days) Average rainfall amount with area average wetness state >0.60 (consecutively for 5–6 days) Average rainfall amount with area averaged wetness state >0.60 (consecutively for 7 days or more) No. of times in a year with area average wetness state <0.20 (consecutively for 5–7 days) No. of times in a year with area average wetness state <0.20 (consecutively for 8–15 days) No. of times in a year with area average wetness state <0.20 (consecutively for 15 days or more)
42.77 6.51 7.42 3.40 1.12 36.06 73.61 132.55 10.26 3.21 0.70
42.11 6.71 7.61 3.21 0.98 35.45 64.45 106.74 9.92 3.17 0.61
37.76 7.10 6.84 2.84 0.82 39.77 76.05 123.59 10.16 3.12 0.71
42.43 6.30 7.25 3.37 1.15 34.63 62.95 109.76 9.35 3.64 1.14
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
100 Average occurrence of weather states Observed Simulated
40
20
Average rainfall in a weather state Rainfall in mm
Independent model
Percent occurrence
60
1
2
3 4 Weather state
5
25
1
100 Observed
Simulated
40
20
0 2
3 4 Weather state
5
75
6
25
1
2
3 4 Weather state
5
6
100 Average rainfall in a weather state Observed Simulated
Observed
Simulated
40
20
Rainfall in mm
Average occurrence of weather states Percent occurrence
5
50
6
60
75 50 25 0
0 1
2
3 4 Weather state
5
1
6
100
60 Observed
Simulated
40
20
2
3 4 Weather state
5
6
Average rainfall in a weather state
Average occurrence of weather states Rainfall in mm
Percent occurrence
3 4 Weather state
0 1
KNN model
2
Average rainfall in a weather state Observed Simulated
Average occurrence of weather states
Rainfall in mm
Percent occurrence
50
6
60
Reordering model
Simulated
0
0
Modified Markov Model
Observed
75
Observed
75
Simulated
50 25 0
0 1
2
3 4 Weather state
5
6
1
2
3 4 Weather state
5
6
Fig. 9. Percent occurrence of weather states and average rainfall in these weather states for observed and models simulated series. These weather states formed on the basis of rainfall distribution over the study region and are defined as (1) majority of stations are dry, (2) most are wet, and the (3) north, (4) south, (5) east and (6) west parts of study area are receiving rainfall.
tion of weather derived from the behaviour of the observed rainfall. The performance of these approaches is evaluated using a wide range of rainfall occurrence and amount attributes depicting at site temporal, across the site spatial and combined spatio-temporal rainfall attributes at different time scales. Table 3 presents an over-
all broad assessment of the relative performances of these approaches in reproducing a particular observed rainfall attribute considered in the paper. This assessment indicates that for majority of statistics, MMM and reordering perform better than KNN. It may be noted that the ranking provided in this table is by no means reflects the superiority of an approach over other ap-
960
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 1
1 Modified Markov model
0.8
0.8
0.6
0.6
Simulated
Simulated
Independent model
0.4
0.2
0.4
0.2 WS1 WS4
0 0
0.2
WS2 WS5
0.4 0.6 Observed
WS1 WS4
WS3 WS6
0.8
1
0
WS3 WS6
0.2
0.4 0.6 Observed
0.8
1
1
1
KNN model
Reordering model 0.8
0.8
0.6
0.6
Simulated
Simulated
WS2 WS5
0
0.4
0.4
0.2
0.2 WS1 WS4
0 0
0.2
WS2 WS5
0.4 0.6 Observed
WS1 WS4
WS3 WS6
0.8
WS2 WS5
WS3 WS6
0 0
1
0.2
0.4 0.6 Observed
0.8
1
Fig. 10. Scatter plots of observed and models simulated rain probabilities at stations in six discrete weather states. WS indicates a weather state as defined in the previous figure. Points on the plots are shown for each station under all weather states.
500
25
Distribution of area averaged monthly rainfall
Distribution of area averaged monthly wet days 400
KNN Modified Markov Reordering Independent Observed
15 10
Rainfall in mm
Wet days
20
5
KNN Modified Markov Reordering Independent Observed
300 200 100
0
0
1
20
40 60 Exceedence probabilities
80
100
1
20
80
100
2000
200
Distribution of area averaged annual rainfall
Distribution of area averaged annual wet days KNN Modified Markov Reordering Independent Observed
150
100
KNN Modified Markov Reordering Independent Observed
1600 Rainfall in mm
Wet days
40 60 Exceedence probabilities
1200
800
400
50 1
11
21
31 41 51 61 71 Exceedence probabilities
81
91
100
1
11
21
31
41
51
61
71
81
91
100
Exceedence probabilities
Fig. 11. Distribution of area averaged wet days and rainfall amounts at monthly and annual time scales for all models.
proaches as for many rainfall attributes two or more approaches are found to provide almost similar results (see other plots and tables for details). Representing spatial dependence has always been a topic of concern among the researchers when it comes to stochastic simulation of rainfall. The three methods presented here address this concern following entirely different approaches. While KNN res-
amples observed spatial dependence, reordering maintains the observed rank dependence in the simulated series. The MMM maintains the observed spatial dependence by making use of random numbers that are spatially correlated in such a manner so as to reproduce the observed spatial structure in the simulated series. In general, all three methods provide a simple yet effective logic of reproducing this characteristic at a network of stations.
961
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962 Table 3 Rainfall attributes and relative rankings of approaches used. Rainfall attribute
MMM
Reordering
KNN
Log-odds ratio of daily rainfall occurrences Cross correlations of monthly wet days Cross correlations of annual wet days Cross correlations of daily rainfall Cross correlations of monthly rainfall Cross correlations of annual rainfall Mean annual wet days Mean monthly wet days Mean annual rainfall Mean monthly rainfall Standard deviation of annual wet days Standard deviation of monthly wet days Standard deviation of annual rainfall Standard deviation of monthly rainfall Distribution of annual wet days at a station Distribution of annual rainfall at a station Maximum wet spell Maximum dry spell No. of days with daily rainfall >35 mm Area averaged wetness state – marginal probabilities Area averaged wetness state – rainfall amount in a probability class Area averaged conditional probabilities Rainfall amount in area averaged conditional probability class Average occurrence of a weather state Average rainfall amount under each weather state Probability of a wet day under each weather state Distribution of area averaged monthly wet days Distribution of area averaged monthly rainfall Distribution of area averaged annual wet days Distribution of area averaged annual rainfall No. of days in a year with area averaged wetness state >0.85 No. of days in a year with area averaged rainfall >25 mm No. of times in a year with area average wetness state >0.60 (consecutively for 3–4 days) No. of times in a year with area average wetness state >0.60 (consecutively for 5–6 days) No. of times in a year with area average wetness state >0.60 (consecutively for 7 days or more) Average rainfall amount with area average wetness state >0.60 (consecutively for 3–4 days) Average rainfall amount with area average wetness state >0.60 (consecutively for 5–6 days) Average rainfall amount with area averaged wetness state >0.60 (consecutively for 7 days or more) No. of times in a year with area average wetness state <0.20 (consecutively for 5–7 days) No. of times in a year with area average wetness state <0.20 (consecutively for 8–15 days) No. of times in a year with area average wetness state <0.20 (consecutively for 15 days or more)
2 2 2 2 2 1 2 2 2 2 1 1 2 3 2 1 2 2 2 1 3 1 1 1 3 3 1 1 3 1 2 1 2 2 2 1 2 3 2 1 2
3 1 1 3 1 3 1 1 1 1 2 2 1 1 1 2 1 1 1 3 1 3 3 3 1 2 2 3 1 2 3 3 3 3 3 3 1 1 1 2 1
1 3 3 1 3 2 3 3 3 3 3 3 3 2 3 3 3 3 3 2 2 2 2 2 2 1 3 2 2 3 1 2 1 1 1 2 3 2 3 3 3
We found that the models with low-order Markovian dependence such as the independent model and to some extent KNN (as it considers area averaged values as conditioning variables) are partially successful in producing extended dry spells at individual stations with proper frequency. This is in agreement with the findings of other researchers (e.g. [5,28,20]). The use of higher time scale wetness state (MMM) or following the dependence of the observed record (reordering) facilitates retaining the higher order dependence at individual locations in the generated simulations. The KNN as used here is structured to maintain temporal dependence of area averaged series only, and therefore, when used over large areas, may not reproduce adequately the higher time scale rainfall characteristics at individual stations which are of interest in many applications. The reordering is based on a simple logic of reproducing the ranked statistics at individual stations which in turn helps reproducing the desired spatial and temporal statistics in the shuffled generated sequences. The approach equally holds good for other weather variables. However, the approach has limitations if there are many similar records in the observed data such as the case with rainfall which contains many zeros. The poor simulation of daily spatial correlations is the result of having many zeros at quite a few stations. Additionally, were the approach to use a small moving window and a limited observed record size, it will simulate realisations that have an identical rank structure (in space and time) to what is observed. While the spatio-temporal dependence
statistics will be represented well (perfectly in the rank space), the stochasticity of the simulated sequences will be compromised. For the success of the method, the observed record should be sufficiently long with not many repeated observations. It should be noted that the good simulation of difficult to represent statistics such as long term persistence is because of the observed rank structure being replicated without significant alteration in formulating the ensembles. The approach in the present form is not suited for downscaling or climate change related studies or in applications where observed rainfall spatio-temporal structure is expected to behave in a different manner in the changed climate conditions or where simulation is carried out conditioning on past observations. Other approaches like KNN and MMM can easily be extended for use in such situations. The KNN offers a simple modelling mechanism, however, being a nonparametric model, is expensive in terms of computer time in comparison to reordering and MMM. The reordering alternative is easy to formulate and implement while MMM offers a complex model structure and requires matrix operations. However, once structured, MMM offers a novel mechanism of modelling varying orders of serial dependence at each point location, while still maintaining the observed spatial dependence. It is helpful to note that, as the structure of MMM supports rainfall simulation at individual stations in isolation, the at site model structure can easily be modified to accommodate the station specific needs in rainfall statistics. Finally, we feel that the final choice amongst the models
962
R. Mehrotra, A. Sharma / Advances in Water Resources 32 (2009) 948–962
presented here may be made in large part according to the user convenience or constraints originating from the data and resources availability or simulation context. References [1] Bardossy A, Plate EJ. Space–time model for daily rainfall using atmospheric circulation patterns. Water Resour Res 1992;28:1247–59. [2] Beersma JJ, Buishand TA. Multi-site simulation of daily precipitation and temperature conditional on the atmospheric circulation. Climate Res 2003;25:121–33. [3] Brandsma T, Buishand TA. Simulation of extreme precipitation in the Rhine basin by nearest neighbour resampling. Hydrol Earth Syst Sci 1998;2:195–209. [4] Bras R, Rodriguez-Iturbe I. Rainfall generation: a nonstationary time varying multidimensional model. Water Resour Res 1976;12:450–6. [5] Buishand TA. Some remarks on the use of daily rainfall models. J Hydrol 1978;36:295–308. [6] Buishand TA, Brandsma T. Multi-site simulation of daily precipitation and temperature in the Rhine basin by nearest-neighbor resampling. Water Resour Res 2001;37:2761–76. [7] Clark MP, Gangopadhyay S, Hay LE, Rajagopalan B, Wilby RL. The Schaake shuffle: a method for reconstructing space–time variability in forecasted precipitation and temperature fields. J Hydrometeorol 2004;5:243–62. [8] Clark MP, Gangopadhyay S, Brandon D, Werner K, Hay L, Rajagopalan B, et al. A resampling procedure for generating conditioned daily weather sequences. Water Resour Res 2004;40:W04304. doi:10.1029/2003WR002747. [9] Guttorp P. Stochastic modeling of scientific data. London: Chapman and Hall; 1995 [chapter 2]. [10] Harrold TI, Sharma A, Sheather SJ. A nonparametric model for stochastic generation of daily rainfall occurrence. Water Resour Res 2003;39(12):1343. doi:10.1029/2003WR002570. [11] Harrold TI, Sharma A, Sheather SJ. A nonparametric model for stochastic generation of daily rainfall amounts. Water Resour Res 2003;39(12):1300. doi:10.1029/2003WR002570. [12] Hay LE, McCabe Jr GJ, Wolock DM, Ayres MA. Simulation of precipitation by weather type analysis. Water Resour Res 1991;27:493–501. [13] Hughes JP, Lettenmaier DP, Guttorp P. A stochastic approach for assessing the effect of changes in synoptic circulation patterns on gauge precipitation. Water Resour Res 1993;29:3303–15. [14] Hughes JP, Guttorp P. A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena. Water Resour Res 1994;30:1535–46. [15] Hughes JP, Guttorp P, Charles SP. A nonhomogeneous hidden Markov model for precipitation occurrence. Appl Stat 1999;48(1):15–30. [16] Katz RW, Parlange MB. Effects of an index of atmospheric circulation on stochastic properties of precipitation. Water Resour Res 1993;29:2335–44. [17] Katz RW, Parlange MB. Overdispersion phenomenon in stochastic modeling of precipitation. J Climate 1998;11:591–601. [18] Katz RW, Parlange MB, Tebaldi C. Stochastic modeling of the effects of largescale circulation on daily weather in the southeastern US. Climatic Change 2003;60:189–216. [19] Lall U, Sharma A. A nearest neighbor bootstrap for time series resampling. Water Resour Res 1996;32:679–93.
[20] Lettenmaier D. Stochastic modeling of precipitation with applications to climate model downscaling. In: von Storch H, Navarra A, editors. Analysis of climate variability. Applications of Statistical Techniques. Berlin: SpringerVerlag; 1995. p. 197–212. [21] Mehrotra R, Sharma A, Cordery I. Comparison of two approaches for downscaling synoptic atmospheric patterns to multi-site precipitation occurrence. J Geophys Res 2004;109:D14107. doi:10.1029/2004JD004823. [22] Mehrotra R, Sharma A. A nonparametric nonhomogeneous hidden Markov model for downscaling of multi-site daily rainfall occurrences. J Geophys Res 2005;110:D16108. doi:10.1029/2004JD005677. [23] Mehrotra R, Srikanthan R, Sharma A. A comparison of three stochastic multisite precipitation occurrence generators. J Hydrol 2006;331:280–92. doi:10.1016/j.jhydrol.2006.05.016. [24] Mehrotra R, Sharma A. A nonparametric stochastic downscaling framework for daily rainfall at multiple locations. J Geophys Res 2006;111:D15101. doi:10.1029/2005JD006637. [25] Mehrotra R, Sharma A. A semi-parametric model for stochastic generation of multi-site daily rainfall exhibiting low-frequency variability. J Hydrol 2007;335:180–93. doi:10.1016/j.jhydrol.2006.11.011. [26] Mehrotra R, Sharma A. Preserving low-frequency variability in generated daily rainfall sequences. J Hydrol 2007;345:102–20. doi:10.1016/ j.jhydrol.2007.08.003. [27] Qian B, Corte-Real JA, Xu H. Multi-site stochastic weather models for impact studies. Int J Climatol 2002;22:1377–97. [28] Racsko P, Szeidl L, Semenov M. A serial approach to local stochastic weather models. Ecol Model 1991;57:27–41. [29] Rajagopalan B, Lall U. A k-nearest neighbour simulator for daily precipitation and other weather variables. Water Resour Res 1999;35:3089–101. [30] Semenov MA, Porter JR. Climatic variability and the modelling of crop yields. Agr Forest Meteorol 1995;73:265–83. [31] Sharma A, O’Neill R. A nonparametric approach for representing interannual dependence in monthly streamflow sequences. Water Resour Res 2002;38(7):5.1–5.10. [32] Sharma A. Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3 – a nonparametric probabilistic forecast model. J Hydrol 2000;239:249–58. [33] Stehlik J, Bardossy A. Multivariate stochastic downscaling model for generating daily precipitation series based on atmospheric circulation. J Hydrol 2002;256:120–41. [34] Vrac M, Stein M, Hayhoe K. Statistical downscaling of precipitation through nonhomogeneous stochastic weather typing. Climate Res 2007;34:169–84. [35] Waymire E, Gupta VK, Rodriguez-Iturbe I. A spectral theory of rainfall at the meso-b scale. Water Resour Res 1984;20:1453–65. [36] Wilks DS. Multi-site generalization of a daily stochastic precipitation model. J Hydrol 1998;210:178–91. [37] Wilks DS. Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agr Forest Meteorol 1999;96:85–101. [38] Wilks DS. Multi-site downscaling of daily precipitation with a stochastic weather generator. Climate Res 1999;11:125–36. [39] Wilson LL, Lettenmaier DP, Skyllingstad E. A hierarchical stochastic model of large-scale atmospheric circulation patterns and multiple station daily precipitation. J Geophys Res 1992;D3:2791–809.