Accepted Manuscript
Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency Simon Michael Papalexiou PII: DOI: Reference:
S0309-1708(17)30989-2 10.1016/j.advwatres.2018.02.013 ADWR 3099
To appear in:
Advances in Water Resources
Received date: Revised date: Accepted date:
25 October 2017 14 February 2018 14 February 2018
Please cite this article as: Simon Michael Papalexiou , Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency, Advances in Water Resources (2018), doi: 10.1016/j.advwatres.2018.02.013
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
1
Highlight:
2 3 4 5 6
Stochastic modelling reproducing any marginal distribution and linear correlation Applicable in univariate, cyclostationary and multivariate cases Precise modelling of precipitation, river discharge, wind, etc. at any time scale Parametric correlation transformation functions unify and simplify the scheme Empirical correlation representation through parsimonious parametric functions
AC
CE
PT
ED
M
AN US
CR IP T
7
1
ACCEPTED MANUSCRIPT
8 9
10
Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency Simon Michael Papalexiou
13
Abstract
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Hydroclimatic processes come in all “shapes and sizes”. They are characterized by different spatiotemporal correlation structures and probability distributions that can be continuous, mixed-type, discrete or even binary. Simulating such processes by reproducing precisely their marginal distribution and linear correlation structure, including features like intermittency, can greatly improve hydrological analysis and design. Traditionally, modelling schemes are case specific and typically attempt to preserve few statistical moments providing inadequate and potentially risky distribution approximations. Here, a single framework is proposed that unifies, extends, and improves a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific “parent” Gaussian process. A novel mathematical representation of this scheme, introducing parametric correlation transformation functions, enables straightforward estimation of the parent-Gaussian process yielding the target process after the marginal back transformation, while it provides a general description that supersedes previous specific parameterizations, offering a simple, fast and efficient simulation procedure for every stationary process at any spatiotemporal scale. This framework, also applicable for cyclostationary and multivariate modelling, is augmented with flexible parametric correlation structures that parsimoniously describe observed correlations. Real-world simulations of various hydroclimatic processes with different correlation structures and marginals, such as precipitation, river discharge, wind speed, humidity, extreme events per year, etc., as well as a multivariate example, highlight the flexibility, advantages, and complete generality of the method.
AN US
M
ED
PT
CE
Keywords: stochastic modeling, weather generator, parent-Gaussian framework, transformations, hydroclimatic processes, precipitation, river discharge, temperature, wind speed, humidity
AC
32 33
CR IP T
12
Department of Civil and Environmental Engineering, University of California, Irvine, CA, USA (
[email protected])
11
2
ACCEPTED MANUSCRIPT
34
Introduction “Make things as simple as possible, but not simpler.” ~Albert Einstein
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
The stochastic structure and the marginal probability laws of hydroclimatic processes like precipitation, river discharge, temperature, wind, or relevant processes with discrete or binary marginal distributions like the number of extremes per year, wet-dry sequences, etc., show complexities that current methodologies are unable to unify in a single modelling framework. These processes differ in their spatiotemporal correlation structures (CS) with stronger structures being observed for example in temperature than in precipitation (e.g., Wilks and Wilby, 1999). Moreover, they have different marginal distributions, e.g., temperature is normally distributed (Breinl et al., 2015; Kilsby et al., 2007) while the intermittent and highly skewed character of precipitation at fine spatiotemporal scales demands the use of mixed-type distributions (Herr and Krzysztofowicz, 2005; Papalexiou and Koutsoyiannis, 2016; Stern and Coe, 1984; Wilks, 1998). Complexities of these processes are further enhanced by seasonality and diurnal variation, as well as a tendency to normality as we shift from fine to coarser scales due to the central limit theorem. These reasons led to a very large number of stochastic modelling schemes (see e.g., Aghakouchak et al., 2010; Bárdossy and Pegram, 2009; Bardossy and Plate, 1992; Buishand, 1978; Cowpertwait et al., 2007; Foufoula-Georgiou and Krajewski, 1995; Foufoula-Georgiou and Lettenmaier, 1987; Grimaldi and Serinaldi, 2006; Hering et al., 2015; Kleiber et al., 2012; Lee and Salas, 2011; Lombardo et al., 2013; Mehrotra and Sharma, 2009; Onof et al., 2000; Onof and Wheater, 1993; Over and Gupta, 1996; Papalexiou et al., 2011; Paschalis et al., 2014; Rodriguez-Iturbe et al., 1987; Salas et al., 2006; Serinaldi, 2009; Serinaldi and Kilsby, 2014a, 2016; Sirangelo et al., 2007; Villarini et al., 2014; Waymire and Gupta, 1981; Wheater et al., 2005; Wilks, 1999; Youngman and Stephenson, 2016) to name just a few.
56 57 58 59
We posit that a general unified approach is however possible by assuming that an arbitrary (target) process * ( )+ with prescribed marginal distribution ( ) and autocorrelation structure (ACS) ( ) has a parent (or else an equivalent) Gaussian process (pGp) * ( )+ with standard Gaussian ( ) and a specific linear ACS ( ). Therefore, modelling and simulating * ( )+ only marginal
60
requires the definition of two functions
61
While
62 63
does not preserve the linear ACS (expressed by the Pearson correlation coefficient) but only rank correlations as is nonlinear and the ACS depends on the marginal distribution ( ) (Embrechts et al.,
65 66 67 68
AN US
M
ED
PT
CE
is easily identified as ( )
AC
64
CR IP T
35
and (
such that ( )
( )
(
( )).
( )), this transformation of the marginal distribution
2002). In particular, for bivariate normally distributed vectors ( , one can show that
( ( )) and
. ( ) ( )/
(
) and an arbitrary transformation )
(Kendall and Stuart, 1979, p.
600). This means that we need inflated values to obtain the target . Therefore, the modelling problem reduces to the definition of a correlation transformation function (CTF) to estimate the parent-Gaussian autocorrelation (pGACS) ( ) from a given ( ). 3
ACCEPTED MANUSCRIPT
This modeling framework was previously considered in several fields to solve specific problems, thus resulting in a sparse literature dealing with simulation of cross-correlated but serially independent realizations from multivariate distributions with specified binary, discrete, and/or continuous marginal distributions (Demirtas, 2014, 2017; Emrich and Piedmonte, 1991; Macke et al., 2009), or independent time series with specified serial correlation and binary, discrete, and/or continuous marginal distributions (Cario and Nelson, 1997, 1998; Kugiumtzis, 2002; Macke et al., 2009; Serinaldi and Lombardo, 2017), or, focusing on periodic processes (Tsoukalas et al., 2018). Moreover, most of these studies focused on specific classes of marginal distributions and CS’s along with ad hoc solutions for the definition of the CTF .
78 79 80 81 82 83 84 85 86 87 88
In this study, we present a unified and fully general mathematical representation of the parentGaussian modeling framework. Going beyond specific cases, we highlight its generality and its ability to model and simulate processes with both cross- and autocorrelations such as spatio-temporal hydrological processes. We introduce a general and efficient procedure to define the function based on a simple and flexible parametric function, as well as a general-purpose technique to simulate the pGp with suitably inflated prescribed CS. Finally, we illustrate the performance of the generalized parentGaussian framework by using a variety of CS’s and marginal distributions of discrete, continuous and mixed type that provide a suitable representation for most of hydroclimatic processes. These diverse examples emphasize how the generalized parent-Gaussian theory can overcome all ad hoc solutions previously proposed in the literature, thus offering a powerful but simple tool that improves the analysis of hydroclimatic variables.
89 90 91 92 93 94 95 96 97 98 99 100
The paper is organized as follows: Section 0 provides the theoretical background introducing a unified mathematical representation of the parent-Gaussian framework starting from the univariate case (i.e., serially autocorrelated processes) and then discussing the multivariate extension. Section 0 starts with a review of marginal distributions and proposes a general framework on how to define and use parametric auto- or cross-correlation structures suitable to describe a large variety of hydroclimatic processes covering most of the cases of practical interest in hydrological studies. Then we present a novel general-purpose CTF along with a general technique to simulate the parent Gaussian process, concluding with a step-by-step synopsis of the simulation algorithm. Section 0 reports several case studies involving hydroclimatic processes with heterogeneous marginal distributions (discrete, continuous, and mixed type) and different CS’s, e.g., precipitation, river discharge, wind speed, relative humidity, number of extreme events per year, etc., as well as, a multivariate case of rainfall, wind and relative humidity. Finally, conclusions are summarized in Section 0.
102 103 104
AN US
M
ED
PT
CE
AC
101
CR IP T
69 70 71 72 73 74 75 76 77
Theoretical framework Univariate case + be a stationary stochastic process over an indexed set T, having an arbitrary Let * ( ) | autocorrelation structure (ACS) ( ) ( ) for lag , and with the random variable (r.v.) ( ) following an 4
ACCEPTED MANUSCRIPT
( )(
). Stationarity entails no change in the joint probability
105
arbitrary marginal distribution function
106 107
distribution of any order and for any shift in time, and thus, * ( )+ (for brevity is omitted ( ) hereafter) is defined by a single ACS ( ) and a single marginal distribution ( ), as ( ) ( ) + be a standard Gaussian process (Gp) with CS ( ) for any and ( ) ( ) . Let * ( ) |
110 111 112 113 114 115 116
( ), implying that any linear combination of ( ( )
distribution (see e.g., Feller, 1971, p. 87) and any ( ) follows the standard Normal distribution ( ). Gaussian processes have well-defined properties and their simulation is easy, therefore, the key concept is to identify a parent Gp that can be transformed into the process * ( )+. Therefore, to simulate/reproduce a stochastic process ( ) with a given linear ACS ( ) and a given marginal ( ) we need to assess and transform the pGp. As the marginal of the Gp is known, this transformation only requires to assess the ACS of the parent-Gaussian process (pGACS) ( ) yielding the desired target ACS.
117 118 119 120
Let
( ( )) transforming * ( )+ into * ( )+, and
be a real one-to-one function with ( )
( ( )) transforming * ( )+ into * ( )+. It follows that ( ( )) maps Gaussian r.v.’s ( ) ( ) and the pGACS ( ) to the target ACS ( ), and vice versa, if ( ) to r.v.’s ( ) ( ( )) is applied to * ( )+. This creates a unique mapping between the two processes, i.e., ( )
* ( )| ( )
(
)
↦ ( )+ ↤
( )
( ( ))
* ( )| ( )
( )
( )+
(1)
( ( ))
M
( )
Now, let us define the transformations
ED
121
( )) has a joint standard Normal
CR IP T
109
AN US
108
( )
( ))
(
( )
( ))
PT
(
(2) (3)
125 126
mapping X to Z. Thus, for the r.v.’s ( ) ( ) and ( ) ( ), where ( ) is any well-defined distribution function, a pair of transformation exists transforming one into the other and vice versa.
AC
127 128
CE
124
( ) ( ) with where ( are, respectively, the √ ) and ( ) √ distribution and quantile functions of the standard Gaussian variate Z, and ( ) the quantile function of X. It is well-known that ( ( )) maps Z to X, with ( ( )) being its inverse transformation
122 123
( ) (this definition is used for notational simplicity) be a pair of * ( )+ Let ( ) and ( ) r.v.’s at an arbitrary time point , lagged by , and having a correlation coefficient ( ) given by
( )
.( ( )
( )) ( )
( ( )
( ) )/
( )
5
( ( ) ( ))
(4)
ACCEPTED MANUSCRIPT
129 130
where and are the mean and the variance of the r.v. X (given that they are finite as there are r.v.’s with infinite mean and variance in which case the correlation cannot be defined), and
( ( ) ( ))
∫ ∫ ( ) ( )
( ) ( )(
( ) ( ))
( )
( )
(5)
( ) ( )) is the joint probability density function (pdf) of ( ) and ( ) which, in
( ) ( )(
where
132 133
general, does not have a simple parametric form. Yet since ( ) and ( ) are jointly normally distributed with correlation ( ) and with bivariate pdf
134 135
( ) ( ))
(
( )
√
( )
( ) ( ) ( ) ( ) ) (
( )
)
AN US
( ) ( )(
CR IP T
131
(6)
we can use the theory of r.v.’s transformations (see e.g., Feller, 1971) and Eq. (3) to estimate directly the joint pdf of ( ) and ( ), i.e., ( ) ( )(
( ) ( ))
where (( ( ) ( ( ))
( ( ))
( )
( ( ))
( ( ))
( )/ ( ( ) ( ))
(7)
139 140
with known ( ) and ( ) parameters we can calculate numerically the integral, and thus, the corresponding ( ) from Eq. (4). This creates a one-to-one link between ( ) and ( ).
141 142
The above approach has not been previously proposed in the literature, and although it results in complex ( ) ( ) ( ( ) ( )) expressions that may affect the numerical integration speed, in some
143 144 145 146 147 148
cases (encountered in numerical experiments, e.g., when Gamma type marginals are involved), it was faster than the more standard approach described next. Note that the computation of the Jacobian demands the existence of partial derivatives and thus for discrete marginal cases this approach may not be applicable. Another approach to estimate the ( ) is by applying the fundamental probability theorem (Feller, 1971, p. 5) stating that given an r.v. Y with known pdf ( ) and an r.v. ( ), where is a real function, the expectation of X can be expressed in terms of ( ), i.e., ( )
ED
PT
CE
149
AC
137
150
knowledge of
151 152
M
138
| is the Jacobian of the ( ( )) ( ) transformation (see e.g., Papoulis and Pillai, 2002, p. 244). Inspection of (7) reveals that ( ) parameters and of ( ). If we insert Eq. (7) into Eq. (5), ( ) ( ) ( ( ) ( )) is a function of the
136
|
( ) ( ).
( ( ))
∫
( ) ( )
, avoiding the calculation of ( )
∫
( )
which requires
( ). This theorem applies in higher dimensions and can be used to evaluate
( ( ) ( )) in terms of the bivariate standard Gaussian pdf (Eq. (6)) in combination with the Eq. (3) transformation. The autocovariance transformation integral ( 6
( )) is given then by
ACCEPTED MANUSCRIPT
( ))
( ∫ ∫
.
( ( ) ( ))
( ( ))/
(
( ( ))/
.
.
( ( ))/
( ) (
)( ( ) ( )
( ( ))/)
.
( ))
( )
( )
(8)
where ( ) is the parameter vector of the distribution ( ). In the copula literature Eq. (9) is known as the Nataf transformation (Nataf, 1962; Liu and Der Kiureghian, 1986; Li et al., 2008; Lebrun
155
and Dutfoy, 2009; Xiao, 2014). In general, the double integral ( solution, yet its numerical estimation is trivial.
158
( )) into Eq. (4), we get the autocorrelation transformation integral (ACTI)
Introducing (
157 (
( )) does not have an analytical
( )), i.e.,
( )
( ))
(
( )) is a function of
( ))
(
AN US
156
CR IP T
153 154
( ) as
(9)
159
Clearly, (
160
combining
161
the
162 163 164 165
, transforming essentially the Gaussian ACS ( ) into the ACS ( ). Note, this relationship transforms a known ( ) value into a ( ) value, while our goal is to transform an observed or a given ( ) value into the corresponding ( ) value; this is described in detail in Section 0.
166 167 168 169 170 171 172
Multivariate and cyclostationary case
174 175
and
( )with Eq. (4) to calculate
are functions of
as well. Likewise
( ) the ACTI creates a link between
( ), and thus, can be applied for any pair of ( ( ) ( )), or else for all lag values
ED
M
( ) and
( ) ( ))
CE
PT
Equations previously shown refer to a single stationary stochastic process, yet Eq. (9) can be applied more generally. As natural processes vary in space and time we need multivariate and cyclostationary models (Bartolini et al., 1988; Salas, 1980; Salas et al., 1982; Shao and Lund, 2004). Let us assume processes (e.g., a single process observed in different locations, different processes in the same location, etc.), and seasons (e.g., 12 months, 365 days, etc.), i.e., a total of processes, and with * ( )+ denoting the -th process at -th season, where , and . Accordingly, for ( ), and for every pair of processes each process we assume a marginal distribution ( ), an ACS
AC
173
( ) ( )(
and of
with
a cross-correlation structure (CCS)
( ), where
, and
. In this case,
Eq. (8) becomes
( ))
(
7
(10)
ACCEPTED MANUSCRIPT
∫ ∫
( ( ))/
(
( ( )))
( )(
( )
( ) ( )
( ))
( )
( )
and thus, the cross-correlation transformation integral (CCTI) becomes
( )
177
where
178
Thus,
( ))
(
are parameter vectors of
( ))
(
( ) and
(11)
( ) distributions, respectively.
( )) can be seen as a function transforming a given lag- correlation
(
( ) in a pair of standard Gp’s, into the lag- correlation
179
CR IP T
176
.
( ) between *
( )+ and
* ( )+ processes. Thus, the difference, compared with the univariate stationary case, is that here we have a set of processes * ( )+ and we want to assess a set of parent Gp’s * ( )+, or else,
182
their pGCCS’s
183
set of *
184 185 186
result, the problem of generating time series from a multivariate cyclostationary model comprising processes with different characteristics, simplifies in generating time series from a multivariate cyclostationary standard Gp.
187
From theory to practice
188
Marginal distributions of hydroclimatic processes
189 190 191 192 193 194 195 196 197 198 199 200 201
Continuous Common hydroclimatic processes like temperature, wind, precipitation, river discharge, relative humidity, etc., have continuous, or, mixed-type (partly continuous and partly discrete) marginal probability distributions. The number of parametric distributions found in the literature to describe hydroclimatic r.v.’s is very large: air temperatures is commonly described by the Gaussian distribution (e.g., Klein Tank and Können, 2003); the two-parameter Weibull has been the prevailing model for wind speeds (e.g., Justus et al., 1978; Seguro and Lambert, 2000), yet Gamma, Lognormal and Pearson type I distributions have also been used for wind (for a review see Carta et al., 2009); nonzero daily precipitation has been modelled, by Gamma (e.g., Buishand, 1978; Bruhn et al., 1979; Geng et al., 1986), Weibull (e.g., Swift and Schreuder, 1981; Wilson and Toumi, 2005), Exponential and mixed Exponentials distributions (e.g., Smith and Schreiber, 1974; Todorovic and Woolhiser, 1975; Wilks, 1999), or by heavier-tailed distributions like the Lognormal (e.g., Swift and Schreuder, 1981), Kappa (Mielke Jr, 1973; Mielke Jr and Johnson, 1973; Hosking, 1994; Park et al., 2009), and the generalized Beta distribution
AN US
180 181
( ). Given this, a multivariate and cyclostationary Gp can be transformed into the ( )
.
( ( ))/. As a
AC
CE
PT
ED
M
( )+ processes using the corresponding transformations
8
ACCEPTED MANUSCRIPT
(Mielke Jr and Johnson, 1974) (for a detailed global investigation on the distribution of seasonal daily precipitation see Papalexiou and Koutsoyiannis (2016) where the Generalized Gamma and the Burr type XII have been used; for global analyses on precipitation extremes see (Papalexiou et al., 2013; Papalexiou and Koutsoyiannis, 2013; Serinaldi and Kilsby, 2014b) ); river flows have been described by the Lognormal and Pareto distributions (e.g., Bowers et al., 2012); the Pearson Type III and the 3parameter lognormal distributions were suggested for low stream flows (Kroll and Vogel, 2002); the Beta distribution has been found to describe very well relative humidity values (e.g., Yao, 1974), and also, is a candidate model for percentage r.v.’s like probability dry, percentage of area affected by extreme temperature or precipitation, etc.
211 212 213 214 215 216 217 218 219
A complete review on the probability models applied in geophysical and hydroclimatic processes would be a difficult task by itself and out of the scope this study. A common practice is to fit many distributions and select the best fitted, yet this approach is susceptible to sample variations. In an attempt to simplify this procedure, and based on entropy information measures, a general theoretical framework was proposed (Papalexiou and Koutsoyiannis, 2012) that led to consistent distributions for positive geophysical and hydroclimatic r.v.’s. Particularly, the Generalized Gamma ( ) (Stacy, 1962; Stacy and Mihram, 1965) and the Burr type XII distributions ( ) (Burr, 1942; Tadikamalla, 1980) were derived, yet the Burr type III ( ) is also proposed here as a general and flexible power-type model. The probability density function of distribution and the distribution functions of the
220
and
)
)
( )
( ( ) )
(12)
are given, respectively, by
(
)
)
(
(
( ) )
( )
AC
CE
PT
(
222 223 224 225 226 227 228
(
ED
(
M
AN US
CR IP T
202 203 204 205 206 207 208 209 210
)
(13)
(14)
These distributions, defined for , have the scale parameter , and the shape parameters and controlling the left and right tails, respectively. Their pdf’s are J-shaped for and bell-shaped for , while the parameter controls mainly the “heaviness” of the right tail, and thus, the behaviour of extreme events. Note that is of exponential form with finite moments, while and are power-type distributions that can have very heavy tails and infinite moments. The can be considered as a generalization of the two-parameter Gamma and Weibull distributions, while the includes as special cases the Pareto type II for , or the Weibull distribution as limiting 9
ACCEPTED MANUSCRIPT
case for . Clearly, the principle of parsimony, which requires to always seek for the simplest model, i.e., to use the smallest number of parameters possible (see e.g., Box et al., 2008, p. 16), should always be applied. For example, if a Weibull distribution performs satisfactorily (under desired criteria) then it should be preferred over the three-parameter ; the same holds for the power-type distributions and where the simpler Pareto type II should be the first choice under adequate performance.
235 236 237 238 239 240 241 242
Mixed type, discrete and binary Several natural processes like precipitation at fine temporal scales (e.g., at daily or subdaily scales), discharge of small streams, or even wind, are intermittent processes. Thus, their marginal distribution comprises a probability mass concentrated at zero (e.g., probability dry in precipitation), and a continuous part for the positive values, expressed by a parametric distribution function. The framework to simulate intermittent processes is the same, (see section 0); the difference lies in the expressions of the distribution function ( ) and the quantile function ( ) that are now related to the conditional expressions for | .
243 244
Particularly, if ( given, respectively, by
)
is probability zero, then the cdf, pdf and quantile function of
are
( )
(
(15)
)
{
|
( )
(16)
)
(17)
(
|
PT
(the Dirac delta notation can also be used to define the pdf). Additionally, the expressions of the mean and variance of , used to estimate the CTF’s in Eq. (9) or Eq. (11) become
CE 250
( )
|
(
( )
)
(
|
(18) (19)
|
AC
247 248 249
{(
ED
( )
)
M
( )
245 246
AN US
CR IP T
229 230 231 232 233 234
)
|
The probability zero, can be easily assessed from the empirical data as , with and denoting, respectively, number of dry and total days, while for positive values any suitable parametric continuous distribution | ( ) could be used. This framework can be extended to include cases where values above a threshold are considered, i.e., the discrete part can express the , and the continuous part values for |
(
) for an
251
arbitrary value
252 253
marginal distributions can be considered, e.g., the water storage in a reservoir varies theoretically from zero to a maximum level when capacity is reached, thus, as a variable it may have two discrete states, 10
. Finally, more general mixed-type
ACCEPTED MANUSCRIPT
i.e., probability zero and probability for maximum capacity; all in between values can be described by a continuous bounded distribution.
256 257 258 259 260 261 262 263
Important aspects of various geophysical processes can be considered to have a discrete or a binary marginal distribution (see e.g., Chung and Salas, 2000; Molotch et al., 2005; Serinaldi and Lombardo, 2017). For example, the number of consecutive wet and dry spells (in days, hours, etc.), the number of days per year having a specific property, e.g., temperature, precipitation, or wind speed above a threshold (extremes), the number of wet days per year, binary sequences to describe wet and dry days, and many more. As in the mixed-type case the methodology remains the same, modified only by using expressions for the quantile, mean and variance that correspond to the discrete or the binary probability mass function chosen.
264 265 266 267 268 269 270 271
Autocorrelation (ACS) and cross-correlation structures (CCS)
272 273 274 275 276 277 278 279 280 281 282 283 284 285
As in Waldo Tobler’s First Law of Geography, stating that “near things are more related than distant things”, it is well-known “things” closer in time are more related than distant “things”. Thus, we ) can assume that a monotonically decreasing function of lag (or distance) having ( and ( ) can serve as an ACS. A general class of functions, proposed here, with these properties are the complementary cumulative distribution functions (ccdf), but now treated as functions and not as probability laws. Note that valid ACS’s and CCS’s have to be positive definite to guarantee that the parent Gaussian processes are stationary (see. e.g., Yaglom, 2012, p. 57). We deem that by definition a ccdf function fulfils this conditions (for a treatise on strictly positive definite functions see Chang, 1996). However, pathological cases may be possible to be formed. For example a function that equals to 1 up to a given lag and then gets 0 cannot be a proper correlation function (for more examples see e.g, Dolloff et al., 2006); yet in real world cases this specific issue cannot be observed as correlation equal to 1 in a dataset indicates an exact linear deterministic relationship, and thus, the correlations for any lag would also equal to 1 excluding the possibility for any other value. In any case, positive definiteness of a function should be considered when we construct theoretical ACF's and CCF's.
CE
PT
ED
M
AN US
Empirical autocorrelation (or cross-correlation) ̂( ), especially for small samples and for large lags , is sensitive to sample variations and calculated up to a maximum lag (1/3 of the sample size is usually suggested). A better approach may be to fit a parametric ACS to ̂( ), expressed by a simple ) where function ( ( ) is the parameter space of the ACS, essentially interpolating the values of ̂( ) and providing a smoother result. Also, it could be desirable to assign a prescribed ACS or CCS, e.g., in sensitivity analysis scenarios or if regional information is transferred to a location of interest.
AC
286
CR IP T
254 255
Based on the previously described rational, the Weibull ACS is then given by
(
)
( . / )
11
(20)
ACCEPTED MANUSCRIPT
287 288
with
and , and forms a generalization of the celebrated Markovian ACS as it is identical for ( ) can be expressed also in terms , yet decays slower for , and faster for . Note that
289 290 291
( ) of lag-1 correlation , i.e., . A similar stretched exponential form was used for space correlation of rainfall (Ciach and Krajewski, 2006). Likewise, a power-type ACS is formed by the Pareto II ccdf, i.e.,
)
.
/
⁄
(21)
CR IP T
( with
and . The PII ACS decays always slower than the Markovian, yet it becomes identical as . A similar two-parameter power-type ACS, named Cauchy class, was also proposed by Gneiting and Schlather (2004) as generalization of the fractional Gaussian noise (fGn) ACS.
295 296
The Weibull and PII ACS’s can be merged into a three-parameter expression, which coincides with the Burr type XII ccdf, i.e.,
(
and
; for
(
(
,
(22)
. / ) )
.
/, and for
297
with
it
298 299 300 301
equals the PII ACS. This general expression, with a different parametrization, was also used in Papalexiou et al. (2011), yet it was not suggested within the general framework proposed here and was not identified as the Burr type XII ccdf. Note that we can form ACS’s decaying even slower than power-type ACS’s, e.g., a general logarithmic expression (GL) proposed here, generalizing also the Markovian ACS, is
ED
M
,
)
AN US
292 293 294
PT
(
)
.
.
(23)
//
( ) with and ; for coincides with the Markovian ACS, i.e., ( ). This ACS for large values of can have extremely slow decay rates, and may be useful for processes at the second or millisecond time scales.
305 306 307
Another noteworthy ACS is the celebrated fGn, or, Hurst ACS (e.g., Beran, 1994, p. 52), linked to long-term persistence (LTP) and fGn processes (Hurst, 1951; Mandelbrot and Wallis, 1968, 1969), given by
AC
CE
302 303 304
(
)
(|
|
| |
12
|
|
)
(
)
(24)
ACCEPTED MANUSCRIPT
where is the Hurst coefficient. Also, FARIMA models (Granger and Joyeux, 1980; Hosking, ( ), yet their ACS is more flexible as it 1981)have a similar ACS, asymptotically equivalent to the allows to control short-term correlations (see e.g., Hosking, 1984; Montanari et al., 1997 for applications in hydrology). Noteworthy recent global studies on annual precipitation (Iliopoulou et al., 2016) and river discharge (Markonis et al., 2018) challenge the LTP hypothesis as the findings suggest very low values of the coefficient that could also emerge from Markovian processes.
314 315
Here we propose a convenient generalization of the fGn ACS, i.e., the generalized fGn (gfGn) given by
CR IP T
308 309 310 311 312 313
(
(
)
(
)(
(
(
)
)
(
)
)
)
(
)
(25)
This ACS, well-defined for , converges very fast to the classical fGn ACS, yet for lag we have ( ) and thus it allows full control of the short-term properties of the ACS. Also, various
318 319
other correlation functions have been proposed, e.g., for wind and atmospheric data (Buell, 1972; Gneiting, 1999).
320
In the cross-correlation case between two process, e.g., of * ( )+ and * ( )+, since
322
(
) for
, and
( )
( )
we can still use parametric CCS’s similar to the ACS’s framework,
if this is desired, by modifying them as
{
ED
( )
M
321
AN US
316 317
( (
) )
(26)
where ( ) can be any parametric CS like the Weibull, PII, etc. with different parameter values for and to fulfil the condition of ( ) ( ). Of course an infinite number of ACS or
325 326
CCS’s can be formed, with many more parameters, yet according to the parsimony principle we should choose the one with the less possible parameters (Box et al., 2008, p. 16).
327 328 329 330 331 332 333 334 335 336
Although bias issues in estimating empirical correlations are beyond the scope of this study; yet it is crucial to mention that the empirical ACS of a process having a very intense underlying ACS, e.g., an fGn with high values of , will be observed (most probably) weaker (especially for small samples), and will also affect the sample standard deviation. In general, a straightforward and easy method to estimate any linear ACS without bias does not exist; formulas exist only for special cases like the fGn ACS (see e.g., Beran, 1994) and should be applied (if there is evidence for LTP) before estimating the pGACS. For most hydroclimatic processes, however, and when dealing with time scales larger than the daily (or even hourly for precipitation) the empirical ACS’s are not intense, and thus, the bias is not expected to be significant; also as afore-mentioned the existence of LTP in processes like river discharge is challenged (see e.g, Markonis et al., 2018). Nevertheless, for processes at fine temporal scales we may 13
AC
CE
PT
323 324
ACCEPTED MANUSCRIPT
observe very strong ACS’s (see the example in Section 0) that can lead to significant biases in the estimation of the underlying ACS and of the standard deviation (see e.g., Papalexiou et al., 2011 and references therein).
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357
Finally, it should be clear that the parent Gaussian framework reproduces linear correlation structures as expressed by the Pearson correlation coefficient, and does not consider other types of correlations, e.g., rank correlations (Kendal’s Tau, Spearman’s Rho, etc.), or, more general correlation topologies that can emerge by other models. For example, there is an increasing interest over the last decade on the spatial dependence structure of extreme events using max-stable processes (see e.g., Davison et al., 2012; Huser et al., 2016; Huser and Genton, 2016; Ribatet, 2013; Stephenson et al., 2016), an approach that may lead to structures which the framework presented here may not be able to reproduce. Yet it is still a matter of debate if modelling extremes directly is a better approach compared to modelling the process itself, e.g., in the presence of LTP it was shown that the latter approach should be preferred (Serinaldi et al., 2018). Also, note that the behaviour of spatial extremes is essentially governed by the tail of a multivariate distribution or by the random field properties, and thus it is rational to assume that if the multivariate process or the field itself is simulated accurately then the structure of extremes will be also consistently reproduced. Certainly, both approaches have advantages and disadvantages that depend on many factors. For example, analysis of above-a-threshold extremes provides a more accurate estimation of the distribution tail (see e.g., Papalexiou et al., 2013) than using the whole set of observations; in contrast, using annual maxima may lead to increased uncertainty in the estimation of the heaviness of the tail especially when fixed-block maxima are used (Papalexiou et al., 2016; van Montfort, 1990).
358
Auto- and cross-correlation transformation function
359 360 361 362 363 364 365 366 367 368
Autocorrelation transformation function (ACTF) The ACTI in Eq. (9) transforms a specific autocorrelation value between a pair of ( ) and ( ) r.v.’s of the Gp lagged by , into the autocorrelation between the ( ) and ( ) r.v.’s, of the target process, yet the inverse relationship is desired for all practical purposes. That is an ACTF ( ) where we introduce any value of or any ACS ( ) and get, respectively, the pGp value or the pGACS ( ). We assume here positive correlations ranging in , -, yet the framework can be expanded easily for negative correlations too. Essentially, we evaluate the ACTI in a small set of values to create ( ) points and interpolate them by fitting a simple parametric function. Since ( ) ( ) (Kendall and Stuart, 1979, p. 600) the ACTF should have the potential to be concave and also pass from the ( ) and (1,1) points, as for equal to 0 and 1, is also 0 and 1.
AN US
M
ED
PT
CE
AC
369 370
CR IP T
337 338 339
A simple two-parameter function with these properties that is proposed here, is heuristically evaluated and found to perform perfectly in all cases of continuous and mixed-type marginals studied, is
14
ACCEPTED MANUSCRIPT
(
) (
(27)
)
which is concave for and , while for and we have . Unless stated this will be the basic ACTF used in the examples of this study. The previous function performs also well for the discrete and binary case, yet the alternative
(
)
(
)
CR IP T
371 372 373
(
)
(28)
seems to perform better for discrete cases and especially for the binary case. This function is concave for and ; for and we have . Note that Eq. (28) coincides with Kumaraswamy cdf (Kumaraswamy, 1980), yet here is used as parametric function and not as probability law. Thus, we can now form a one-to-one relationship that enables the calculation of the pGp either as a single value or as a parametric pGACS if a parametric ACS ( ) is plugged into Eq. (27) or Eq. (28). ) functions can be easily fitted if we evaluate the ACTI (Eq. (9)) for Any of the ACTF’s ( * + and use the corresponding set of ( ) points. We stress that this approach is simpler and by far more efficient than iterative methods and polynomial approximations previously proposed (see e.g., Demirtas, 2014; Ferrari and Barbiero, 2012; Kugiumtzis, 2002).
384
)) marginal distribution, Let us assume that we target a process * ( )+ having a Weibull ( ( ( ) ( ) ( ( ) ) and a Markovian ACS ( ) ( ) with . For illustration and comparison purposes we assume three different distributions (Fig. 1a) with scale * +, respectively. The estimated ( parameter and shape parameters ) points ) functions (Fig. 1b) show very different transformation (gray dots in Fig. 1b) and the fitted ( profiles, i.e., for the bell-shaped ( ) which resembles the Gaussian distribution, essentially ) has a and the pGACS coincides with ( ) (Fig. 1c). As we deviate from normality, e.g., the ( heavy tail and very high skewness, the ACTF becomes more concave resulting in very different pGACS’s (Fig. 1c). Note that a Markovian ACS ( ) is not reproduced by a Markovian pGACS, e.g., for the ( ) case the ( ) corresponds to ( ) which does not imply that the Markovian ACS ( ) will reproduce the ( ) , it will only preserve the ( ).
395 396 397 398 399 400 401
M
ED
PT
CE
AC
385 386 387 388 389 390 391 392 393 394
AN US
374 375 376 377 378 379 380 381 382 383
Another example, showing how the concavity of the ACTF changes, regards mixed-type marginal distributions suitable to model intermittent processes like precipitation (or wind and discharge in some )) cases). Let us target for an intermittent process * ( )+ having probability zero , a Pareto II ( ( ( ) ( ) marginal distribution ( ) for nonzero values with and ( ) with , and an fGn ACS (Eq. (24)) ( ) . For comparison we use * +, with the unconditional pdf’s shown in Fig. 1d (note that it is common to observe equal to high values as 0.90 and 0.99 for daily and hourly rainfall records, respectively). The ( ) 15
ACCEPTED MANUSCRIPT
) functions (Fig. 1e) show how concavity increases as increases points and the fitted ( resulting in very different pGACS (Fig. 1c), and thus, for large values of a much stronger ( ) is necessary to preserve the ( ). As in the previous example the pGACS of the fGn ACS is not an fGN ACS with just a different value.
406 407 408 409 410 411 412 413 414
In summary, we note that the concavity of the ACTF, or else, the difference between the target ACS and the pGACS, depends on how much the target marginal deviates from normality. Cleary, a unique method to quantify this deviation does not exist, as deviations may regard the general shape (e.g., skewed vs. symmetric densities), the heaviness of the tail (e.g., exponential vs. power-type tails), or the type of the density, i.e., mixed-type vs. continuous type. Consequently, we could “speculate” that when the target marginal is of continuous type, has a moderately skewed and bell-shaped pdf, e.g., as in the bell-shaped Weibull pdf example (Fig. 1a), then the linear correlation has a small bias. In other words, ( ) could be approximated by the actual ( ), and thus, we can directly transform the Gaussian time series avoiding calculating the integrals and fitting the ACTF.
PT
ED
M
AN US
CR IP T
402 403 404 405
415
CE
Fig. 1. Demonstration of the autocorrelation transformation function (ACTF) and the resulting parent-Gaussian autocorrelation structures (pGACS): (a) different Weibull ( ) marginal pdf’s, (b) ACTF’s (indicated with the same colour) corresponding to the distributions, (c) pGACS’s for a Markovian ACS ( ) with corresponding to the three distributions (note that ( ) is overlapped by the pGACS of the ( ) case), (d) unconditional Pareto II ( ) pdf’s for three values, (e) ACTF’s corresponding to the mixed-type distributions, (f) pGACS for an fGn ACS ( ) with Hurst coefficient corresponding to the mixed-type distributions.
AC
416 417 418 419 420 421
424
Cross-correlation transformation function (CCTF) In the previous case, the ACTF was constrained to pass from the (1,1) point, yet for cross-correlation this is not mandatory. The cross-correlation between two stationary processes * ( )+ and { ( )} has
425
boundaries depending on the marginal distributions
426
admissible
422 423
( ) and
( ), thus there is a range of
values according to the Frechet-Hoeffding theoretical limits, i.e., 16
(
)
ACCEPTED MANUSCRIPT
427
(
)
(Embrechts et al., 2002; Fréchet, 1957; Hoeffding, 1994). These boundary
values can be checked for each specific application by using for example the procedure suggested by Demirtas and Hedeker (2011). However, this check is not required in our approach as the interpolation procedure automatically determines the feasible range of and thus the ( ).
431 432
Here, we propose as CCTF a simple two-parameter function, that performed extremely well in all cases studied, i.e.,
.
/
.
/ .
CR IP T
428 429 430
/
(29)
Note that if more flexibility is needed the three-parameter
434 435 436 437
function (Eq. (29)) is essentially the same as the ACTF in Eq. (27) but without the denominator part, which forced the function to pass from (1,1); it is valid for , concave for , linear for , and convex for . As in the ACTF case, we can easily fit the CCTF by evaluating the CCTI in Eq. (11) for * + and creating the corresponding ( )
438
points. Once the CCTF parameters are estimated we can determine easily the
439
in Eq. (29) and solve for
AN US
433
by setting
(30)
M
, i.e.,
can be used. This
Let us assume a process * ( )+ with a Weibull marginal ( ) and a process * ( )+ with a Beta marginal ( ). For comparison purposes we use three different Beta pdf’s having negative, zero, and positive skewness (Fig. 2a). We use the CCTI of Eq. (11) to estimate the ( ) points
443 444 445
(grey dots in Fig. 2b) and we fit the CCTF in Eq. (29). Noteworthy, for the negatively skewed Beta the CCTF is convex, almost a straight line for the symmetric, and concave for the positively skewed, while the corresponding limits, estimated from Eq. (30), are 0.53, 0.64 and 0.75, respectively. This
446
indicates that the more the two marginals differ the more their maximum cross-correlation value
447 448
will be decreased. We also show the pGCCS of a Markovian CCS (Fig. 2c); clearly, not all CCS are feasible due to the maximum limit.
PT
CE
AC
449 450 451 452
ED
440 441 442
Another interesting example is to assume cross-correlated processes with mixed-type marginals that differ in probability zero. We assume a heavy-tailed continuous marginal ( ) for the first process * ( )+, while for * ( )+ we use three different mixed-type ( ) marginals with * + (Fig. 2d). Using the CCTI of Eq. (11) we calculate the ( ) points (grey dots
453
in Fig. 2e) and we fit the CCTF of Eq. (29). All CCTF’s are concave while the corresponding
454 455
estimated from Eq. (30), are 0.98, 0.89 and 0.66, respectively, indicating a decrease in the maximum cross-correlation as increases, or else, as the two marginals differ more. In Fig. 2f we show the pGCCS 17
limits,
ACCEPTED MANUSCRIPT
of the Markovian CCS used in the previous example; clearly, as needed to reproduce the target CCS.
increases much stronger pGCCS are
AN US
CR IP T
456 457
458
Fig. 2. Demonstration of the cross-correlation transformation function (CCTF) and the resulting parent-Gaussian crosscorrelation structures (pGCCS): (a) a Weibull ( ) marginal and three Beta ( ) pdf’s, (b) corresponding (indicated with the same colour) cross-correlation transformation functions (CCTF) of the distributions with the three Beta, (c) corresponding pGCCS for a Markovian target CCS , (d) the Pareto II ( ) continuous marginal and the three mixed-type marginal pdf’s for three values, (e) corresponding CCTF’s, (f) corresponding pGCCS for the Markovian CCS with .
464 465 466
Gaussian univariate and multivariate processes with arbitrary CS
M
459 460 461 462 463
PT
ED
We can easily simulate a standard Gp, i.e., with ( ) by using an autoregressive process of order
( )
∑
and , having an arbitrary parametric ACS ( ( )) defined by
(
)
(31)
( )
where ( ) ( ) is Gaussian random noise with parameters. For a given parametric ACS ( ) the
469
using the Yule-Walker equations (e.g., Box et al., 2008, p. 67), i.e.,
470 471 472 473
the parameter vector, , ( ) ( )- is the correlation vector up to lag , and , (| |) - is the correlation matrix with and denoting, respectively, the -th row and -th column, e.g., the , - element is simply the value of ( ) (for more on AR models see Box et al., 2008)
∑
( ), and are the model can be easily and analytically computed , where
,
- is
AC
CE
467 468
475
( ) model preserves exactly a given ACS ( ) up to lag and then for ∑ decays according to ( ), a relation that can be easily applied recursively. For ( )( )
476
example, Fig. 3 shows the approximation of two parametric ACS up to various lags, i.e., of a Weibull with
474
Thus, an
18
ACCEPTED MANUSCRIPT
477 478 479
(decaying thus slower than the Markovian ACS), and a very strong Pareto II ACS with . Clearly, the closer the ACS is to the Markovian the better will be approximated by the ; in any case, a parametric ACS can be approximated up to any desired lag, e.g.,
CR IP T
even an AR(5000) can be easily fitted. To avoid any confusion, we emphasize that here we fit the theoretical ACF of an ( ) model to a target parametric ACS. This method only requires estimation of a couple of parameters, i.e., those of the fitted parametric ACS to the data, while the parameters of the fitted ( ) model are simply coefficients/constants (estimated analytically and without uncertainty) corresponding to the target parametric ACF. Thus, this approach is parsimonious and efficient, e.g., an AR(5000) fitted to a two-parameter ACS is essentially a two-parameter model.
AN US
480 481 482 483 484 485
(
and and ) ( ) for
486
Fig. 3. Approximation of two parametric ACS by AR( ) models of large order.
488 489 490
Another approach to generate time series from a standard Gp having an arbitrary ACS ( ) is based on approximating ( ) by a sum of independent AR(1) processes, an idea originally proposed and used by Mandelbrot (1971) to approximate the fGn ACS. Particularly, if we define
ED
M
487
PT
( )
with ( )
492
standard deviation
493
processes are independent, the variance and the CS of the sum process are
496 497 498
( )
( ), and ( )
( (
( ) )
,
), then, since the AR(1) ∑
, respectively. Of course, if we demand * ( )+ to have
and , then
AC
495
∑
( ) being the -th AR(1) process in the sum, having mean
, lag-1 correlation
CE
( )
)
(32)
491
494
( ) (
∑ ( )
∑
is also constrained to equal 1. The AR(1) parameters can be estimated by numerically
minimizing an error norm between ( ) and ( ) up to a desired lag . For an application using five independent AR(1) approximating the ACS given Eq. (22) see Papalexiou et al. (2011). The corresponding multivariate autoregressive model MAR( ) of
19
processes is defined by
ACCEPTED MANUSCRIPT
( )
,
)
(33)
( )
499
where ( )
500 501 502 503 504 505 506
( ) , ( ) ( )- is an -vector of independent standard Gaussian variables. For a general algorithm to fit MAR( ) models of large order see Neumaier and Schneider (2001), Schneider and Neumaier (2001), and Schlögl (2006). Clearly, large order MAR models are complicated and often unnecessary, e.g., when Markovian ACS and CCS are observed a MAR(1) is sufficient. The parameters of ( ) the MAR(1) model ( ) ( ) can be estimated by using the relations ( ) ( ) ( ) and ( ) (thus, can be found using, e.g., Choleski decomposition), ( ) ( ) where and are the correlation matrices of the Gaussian processes (see also section 0 for
507 508 509 510
an application). Clearly, analytical or exact estimation of demands the matrix to be positive definite, a condition that is not always fulfilled. In this case the matrix can be approximated using near-matrix methods (Higham, 1988, see e.g., 2002).Finally, note that cyclostationarity can be introduced by using different MAR processes for each season.
511 512
A stochastic modelling recipe step-by-step
513 514 515 516 517
1. 2.
Assess suitable marginal distributions and correlation structures (auto- and cross-correlations for the multivariate case) from the empirical time series. Estimate the ( ) points for the proposed values (see section 0) using the ACTI in Eq. (9) and fit the parametric ACTF of Eq. (27) (or Eq. (28) for binary marginals) to the points; for the multivariate case estimate ( ) points using the CCTI in Eq. (11) and fit the CCTF in Eq. (29).
518
3.
Insert the target ACS
524 525 526 527 528
are
parameter matrices, and
AN US
CR IP T
and
M
ED
( ) into the fitted ACTF (
5.
.
( )
) to estimate the pGACS
/ to find the pGCCS
( ); for the
( ).
Use the estimated pGACS and the pGCCS to fit an AR( ) or MAR( ) model and generate Gaussian univariate or multivariate time series. Transform the Gaussian time series using the corresponding transformation
CE
4.
PT
multivariate case use the fitted CCTF
( )
( )
.
( ( )/ based on the fitted distributions from step 1.
AC
523
( )- ,
The proposed stochastic modelling framework is summarized in the following steps:
519 520 521 522
( )
(
∑
Real world examples Storm events of fine temporal resolution The method is applied to simulate rainfall events at fine temporal scale (10 seconds resolution), a process displaying strong ACS, highly skewed and heavy tailed continuous marginal distribution. The original data are seven storm events recorded at the Hydrometeorology Laboratory at the Iowa 20
ACCEPTED MANUSCRIPT
CR IP T
University (Georgakakos et al., 1994); note this dataset has been used also in other studies (see e.g., Papalexiou et al. (2011) for a statistical and stochastic analysis, Cârsteanu and Foufoula-Georgiou (1996) for a multifractal analysis, and Kumar and Foufoula-Georgiou (1997) for a wavelet analysis. Assuming that these events are the outcome of a single process (Papalexiou et al., 2011), and thus, the characteristics of the process are better determined if all events are merged into one set despite their large statistical differences (see Fig. 4a). Nonetheless, the aim here is not a detailed analysis of the data, but rather to illustrate the applicability of the method in a highly non-Gaussian process with strong ACS. In addition, other issues like the bias in the estimation of the ACS are not considered (for these details see Papalexiou et al., 2011).
PT
ED
M
AN US
529 530 531 532 533 534 535 536 537
538
545 546 547 548 549 550
CE
Fig. 4. A graphical step by step demonstration of the method applied to 10 s rainfall events: (a) recorded events; (b) empirical distribution of 10 s rainfall and the fitted BrXII distribution; (c) observed ACS and the fitted parametric Weibull ACS ( ); (d) the fitted autocorrelation transformation function (ACTF); (e) the parent-Gaussian autocorrelation structure ( ); (f) generated Gaussian events with ACS ( ); (g) synthetic events by transforming the Gaussian time series; (h) empirical distribution of the synthetic rainfall compared with the target BrXII fitted to the original data; (i) empirical autocorrelations of Gaussian and synthetic time series compared with ( ) and ( ).
AC
539 540 541 542 543 544
The method is graphically demonstrated in Fig. 4. First, a parametric distribution is fitted to the empirical distribution (Fig. 4b); here the is used. The estimated parameters, using the method of L-moments (Hosking, 1990), are , , and (see Papalexiou and Koutsoyiannis (2016) for details about fitting the using L-moments). Second, the observed ACS ̂ ( ) is ( ) given in Eq. (20), with estimated approximated by numerically fitting the parametric Weibull ACS ( ). Third, the parameters and (Fig. 4c), i.e. it is assumed that ( ) 21
ACCEPTED MANUSCRIPT
) in Eq. (9) is applied using the fitted ΑCTI ( distribution to estimate a few ( ) points (grey dots in Fig. 4d) which are “interpolated” by the ACTF in Eq. (27) with parameters and ( ) to obtain the pGACS ( ) . Fourth, the fitted ACTF is applied to the estimated (Fig. 4e). Fifth, a large order AR model is used, i.e., AR(1000), to reproduce the fitted pGACS ( ) and generate Gaussian time series (Fig. 4f). Sixth, synthetic time series emerge by simply transforming the Gaussian time series using Eq. (2) and the fitted (Fig. 4e). For verification the empirical distribution of the synthetic time series is compared with the target that was fitted to the original data (Fig. 4h), while the empirical ACS of the Gaussian and synthetic time series are compared with the target ( ) and ( ) (Fig. 4i).
560 561 562 563 564 565 566 567 568 569
Hourly precipitation
570 571 572 573 574 575 576 577 578 579 580 581 582 583 584
The framework presented here is “scale-free” as it can be applied at any time scale, including hourly. We demonstrate the performance of the method by using an hourly precipitation record from the Unites States (randomly selected from the NCDC database (Hammer and Steurer, 1997); station code: 063456) covering the period from 01/08/1954 to 30/11/2011 (Fig. 5a). Assuming hourly precipitation as a cyclostationary process, i.e., stationary at each month, we apply the steps described in Section 0 twelve times. In brief, high probability dry values are observed within the twelve months ranging from 0.90 to 0.95 (Fig. 5c,d and Fig. A.1), while the nonzero hourly rainfall is adequately described by fitted distributions (Eq. (12)); the estimated parameters for all months are given in Fig. A.1. The Pareto II ACS (Eq. (21)) interpolates accurately the linear empirical ACS (see correlograms and estimated parameters in Fig. 5e,f and Fig. A.2) and the pGACS for each month is estimated using the expressions for the mixed-type case (Eqs (17)-(19)). High probability dry values and high skewness of the target marginals inflate significantly the pGACS which approximates 0 in all months for , thus, an ( ) was used to generate the Gaussian time series. The synthetic rainfall was generated for the same period as the observed one (Fig. 5b) and emerged by transforming the Gaussian time series using the corresponding marginal-back transformations of each month.
585 586 587
CR IP T
551 552 553 554 555 556 557 558 559
AC
CE
PT
ED
M
AN US
Stochastic modelling of hourly precipitation has been the subject of active research for more than three decades having important applications, e.g., in rainfall-run off modeling. Previous methods have mainly focus on point process models like the Barlett-Lewis and the Neyman-Scott (see e.g., Istok and Boersma, 1989; Onof et al., 2000; Onof and Wheater, 1993; Rodriguez-Iturbe et al., 1987; Verhoest et al., 1997; Wheater et al., 2005). These models do not reproduce the actual marginal distribution but rather attempt to approximate it by preserving a number of statistical moments (usually up to the third aiming to reproduce skewness). At their core, Exponential and Gamma distributions are used that may underestimate return levels which is a well-known weakness that has motivated alternative approaches (see e.g., Onof et al., 2005).
The empirical probability distributions (Fig. 5c,d and Fig. A.1) as well as the empirical ACS’s (Fig. 5e,f and Fig. A.2)) of the synthetic time series show the precise reproduction of their target counterparts. Of course any minor deviations shown are due to random sample variations and not due 22
ACCEPTED MANUSCRIPT
to any systematic bias of the method. We also highlight the “one-state” framework of this method, as common precipitation generators, e.g., point process models, use two states, one to simulate the wet/dry days’ sequence and a second one to calculate the intensity of wet days (see also Wilks, 1998). Here, the intermittency is introduced “naturally” by applying the mixed-type marginal-back transformation. Note that this framework, compared to the afore-mentioned popular models, has the advantage of reproducing the actual marginal and ACS, while it can be equally or even more parsimonious, depending, of course, on the selected marginal distribution and parametric ACS. For example, the total number of parameters we estimate to model hourly precipitation for each month is just five, i.e., three for the marginal and two for the ACS; all the rest are deterministically estimated, and thus, are not essentially parameters.
CR IP T
588 589 590 591 592 593 594 595 596 597
599 600 601 602
AC
CE
PT
ED
M
AN US
598
Fig. 5. Simulation of cyclostationary hourly precipitation: (a) observed hourly precipitation (1954-2011) at a randomly selected station from the NCDC database (code: 063456); (b) synthetic time series; (c-d) probability plots for two characteristic months showing the fitted distributions (estimated parameters are given in the legend) to the observed
23
ACCEPTED MANUSCRIPT
603 604
nonzero data and the empirical distributions of the synthetic data; (e-f) correlograms showing the fitted PII ACS (estimated parameters are given in the legend) to the empirical values and the empirical ACS of the synthetic time series.
605 606 607 608 609
Daily precipitation, river discharge and wind
610 611 612 613 614 615 616 617 618 619 620 621
distribution (Eq. (12)) (Papalexiou and Koutsoyiannis, 2016), and a Weibull ACS (Eq.(20)) was fitted to the empirical ACS, i.e., ( ) ( ) (Fig. 6e). The ACTI in Eq. (9) was used to estimate ( ) points (Fig. 6b) using the corresponding expressions for the mixed-type case, i.e., the quantile function, mean and variance, are given now by Eqs (17), (18), and (19), respectively. The correlation transformation function ( ) fitted perfectly to the ( ) points (Fig. 6b) and was used to estimate the pGACS, i.e., ( ) ( ( ) ) (Fig. 6e). The ( ) for , thus, an AR(20) was fitted to ( ) for and used to generate a Gaussian time series of 1000 months ( values). The Gaussian time series was transformed to synthetic rainfall (Fig. 6c shows sample size equal with the original time series) by applying the transformation in Eq. (2) with being the mixed-type quantile in Eq. (17) with the estimated and the fitted quantile. For verification, the empirical distribution and the empirical ACS of the synthetic sample is compared with the target ( ) distribution (Fig. 6d) and the target ACS ( ) ( )
622
(Fig. 6e).
M
AN US
CR IP T
Daily precipitation is another important case of an intermittent process that plays a crucial role in hydrology. Here, we use October’s daily rainfall recorded at the National Observatory of Athens in Greece from 24/04/1927 to 20/02/1990 (Fig. 6a), assuming stationarity within the month. The probability dry was found ; the nonzero rainfall was described by an (
AC
CE
PT
ED
623
24
)
625 626 627 628 629 630 631 632 633
Fig. 6. Real-world simulations: (a) daily precipitation of October recorded at the National observatory of Athens, Greece; (b) the corresponding autocorrelation transformation function; (c) simulated time series; (d) probability plot verifying that the marginal of the synthetic time series is the same as the fitted to the real nonzero data; (e) the fitted daily precipitation ACS ( ); the estimated pGACS ( ), and the empirical values from the generated Gaussian and synthetic time series; (f-j) similar to (a-e) for daily river discharge of April in the Middle Fork Snoqualmie river in USA; (k-o) similar to (a-e) for daily wind speed of January recorded at Maastricht, Netherlands.
AC
624
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
Another important variable is daily river discharge; here we use daily values of April observed in the Middle Fork Snoqualmie river at the USA from 01/02/1961 to 06/11/2016 (Fig. 6f), assuming again a stationary process for same-month days. Daily discharge was found to be well described by the heavy 25
ACCEPTED MANUSCRIPT
( ) distribution (Eq.(14)), while the empirical ACS fitted well to a relatively tailed ( ) (Fig. 6j). The estimated correlation slowly decaying Weibull ACS (Eq.(20)), i.e., ( ) ) (Fig. 6g) was applied to evaluate the pGACS ( ) (Fig. 6j), transformation function ( which gets ( ) for ; thus, a Gaussian time series of 1000 months ( values) was generated from a fitted AR(25) and was transformed to synthetic daily discharge (Fig. 6h) by applying the transformation in Eq. (2) with being the quantile of the fitted . The synthetic time series ( ) ( ) (Fig. have the expected marginal (Fig. 6i) and the expected ACS 6j).
642 643 644
As a final example the method is applied to daily wind speed of January recorded at Maastricht in the Netherlands from 01/01/1957 to 31/12/2016 (Fig. 6k). Daily wind speed is described here by the ( ) distribution (Eq.(12)), as commonly used models like the two-parameter Weibull
645 646 647 648 649 650 651 652
distribution or the two-parameter Gamma distribution, did not perform well; also zero values were not recorded and thus the marginal is assumed continuous. The PII ACS in Eq. (21) was fitted to the empirical ( ). The estimated correlation transformation function ( ) ACS, i.e., ( ) ( ) ( ) (Fig. 6l) shows no concavity, and thus, the pGACS coincides with (lines overlap in Fig. 6o). ( ) The ( ) for , and thus, an AR(25) was used to generate a Gaussian time series ( values) which was then converted to synthetic daily wind speed (Fig. 6m) by applying the transformation in Eq. (2) with being the quantile of the fitted . The synthetic time series have the expected marginal (Fig. 6n) and the expected ACS (Fig. 6o).
653 654 655 656 657 658 659 660 661 662 663
Relative humidity, discrete and binary cases
M
AN US
CR IP T
634 635 636 637 638 639 640 641
AC
CE
PT
ED
Many variables in nature can be expressed in the , - range, e.g., relative humidity, probability dry, percentage variables, etc. Here, the method is applied on November daily values of relative humidity recorded at Maastricht in the Netherlands from 01/01/1957 to 31/12/2016 (Fig. 7a). A two-parameter ) and a PII ACS (Eq. (21)) was fitted to the empirical Beta ( ) distribution was used, i.e., ( ( ). The estimated ACTF ( ) (Fig. 7b) is a straight line and ACS, i.e., ( ) ( ) the pGACS ( ) coincides with ( ) (lines overlap in Fig. 7e). The ( ) for , and thus, an AR(30) was used to generate a Gaussian time series ( values) which was transformed to synthetic daily relative humidity (Fig. 7c) by applying the transformation in Eq. (2) with being the quantile of the fitted Beta. The synthetic time series have the expected marginal (Fig. 7d) and the expected ACS (Fig. 7e).
26
665 666 667 668 669 670 671 672 673
Fig. 7. Simulation of relative humidity, number of extremes per year and occurrences of high/low rainfall years: (a) relative humidity of November recorded at Maastricht, Netherlands; (b) the ACTF; (c) simulated time series; (d) probability plot verifying that the marginal of the synthetic time series is the target Beta distribution; (e) the ACS ( ), the pGACS ( ), and the empirical values form the generated Gaussian and synthetic time series; (f-j) similar to (a-e) for number of extremes per year in the station ASN00008067 from GHCN-daily database; the discrete Polya-Aeppli distribution is used; (k-o) similar to (a-e) for occurrences of low/high rainfall years at the station ASN00008067.
AC
664
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
As an example of a process with discrete marginal distribution we use the number of extreme daily precipitation events per year, i.e., in an -year record we identify the largest values and their dates and count the number in each year. Here, we used a long daily rainfall record, randomly selected 27
ACCEPTED MANUSCRIPT
from the GHCN-Daily database (ID: ASN00008067), to form the number of extremes per year time series (Fig. 7f). A two-parameter discrete distribution was used, i.e., the Polya-Aeppli ( ) distribution ( ), to describe the probability to observe a given number of extremes in any year. The empirical autocorrelation is essentially 0, and thus, for demonstration purposes, a PII ACS (Eq. (21)) was ( ). We fit the Kumaraswamy ACTF in Eq. (28) , i.e., ( ) prescribed, i.e., ( ) (Fig. 7g) while the pGACS ( ) for (Fig. 7j), and thus, an AR(30) was used to generate a Gaussian time series (3000 years) which was transformed to synthetic number of extremes per year (Fig. 7h) by applying the transformation in Eq. (2) with being the quantile of the fitted Polya-Aeppli distribution. The synthetic time series have the expected marginal (Fig. 7i) and the expected ACS (Fig. 7j).
684 685 686 687 688 689 690 691 692 693
In many cases we may desire to express a variable in two states, e.g., rain and no-rain. We use the same daily rainfall record (ASN00008067), aggregated to the annual timescale, to form a binary time series by assigning 0 to annual precipitation values that belong to the first quartile (low values indicating ) drought years) and 1 to the rest (Fig. 7k). The Bernoulli ( ) distribution was used with ( to describe the probability to observe a drought year, and a Weibull ACS was prescribed, i.e., ( ) ( ) to demonstrate the method as the empirical correlation is almost zero. The ) (Fig. 7l) shows moderate to strong concavity with a estimated Kumaraswamy ACTF ( perfect fit to the points, while the pGACS ( ) for (Fig. 7o). Gaussian time series (3000 years) were generated from an AR(30) and transformed to binary time series accordingly (Fig. 7m). The synthetic binary time series have the expected marginal (Fig. 7n) and the expected ACS (Fig. 7o).
694 695 696 697 698 699 700 701
A Multivariate simulation of precipitation, wind, and relative humidity
M
AN US
CR IP T
674 675 676 677 678 679 680 681 682 683
CE
PT
ED
As a final example we demonstrate the method in a multivariate simulation of three different processes at daily scale, i.e., of precipitation * ( )}, wind speed * ( )+ and relative humidity * ( )+. For precipitation, we assume probability dry and for positive values a power-type ( ) distribution; for wind speed probability calmness and for positive values the two-parameter Weibull ( ) distribution; for relative humidity a Kumaraswamy ( ) marginal distribution (Fig. 8a-c). Thus, we have two mixed-type marginals, with different and different continuous part, and a continuous marginal bounded in ( ). Also we assume lag-0 and lag-1 correlation matrices
702 703 704
(
AC
( )
with the ( -
( )
)
(
) element expressing the cross-correlation coefficients
and - process. Of course the diagonal in ( ) is always symmetric while ( ) is not.
)
(34)
( ) for lags 0 or 1 between the
( ) refers to the lag-1 autocorrelation. Note that
28
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
705
Fig. 8. Hypothetical multivariate case of daily precipitation with , wind speed with , and relative humidity: (a-c) the the target marginal pdf’s of the three variables; (d-f) the corresponding ACTF’s; (g-i) the CCTF’s; and (j-l) synthetic times series that preserve the , the marginal distributions, and the lag-0 and lag-1 cross-correlations among them.
PT
706 707 708
The ACTF’s (Fig. 8d-f) show moderate concavity for the precipitation marginal, slight for wind, and essentially zero for relative humidity. The CCTF is concave for rainfall-wind (Fig. 8g) with estimated upper limit 0.80, while it is clearly convex for rainfall-humidity with limit 0.42 (Fig. 8h), and slightly convex for the wind-humidity case with limit 0.76 (Fig. 8i). These functions were used to estimate the pGp correlation matrices, i.e.,
AC
710 711 712 713 714
CE
709
( )
(
( )
)
29
(
)
(35)
ACCEPTED MANUSCRIPT
715 716
which accordingly were used to fit a multivariate AR(1) and generate a multivariate time series of 10000 values. Accordingly, as in the previous cases, each Gaussian time series was transformed into its target
717
process time series using the corresponding transformations
718 719 720 721
time series (Fig. 8j-l) preserve all desired properties, i.e., , marginal distributions, and the lag-0 and lag-1 correlation matrices, while the actual “topology” of the correlation structure can be seen in the scatter plots of Fig. A.3. Of course, it is straightforward to apply this framework for the multivariate cyclostationary case, and use higher order MAR models if desired.
722
Summary and Conclusions
723 724 725 726 727 728 729 730
Natural processes have a “dynamic” randomness unveiled by structural dependencies in time and/or space and by probability laws governing the frequency of their values. There is no unique correlation structure or probability law to describe the pluralism and the polyphony observed in natural processes. Therefore, and in order to avoid ad hoc solutions, we need a general stochastic framework capable of reproducing this “dynamic” randomness irrespective of the correlation structure or the marginal probability law. The framework presented here, unifies, introduces new elements, extends and simplifies previous attempts proposing a single approach for stochastic modelling of any hydroclimatic process and beyond.
731 732 733 734 735 736 737
The basic assumption is that a stationary or cyclostationary process * ( )+, with any prescribed marginal distribution ( ) and any linear correlation structure ( ) has a parent-Gaussian process that can be easily assessed. For the multivariate case, this implies that a set of processes have a corresponding set of parent-Gaussian processes. In this direction, a unified framework is introduced based on simple analytical auto- and cross-correlation transformation functions that enable fast and easy estimation of the parent-Gaussian processes avoiding iterative and case specific methods proposed in the past.
738 739 740 741 742 743 744 745 746 747 748
The method is applied in a large variety of hydroclimatic processes, ranging from intermittent variables like precipitation and wind speed, to processes with heavy-tail marginals like river discharge, and to processes with bounded marginals like relative humidity, or even to discrete and binary cases. The versatility of the method is also demonstrated in a multivariate case where processes with very different marginals and linear correlation structures are reproduced exactly. As a final remark, the framework proposed enables an easy estimation of the limit or the maximum feasible cross-correlation between two processes and reveals that when their marginals differ significantly in shape, then their linear cross-correlation cannot be intense. The same remark holds also for single processes, and although there is no limit and the autocorrelation can reach up to 1, processes with very different marginals from the Gaussian, e.g., highly skewed and heavy-tailed, will be observed with weak linear autocorrelation structures, indicating, maybe, that nature likes to “hide” its internal structure.
.
( ( )/. The synthetic
AC
CE
PT
ED
M
AN US
CR IP T
( )
30
ACCEPTED MANUSCRIPT
Acknowledgements
750 751 752
I honestly thank: Y Markonis for providing the river discharge, wind and humidity data as well as for his general support; KM Andreadis for spotting many typos; and the reviewers for meticulously reading the manuscript and providing constructive remarks which led in a significantly improved version.
753
References
754 755
Aghakouchak, A., Habib, E., Bárdossy, A., 2010. A comparison of three remotely sensed rainfall ensemble generators. Atmospheric Research 98, 387–399.
756 757
Bárdossy, A., Pegram, G.G.S., 2009. Copula based multisite model for daily precipitation simulation. Hydrology and Earth System Sciences 13, 2299.
758 759
Bardossy, A., Plate, E.J., 1992. Space-time model for daily rainfall using atmospheric circulation patterns. Water Resour. Res. 28, 1247–1259. https://doi.org/10.1029/91WR02589
760 761
Bartolini, P., Salas, J.D., Obeysekera, J.T.B., 1988. Multivariate periodic ARMA (1, 1) processes. Water Resources Research 24, 1237–1246.
762
Beran, J., 1994. Statistics for Long-Memory Processes. CRC Press.
763 764
Bowers, M.C., Tung, W.W., Gao, J.B., 2012. On the distributions of seasonal river flows: Lognormal or power law? Water Resour. Res. 48, W05536. https://doi.org/10.1029/2011WR011308
765 766
Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 2008. Time Series Analysis: Forecasting and Control, 4 edition. ed. Wiley, Hoboken, N.J.
767 768 769
Breinl, K., Turkington, T., Stowasser, M., 2015. Simulating daily precipitation and temperature: a weather generation framework for assessing hydrometeorological hazards. Meteorological Applications 22, 334–347.
770 771 772
Bruhn, J.A., Fry, W.E., Fick, G.W., 1979. Simulation of Daily Weather Data Using Theoretical Probability Distributions. J. Appl. Meteor. 19, 1029–1036. https://doi.org/10.1175/15200450(1980)019<1029:SODWDU>2.0.CO;2
773 774
Buell, C.E., 1972. Correlation Functions for Wind and Geopotential on Isobaric Surfaces. J. Appl. Meteor. 11, 51–59. https://doi.org/10.1175/1520-0450(1972)011<0051:CFFWAG>2.0.CO;2
777 778 779
AN US
M
ED
PT
CE
AC
775 776
CR IP T
749
Buishand, T.A., 1978. Some remarks on the use of daily rainfall models. Journal of Hydrology 36, 295– 308. https://doi.org/10.1016/0022-1694(78)90150-6 Burr, I.W., 1942. Cumulative Frequency Functions. The Annals of Mathematical Statistics 13, 215–232. Cario, M.C., Nelson, B.L., 1998. Numerical methods for fitting and simulating autoregressive-to-anything processes. INFORMS Journal on Computing 10, 72–81.
31
ACCEPTED MANUSCRIPT
Cario, M.C., Nelson, B.L., 1997. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.
783 784 785
Cârsteanu, A., Foufoula-Georgiou, E., 1996. Assessing dependence among weights in a multiplicative cascade model of temporal rainfall. J. Geophys. Res. 101, 26363–26370. https://doi.org/10.1029/96JD01657
786 787 788
Carta, J.A., Ramírez, P., Velázquez, S., 2009. A review of wind speed probability distributions used in wind energy analysis. Renewable and Sustainable Energy Reviews 13, 933–955. https://doi.org/10.1016/j.rser.2008.05.005
789 790
Chang, K.-F., 1996. Strictly Positive Definite Functions. Journal of Approximation Theory 87, 148–158. https://doi.org/10.1006/jath.1996.0097
791 792
Chung, C., Salas, J.D., 2000. Drought occurrence probabilities and risks of dependent hydrologic processes. Journal of Hydrologic Engineering 5, 259–268.
793 794 795
Ciach, G.J., Krajewski, W.F., 2006. Analysis and modeling of spatial correlation structure in small-scale rainfall in Central Oklahoma. Advances in Water Resources 29, 1450–1463. https://doi.org/10.1016/j.advwatres.2005.11.003
796 797 798
Cowpertwait, P., Isham, V., Onof, C., 2007. Point process models of rainfall: developments for fine-scale structure. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 463, 2569–2587. https://doi.org/10.1098/rspa.2007.1889
799 800
Davison, A.C., Padoan, S.A., Ribatet, M., 2012. Statistical Modeling of Spatial Extremes. Statistical Science 27, 161–186.
801 802
Demirtas, H., 2017. Concurrent generation of binary and nonnormal continuous data through fifth-order power polynomials. Communications in Statistics-Simulation and Computation 46, 344–357.
803 804
Demirtas, H., 2014. Joint generation of binary and nonnormal continuous data. Journal of Biometrics & Biostatistics 1.
805 806
Demirtas, H., Hedeker, D., 2011. A practical way for computing approximate lower and upper correlation bounds. The American Statistician 65, 104–109.
807 808 809
Dolloff, J., Lofy, B., Sussman, A., Taylor, C., 2006. Strictly positive definite correlation functions. Presented at the Signal Processing, Sensor Fusion, and Target Recognition XV, International Society for Optics and Photonics, p. 62351A. https://doi.org/10.1117/12.663967
812 813 814 815
AN US
M
ED
PT
CE
AC
810 811
CR IP T
780 781 782
Embrechts, P., McNeil, A., Straumann, D., 2002. Correlation and dependence in risk management: properties and pitfalls. Risk management: value at risk and beyond 176223. Emrich, L.J., Piedmonte, M.R., 1991. A method for generating high-dimensional multivariate binary variates. The American Statistician 45, 302–304. Feller, W., 1971. Introduction to the Theory of Probability and its Applications, Vol. 2. II (2. Ed.) New York: Wiley. 32
ACCEPTED MANUSCRIPT
Ferrari, P.A., Barbiero, A., 2012. Simulating Ordinal Data. Multivariate Behavioral Research 47, 566–589. https://doi.org/10.1080/00273171.2012.692630
818 819
Foufoula-Georgiou, E., Krajewski, W., 1995. Recent advances in rainfall modeling, estimation, and forecasting. Rev. Geophys. 33, 1125–1137. https://doi.org/10.1029/95RG00338
820 821
Foufoula-Georgiou, E., Lettenmaier, D.P., 1987. A Markov Renewal Model for rainfall occurrences. Water Resources Research 23, 875–884. https://doi.org/10.1029/WR023i005p00875
822 823
Fréchet, M., 1957. Les tableaux de corrélation et les programmes linéaires. Revue de l’Institut international de statistique 23–40.
824 825 826
Geng, S., Penning de Vries, F.W.T., Supit, I., 1986. A simple method for generating daily rainfall data. Agricultural and Forest Meteorology 36, 363–376. https://doi.org/10.1016/01681923(86)90014-6
827 828
Georgakakos, K.P., Carsteanu, A.A., Sturdevant, P.L., Cramer, J.A., 1994. Observation and analysis of Midwestern rain rates. Journal of applied meteorology 33, 1433–1444.
829 830
Gneiting, T., 1999. Correlation functions for atmospheric data analysis. Q.J.R. Meteorol. Soc. 125, 2449– 2464. https://doi.org/10.1002/qj.49712555906
831 832
Gneiting, T., Schlather, M., 2004. Stochastic Models That Separate Fractal Dimension and the Hurst Effect. SIAM Rev. 46, 269–282. https://doi.org/10.1137/S0036144501394387
833 834
Granger, C.W., Joyeux, R., 1980. An introduction to long-memory time series models and fractional differencing. Journal of time series analysis 1, 15–29.
835 836
Grimaldi, S., Serinaldi, F., 2006. Asymmetric copula in multivariate flood frequency analysis. Advances in Water Resources 29, 1155–1167.
837 838
Hammer, G.R., Steurer, P.M., 1997. Data set documentation for hourly precipitation data. NOAA/NCDC TD3240 Documentation Series, Asheville, NC 18.
839 840
Hering, A.S., Kazor, K., Kleiber, W., 2015. A Markov-switching vector autoregressive stochastic wind generator for multiple spatial and temporal scales. Resources 4, 70–92.
841 842
Herr, H.D., Krzysztofowicz, R., 2005. Generic probability distribution of rainfall in space: The bivariate model. Journal of Hydrology 306, 234–263.
843 844
Higham, N.J., 2002. Computing the nearest correlation matrix—a problem from finance. IMA J Numer Anal 22, 329–343. https://doi.org/10.1093/imanum/22.3.329
AN US
M
ED
PT
CE
AC
845 846
CR IP T
816 817
Higham, N.J., 1988. Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications 103, 103–118. https://doi.org/10.1016/0024-3795(88)90223-6
847 848
Hoeffding, W., 1994. Scale—invariant correlation theory, in: The Collected Works of Wassily Hoeffding. Springer, pp. 57–107.
849 850
Hosking, J.R., 1984. Modeling persistence in hydrological time series using fractional differencing. Water resources research 20, 1898–1908. 33
ACCEPTED MANUSCRIPT
Hosking, J.R., 1981. Fractional differencing. Biometrika 68, 165–176.
852 853
Hosking, J.R.M., 1994. The four-parameter kappa distribution. IBM Journal of Research and Development 38, 251–258.
854 855
Hosking, J.R.M., 1990. L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. Journal of the Royal Statistical Society. Series B (Methodological) 52, 105–124.
856 857
Hurst, H.E., 1951. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770–808.
858 859
Huser, R., Davison, A.C., Genton, M.G., 2016. Likelihood estimators for multivariate extremes. Extremes 19, 79–103. https://doi.org/10.1007/s10687-015-0230-4
860 861
Huser, R., Genton, M.G., 2016. Non-Stationary Dependence Structures for Spatial Extremes. JABES 21, 470–491. https://doi.org/10.1007/s13253-016-0247-4
862 863
Iliopoulou, T., Papalexiou, S.M., Markonis, Y., Koutsoyiannis, D., 2016. Revisiting long-range dependence in annual precipitation. Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2016.04.015
864 865
Istok, J.D., Boersma, L., 1989. A stochastic cluster model for hourly precipitation data. Journal of Hydrology 106, 257–285. https://doi.org/10.1016/0022-1694(89)90076-0
866 867 868
Justus, C.G., Hargraves, W.R., Mikhail, A., Graber, D., 1978. Methods for Estimating Wind Speed Frequency Distributions. J. Appl. Meteor. 17, 350–353. https://doi.org/10.1175/15200450(1978)017<0350:MFEWSF>2.0.CO;2
869
Kendall, M., Stuart, A., 1979. Handbook of Statistics. Griffin & Company, London.
870 871 872
Kilsby, C.G., Jones, P.D., Burton, A., Ford, A.C., Fowler, H.J., Harpham, C., James, P., Smith, A., Wilby, R.L., 2007. A daily weather generator for use in climate change studies. Environmental Modelling & Software 22, 1705–1719.
873 874 875
Kleiber, W., Katz, R.W., Rajagopalan, B., 2012. Daily spatiotemporal precipitation simulation using latent and transformed Gaussian processes. Water Resour. Res. 48, W01523. https://doi.org/10.1029/2011WR011105
876 877 878
Klein Tank, A.M.G., Können, G.P., 2003. Trends in Indices of Daily Temperature and Precipitation Extremes in Europe, 1946–99. J. Climate 16, 3665–3680. https://doi.org/10.1175/15200442(2003)016<3665:TIIODT>2.0.CO;2
879 880
Kroll, C.N., Vogel, R.M., 2002. Probability distribution of low streamflow series in the United States. Journal of Hydrologic Engineering 7, 137–146.
883 884
AN US
M
ED
PT
CE
AC
881 882
CR IP T
851
Kugiumtzis, D., 2002. Statically transformed autoregressive process and surrogate data test for nonlinearity. Physical Review E 66, 025201. Kumar, P., Foufoula-Georgiou, E., 1997. Wavelet analysis for geophysical applications. Rev. Geophys. 35, 385–412. https://doi.org/10.1029/97RG00427
34
ACCEPTED MANUSCRIPT
Kumaraswamy, P., 1980. A generalized probability density function for double-bounded random processes. Journal of Hydrology 46, 79–88. https://doi.org/10.1016/0022-1694(80)90036-0
887 888 889
Lebrun, R., Dutfoy, A., 2009. An innovating analysis of the Nataf transformation from the copula viewpoint. Probabilistic Engineering Mechanics 24, 312–320. https://doi.org/10.1016/j.probengmech.2008.08.001
890 891
Lee, T., Salas, J.D., 2011. Copula-based stochastic simulation of hydrological data applied to Nile River flows. Hydrology Research 42, 318–330.
892 893
Li, H., Lü, Z., Yuan, X., 2008. Nataf transformation based point estimate method. Chin. Sci. Bull. 53, 2586. https://doi.org/10.1007/s11434-008-0351-0
894 895 896
Liu, P.-L., Der Kiureghian, A., 1986. Multivariate distribution models with prescribed marginals and covariances. Probabilistic Engineering Mechanics 1, 105–112. https://doi.org/10.1016/02668920(86)90033-0
897 898 899
Lombardo, F., Volpi, E., Koutsoyiannis, D., Papalexiou, S.M., 2013. Just two moments! A cautionary note against use of high-order moments in multifractal models in hydrology. Hydrology and Earth System Sciences Discussions 10, 4627–4654. https://doi.org/10.5194/hessd-10-4627-2013
900 901
Macke, J.H., Berens, P., Ecker, A.S., Tolias, A.S., Bethge, M., 2009. Generating spike trains with specified correlation coefficients. Neural computation 21, 397–423.
902
Mandelbrot, B.B., 1971. A Fast Fractional Gaussian Noise Generator. Water Resour. Res. 7, 543–553.
903 904
Mandelbrot, B.B., Wallis, J.R., 1969. Some long-run properties of geophysical records. Water Resour. Res. 5, 321–340. https://doi.org/10.1029/WR005i002p00321
905 906
Mandelbrot, B.B., Wallis, J.R., 1968. Noah, Joseph, and Operational Hydrology. Water Resour. Res. 4, 909–918. https://doi.org/10.1029/WR004i005p00909
907 908 909
Markonis, Y., Moustakis, Y., Nasika, C., Sychova, P., Dimitriadis, P., Hanel, M., Máca, P., Papalexiou, S.M., 2018. Global estimation of long-term persistence in annual river runoff. Advances in Water Resources 113, 1–12. https://doi.org/10.1016/j.advwatres.2018.01.003
910 911 912
Mehrotra, R., Sharma, A., 2009. Evaluating spatio-temporal representations in daily rainfall sequences from three stochastic multi-site weather generation approaches. Advances in Water Resources 32, 948–962.
913 914
Mielke Jr, P.W., 1973. Another Family of Distributions for Describing and Analyzing Precipitation Data. Journal of Applied Meteorology 12, 275–280.
918 919
AN US
M
ED
PT
CE
AC
915 916 917
CR IP T
885 886
Mielke Jr, P.W., Johnson, E.S., 1974. Some generalized beta distributions of the second kind having desirable application features in hydrology and meteorology. Water Resources Research 10, 223–226. Mielke Jr, P.W., Johnson, E.S., 1973. Three-Parameter Kappa Distribution Maximum Likelihood Estimates and Likelihood Ratio Tests. Monthly Weather Review 101, 701–707.
35
ACCEPTED MANUSCRIPT
Molotch, N.P., Colee, M.T., Bales, R.C., Dozier, J., 2005. Estimating the spatial distribution of snow water equivalent in an alpine basin using binary regression tree models: the impact of digital elevation data and independent variable selection. Hydrological Processes 19, 1459–1479.
923 924 925
Montanari, A., Rosso, R., Taqqu, M.S., 1997. Fractionally differenced ARIMA models applied to hydrologic time series: Identification, estimation, and simulation. Water resources research 33, 1035–1044.
926 927 928
Nataf, A., 1962. Statistique mathematique-determination des distributions de probabilites dont les marges sont donnees. COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES 255, 42.
929 930
Neumaier, A., Schneider, T., 2001. Estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Transactions on Mathematical Software (TOMS) 27, 27–57.
931 932 933
Onof, C., Chandler, R.E., Kakou, A., Northrop, P., Wheater, H.S., Isham, V., 2000. Rainfall modelling using Poisson-cluster processes: a review of developments. Stochastic Environmental Research and Risk Assessment 14, 384–411. https://doi.org/10.1007/s004770000043
934 935 936
Onof, C., Townend, J., Kee, R., 2005. Comparison of two hourly to 5-min rainfall disaggregators. Atmospheric Research, Precipitation in Urban Areas6th International Workshop on Precipitation in Urban Areas 77, 176–187. https://doi.org/10.1016/j.atmosres.2004.10.022
937 938 939 940
Onof, C., Wheater, H.S., 1993. Modelling of British rainfall using a random parameter Bartlett-Lewis Rectangular Pulse Model. Journal of Hydrology, Conventry Groundwater Investigation: Sources and Movement of Chlorinated Hydrocarbon Solvents 149, 67–95. https://doi.org/10.1016/00221694(93)90100-N
941 942
Over, T.M., Gupta, V.K., 1996. A space-time theory of mesoscale rainfall using random cascades. J. Geophys. Res. 101, 26319–26331. https://doi.org/10.1029/96JD02033
943 944 945
Papalexiou, S.M., Dialynas, Y.G., Grimaldi, S., 2016. Hershfield factor revisited: Correcting annual maximum precipitation. Journal of Hydrology 542, 884–895. https://doi.org/10.1016/j.jhydrol.2016.09.058
946 947 948
Papalexiou, S.M., Koutsoyiannis, D., 2016. A global survey on the seasonal variation of the marginal distribution of daily precipitation. Advances in Water Resources 94, 131–145. https://doi.org/10.1016/j.advwatres.2016.05.005
949 950
Papalexiou, S.M., Koutsoyiannis, D., 2013. Battle of extreme value distributions: A global survey on extreme daily rainfall. Water Resour. Res. 49, 187–201. https://doi.org/10.1029/2012WR012557
954 955 956
AN US
M
ED
PT
CE
AC
951 952 953
CR IP T
920 921 922
Papalexiou, S.M., Koutsoyiannis, D., 2012. Entropy based derivation of probability distributions: A case study to daily rainfall. Advances in Water Resources, Space-Time Precipitation from Urban Scale to Global Change 45, 51–57. https://doi.org/10.1016/j.advwatres.2011.11.007 Papalexiou, S.M., Koutsoyiannis, D., Makropoulos, C., 2013. How extreme is extreme? An assessment of daily rainfall distribution tails. Hydrol. Earth Syst. Sci. 17, 851–862. https://doi.org/10.5194/hess-17-851-2013 36
ACCEPTED MANUSCRIPT
Papalexiou, S.M., Koutsoyiannis, D., Montanari, A., 2011. Can a simple stochastic model generate rich patterns of rainfall events? Journal of Hydrology 411, 279–289. https://doi.org/10.1016/j.jhydrol.2011.10.008
960 961
Papoulis, A., Pillai, S.U., 2002. Probability, random variables, and stochastic processes. Tata McGraw-Hill Education.
962 963
Park, J.-S., Seo, S.-C., Kim, T.Y., 2009. A kappa distribution with a hydrological application. Stoch Environ Res Risk Assess 23, 579–586. https://doi.org/10.1007/s00477-008-0243-5
964 965 966
Paschalis, A., Molnar, P., Fatichi, S., Burlando, P., 2014. On temporal stochastic modeling of precipitation, nesting models across scales. Advances in Water Resources 63, 152–166. https://doi.org/10.1016/j.advwatres.2013.11.006
967 968
Ribatet, M., 2013. Spatial extremes: Max-stable processes at work. Journal de la Société Française de Statistique 154, 156–177.
969 970 971
Rodriguez-Iturbe, I., Cox, D.R., Isham, V., 1987. Some Models for Rainfall Based on Stochastic Point Processes. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 410, 269–288. https://doi.org/10.1098/rspa.1987.0039
972
Salas, J.D., 1980. Applied modeling of hydrologic time series. Water Resources Publication.
973 974
Salas, J.D., Boes, D.C., Smith, R.A., 1982. Estimation of ARMA Models with seasonal parameters. Water Resour. Res. 18, 1006–1010. https://doi.org/10.1029/WR018i004p01006
975 976
Salas, J.D., Sveinsson, O.G., Lane, W.L., Frevert, D.K., 2006. Stochastic streamflow simulation using SAMS-2003. Journal of irrigation and drainage engineering 132, 112–122.
977 978
Schlögl, A., 2006. A comparison of multivariate autoregressive estimators. Signal processing 86, 2426– 2429.
979 980 981
Schneider, T., Neumaier, A., 2001. Algorithm 808: ARfit—A Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Transactions on Mathematical Software (TOMS) 27, 58–65.
982 983 984
Seguro, J.V., Lambert, T.W., 2000. Modern estimation of the parameters of the Weibull wind speed distribution for wind energy analysis. Journal of Wind Engineering and Industrial Aerodynamics 85, 75–84. https://doi.org/10.1016/S0167-6105(99)00122-1
985 986
Serinaldi, F., 2009. A multisite daily rainfall generator driven by bivariate copula-based mixed distributions. Journal of Geophysical Research: Atmospheres 114.
AN US
M
ED
PT
CE
AC
987 988
CR IP T
957 958 959
Serinaldi, F., Kilsby, C.G., 2016. A Blueprint for Full Collective Flood Risk Estimation: Demonstration for European River Flooding. Risk Analysis.
989 990
Serinaldi, F., Kilsby, C.G., 2014a. Simulating daily rainfall fields over large areas for collective risk estimation. Journal of Hydrology 512, 285–302.
991 992
Serinaldi, F., Kilsby, C.G., 2014b. Rainfall extremes: Toward reconciliation after the battle of distributions. Water Resour. Res. 50, 336–352. https://doi.org/10.1002/2013WR014211 37
ACCEPTED MANUSCRIPT
Serinaldi, F., Kilsby, C.G., Lombardo, F., 2018. Untenable nonstationarity: An assessment of the fitness for purpose of trend tests in hydrology. Advances in Water Resources 111, 132–155. https://doi.org/10.1016/j.advwatres.2017.10.015
996 997
Serinaldi, F., Lombardo, F., 2017. General simulation algorithm for autocorrelated binary processes. Physical Review E 95, 023312.
998 999
Shao, Q., Lund, R., 2004. Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. Journal of Time Series Analysis 25, 359–372.
CR IP T
993 994 995
Sirangelo, B., Versace, P., De Luca, D.L., 2007. Rainfall nowcasting by at site stochastic model PRAISE. Hydrology and Earth System Sciences Discussions 4, 151–177.
1002 1003
Smith, R.L., Schreiber, H.A., 1974. Point Processes of Seasonal Thunderstorm Rainfal 2. Rainfall Depth Probabilities. Water Resources Research.
1004 1005
Stacy, E.W., 1962. A Generalization of the Gamma Distribution. The Annals of Mathematical Statistics 33, 1187–1192.
1006 1007
Stacy, E.W., Mihram, G.A., 1965. Parameter Estimation for a Generalized Gamma Distribution. Technometrics 7, 349–358.
1008 1009 1010
Stephenson, A.G., Lehmann, E.A., Phatak, A., 2016. A max-stable process model for rainfall extremes at different accumulation durations. Weather and Climate Extremes 13, 44–53. https://doi.org/10.1016/j.wace.2016.07.002
1011 1012
Stern, R.D., Coe, R., 1984. A Model Fitting Analysis of Daily Rainfall Data. Journal of the Royal Statistical Society. Series A (General) 147, 1. https://doi.org/10.2307/2981736
1013 1014 1015
Swift, L.W., Schreuder, H.T., 1981. Fitting Daily Precipitation Amounts Using the SB Distribution. Monthly Weather Review 109, 2535–2540. https://doi.org/10.1175/15200493(1981)109<2535:FDPAUT>2.0.CO;2
1016 1017
Tadikamalla, P.R., 1980. A Look at the Burr and Related Distributions. International Statistical Review / Revue Internationale de Statistique 48, 337–344.
1018 1019
Todorovic, P., Woolhiser, D.A., 1975. A stochastic model of n-day precipitation. Journal of Applied Meteorology 14, 17–24.
1020 1021 1022
Tsoukalas, I., Efstratiadis, A., Makropoulos, C., 2018. Stochastic Periodic Autoregressive to Anything (SPARTA): Modeling and Simulation of Cyclostationary Processes With Arbitrary Marginal Distributions. Water Resour. Res. n/a-n/a. https://doi.org/10.1002/2017WR021394
1025 1026 1027
M
ED
PT
CE
AC
1023 1024
AN US
1000 1001
van Montfort, M.A.J., 1990. Sliding maxima. Journal of Hydrology 118, 77–85. https://doi.org/10.1016/0022-1694(90)90251-R Verhoest, N., Troch, P.A., De Troch, F.P., 1997. On the applicability of Bartlett–Lewis rectangular pulses models in the modeling of design storms at a point. Journal of Hydrology 202, 108–120. https://doi.org/10.1016/S0022-1694(97)00060-7
38
ACCEPTED MANUSCRIPT
Villarini, G., Seo, B.-C., Serinaldi, F., Krajewski, W.F., 2014. Spatial and temporal modeling of radar rainfall uncertainties. Atmospheric research 135, 91–101.
1030 1031 1032
Waymire, E., Gupta, V.K., 1981. The mathematical structure of rainfall representations: 1. A review of the stochastic rainfall models. Water Resources Research 17, 1261–1272. https://doi.org/10.1029/WR017i005p01261
1033 1034 1035
Wheater, H.S., Chandler, R.E., Onof, C.J., Isham, V.S., Bellone, E., Yang, C., Lekkas, D., Lourmas, G., Segond, M.-L., 2005. Spatial-temporal rainfall modelling for flood risk estimation. Stoch Environ Res Ris Assess 19, 403–416. https://doi.org/10.1007/s00477-005-0011-8
1036 1037
Wilks, D.S., 1999. Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agricultural and Forest meteorology 96, 85–101.
1038 1039
Wilks, D.S., 1998. Multisite generalization of a daily stochastic precipitation generation model. Journal of Hydrology 210, 178–191. https://doi.org/10.1016/S0022-1694(98)00186-3
1040 1041
Wilks, D.S., Wilby, R.L., 1999. The weather generation game: a review of stochastic weather models. Progress in physical geography 23, 329–357.
1042 1043
Wilson, P.S., Toumi, R., 2005. A fundamental probability distribution for heavy rainfall. Geophys. Res. Lett. 32, L14812. https://doi.org/10.1029/2005GL022465
1044 1045
Xiao, Q., 2014. Evaluating correlation coefficient for Nataf transformation. Probabilistic Engineering Mechanics 37, 1–6. https://doi.org/10.1016/j.probengmech.2014.03.010
1046 1047
Yaglom, A.M., 2012. Correlation Theory of Stationary and Related Random Functions: Supplementary Notes and References. Springer Science & Business Media.
1048 1049
Yao, A.Y.M., 1974. A Statistical Model for the Surface Relative Humidity. J. Appl. Meteor. 13, 17–21. https://doi.org/10.1175/1520-0450(1974)013<0017:ASMFTS>2.0.CO;2
1050 1051
Youngman, B.D., Stephenson, D.B., 2016. A geostatistical extreme-value framework for fast simulation of natural hazard events, in: Proc. R. Soc. A. The Royal Society, p. 20150855.
AC
CE
PT
ED
M
AN US
CR IP T
1028 1029
39
ACCEPTED MANUSCRIPT
Appendix A
ED
M
AN US
CR IP T
1052
CE
Fig. A.1. Probability plots for all months related to the hourly precipitation example described in Section 0. The fitted distributions (estimated parameters are given in the legend) to the observed nonzero data are shown as well as the empirical distributions of the synthetic data.
AC
1054 1055 1056
PT
1053
40
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
1057
CE
PT
Fig. A.2. Correlograms for all months related to the hourly precipitation example described in Section 0. The PII ACS was fitted to empirical values (estimated parameters are given in the legend) while for comparison the empirical ACS’s of the synthetic time series are also shown.
AC
1058 1059 1060
41
ACCEPTED MANUSCRIPT
1063 1064
Fig. A.3. Scatter plots related to the multivariate simulation example described in Section 0. The plots express the correlations given in the lag-0 and lag-1 correlation matrices.
AC
1062
CE
PT
ED
M
AN US
CR IP T
1061
42