Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency

Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency

Accepted Manuscript Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, an...

2MB Sizes 0 Downloads 18 Views

Accepted Manuscript

Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency Simon Michael Papalexiou PII: DOI: Reference:

S0309-1708(17)30989-2 10.1016/j.advwatres.2018.02.013 ADWR 3099

To appear in:

Advances in Water Resources

Received date: Revised date: Accepted date:

25 October 2017 14 February 2018 14 February 2018

Please cite this article as: Simon Michael Papalexiou , Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency, Advances in Water Resources (2018), doi: 10.1016/j.advwatres.2018.02.013

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

1

Highlight:

2 3 4 5 6

    

Stochastic modelling reproducing any marginal distribution and linear correlation Applicable in univariate, cyclostationary and multivariate cases Precise modelling of precipitation, river discharge, wind, etc. at any time scale Parametric correlation transformation functions unify and simplify the scheme Empirical correlation representation through parsimonious parametric functions

AC

CE

PT

ED

M

AN US

CR IP T

7

1

ACCEPTED MANUSCRIPT

8 9

10

Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency Simon Michael Papalexiou

13

Abstract

14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Hydroclimatic processes come in all “shapes and sizes”. They are characterized by different spatiotemporal correlation structures and probability distributions that can be continuous, mixed-type, discrete or even binary. Simulating such processes by reproducing precisely their marginal distribution and linear correlation structure, including features like intermittency, can greatly improve hydrological analysis and design. Traditionally, modelling schemes are case specific and typically attempt to preserve few statistical moments providing inadequate and potentially risky distribution approximations. Here, a single framework is proposed that unifies, extends, and improves a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific “parent” Gaussian process. A novel mathematical representation of this scheme, introducing parametric correlation transformation functions, enables straightforward estimation of the parent-Gaussian process yielding the target process after the marginal back transformation, while it provides a general description that supersedes previous specific parameterizations, offering a simple, fast and efficient simulation procedure for every stationary process at any spatiotemporal scale. This framework, also applicable for cyclostationary and multivariate modelling, is augmented with flexible parametric correlation structures that parsimoniously describe observed correlations. Real-world simulations of various hydroclimatic processes with different correlation structures and marginals, such as precipitation, river discharge, wind speed, humidity, extreme events per year, etc., as well as a multivariate example, highlight the flexibility, advantages, and complete generality of the method.

AN US

M

ED

PT

CE

Keywords: stochastic modeling, weather generator, parent-Gaussian framework, transformations, hydroclimatic processes, precipitation, river discharge, temperature, wind speed, humidity

AC

32 33

CR IP T

12

Department of Civil and Environmental Engineering, University of California, Irvine, CA, USA ([email protected])

11

2

ACCEPTED MANUSCRIPT

34

Introduction “Make things as simple as possible, but not simpler.” ~Albert Einstein

36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

The stochastic structure and the marginal probability laws of hydroclimatic processes like precipitation, river discharge, temperature, wind, or relevant processes with discrete or binary marginal distributions like the number of extremes per year, wet-dry sequences, etc., show complexities that current methodologies are unable to unify in a single modelling framework. These processes differ in their spatiotemporal correlation structures (CS) with stronger structures being observed for example in temperature than in precipitation (e.g., Wilks and Wilby, 1999). Moreover, they have different marginal distributions, e.g., temperature is normally distributed (Breinl et al., 2015; Kilsby et al., 2007) while the intermittent and highly skewed character of precipitation at fine spatiotemporal scales demands the use of mixed-type distributions (Herr and Krzysztofowicz, 2005; Papalexiou and Koutsoyiannis, 2016; Stern and Coe, 1984; Wilks, 1998). Complexities of these processes are further enhanced by seasonality and diurnal variation, as well as a tendency to normality as we shift from fine to coarser scales due to the central limit theorem. These reasons led to a very large number of stochastic modelling schemes (see e.g., Aghakouchak et al., 2010; Bárdossy and Pegram, 2009; Bardossy and Plate, 1992; Buishand, 1978; Cowpertwait et al., 2007; Foufoula-Georgiou and Krajewski, 1995; Foufoula-Georgiou and Lettenmaier, 1987; Grimaldi and Serinaldi, 2006; Hering et al., 2015; Kleiber et al., 2012; Lee and Salas, 2011; Lombardo et al., 2013; Mehrotra and Sharma, 2009; Onof et al., 2000; Onof and Wheater, 1993; Over and Gupta, 1996; Papalexiou et al., 2011; Paschalis et al., 2014; Rodriguez-Iturbe et al., 1987; Salas et al., 2006; Serinaldi, 2009; Serinaldi and Kilsby, 2014a, 2016; Sirangelo et al., 2007; Villarini et al., 2014; Waymire and Gupta, 1981; Wheater et al., 2005; Wilks, 1999; Youngman and Stephenson, 2016) to name just a few.

56 57 58 59

We posit that a general unified approach is however possible by assuming that an arbitrary (target) process * ( )+ with prescribed marginal distribution ( ) and autocorrelation structure (ACS) ( ) has a parent (or else an equivalent) Gaussian process (pGp) * ( )+ with standard Gaussian ( ) and a specific linear ACS ( ). Therefore, modelling and simulating * ( )+ only marginal

60

requires the definition of two functions

61

While

62 63

does not preserve the linear ACS (expressed by the Pearson correlation coefficient) but only rank correlations as is nonlinear and the ACS depends on the marginal distribution ( ) (Embrechts et al.,

65 66 67 68

AN US

M

ED

PT

CE

is easily identified as ( )

AC

64

CR IP T

35

and (

such that ( )

( )

(

( )).

( )), this transformation of the marginal distribution

2002). In particular, for bivariate normally distributed vectors ( , one can show that

( ( )) and

. ( ) ( )/

(

) and an arbitrary transformation )

(Kendall and Stuart, 1979, p.

600). This means that we need inflated values to obtain the target . Therefore, the modelling problem reduces to the definition of a correlation transformation function (CTF) to estimate the parent-Gaussian autocorrelation (pGACS) ( ) from a given ( ). 3

ACCEPTED MANUSCRIPT

This modeling framework was previously considered in several fields to solve specific problems, thus resulting in a sparse literature dealing with simulation of cross-correlated but serially independent realizations from multivariate distributions with specified binary, discrete, and/or continuous marginal distributions (Demirtas, 2014, 2017; Emrich and Piedmonte, 1991; Macke et al., 2009), or independent time series with specified serial correlation and binary, discrete, and/or continuous marginal distributions (Cario and Nelson, 1997, 1998; Kugiumtzis, 2002; Macke et al., 2009; Serinaldi and Lombardo, 2017), or, focusing on periodic processes (Tsoukalas et al., 2018). Moreover, most of these studies focused on specific classes of marginal distributions and CS’s along with ad hoc solutions for the definition of the CTF .

78 79 80 81 82 83 84 85 86 87 88

In this study, we present a unified and fully general mathematical representation of the parentGaussian modeling framework. Going beyond specific cases, we highlight its generality and its ability to model and simulate processes with both cross- and autocorrelations such as spatio-temporal hydrological processes. We introduce a general and efficient procedure to define the function based on a simple and flexible parametric function, as well as a general-purpose technique to simulate the pGp with suitably inflated prescribed CS. Finally, we illustrate the performance of the generalized parentGaussian framework by using a variety of CS’s and marginal distributions of discrete, continuous and mixed type that provide a suitable representation for most of hydroclimatic processes. These diverse examples emphasize how the generalized parent-Gaussian theory can overcome all ad hoc solutions previously proposed in the literature, thus offering a powerful but simple tool that improves the analysis of hydroclimatic variables.

89 90 91 92 93 94 95 96 97 98 99 100

The paper is organized as follows: Section 0 provides the theoretical background introducing a unified mathematical representation of the parent-Gaussian framework starting from the univariate case (i.e., serially autocorrelated processes) and then discussing the multivariate extension. Section 0 starts with a review of marginal distributions and proposes a general framework on how to define and use parametric auto- or cross-correlation structures suitable to describe a large variety of hydroclimatic processes covering most of the cases of practical interest in hydrological studies. Then we present a novel general-purpose CTF along with a general technique to simulate the parent Gaussian process, concluding with a step-by-step synopsis of the simulation algorithm. Section 0 reports several case studies involving hydroclimatic processes with heterogeneous marginal distributions (discrete, continuous, and mixed type) and different CS’s, e.g., precipitation, river discharge, wind speed, relative humidity, number of extreme events per year, etc., as well as, a multivariate case of rainfall, wind and relative humidity. Finally, conclusions are summarized in Section 0.

102 103 104

AN US

M

ED

PT

CE

AC

101

CR IP T

69 70 71 72 73 74 75 76 77

Theoretical framework Univariate case + be a stationary stochastic process over an indexed set T, having an arbitrary Let * ( ) | autocorrelation structure (ACS) ( ) ( ) for lag , and with the random variable (r.v.) ( ) following an 4

ACCEPTED MANUSCRIPT

( )(

). Stationarity entails no change in the joint probability

105

arbitrary marginal distribution function

106 107

distribution of any order and for any shift in time, and thus, * ( )+ (for brevity is omitted ( ) hereafter) is defined by a single ACS ( ) and a single marginal distribution ( ), as ( ) ( ) + be a standard Gaussian process (Gp) with CS ( ) for any and ( ) ( ) . Let * ( ) |

110 111 112 113 114 115 116

( ), implying that any linear combination of ( ( )

distribution (see e.g., Feller, 1971, p. 87) and any ( ) follows the standard Normal distribution ( ). Gaussian processes have well-defined properties and their simulation is easy, therefore, the key concept is to identify a parent Gp that can be transformed into the process * ( )+. Therefore, to simulate/reproduce a stochastic process ( ) with a given linear ACS ( ) and a given marginal ( ) we need to assess and transform the pGp. As the marginal of the Gp is known, this transformation only requires to assess the ACS of the parent-Gaussian process (pGACS) ( ) yielding the desired target ACS.

117 118 119 120

Let

( ( )) transforming * ( )+ into * ( )+, and

be a real one-to-one function with ( )

( ( )) transforming * ( )+ into * ( )+. It follows that ( ( )) maps Gaussian r.v.’s ( ) ( ) and the pGACS ( ) to the target ACS ( ), and vice versa, if ( ) to r.v.’s ( ) ( ( )) is applied to * ( )+. This creates a unique mapping between the two processes, i.e., ( )

* ( )| ( )

(

)

↦ ( )+ ↤

( )

( ( ))

* ( )| ( )

( )

( )+

(1)

( ( ))

M

( )

Now, let us define the transformations

ED

121

( )) has a joint standard Normal

CR IP T

109

AN US

108

( )

( ))

(

( )

( ))

PT

(

(2) (3)

125 126

mapping X to Z. Thus, for the r.v.’s ( ) ( ) and ( ) ( ), where ( ) is any well-defined distribution function, a pair of transformation exists transforming one into the other and vice versa.

AC

127 128

CE

124

( ) ( ) with where ( are, respectively, the √ ) and ( ) √ distribution and quantile functions of the standard Gaussian variate Z, and ( ) the quantile function of X. It is well-known that ( ( )) maps Z to X, with ( ( )) being its inverse transformation

122 123

( ) (this definition is used for notational simplicity) be a pair of * ( )+ Let ( ) and ( ) r.v.’s at an arbitrary time point , lagged by , and having a correlation coefficient ( ) given by

( )

.( ( )

( )) ( )

( ( )

( ) )/

( )

5

( ( ) ( ))

(4)

ACCEPTED MANUSCRIPT

129 130

where and are the mean and the variance of the r.v. X (given that they are finite as there are r.v.’s with infinite mean and variance in which case the correlation cannot be defined), and

( ( ) ( ))

∫ ∫ ( ) ( )

( ) ( )(

( ) ( ))

( )

( )

(5)

( ) ( )) is the joint probability density function (pdf) of ( ) and ( ) which, in

( ) ( )(

where

132 133

general, does not have a simple parametric form. Yet since ( ) and ( ) are jointly normally distributed with correlation ( ) and with bivariate pdf

134 135

( ) ( ))

(

( )



( )

( ) ( ) ( ) ( ) ) (

( )

)

AN US

( ) ( )(

CR IP T

131

(6)

we can use the theory of r.v.’s transformations (see e.g., Feller, 1971) and Eq. (3) to estimate directly the joint pdf of ( ) and ( ), i.e., ( ) ( )(

( ) ( ))

where (( ( ) ( ( ))

( ( ))

( )

( ( ))

( ( ))

( )/ ( ( ) ( ))

(7)

139 140

with known ( ) and ( ) parameters we can calculate numerically the integral, and thus, the corresponding ( ) from Eq. (4). This creates a one-to-one link between ( ) and ( ).

141 142

The above approach has not been previously proposed in the literature, and although it results in complex ( ) ( ) ( ( ) ( )) expressions that may affect the numerical integration speed, in some

143 144 145 146 147 148

cases (encountered in numerical experiments, e.g., when Gamma type marginals are involved), it was faster than the more standard approach described next. Note that the computation of the Jacobian demands the existence of partial derivatives and thus for discrete marginal cases this approach may not be applicable. Another approach to estimate the ( ) is by applying the fundamental probability theorem (Feller, 1971, p. 5) stating that given an r.v. Y with known pdf ( ) and an r.v. ( ), where is a real function, the expectation of X can be expressed in terms of ( ), i.e., ( )

ED

PT

CE

149

AC

137

150

knowledge of

151 152

M

138

| is the Jacobian of the ( ( )) ( ) transformation (see e.g., Papoulis and Pillai, 2002, p. 244). Inspection of (7) reveals that ( ) parameters and of ( ). If we insert Eq. (7) into Eq. (5), ( ) ( ) ( ( ) ( )) is a function of the

136

|

( ) ( ).

( ( ))



( ) ( )

, avoiding the calculation of ( )



( )

which requires

( ). This theorem applies in higher dimensions and can be used to evaluate

( ( ) ( )) in terms of the bivariate standard Gaussian pdf (Eq. (6)) in combination with the Eq. (3) transformation. The autocovariance transformation integral ( 6

( )) is given then by

ACCEPTED MANUSCRIPT

( ))

( ∫ ∫

.

( ( ) ( ))

( ( ))/

(

( ( ))/

.

.

( ( ))/

( ) (

)( ( ) ( )

( ( ))/)

.

( ))

( )

( )

(8)

where ( ) is the parameter vector of the distribution ( ). In the copula literature Eq. (9) is known as the Nataf transformation (Nataf, 1962; Liu and Der Kiureghian, 1986; Li et al., 2008; Lebrun

155

and Dutfoy, 2009; Xiao, 2014). In general, the double integral ( solution, yet its numerical estimation is trivial.

158

( )) into Eq. (4), we get the autocorrelation transformation integral (ACTI)

Introducing (

157 (

( )) does not have an analytical

( )), i.e.,

( )

( ))

(

( )) is a function of

( ))

(

AN US

156

CR IP T

153 154

( ) as

(9)

159

Clearly, (

160

combining

161

the

162 163 164 165

, transforming essentially the Gaussian ACS ( ) into the ACS ( ). Note, this relationship transforms a known ( ) value into a ( ) value, while our goal is to transform an observed or a given ( ) value into the corresponding ( ) value; this is described in detail in Section 0.

166 167 168 169 170 171 172

Multivariate and cyclostationary case

174 175

and

( )with Eq. (4) to calculate

are functions of

as well. Likewise

( ) the ACTI creates a link between

( ), and thus, can be applied for any pair of ( ( ) ( )), or else for all lag values

ED

M

( ) and

( ) ( ))

CE

PT

Equations previously shown refer to a single stationary stochastic process, yet Eq. (9) can be applied more generally. As natural processes vary in space and time we need multivariate and cyclostationary models (Bartolini et al., 1988; Salas, 1980; Salas et al., 1982; Shao and Lund, 2004). Let us assume processes (e.g., a single process observed in different locations, different processes in the same location, etc.), and seasons (e.g., 12 months, 365 days, etc.), i.e., a total of processes, and with * ( )+ denoting the -th process at -th season, where , and . Accordingly, for ( ), and for every pair of processes each process we assume a marginal distribution ( ), an ACS

AC

173

( ) ( )(

and of

with

a cross-correlation structure (CCS)

( ), where

, and

. In this case,

Eq. (8) becomes

( ))

(

7

(10)

ACCEPTED MANUSCRIPT

∫ ∫

( ( ))/

(

( ( )))

( )(

( )

( ) ( )

( ))

( )

( )

and thus, the cross-correlation transformation integral (CCTI) becomes

( )

177

where

178

Thus,

( ))

(

are parameter vectors of

( ))

(

( ) and

(11)

( ) distributions, respectively.

( )) can be seen as a function transforming a given lag- correlation

(

( ) in a pair of standard Gp’s, into the lag- correlation

179

CR IP T

176

.

( ) between *

( )+ and

* ( )+ processes. Thus, the difference, compared with the univariate stationary case, is that here we have a set of processes * ( )+ and we want to assess a set of parent Gp’s * ( )+, or else,

182

their pGCCS’s

183

set of *

184 185 186

result, the problem of generating time series from a multivariate cyclostationary model comprising processes with different characteristics, simplifies in generating time series from a multivariate cyclostationary standard Gp.

187

From theory to practice

188

Marginal distributions of hydroclimatic processes

189 190 191 192 193 194 195 196 197 198 199 200 201

Continuous Common hydroclimatic processes like temperature, wind, precipitation, river discharge, relative humidity, etc., have continuous, or, mixed-type (partly continuous and partly discrete) marginal probability distributions. The number of parametric distributions found in the literature to describe hydroclimatic r.v.’s is very large: air temperatures is commonly described by the Gaussian distribution (e.g., Klein Tank and Können, 2003); the two-parameter Weibull has been the prevailing model for wind speeds (e.g., Justus et al., 1978; Seguro and Lambert, 2000), yet Gamma, Lognormal and Pearson type I distributions have also been used for wind (for a review see Carta et al., 2009); nonzero daily precipitation has been modelled, by Gamma (e.g., Buishand, 1978; Bruhn et al., 1979; Geng et al., 1986), Weibull (e.g., Swift and Schreuder, 1981; Wilson and Toumi, 2005), Exponential and mixed Exponentials distributions (e.g., Smith and Schreiber, 1974; Todorovic and Woolhiser, 1975; Wilks, 1999), or by heavier-tailed distributions like the Lognormal (e.g., Swift and Schreuder, 1981), Kappa (Mielke Jr, 1973; Mielke Jr and Johnson, 1973; Hosking, 1994; Park et al., 2009), and the generalized Beta distribution

AN US

180 181

( ). Given this, a multivariate and cyclostationary Gp can be transformed into the ( )

.

( ( ))/. As a

AC

CE

PT

ED

M

( )+ processes using the corresponding transformations

8

ACCEPTED MANUSCRIPT

(Mielke Jr and Johnson, 1974) (for a detailed global investigation on the distribution of seasonal daily precipitation see Papalexiou and Koutsoyiannis (2016) where the Generalized Gamma and the Burr type XII have been used; for global analyses on precipitation extremes see (Papalexiou et al., 2013; Papalexiou and Koutsoyiannis, 2013; Serinaldi and Kilsby, 2014b) ); river flows have been described by the Lognormal and Pareto distributions (e.g., Bowers et al., 2012); the Pearson Type III and the 3parameter lognormal distributions were suggested for low stream flows (Kroll and Vogel, 2002); the Beta distribution has been found to describe very well relative humidity values (e.g., Yao, 1974), and also, is a candidate model for percentage r.v.’s like probability dry, percentage of area affected by extreme temperature or precipitation, etc.

211 212 213 214 215 216 217 218 219

A complete review on the probability models applied in geophysical and hydroclimatic processes would be a difficult task by itself and out of the scope this study. A common practice is to fit many distributions and select the best fitted, yet this approach is susceptible to sample variations. In an attempt to simplify this procedure, and based on entropy information measures, a general theoretical framework was proposed (Papalexiou and Koutsoyiannis, 2012) that led to consistent distributions for positive geophysical and hydroclimatic r.v.’s. Particularly, the Generalized Gamma ( ) (Stacy, 1962; Stacy and Mihram, 1965) and the Burr type XII distributions ( ) (Burr, 1942; Tadikamalla, 1980) were derived, yet the Burr type III ( ) is also proposed here as a general and flexible power-type model. The probability density function of distribution and the distribution functions of the

220

and

)

)

( )

( ( ) )

(12)

are given, respectively, by

(

)

)

(

(

( ) )

( )

AC

CE

PT

(

222 223 224 225 226 227 228

(

ED

(

M

AN US

CR IP T

202 203 204 205 206 207 208 209 210

)

(13)

(14)

These distributions, defined for , have the scale parameter , and the shape parameters and controlling the left and right tails, respectively. Their pdf’s are J-shaped for and bell-shaped for , while the parameter controls mainly the “heaviness” of the right tail, and thus, the behaviour of extreme events. Note that is of exponential form with finite moments, while and are power-type distributions that can have very heavy tails and infinite moments. The can be considered as a generalization of the two-parameter Gamma and Weibull distributions, while the includes as special cases the Pareto type II for , or the Weibull distribution as limiting 9

ACCEPTED MANUSCRIPT

case for . Clearly, the principle of parsimony, which requires to always seek for the simplest model, i.e., to use the smallest number of parameters possible (see e.g., Box et al., 2008, p. 16), should always be applied. For example, if a Weibull distribution performs satisfactorily (under desired criteria) then it should be preferred over the three-parameter ; the same holds for the power-type distributions and where the simpler Pareto type II should be the first choice under adequate performance.

235 236 237 238 239 240 241 242

Mixed type, discrete and binary Several natural processes like precipitation at fine temporal scales (e.g., at daily or subdaily scales), discharge of small streams, or even wind, are intermittent processes. Thus, their marginal distribution comprises a probability mass concentrated at zero (e.g., probability dry in precipitation), and a continuous part for the positive values, expressed by a parametric distribution function. The framework to simulate intermittent processes is the same, (see section 0); the difference lies in the expressions of the distribution function ( ) and the quantile function ( ) that are now related to the conditional expressions for | .

243 244

Particularly, if ( given, respectively, by

)

is probability zero, then the cdf, pdf and quantile function of

are

( )

(

(15)

)

{

|

( )

(16)

)

(17)

(

|

PT

(the Dirac delta notation can also be used to define the pdf). Additionally, the expressions of the mean and variance of , used to estimate the CTF’s in Eq. (9) or Eq. (11) become

CE 250

( )

|

(

( )

)

(

|

(18) (19)

|

AC

247 248 249

{(

ED

( )

)

M

( )

245 246

AN US

CR IP T

229 230 231 232 233 234

)

|

The probability zero, can be easily assessed from the empirical data as , with and denoting, respectively, number of dry and total days, while for positive values any suitable parametric continuous distribution | ( ) could be used. This framework can be extended to include cases where values above a threshold are considered, i.e., the discrete part can express the , and the continuous part values for |

(

) for an

251

arbitrary value

252 253

marginal distributions can be considered, e.g., the water storage in a reservoir varies theoretically from zero to a maximum level when capacity is reached, thus, as a variable it may have two discrete states, 10

. Finally, more general mixed-type

ACCEPTED MANUSCRIPT

i.e., probability zero and probability for maximum capacity; all in between values can be described by a continuous bounded distribution.

256 257 258 259 260 261 262 263

Important aspects of various geophysical processes can be considered to have a discrete or a binary marginal distribution (see e.g., Chung and Salas, 2000; Molotch et al., 2005; Serinaldi and Lombardo, 2017). For example, the number of consecutive wet and dry spells (in days, hours, etc.), the number of days per year having a specific property, e.g., temperature, precipitation, or wind speed above a threshold (extremes), the number of wet days per year, binary sequences to describe wet and dry days, and many more. As in the mixed-type case the methodology remains the same, modified only by using expressions for the quantile, mean and variance that correspond to the discrete or the binary probability mass function chosen.

264 265 266 267 268 269 270 271

Autocorrelation (ACS) and cross-correlation structures (CCS)

272 273 274 275 276 277 278 279 280 281 282 283 284 285

As in Waldo Tobler’s First Law of Geography, stating that “near things are more related than distant things”, it is well-known “things” closer in time are more related than distant “things”. Thus, we ) can assume that a monotonically decreasing function of lag (or distance) having ( and ( ) can serve as an ACS. A general class of functions, proposed here, with these properties are the complementary cumulative distribution functions (ccdf), but now treated as functions and not as probability laws. Note that valid ACS’s and CCS’s have to be positive definite to guarantee that the parent Gaussian processes are stationary (see. e.g., Yaglom, 2012, p. 57). We deem that by definition a ccdf function fulfils this conditions (for a treatise on strictly positive definite functions see Chang, 1996). However, pathological cases may be possible to be formed. For example a function that equals to 1 up to a given lag and then gets 0 cannot be a proper correlation function (for more examples see e.g, Dolloff et al., 2006); yet in real world cases this specific issue cannot be observed as correlation equal to 1 in a dataset indicates an exact linear deterministic relationship, and thus, the correlations for any lag would also equal to 1 excluding the possibility for any other value. In any case, positive definiteness of a function should be considered when we construct theoretical ACF's and CCF's.

CE

PT

ED

M

AN US

Empirical autocorrelation (or cross-correlation) ̂( ), especially for small samples and for large lags , is sensitive to sample variations and calculated up to a maximum lag (1/3 of the sample size is usually suggested). A better approach may be to fit a parametric ACS to ̂( ), expressed by a simple ) where function ( ( ) is the parameter space of the ACS, essentially interpolating the values of ̂( ) and providing a smoother result. Also, it could be desirable to assign a prescribed ACS or CCS, e.g., in sensitivity analysis scenarios or if regional information is transferred to a location of interest.

AC

286

CR IP T

254 255

Based on the previously described rational, the Weibull ACS is then given by

(

)

( . / )

11

(20)

ACCEPTED MANUSCRIPT

287 288

with

and , and forms a generalization of the celebrated Markovian ACS as it is identical for ( ) can be expressed also in terms , yet decays slower for , and faster for . Note that

289 290 291

( ) of lag-1 correlation , i.e., . A similar stretched exponential form was used for space correlation of rainfall (Ciach and Krajewski, 2006). Likewise, a power-type ACS is formed by the Pareto II ccdf, i.e.,

)

.

/



(21)

CR IP T

( with

and . The PII ACS decays always slower than the Markovian, yet it becomes identical as . A similar two-parameter power-type ACS, named Cauchy class, was also proposed by Gneiting and Schlather (2004) as generalization of the fractional Gaussian noise (fGn) ACS.

295 296

The Weibull and PII ACS’s can be merged into a three-parameter expression, which coincides with the Burr type XII ccdf, i.e.,

(

and

; for

(

(

,

(22)

. / ) )

.

/, and for

297

with

it

298 299 300 301

equals the PII ACS. This general expression, with a different parametrization, was also used in Papalexiou et al. (2011), yet it was not suggested within the general framework proposed here and was not identified as the Burr type XII ccdf. Note that we can form ACS’s decaying even slower than power-type ACS’s, e.g., a general logarithmic expression (GL) proposed here, generalizing also the Markovian ACS, is

ED

M

,

)

AN US

292 293 294

PT

(

)

.

.

(23)

//

( ) with and ; for coincides with the Markovian ACS, i.e., ( ). This ACS for large values of can have extremely slow decay rates, and may be useful for processes at the second or millisecond time scales.

305 306 307

Another noteworthy ACS is the celebrated fGn, or, Hurst ACS (e.g., Beran, 1994, p. 52), linked to long-term persistence (LTP) and fGn processes (Hurst, 1951; Mandelbrot and Wallis, 1968, 1969), given by

AC

CE

302 303 304

(

)

(|

|

| |

12

|

|

)

(

)

(24)

ACCEPTED MANUSCRIPT

where is the Hurst coefficient. Also, FARIMA models (Granger and Joyeux, 1980; Hosking, ( ), yet their ACS is more flexible as it 1981)have a similar ACS, asymptotically equivalent to the allows to control short-term correlations (see e.g., Hosking, 1984; Montanari et al., 1997 for applications in hydrology). Noteworthy recent global studies on annual precipitation (Iliopoulou et al., 2016) and river discharge (Markonis et al., 2018) challenge the LTP hypothesis as the findings suggest very low values of the coefficient that could also emerge from Markovian processes.

314 315

Here we propose a convenient generalization of the fGn ACS, i.e., the generalized fGn (gfGn) given by

CR IP T

308 309 310 311 312 313

(

(

)

(

)(

(

(

)

)

(

)

)

)

(

)

(25)

This ACS, well-defined for , converges very fast to the classical fGn ACS, yet for lag we have ( ) and thus it allows full control of the short-term properties of the ACS. Also, various

318 319

other correlation functions have been proposed, e.g., for wind and atmospheric data (Buell, 1972; Gneiting, 1999).

320

In the cross-correlation case between two process, e.g., of * ( )+ and * ( )+, since

322

(

) for

, and

( )

( )

we can still use parametric CCS’s similar to the ACS’s framework,

if this is desired, by modifying them as

{

ED

( )

M

321

AN US

316 317

( (

) )

(26)

where ( ) can be any parametric CS like the Weibull, PII, etc. with different parameter values for and to fulfil the condition of ( ) ( ). Of course an infinite number of ACS or

325 326

CCS’s can be formed, with many more parameters, yet according to the parsimony principle we should choose the one with the less possible parameters (Box et al., 2008, p. 16).

327 328 329 330 331 332 333 334 335 336

Although bias issues in estimating empirical correlations are beyond the scope of this study; yet it is crucial to mention that the empirical ACS of a process having a very intense underlying ACS, e.g., an fGn with high values of , will be observed (most probably) weaker (especially for small samples), and will also affect the sample standard deviation. In general, a straightforward and easy method to estimate any linear ACS without bias does not exist; formulas exist only for special cases like the fGn ACS (see e.g., Beran, 1994) and should be applied (if there is evidence for LTP) before estimating the pGACS. For most hydroclimatic processes, however, and when dealing with time scales larger than the daily (or even hourly for precipitation) the empirical ACS’s are not intense, and thus, the bias is not expected to be significant; also as afore-mentioned the existence of LTP in processes like river discharge is challenged (see e.g, Markonis et al., 2018). Nevertheless, for processes at fine temporal scales we may 13

AC

CE

PT

323 324

ACCEPTED MANUSCRIPT

observe very strong ACS’s (see the example in Section 0) that can lead to significant biases in the estimation of the underlying ACS and of the standard deviation (see e.g., Papalexiou et al., 2011 and references therein).

340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357

Finally, it should be clear that the parent Gaussian framework reproduces linear correlation structures as expressed by the Pearson correlation coefficient, and does not consider other types of correlations, e.g., rank correlations (Kendal’s Tau, Spearman’s Rho, etc.), or, more general correlation topologies that can emerge by other models. For example, there is an increasing interest over the last decade on the spatial dependence structure of extreme events using max-stable processes (see e.g., Davison et al., 2012; Huser et al., 2016; Huser and Genton, 2016; Ribatet, 2013; Stephenson et al., 2016), an approach that may lead to structures which the framework presented here may not be able to reproduce. Yet it is still a matter of debate if modelling extremes directly is a better approach compared to modelling the process itself, e.g., in the presence of LTP it was shown that the latter approach should be preferred (Serinaldi et al., 2018). Also, note that the behaviour of spatial extremes is essentially governed by the tail of a multivariate distribution or by the random field properties, and thus it is rational to assume that if the multivariate process or the field itself is simulated accurately then the structure of extremes will be also consistently reproduced. Certainly, both approaches have advantages and disadvantages that depend on many factors. For example, analysis of above-a-threshold extremes provides a more accurate estimation of the distribution tail (see e.g., Papalexiou et al., 2013) than using the whole set of observations; in contrast, using annual maxima may lead to increased uncertainty in the estimation of the heaviness of the tail especially when fixed-block maxima are used (Papalexiou et al., 2016; van Montfort, 1990).

358

Auto- and cross-correlation transformation function

359 360 361 362 363 364 365 366 367 368

Autocorrelation transformation function (ACTF) The ACTI in Eq. (9) transforms a specific autocorrelation value between a pair of ( ) and ( ) r.v.’s of the Gp lagged by , into the autocorrelation between the ( ) and ( ) r.v.’s, of the target process, yet the inverse relationship is desired for all practical purposes. That is an ACTF ( ) where we introduce any value of or any ACS ( ) and get, respectively, the pGp value or the pGACS ( ). We assume here positive correlations ranging in , -, yet the framework can be expanded easily for negative correlations too. Essentially, we evaluate the ACTI in a small set of values to create ( ) points and interpolate them by fitting a simple parametric function. Since ( ) ( ) (Kendall and Stuart, 1979, p. 600) the ACTF should have the potential to be concave and also pass from the ( ) and (1,1) points, as for equal to 0 and 1, is also 0 and 1.

AN US

M

ED

PT

CE

AC

369 370

CR IP T

337 338 339

A simple two-parameter function with these properties that is proposed here, is heuristically evaluated and found to perform perfectly in all cases of continuous and mixed-type marginals studied, is

14

ACCEPTED MANUSCRIPT

(

) (

(27)

)

which is concave for and , while for and we have . Unless stated this will be the basic ACTF used in the examples of this study. The previous function performs also well for the discrete and binary case, yet the alternative

(

)

(

)

CR IP T

371 372 373

(

)

(28)

seems to perform better for discrete cases and especially for the binary case. This function is concave for and ; for and we have . Note that Eq. (28) coincides with Kumaraswamy cdf (Kumaraswamy, 1980), yet here is used as parametric function and not as probability law. Thus, we can now form a one-to-one relationship that enables the calculation of the pGp either as a single value or as a parametric pGACS if a parametric ACS ( ) is plugged into Eq. (27) or Eq. (28). ) functions can be easily fitted if we evaluate the ACTI (Eq. (9)) for Any of the ACTF’s ( * + and use the corresponding set of ( ) points. We stress that this approach is simpler and by far more efficient than iterative methods and polynomial approximations previously proposed (see e.g., Demirtas, 2014; Ferrari and Barbiero, 2012; Kugiumtzis, 2002).

384

)) marginal distribution, Let us assume that we target a process * ( )+ having a Weibull ( ( ( ) ( ) ( ( ) ) and a Markovian ACS ( ) ( ) with . For illustration and comparison purposes we assume three different distributions (Fig. 1a) with scale * +, respectively. The estimated ( parameter and shape parameters ) points ) functions (Fig. 1b) show very different transformation (gray dots in Fig. 1b) and the fitted ( profiles, i.e., for the bell-shaped ( ) which resembles the Gaussian distribution, essentially ) has a and the pGACS coincides with ( ) (Fig. 1c). As we deviate from normality, e.g., the ( heavy tail and very high skewness, the ACTF becomes more concave resulting in very different pGACS’s (Fig. 1c). Note that a Markovian ACS ( ) is not reproduced by a Markovian pGACS, e.g., for the ( ) case the ( ) corresponds to ( ) which does not imply that the Markovian ACS ( ) will reproduce the ( ) , it will only preserve the ( ).

395 396 397 398 399 400 401

M

ED

PT

CE

AC

385 386 387 388 389 390 391 392 393 394

AN US

374 375 376 377 378 379 380 381 382 383

Another example, showing how the concavity of the ACTF changes, regards mixed-type marginal distributions suitable to model intermittent processes like precipitation (or wind and discharge in some )) cases). Let us target for an intermittent process * ( )+ having probability zero , a Pareto II ( ( ( ) ( ) marginal distribution ( ) for nonzero values with and ( ) with , and an fGn ACS (Eq. (24)) ( ) . For comparison we use * +, with the unconditional pdf’s shown in Fig. 1d (note that it is common to observe equal to high values as 0.90 and 0.99 for daily and hourly rainfall records, respectively). The ( ) 15

ACCEPTED MANUSCRIPT

) functions (Fig. 1e) show how concavity increases as increases points and the fitted ( resulting in very different pGACS (Fig. 1c), and thus, for large values of a much stronger ( ) is necessary to preserve the ( ). As in the previous example the pGACS of the fGn ACS is not an fGN ACS with just a different value.

406 407 408 409 410 411 412 413 414

In summary, we note that the concavity of the ACTF, or else, the difference between the target ACS and the pGACS, depends on how much the target marginal deviates from normality. Cleary, a unique method to quantify this deviation does not exist, as deviations may regard the general shape (e.g., skewed vs. symmetric densities), the heaviness of the tail (e.g., exponential vs. power-type tails), or the type of the density, i.e., mixed-type vs. continuous type. Consequently, we could “speculate” that when the target marginal is of continuous type, has a moderately skewed and bell-shaped pdf, e.g., as in the bell-shaped Weibull pdf example (Fig. 1a), then the linear correlation has a small bias. In other words, ( ) could be approximated by the actual ( ), and thus, we can directly transform the Gaussian time series avoiding calculating the integrals and fitting the ACTF.

PT

ED

M

AN US

CR IP T

402 403 404 405

415

CE

Fig. 1. Demonstration of the autocorrelation transformation function (ACTF) and the resulting parent-Gaussian autocorrelation structures (pGACS): (a) different Weibull ( ) marginal pdf’s, (b) ACTF’s (indicated with the same colour) corresponding to the distributions, (c) pGACS’s for a Markovian ACS ( ) with corresponding to the three distributions (note that ( ) is overlapped by the pGACS of the ( ) case), (d) unconditional Pareto II ( ) pdf’s for three values, (e) ACTF’s corresponding to the mixed-type distributions, (f) pGACS for an fGn ACS ( ) with Hurst coefficient corresponding to the mixed-type distributions.

AC

416 417 418 419 420 421

424

Cross-correlation transformation function (CCTF) In the previous case, the ACTF was constrained to pass from the (1,1) point, yet for cross-correlation this is not mandatory. The cross-correlation between two stationary processes * ( )+ and { ( )} has

425

boundaries depending on the marginal distributions

426

admissible

422 423

( ) and

( ), thus there is a range of

values according to the Frechet-Hoeffding theoretical limits, i.e., 16

(

)

ACCEPTED MANUSCRIPT

427

(

)

(Embrechts et al., 2002; Fréchet, 1957; Hoeffding, 1994). These boundary

values can be checked for each specific application by using for example the procedure suggested by Demirtas and Hedeker (2011). However, this check is not required in our approach as the interpolation procedure automatically determines the feasible range of and thus the ( ).

431 432

Here, we propose as CCTF a simple two-parameter function, that performed extremely well in all cases studied, i.e.,

.

/

.

/ .

CR IP T

428 429 430

/

(29)

Note that if more flexibility is needed the three-parameter

434 435 436 437

function (Eq. (29)) is essentially the same as the ACTF in Eq. (27) but without the denominator part, which forced the function to pass from (1,1); it is valid for , concave for , linear for , and convex for . As in the ACTF case, we can easily fit the CCTF by evaluating the CCTI in Eq. (11) for * + and creating the corresponding ( )

438

points. Once the CCTF parameters are estimated we can determine easily the

439

in Eq. (29) and solve for

AN US

433

by setting

(30)

M

, i.e.,

can be used. This

Let us assume a process * ( )+ with a Weibull marginal ( ) and a process * ( )+ with a Beta marginal ( ). For comparison purposes we use three different Beta pdf’s having negative, zero, and positive skewness (Fig. 2a). We use the CCTI of Eq. (11) to estimate the ( ) points

443 444 445

(grey dots in Fig. 2b) and we fit the CCTF in Eq. (29). Noteworthy, for the negatively skewed Beta the CCTF is convex, almost a straight line for the symmetric, and concave for the positively skewed, while the corresponding limits, estimated from Eq. (30), are 0.53, 0.64 and 0.75, respectively. This

446

indicates that the more the two marginals differ the more their maximum cross-correlation value

447 448

will be decreased. We also show the pGCCS of a Markovian CCS (Fig. 2c); clearly, not all CCS are feasible due to the maximum limit.

PT

CE

AC

449 450 451 452

ED

440 441 442

Another interesting example is to assume cross-correlated processes with mixed-type marginals that differ in probability zero. We assume a heavy-tailed continuous marginal ( ) for the first process * ( )+, while for * ( )+ we use three different mixed-type ( ) marginals with * + (Fig. 2d). Using the CCTI of Eq. (11) we calculate the ( ) points (grey dots

453

in Fig. 2e) and we fit the CCTF of Eq. (29). All CCTF’s are concave while the corresponding

454 455

estimated from Eq. (30), are 0.98, 0.89 and 0.66, respectively, indicating a decrease in the maximum cross-correlation as increases, or else, as the two marginals differ more. In Fig. 2f we show the pGCCS 17

limits,

ACCEPTED MANUSCRIPT

of the Markovian CCS used in the previous example; clearly, as needed to reproduce the target CCS.

increases much stronger pGCCS are

AN US

CR IP T

456 457

458

Fig. 2. Demonstration of the cross-correlation transformation function (CCTF) and the resulting parent-Gaussian crosscorrelation structures (pGCCS): (a) a Weibull ( ) marginal and three Beta ( ) pdf’s, (b) corresponding (indicated with the same colour) cross-correlation transformation functions (CCTF) of the distributions with the three Beta, (c) corresponding pGCCS for a Markovian target CCS , (d) the Pareto II ( ) continuous marginal and the three mixed-type marginal pdf’s for three values, (e) corresponding CCTF’s, (f) corresponding pGCCS for the Markovian CCS with .

464 465 466

Gaussian univariate and multivariate processes with arbitrary CS

M

459 460 461 462 463

PT

ED

We can easily simulate a standard Gp, i.e., with ( ) by using an autoregressive process of order

( )



and , having an arbitrary parametric ACS ( ( )) defined by

(

)

(31)

( )

where ( ) ( ) is Gaussian random noise with parameters. For a given parametric ACS ( ) the

469

using the Yule-Walker equations (e.g., Box et al., 2008, p. 67), i.e.,

470 471 472 473

the parameter vector, , ( ) ( )- is the correlation vector up to lag , and , (| |) - is the correlation matrix with and denoting, respectively, the -th row and -th column, e.g., the , - element is simply the value of ( ) (for more on AR models see Box et al., 2008)



( ), and are the model can be easily and analytically computed , where

,

- is

AC

CE

467 468

475

( ) model preserves exactly a given ACS ( ) up to lag and then for ∑ decays according to ( ), a relation that can be easily applied recursively. For ( )( )

476

example, Fig. 3 shows the approximation of two parametric ACS up to various lags, i.e., of a Weibull with

474

Thus, an

18

ACCEPTED MANUSCRIPT

477 478 479

(decaying thus slower than the Markovian ACS), and a very strong Pareto II ACS with . Clearly, the closer the ACS is to the Markovian the better will be approximated by the ; in any case, a parametric ACS can be approximated up to any desired lag, e.g.,

CR IP T

even an AR(5000) can be easily fitted. To avoid any confusion, we emphasize that here we fit the theoretical ACF of an ( ) model to a target parametric ACS. This method only requires estimation of a couple of parameters, i.e., those of the fitted parametric ACS to the data, while the parameters of the fitted ( ) model are simply coefficients/constants (estimated analytically and without uncertainty) corresponding to the target parametric ACF. Thus, this approach is parsimonious and efficient, e.g., an AR(5000) fitted to a two-parameter ACS is essentially a two-parameter model.

AN US

480 481 482 483 484 485

(

and and ) ( ) for

486

Fig. 3. Approximation of two parametric ACS by AR( ) models of large order.

488 489 490

Another approach to generate time series from a standard Gp having an arbitrary ACS ( ) is based on approximating ( ) by a sum of independent AR(1) processes, an idea originally proposed and used by Mandelbrot (1971) to approximate the fGn ACS. Particularly, if we define

ED

M

487

PT

( )

with ( )

492

standard deviation

493

processes are independent, the variance and the CS of the sum process are

496 497 498

( )

( ), and ( )

( (

( ) )

,

), then, since the AR(1) ∑

, respectively. Of course, if we demand * ( )+ to have

and , then

AC

495



( ) being the -th AR(1) process in the sum, having mean

, lag-1 correlation

CE

( )

)

(32)

491

494

( ) (

∑ ( )



is also constrained to equal 1. The AR(1) parameters can be estimated by numerically

minimizing an error norm between ( ) and ( ) up to a desired lag . For an application using five independent AR(1) approximating the ACS given Eq. (22) see Papalexiou et al. (2011). The corresponding multivariate autoregressive model MAR( ) of

19

processes is defined by

ACCEPTED MANUSCRIPT

( )

,

)

(33)

( )

499

where ( )

500 501 502 503 504 505 506

( ) , ( ) ( )- is an -vector of independent standard Gaussian variables. For a general algorithm to fit MAR( ) models of large order see Neumaier and Schneider (2001), Schneider and Neumaier (2001), and Schlögl (2006). Clearly, large order MAR models are complicated and often unnecessary, e.g., when Markovian ACS and CCS are observed a MAR(1) is sufficient. The parameters of ( ) the MAR(1) model ( ) ( ) can be estimated by using the relations ( ) ( ) ( ) and ( ) (thus, can be found using, e.g., Choleski decomposition), ( ) ( ) where and are the correlation matrices of the Gaussian processes (see also section 0 for

507 508 509 510

an application). Clearly, analytical or exact estimation of demands the matrix to be positive definite, a condition that is not always fulfilled. In this case the matrix can be approximated using near-matrix methods (Higham, 1988, see e.g., 2002).Finally, note that cyclostationarity can be introduced by using different MAR processes for each season.

511 512

A stochastic modelling recipe step-by-step

513 514 515 516 517

1. 2.

Assess suitable marginal distributions and correlation structures (auto- and cross-correlations for the multivariate case) from the empirical time series. Estimate the ( ) points for the proposed values (see section 0) using the ACTI in Eq. (9) and fit the parametric ACTF of Eq. (27) (or Eq. (28) for binary marginals) to the points; for the multivariate case estimate ( ) points using the CCTI in Eq. (11) and fit the CCTF in Eq. (29).

518

3.

Insert the target ACS

524 525 526 527 528

are

parameter matrices, and

AN US

CR IP T

and

M

ED

( ) into the fitted ACTF (

5.

.

( )

) to estimate the pGACS

/ to find the pGCCS

( ); for the

( ).

Use the estimated pGACS and the pGCCS to fit an AR( ) or MAR( ) model and generate Gaussian univariate or multivariate time series. Transform the Gaussian time series using the corresponding transformation

CE

4.

PT

multivariate case use the fitted CCTF

( )

( )

.

( ( )/ based on the fitted distributions from step 1.

AC

523

( )- ,

The proposed stochastic modelling framework is summarized in the following steps:

519 520 521 522

( )

(



Real world examples Storm events of fine temporal resolution The method is applied to simulate rainfall events at fine temporal scale (10 seconds resolution), a process displaying strong ACS, highly skewed and heavy tailed continuous marginal distribution. The original data are seven storm events recorded at the Hydrometeorology Laboratory at the Iowa 20

ACCEPTED MANUSCRIPT

CR IP T

University (Georgakakos et al., 1994); note this dataset has been used also in other studies (see e.g., Papalexiou et al. (2011) for a statistical and stochastic analysis, Cârsteanu and Foufoula-Georgiou (1996) for a multifractal analysis, and Kumar and Foufoula-Georgiou (1997) for a wavelet analysis. Assuming that these events are the outcome of a single process (Papalexiou et al., 2011), and thus, the characteristics of the process are better determined if all events are merged into one set despite their large statistical differences (see Fig. 4a). Nonetheless, the aim here is not a detailed analysis of the data, but rather to illustrate the applicability of the method in a highly non-Gaussian process with strong ACS. In addition, other issues like the bias in the estimation of the ACS are not considered (for these details see Papalexiou et al., 2011).

PT

ED

M

AN US

529 530 531 532 533 534 535 536 537

538

545 546 547 548 549 550

CE

Fig. 4. A graphical step by step demonstration of the method applied to 10 s rainfall events: (a) recorded events; (b) empirical distribution of 10 s rainfall and the fitted BrXII distribution; (c) observed ACS and the fitted parametric Weibull ACS ( ); (d) the fitted autocorrelation transformation function (ACTF); (e) the parent-Gaussian autocorrelation structure ( ); (f) generated Gaussian events with ACS ( ); (g) synthetic events by transforming the Gaussian time series; (h) empirical distribution of the synthetic rainfall compared with the target BrXII fitted to the original data; (i) empirical autocorrelations of Gaussian and synthetic time series compared with ( ) and ( ).

AC

539 540 541 542 543 544

The method is graphically demonstrated in Fig. 4. First, a parametric distribution is fitted to the empirical distribution (Fig. 4b); here the is used. The estimated parameters, using the method of L-moments (Hosking, 1990), are , , and (see Papalexiou and Koutsoyiannis (2016) for details about fitting the using L-moments). Second, the observed ACS ̂ ( ) is ( ) given in Eq. (20), with estimated approximated by numerically fitting the parametric Weibull ACS ( ). Third, the parameters and (Fig. 4c), i.e. it is assumed that ( ) 21

ACCEPTED MANUSCRIPT

) in Eq. (9) is applied using the fitted ΑCTI ( distribution to estimate a few ( ) points (grey dots in Fig. 4d) which are “interpolated” by the ACTF in Eq. (27) with parameters and ( ) to obtain the pGACS ( ) . Fourth, the fitted ACTF is applied to the estimated (Fig. 4e). Fifth, a large order AR model is used, i.e., AR(1000), to reproduce the fitted pGACS ( ) and generate Gaussian time series (Fig. 4f). Sixth, synthetic time series emerge by simply transforming the Gaussian time series using Eq. (2) and the fitted (Fig. 4e). For verification the empirical distribution of the synthetic time series is compared with the target that was fitted to the original data (Fig. 4h), while the empirical ACS of the Gaussian and synthetic time series are compared with the target ( ) and ( ) (Fig. 4i).

560 561 562 563 564 565 566 567 568 569

Hourly precipitation

570 571 572 573 574 575 576 577 578 579 580 581 582 583 584

The framework presented here is “scale-free” as it can be applied at any time scale, including hourly. We demonstrate the performance of the method by using an hourly precipitation record from the Unites States (randomly selected from the NCDC database (Hammer and Steurer, 1997); station code: 063456) covering the period from 01/08/1954 to 30/11/2011 (Fig. 5a). Assuming hourly precipitation as a cyclostationary process, i.e., stationary at each month, we apply the steps described in Section 0 twelve times. In brief, high probability dry values are observed within the twelve months ranging from 0.90 to 0.95 (Fig. 5c,d and Fig. A.1), while the nonzero hourly rainfall is adequately described by fitted distributions (Eq. (12)); the estimated parameters for all months are given in Fig. A.1. The Pareto II ACS (Eq. (21)) interpolates accurately the linear empirical ACS (see correlograms and estimated parameters in Fig. 5e,f and Fig. A.2) and the pGACS for each month is estimated using the expressions for the mixed-type case (Eqs (17)-(19)). High probability dry values and high skewness of the target marginals inflate significantly the pGACS which approximates 0 in all months for , thus, an ( ) was used to generate the Gaussian time series. The synthetic rainfall was generated for the same period as the observed one (Fig. 5b) and emerged by transforming the Gaussian time series using the corresponding marginal-back transformations of each month.

585 586 587

CR IP T

551 552 553 554 555 556 557 558 559

AC

CE

PT

ED

M

AN US

Stochastic modelling of hourly precipitation has been the subject of active research for more than three decades having important applications, e.g., in rainfall-run off modeling. Previous methods have mainly focus on point process models like the Barlett-Lewis and the Neyman-Scott (see e.g., Istok and Boersma, 1989; Onof et al., 2000; Onof and Wheater, 1993; Rodriguez-Iturbe et al., 1987; Verhoest et al., 1997; Wheater et al., 2005). These models do not reproduce the actual marginal distribution but rather attempt to approximate it by preserving a number of statistical moments (usually up to the third aiming to reproduce skewness). At their core, Exponential and Gamma distributions are used that may underestimate return levels which is a well-known weakness that has motivated alternative approaches (see e.g., Onof et al., 2005).

The empirical probability distributions (Fig. 5c,d and Fig. A.1) as well as the empirical ACS’s (Fig. 5e,f and Fig. A.2)) of the synthetic time series show the precise reproduction of their target counterparts. Of course any minor deviations shown are due to random sample variations and not due 22

ACCEPTED MANUSCRIPT

to any systematic bias of the method. We also highlight the “one-state” framework of this method, as common precipitation generators, e.g., point process models, use two states, one to simulate the wet/dry days’ sequence and a second one to calculate the intensity of wet days (see also Wilks, 1998). Here, the intermittency is introduced “naturally” by applying the mixed-type marginal-back transformation. Note that this framework, compared to the afore-mentioned popular models, has the advantage of reproducing the actual marginal and ACS, while it can be equally or even more parsimonious, depending, of course, on the selected marginal distribution and parametric ACS. For example, the total number of parameters we estimate to model hourly precipitation for each month is just five, i.e., three for the marginal and two for the ACS; all the rest are deterministically estimated, and thus, are not essentially parameters.

CR IP T

588 589 590 591 592 593 594 595 596 597

599 600 601 602

AC

CE

PT

ED

M

AN US

598

Fig. 5. Simulation of cyclostationary hourly precipitation: (a) observed hourly precipitation (1954-2011) at a randomly selected station from the NCDC database (code: 063456); (b) synthetic time series; (c-d) probability plots for two characteristic months showing the fitted distributions (estimated parameters are given in the legend) to the observed

23

ACCEPTED MANUSCRIPT

603 604

nonzero data and the empirical distributions of the synthetic data; (e-f) correlograms showing the fitted PII ACS (estimated parameters are given in the legend) to the empirical values and the empirical ACS of the synthetic time series.

605 606 607 608 609

Daily precipitation, river discharge and wind

610 611 612 613 614 615 616 617 618 619 620 621

distribution (Eq. (12)) (Papalexiou and Koutsoyiannis, 2016), and a Weibull ACS (Eq.(20)) was fitted to the empirical ACS, i.e., ( ) ( ) (Fig. 6e). The ACTI in Eq. (9) was used to estimate ( ) points (Fig. 6b) using the corresponding expressions for the mixed-type case, i.e., the quantile function, mean and variance, are given now by Eqs (17), (18), and (19), respectively. The correlation transformation function ( ) fitted perfectly to the ( ) points (Fig. 6b) and was used to estimate the pGACS, i.e., ( ) ( ( ) ) (Fig. 6e). The ( ) for , thus, an AR(20) was fitted to ( ) for and used to generate a Gaussian time series of 1000 months ( values). The Gaussian time series was transformed to synthetic rainfall (Fig. 6c shows sample size equal with the original time series) by applying the transformation in Eq. (2) with being the mixed-type quantile in Eq. (17) with the estimated and the fitted quantile. For verification, the empirical distribution and the empirical ACS of the synthetic sample is compared with the target ( ) distribution (Fig. 6d) and the target ACS ( ) ( )

622

(Fig. 6e).

M

AN US

CR IP T

Daily precipitation is another important case of an intermittent process that plays a crucial role in hydrology. Here, we use October’s daily rainfall recorded at the National Observatory of Athens in Greece from 24/04/1927 to 20/02/1990 (Fig. 6a), assuming stationarity within the month. The probability dry was found ; the nonzero rainfall was described by an (

AC

CE

PT

ED

623

24

)

625 626 627 628 629 630 631 632 633

Fig. 6. Real-world simulations: (a) daily precipitation of October recorded at the National observatory of Athens, Greece; (b) the corresponding autocorrelation transformation function; (c) simulated time series; (d) probability plot verifying that the marginal of the synthetic time series is the same as the fitted to the real nonzero data; (e) the fitted daily precipitation ACS ( ); the estimated pGACS ( ), and the empirical values from the generated Gaussian and synthetic time series; (f-j) similar to (a-e) for daily river discharge of April in the Middle Fork Snoqualmie river in USA; (k-o) similar to (a-e) for daily wind speed of January recorded at Maastricht, Netherlands.

AC

624

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Another important variable is daily river discharge; here we use daily values of April observed in the Middle Fork Snoqualmie river at the USA from 01/02/1961 to 06/11/2016 (Fig. 6f), assuming again a stationary process for same-month days. Daily discharge was found to be well described by the heavy 25

ACCEPTED MANUSCRIPT

( ) distribution (Eq.(14)), while the empirical ACS fitted well to a relatively tailed ( ) (Fig. 6j). The estimated correlation slowly decaying Weibull ACS (Eq.(20)), i.e., ( ) ) (Fig. 6g) was applied to evaluate the pGACS ( ) (Fig. 6j), transformation function ( which gets ( ) for ; thus, a Gaussian time series of 1000 months ( values) was generated from a fitted AR(25) and was transformed to synthetic daily discharge (Fig. 6h) by applying the transformation in Eq. (2) with being the quantile of the fitted . The synthetic time series ( ) ( ) (Fig. have the expected marginal (Fig. 6i) and the expected ACS 6j).

642 643 644

As a final example the method is applied to daily wind speed of January recorded at Maastricht in the Netherlands from 01/01/1957 to 31/12/2016 (Fig. 6k). Daily wind speed is described here by the ( ) distribution (Eq.(12)), as commonly used models like the two-parameter Weibull

645 646 647 648 649 650 651 652

distribution or the two-parameter Gamma distribution, did not perform well; also zero values were not recorded and thus the marginal is assumed continuous. The PII ACS in Eq. (21) was fitted to the empirical ( ). The estimated correlation transformation function ( ) ACS, i.e., ( ) ( ) ( ) (Fig. 6l) shows no concavity, and thus, the pGACS coincides with (lines overlap in Fig. 6o). ( ) The ( ) for , and thus, an AR(25) was used to generate a Gaussian time series ( values) which was then converted to synthetic daily wind speed (Fig. 6m) by applying the transformation in Eq. (2) with being the quantile of the fitted . The synthetic time series have the expected marginal (Fig. 6n) and the expected ACS (Fig. 6o).

653 654 655 656 657 658 659 660 661 662 663

Relative humidity, discrete and binary cases

M

AN US

CR IP T

634 635 636 637 638 639 640 641

AC

CE

PT

ED

Many variables in nature can be expressed in the , - range, e.g., relative humidity, probability dry, percentage variables, etc. Here, the method is applied on November daily values of relative humidity recorded at Maastricht in the Netherlands from 01/01/1957 to 31/12/2016 (Fig. 7a). A two-parameter ) and a PII ACS (Eq. (21)) was fitted to the empirical Beta ( ) distribution was used, i.e., ( ( ). The estimated ACTF ( ) (Fig. 7b) is a straight line and ACS, i.e., ( ) ( ) the pGACS ( ) coincides with ( ) (lines overlap in Fig. 7e). The ( ) for , and thus, an AR(30) was used to generate a Gaussian time series ( values) which was transformed to synthetic daily relative humidity (Fig. 7c) by applying the transformation in Eq. (2) with being the quantile of the fitted Beta. The synthetic time series have the expected marginal (Fig. 7d) and the expected ACS (Fig. 7e).

26

665 666 667 668 669 670 671 672 673

Fig. 7. Simulation of relative humidity, number of extremes per year and occurrences of high/low rainfall years: (a) relative humidity of November recorded at Maastricht, Netherlands; (b) the ACTF; (c) simulated time series; (d) probability plot verifying that the marginal of the synthetic time series is the target Beta distribution; (e) the ACS ( ), the pGACS ( ), and the empirical values form the generated Gaussian and synthetic time series; (f-j) similar to (a-e) for number of extremes per year in the station ASN00008067 from GHCN-daily database; the discrete Polya-Aeppli distribution is used; (k-o) similar to (a-e) for occurrences of low/high rainfall years at the station ASN00008067.

AC

664

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

As an example of a process with discrete marginal distribution we use the number of extreme daily precipitation events per year, i.e., in an -year record we identify the largest values and their dates and count the number in each year. Here, we used a long daily rainfall record, randomly selected 27

ACCEPTED MANUSCRIPT

from the GHCN-Daily database (ID: ASN00008067), to form the number of extremes per year time series (Fig. 7f). A two-parameter discrete distribution was used, i.e., the Polya-Aeppli ( ) distribution ( ), to describe the probability to observe a given number of extremes in any year. The empirical autocorrelation is essentially 0, and thus, for demonstration purposes, a PII ACS (Eq. (21)) was ( ). We fit the Kumaraswamy ACTF in Eq. (28) , i.e., ( ) prescribed, i.e., ( ) (Fig. 7g) while the pGACS ( ) for (Fig. 7j), and thus, an AR(30) was used to generate a Gaussian time series (3000 years) which was transformed to synthetic number of extremes per year (Fig. 7h) by applying the transformation in Eq. (2) with being the quantile of the fitted Polya-Aeppli distribution. The synthetic time series have the expected marginal (Fig. 7i) and the expected ACS (Fig. 7j).

684 685 686 687 688 689 690 691 692 693

In many cases we may desire to express a variable in two states, e.g., rain and no-rain. We use the same daily rainfall record (ASN00008067), aggregated to the annual timescale, to form a binary time series by assigning 0 to annual precipitation values that belong to the first quartile (low values indicating ) drought years) and 1 to the rest (Fig. 7k). The Bernoulli ( ) distribution was used with ( to describe the probability to observe a drought year, and a Weibull ACS was prescribed, i.e., ( ) ( ) to demonstrate the method as the empirical correlation is almost zero. The ) (Fig. 7l) shows moderate to strong concavity with a estimated Kumaraswamy ACTF ( perfect fit to the points, while the pGACS ( ) for (Fig. 7o). Gaussian time series (3000 years) were generated from an AR(30) and transformed to binary time series accordingly (Fig. 7m). The synthetic binary time series have the expected marginal (Fig. 7n) and the expected ACS (Fig. 7o).

694 695 696 697 698 699 700 701

A Multivariate simulation of precipitation, wind, and relative humidity

M

AN US

CR IP T

674 675 676 677 678 679 680 681 682 683

CE

PT

ED

As a final example we demonstrate the method in a multivariate simulation of three different processes at daily scale, i.e., of precipitation * ( )}, wind speed * ( )+ and relative humidity * ( )+. For precipitation, we assume probability dry and for positive values a power-type ( ) distribution; for wind speed probability calmness and for positive values the two-parameter Weibull ( ) distribution; for relative humidity a Kumaraswamy ( ) marginal distribution (Fig. 8a-c). Thus, we have two mixed-type marginals, with different and different continuous part, and a continuous marginal bounded in ( ). Also we assume lag-0 and lag-1 correlation matrices

702 703 704

(

AC

( )

with the ( -

( )

)

(

) element expressing the cross-correlation coefficients

and - process. Of course the diagonal in ( ) is always symmetric while ( ) is not.

)

(34)

( ) for lags 0 or 1 between the

( ) refers to the lag-1 autocorrelation. Note that

28

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

705

Fig. 8. Hypothetical multivariate case of daily precipitation with , wind speed with , and relative humidity: (a-c) the the target marginal pdf’s of the three variables; (d-f) the corresponding ACTF’s; (g-i) the CCTF’s; and (j-l) synthetic times series that preserve the , the marginal distributions, and the lag-0 and lag-1 cross-correlations among them.

PT

706 707 708

The ACTF’s (Fig. 8d-f) show moderate concavity for the precipitation marginal, slight for wind, and essentially zero for relative humidity. The CCTF is concave for rainfall-wind (Fig. 8g) with estimated upper limit 0.80, while it is clearly convex for rainfall-humidity with limit 0.42 (Fig. 8h), and slightly convex for the wind-humidity case with limit 0.76 (Fig. 8i). These functions were used to estimate the pGp correlation matrices, i.e.,

AC

710 711 712 713 714

CE

709

( )

(

( )

)

29

(

)

(35)

ACCEPTED MANUSCRIPT

715 716

which accordingly were used to fit a multivariate AR(1) and generate a multivariate time series of 10000 values. Accordingly, as in the previous cases, each Gaussian time series was transformed into its target

717

process time series using the corresponding transformations

718 719 720 721

time series (Fig. 8j-l) preserve all desired properties, i.e., , marginal distributions, and the lag-0 and lag-1 correlation matrices, while the actual “topology” of the correlation structure can be seen in the scatter plots of Fig. A.3. Of course, it is straightforward to apply this framework for the multivariate cyclostationary case, and use higher order MAR models if desired.

722

Summary and Conclusions

723 724 725 726 727 728 729 730

Natural processes have a “dynamic” randomness unveiled by structural dependencies in time and/or space and by probability laws governing the frequency of their values. There is no unique correlation structure or probability law to describe the pluralism and the polyphony observed in natural processes. Therefore, and in order to avoid ad hoc solutions, we need a general stochastic framework capable of reproducing this “dynamic” randomness irrespective of the correlation structure or the marginal probability law. The framework presented here, unifies, introduces new elements, extends and simplifies previous attempts proposing a single approach for stochastic modelling of any hydroclimatic process and beyond.

731 732 733 734 735 736 737

The basic assumption is that a stationary or cyclostationary process * ( )+, with any prescribed marginal distribution ( ) and any linear correlation structure ( ) has a parent-Gaussian process that can be easily assessed. For the multivariate case, this implies that a set of processes have a corresponding set of parent-Gaussian processes. In this direction, a unified framework is introduced based on simple analytical auto- and cross-correlation transformation functions that enable fast and easy estimation of the parent-Gaussian processes avoiding iterative and case specific methods proposed in the past.

738 739 740 741 742 743 744 745 746 747 748

The method is applied in a large variety of hydroclimatic processes, ranging from intermittent variables like precipitation and wind speed, to processes with heavy-tail marginals like river discharge, and to processes with bounded marginals like relative humidity, or even to discrete and binary cases. The versatility of the method is also demonstrated in a multivariate case where processes with very different marginals and linear correlation structures are reproduced exactly. As a final remark, the framework proposed enables an easy estimation of the limit or the maximum feasible cross-correlation between two processes and reveals that when their marginals differ significantly in shape, then their linear cross-correlation cannot be intense. The same remark holds also for single processes, and although there is no limit and the autocorrelation can reach up to 1, processes with very different marginals from the Gaussian, e.g., highly skewed and heavy-tailed, will be observed with weak linear autocorrelation structures, indicating, maybe, that nature likes to “hide” its internal structure.

.

( ( )/. The synthetic

AC

CE

PT

ED

M

AN US

CR IP T

( )

30

ACCEPTED MANUSCRIPT

Acknowledgements

750 751 752

I honestly thank: Y Markonis for providing the river discharge, wind and humidity data as well as for his general support; KM Andreadis for spotting many typos; and the reviewers for meticulously reading the manuscript and providing constructive remarks which led in a significantly improved version.

753

References

754 755

Aghakouchak, A., Habib, E., Bárdossy, A., 2010. A comparison of three remotely sensed rainfall ensemble generators. Atmospheric Research 98, 387–399.

756 757

Bárdossy, A., Pegram, G.G.S., 2009. Copula based multisite model for daily precipitation simulation. Hydrology and Earth System Sciences 13, 2299.

758 759

Bardossy, A., Plate, E.J., 1992. Space-time model for daily rainfall using atmospheric circulation patterns. Water Resour. Res. 28, 1247–1259. https://doi.org/10.1029/91WR02589

760 761

Bartolini, P., Salas, J.D., Obeysekera, J.T.B., 1988. Multivariate periodic ARMA (1, 1) processes. Water Resources Research 24, 1237–1246.

762

Beran, J., 1994. Statistics for Long-Memory Processes. CRC Press.

763 764

Bowers, M.C., Tung, W.W., Gao, J.B., 2012. On the distributions of seasonal river flows: Lognormal or power law? Water Resour. Res. 48, W05536. https://doi.org/10.1029/2011WR011308

765 766

Box, G.E.P., Jenkins, G.M., Reinsel, G.C., 2008. Time Series Analysis: Forecasting and Control, 4 edition. ed. Wiley, Hoboken, N.J.

767 768 769

Breinl, K., Turkington, T., Stowasser, M., 2015. Simulating daily precipitation and temperature: a weather generation framework for assessing hydrometeorological hazards. Meteorological Applications 22, 334–347.

770 771 772

Bruhn, J.A., Fry, W.E., Fick, G.W., 1979. Simulation of Daily Weather Data Using Theoretical Probability Distributions. J. Appl. Meteor. 19, 1029–1036. https://doi.org/10.1175/15200450(1980)019<1029:SODWDU>2.0.CO;2

773 774

Buell, C.E., 1972. Correlation Functions for Wind and Geopotential on Isobaric Surfaces. J. Appl. Meteor. 11, 51–59. https://doi.org/10.1175/1520-0450(1972)011<0051:CFFWAG>2.0.CO;2

777 778 779

AN US

M

ED

PT

CE

AC

775 776

CR IP T

749

Buishand, T.A., 1978. Some remarks on the use of daily rainfall models. Journal of Hydrology 36, 295– 308. https://doi.org/10.1016/0022-1694(78)90150-6 Burr, I.W., 1942. Cumulative Frequency Functions. The Annals of Mathematical Statistics 13, 215–232. Cario, M.C., Nelson, B.L., 1998. Numerical methods for fitting and simulating autoregressive-to-anything processes. INFORMS Journal on Computing 10, 72–81.

31

ACCEPTED MANUSCRIPT

Cario, M.C., Nelson, B.L., 1997. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois.

783 784 785

Cârsteanu, A., Foufoula-Georgiou, E., 1996. Assessing dependence among weights in a multiplicative cascade model of temporal rainfall. J. Geophys. Res. 101, 26363–26370. https://doi.org/10.1029/96JD01657

786 787 788

Carta, J.A., Ramírez, P., Velázquez, S., 2009. A review of wind speed probability distributions used in wind energy analysis. Renewable and Sustainable Energy Reviews 13, 933–955. https://doi.org/10.1016/j.rser.2008.05.005

789 790

Chang, K.-F., 1996. Strictly Positive Definite Functions. Journal of Approximation Theory 87, 148–158. https://doi.org/10.1006/jath.1996.0097

791 792

Chung, C., Salas, J.D., 2000. Drought occurrence probabilities and risks of dependent hydrologic processes. Journal of Hydrologic Engineering 5, 259–268.

793 794 795

Ciach, G.J., Krajewski, W.F., 2006. Analysis and modeling of spatial correlation structure in small-scale rainfall in Central Oklahoma. Advances in Water Resources 29, 1450–1463. https://doi.org/10.1016/j.advwatres.2005.11.003

796 797 798

Cowpertwait, P., Isham, V., Onof, C., 2007. Point process models of rainfall: developments for fine-scale structure. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 463, 2569–2587. https://doi.org/10.1098/rspa.2007.1889

799 800

Davison, A.C., Padoan, S.A., Ribatet, M., 2012. Statistical Modeling of Spatial Extremes. Statistical Science 27, 161–186.

801 802

Demirtas, H., 2017. Concurrent generation of binary and nonnormal continuous data through fifth-order power polynomials. Communications in Statistics-Simulation and Computation 46, 344–357.

803 804

Demirtas, H., 2014. Joint generation of binary and nonnormal continuous data. Journal of Biometrics & Biostatistics 1.

805 806

Demirtas, H., Hedeker, D., 2011. A practical way for computing approximate lower and upper correlation bounds. The American Statistician 65, 104–109.

807 808 809

Dolloff, J., Lofy, B., Sussman, A., Taylor, C., 2006. Strictly positive definite correlation functions. Presented at the Signal Processing, Sensor Fusion, and Target Recognition XV, International Society for Optics and Photonics, p. 62351A. https://doi.org/10.1117/12.663967

812 813 814 815

AN US

M

ED

PT

CE

AC

810 811

CR IP T

780 781 782

Embrechts, P., McNeil, A., Straumann, D., 2002. Correlation and dependence in risk management: properties and pitfalls. Risk management: value at risk and beyond 176223. Emrich, L.J., Piedmonte, M.R., 1991. A method for generating high-dimensional multivariate binary variates. The American Statistician 45, 302–304. Feller, W., 1971. Introduction to the Theory of Probability and its Applications, Vol. 2. II (2. Ed.) New York: Wiley. 32

ACCEPTED MANUSCRIPT

Ferrari, P.A., Barbiero, A., 2012. Simulating Ordinal Data. Multivariate Behavioral Research 47, 566–589. https://doi.org/10.1080/00273171.2012.692630

818 819

Foufoula-Georgiou, E., Krajewski, W., 1995. Recent advances in rainfall modeling, estimation, and forecasting. Rev. Geophys. 33, 1125–1137. https://doi.org/10.1029/95RG00338

820 821

Foufoula-Georgiou, E., Lettenmaier, D.P., 1987. A Markov Renewal Model for rainfall occurrences. Water Resources Research 23, 875–884. https://doi.org/10.1029/WR023i005p00875

822 823

Fréchet, M., 1957. Les tableaux de corrélation et les programmes linéaires. Revue de l’Institut international de statistique 23–40.

824 825 826

Geng, S., Penning de Vries, F.W.T., Supit, I., 1986. A simple method for generating daily rainfall data. Agricultural and Forest Meteorology 36, 363–376. https://doi.org/10.1016/01681923(86)90014-6

827 828

Georgakakos, K.P., Carsteanu, A.A., Sturdevant, P.L., Cramer, J.A., 1994. Observation and analysis of Midwestern rain rates. Journal of applied meteorology 33, 1433–1444.

829 830

Gneiting, T., 1999. Correlation functions for atmospheric data analysis. Q.J.R. Meteorol. Soc. 125, 2449– 2464. https://doi.org/10.1002/qj.49712555906

831 832

Gneiting, T., Schlather, M., 2004. Stochastic Models That Separate Fractal Dimension and the Hurst Effect. SIAM Rev. 46, 269–282. https://doi.org/10.1137/S0036144501394387

833 834

Granger, C.W., Joyeux, R., 1980. An introduction to long-memory time series models and fractional differencing. Journal of time series analysis 1, 15–29.

835 836

Grimaldi, S., Serinaldi, F., 2006. Asymmetric copula in multivariate flood frequency analysis. Advances in Water Resources 29, 1155–1167.

837 838

Hammer, G.R., Steurer, P.M., 1997. Data set documentation for hourly precipitation data. NOAA/NCDC TD3240 Documentation Series, Asheville, NC 18.

839 840

Hering, A.S., Kazor, K., Kleiber, W., 2015. A Markov-switching vector autoregressive stochastic wind generator for multiple spatial and temporal scales. Resources 4, 70–92.

841 842

Herr, H.D., Krzysztofowicz, R., 2005. Generic probability distribution of rainfall in space: The bivariate model. Journal of Hydrology 306, 234–263.

843 844

Higham, N.J., 2002. Computing the nearest correlation matrix—a problem from finance. IMA J Numer Anal 22, 329–343. https://doi.org/10.1093/imanum/22.3.329

AN US

M

ED

PT

CE

AC

845 846

CR IP T

816 817

Higham, N.J., 1988. Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications 103, 103–118. https://doi.org/10.1016/0024-3795(88)90223-6

847 848

Hoeffding, W., 1994. Scale—invariant correlation theory, in: The Collected Works of Wassily Hoeffding. Springer, pp. 57–107.

849 850

Hosking, J.R., 1984. Modeling persistence in hydrological time series using fractional differencing. Water resources research 20, 1898–1908. 33

ACCEPTED MANUSCRIPT

Hosking, J.R., 1981. Fractional differencing. Biometrika 68, 165–176.

852 853

Hosking, J.R.M., 1994. The four-parameter kappa distribution. IBM Journal of Research and Development 38, 251–258.

854 855

Hosking, J.R.M., 1990. L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. Journal of the Royal Statistical Society. Series B (Methodological) 52, 105–124.

856 857

Hurst, H.E., 1951. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770–808.

858 859

Huser, R., Davison, A.C., Genton, M.G., 2016. Likelihood estimators for multivariate extremes. Extremes 19, 79–103. https://doi.org/10.1007/s10687-015-0230-4

860 861

Huser, R., Genton, M.G., 2016. Non-Stationary Dependence Structures for Spatial Extremes. JABES 21, 470–491. https://doi.org/10.1007/s13253-016-0247-4

862 863

Iliopoulou, T., Papalexiou, S.M., Markonis, Y., Koutsoyiannis, D., 2016. Revisiting long-range dependence in annual precipitation. Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2016.04.015

864 865

Istok, J.D., Boersma, L., 1989. A stochastic cluster model for hourly precipitation data. Journal of Hydrology 106, 257–285. https://doi.org/10.1016/0022-1694(89)90076-0

866 867 868

Justus, C.G., Hargraves, W.R., Mikhail, A., Graber, D., 1978. Methods for Estimating Wind Speed Frequency Distributions. J. Appl. Meteor. 17, 350–353. https://doi.org/10.1175/15200450(1978)017<0350:MFEWSF>2.0.CO;2

869

Kendall, M., Stuart, A., 1979. Handbook of Statistics. Griffin & Company, London.

870 871 872

Kilsby, C.G., Jones, P.D., Burton, A., Ford, A.C., Fowler, H.J., Harpham, C., James, P., Smith, A., Wilby, R.L., 2007. A daily weather generator for use in climate change studies. Environmental Modelling & Software 22, 1705–1719.

873 874 875

Kleiber, W., Katz, R.W., Rajagopalan, B., 2012. Daily spatiotemporal precipitation simulation using latent and transformed Gaussian processes. Water Resour. Res. 48, W01523. https://doi.org/10.1029/2011WR011105

876 877 878

Klein Tank, A.M.G., Können, G.P., 2003. Trends in Indices of Daily Temperature and Precipitation Extremes in Europe, 1946–99. J. Climate 16, 3665–3680. https://doi.org/10.1175/15200442(2003)016<3665:TIIODT>2.0.CO;2

879 880

Kroll, C.N., Vogel, R.M., 2002. Probability distribution of low streamflow series in the United States. Journal of Hydrologic Engineering 7, 137–146.

883 884

AN US

M

ED

PT

CE

AC

881 882

CR IP T

851

Kugiumtzis, D., 2002. Statically transformed autoregressive process and surrogate data test for nonlinearity. Physical Review E 66, 025201. Kumar, P., Foufoula-Georgiou, E., 1997. Wavelet analysis for geophysical applications. Rev. Geophys. 35, 385–412. https://doi.org/10.1029/97RG00427

34

ACCEPTED MANUSCRIPT

Kumaraswamy, P., 1980. A generalized probability density function for double-bounded random processes. Journal of Hydrology 46, 79–88. https://doi.org/10.1016/0022-1694(80)90036-0

887 888 889

Lebrun, R., Dutfoy, A., 2009. An innovating analysis of the Nataf transformation from the copula viewpoint. Probabilistic Engineering Mechanics 24, 312–320. https://doi.org/10.1016/j.probengmech.2008.08.001

890 891

Lee, T., Salas, J.D., 2011. Copula-based stochastic simulation of hydrological data applied to Nile River flows. Hydrology Research 42, 318–330.

892 893

Li, H., Lü, Z., Yuan, X., 2008. Nataf transformation based point estimate method. Chin. Sci. Bull. 53, 2586. https://doi.org/10.1007/s11434-008-0351-0

894 895 896

Liu, P.-L., Der Kiureghian, A., 1986. Multivariate distribution models with prescribed marginals and covariances. Probabilistic Engineering Mechanics 1, 105–112. https://doi.org/10.1016/02668920(86)90033-0

897 898 899

Lombardo, F., Volpi, E., Koutsoyiannis, D., Papalexiou, S.M., 2013. Just two moments! A cautionary note against use of high-order moments in multifractal models in hydrology. Hydrology and Earth System Sciences Discussions 10, 4627–4654. https://doi.org/10.5194/hessd-10-4627-2013

900 901

Macke, J.H., Berens, P., Ecker, A.S., Tolias, A.S., Bethge, M., 2009. Generating spike trains with specified correlation coefficients. Neural computation 21, 397–423.

902

Mandelbrot, B.B., 1971. A Fast Fractional Gaussian Noise Generator. Water Resour. Res. 7, 543–553.

903 904

Mandelbrot, B.B., Wallis, J.R., 1969. Some long-run properties of geophysical records. Water Resour. Res. 5, 321–340. https://doi.org/10.1029/WR005i002p00321

905 906

Mandelbrot, B.B., Wallis, J.R., 1968. Noah, Joseph, and Operational Hydrology. Water Resour. Res. 4, 909–918. https://doi.org/10.1029/WR004i005p00909

907 908 909

Markonis, Y., Moustakis, Y., Nasika, C., Sychova, P., Dimitriadis, P., Hanel, M., Máca, P., Papalexiou, S.M., 2018. Global estimation of long-term persistence in annual river runoff. Advances in Water Resources 113, 1–12. https://doi.org/10.1016/j.advwatres.2018.01.003

910 911 912

Mehrotra, R., Sharma, A., 2009. Evaluating spatio-temporal representations in daily rainfall sequences from three stochastic multi-site weather generation approaches. Advances in Water Resources 32, 948–962.

913 914

Mielke Jr, P.W., 1973. Another Family of Distributions for Describing and Analyzing Precipitation Data. Journal of Applied Meteorology 12, 275–280.

918 919

AN US

M

ED

PT

CE

AC

915 916 917

CR IP T

885 886

Mielke Jr, P.W., Johnson, E.S., 1974. Some generalized beta distributions of the second kind having desirable application features in hydrology and meteorology. Water Resources Research 10, 223–226. Mielke Jr, P.W., Johnson, E.S., 1973. Three-Parameter Kappa Distribution Maximum Likelihood Estimates and Likelihood Ratio Tests. Monthly Weather Review 101, 701–707.

35

ACCEPTED MANUSCRIPT

Molotch, N.P., Colee, M.T., Bales, R.C., Dozier, J., 2005. Estimating the spatial distribution of snow water equivalent in an alpine basin using binary regression tree models: the impact of digital elevation data and independent variable selection. Hydrological Processes 19, 1459–1479.

923 924 925

Montanari, A., Rosso, R., Taqqu, M.S., 1997. Fractionally differenced ARIMA models applied to hydrologic time series: Identification, estimation, and simulation. Water resources research 33, 1035–1044.

926 927 928

Nataf, A., 1962. Statistique mathematique-determination des distributions de probabilites dont les marges sont donnees. COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES 255, 42.

929 930

Neumaier, A., Schneider, T., 2001. Estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Transactions on Mathematical Software (TOMS) 27, 27–57.

931 932 933

Onof, C., Chandler, R.E., Kakou, A., Northrop, P., Wheater, H.S., Isham, V., 2000. Rainfall modelling using Poisson-cluster processes: a review of developments. Stochastic Environmental Research and Risk Assessment 14, 384–411. https://doi.org/10.1007/s004770000043

934 935 936

Onof, C., Townend, J., Kee, R., 2005. Comparison of two hourly to 5-min rainfall disaggregators. Atmospheric Research, Precipitation in Urban Areas6th International Workshop on Precipitation in Urban Areas 77, 176–187. https://doi.org/10.1016/j.atmosres.2004.10.022

937 938 939 940

Onof, C., Wheater, H.S., 1993. Modelling of British rainfall using a random parameter Bartlett-Lewis Rectangular Pulse Model. Journal of Hydrology, Conventry Groundwater Investigation: Sources and Movement of Chlorinated Hydrocarbon Solvents 149, 67–95. https://doi.org/10.1016/00221694(93)90100-N

941 942

Over, T.M., Gupta, V.K., 1996. A space-time theory of mesoscale rainfall using random cascades. J. Geophys. Res. 101, 26319–26331. https://doi.org/10.1029/96JD02033

943 944 945

Papalexiou, S.M., Dialynas, Y.G., Grimaldi, S., 2016. Hershfield factor revisited: Correcting annual maximum precipitation. Journal of Hydrology 542, 884–895. https://doi.org/10.1016/j.jhydrol.2016.09.058

946 947 948

Papalexiou, S.M., Koutsoyiannis, D., 2016. A global survey on the seasonal variation of the marginal distribution of daily precipitation. Advances in Water Resources 94, 131–145. https://doi.org/10.1016/j.advwatres.2016.05.005

949 950

Papalexiou, S.M., Koutsoyiannis, D., 2013. Battle of extreme value distributions: A global survey on extreme daily rainfall. Water Resour. Res. 49, 187–201. https://doi.org/10.1029/2012WR012557

954 955 956

AN US

M

ED

PT

CE

AC

951 952 953

CR IP T

920 921 922

Papalexiou, S.M., Koutsoyiannis, D., 2012. Entropy based derivation of probability distributions: A case study to daily rainfall. Advances in Water Resources, Space-Time Precipitation from Urban Scale to Global Change 45, 51–57. https://doi.org/10.1016/j.advwatres.2011.11.007 Papalexiou, S.M., Koutsoyiannis, D., Makropoulos, C., 2013. How extreme is extreme? An assessment of daily rainfall distribution tails. Hydrol. Earth Syst. Sci. 17, 851–862. https://doi.org/10.5194/hess-17-851-2013 36

ACCEPTED MANUSCRIPT

Papalexiou, S.M., Koutsoyiannis, D., Montanari, A., 2011. Can a simple stochastic model generate rich patterns of rainfall events? Journal of Hydrology 411, 279–289. https://doi.org/10.1016/j.jhydrol.2011.10.008

960 961

Papoulis, A., Pillai, S.U., 2002. Probability, random variables, and stochastic processes. Tata McGraw-Hill Education.

962 963

Park, J.-S., Seo, S.-C., Kim, T.Y., 2009. A kappa distribution with a hydrological application. Stoch Environ Res Risk Assess 23, 579–586. https://doi.org/10.1007/s00477-008-0243-5

964 965 966

Paschalis, A., Molnar, P., Fatichi, S., Burlando, P., 2014. On temporal stochastic modeling of precipitation, nesting models across scales. Advances in Water Resources 63, 152–166. https://doi.org/10.1016/j.advwatres.2013.11.006

967 968

Ribatet, M., 2013. Spatial extremes: Max-stable processes at work. Journal de la Société Française de Statistique 154, 156–177.

969 970 971

Rodriguez-Iturbe, I., Cox, D.R., Isham, V., 1987. Some Models for Rainfall Based on Stochastic Point Processes. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 410, 269–288. https://doi.org/10.1098/rspa.1987.0039

972

Salas, J.D., 1980. Applied modeling of hydrologic time series. Water Resources Publication.

973 974

Salas, J.D., Boes, D.C., Smith, R.A., 1982. Estimation of ARMA Models with seasonal parameters. Water Resour. Res. 18, 1006–1010. https://doi.org/10.1029/WR018i004p01006

975 976

Salas, J.D., Sveinsson, O.G., Lane, W.L., Frevert, D.K., 2006. Stochastic streamflow simulation using SAMS-2003. Journal of irrigation and drainage engineering 132, 112–122.

977 978

Schlögl, A., 2006. A comparison of multivariate autoregressive estimators. Signal processing 86, 2426– 2429.

979 980 981

Schneider, T., Neumaier, A., 2001. Algorithm 808: ARfit—A Matlab package for the estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Transactions on Mathematical Software (TOMS) 27, 58–65.

982 983 984

Seguro, J.V., Lambert, T.W., 2000. Modern estimation of the parameters of the Weibull wind speed distribution for wind energy analysis. Journal of Wind Engineering and Industrial Aerodynamics 85, 75–84. https://doi.org/10.1016/S0167-6105(99)00122-1

985 986

Serinaldi, F., 2009. A multisite daily rainfall generator driven by bivariate copula-based mixed distributions. Journal of Geophysical Research: Atmospheres 114.

AN US

M

ED

PT

CE

AC

987 988

CR IP T

957 958 959

Serinaldi, F., Kilsby, C.G., 2016. A Blueprint for Full Collective Flood Risk Estimation: Demonstration for European River Flooding. Risk Analysis.

989 990

Serinaldi, F., Kilsby, C.G., 2014a. Simulating daily rainfall fields over large areas for collective risk estimation. Journal of Hydrology 512, 285–302.

991 992

Serinaldi, F., Kilsby, C.G., 2014b. Rainfall extremes: Toward reconciliation after the battle of distributions. Water Resour. Res. 50, 336–352. https://doi.org/10.1002/2013WR014211 37

ACCEPTED MANUSCRIPT

Serinaldi, F., Kilsby, C.G., Lombardo, F., 2018. Untenable nonstationarity: An assessment of the fitness for purpose of trend tests in hydrology. Advances in Water Resources 111, 132–155. https://doi.org/10.1016/j.advwatres.2017.10.015

996 997

Serinaldi, F., Lombardo, F., 2017. General simulation algorithm for autocorrelated binary processes. Physical Review E 95, 023312.

998 999

Shao, Q., Lund, R., 2004. Computation and characterization of autocorrelations and partial autocorrelations in periodic ARMA models. Journal of Time Series Analysis 25, 359–372.

CR IP T

993 994 995

Sirangelo, B., Versace, P., De Luca, D.L., 2007. Rainfall nowcasting by at site stochastic model PRAISE. Hydrology and Earth System Sciences Discussions 4, 151–177.

1002 1003

Smith, R.L., Schreiber, H.A., 1974. Point Processes of Seasonal Thunderstorm Rainfal 2. Rainfall Depth Probabilities. Water Resources Research.

1004 1005

Stacy, E.W., 1962. A Generalization of the Gamma Distribution. The Annals of Mathematical Statistics 33, 1187–1192.

1006 1007

Stacy, E.W., Mihram, G.A., 1965. Parameter Estimation for a Generalized Gamma Distribution. Technometrics 7, 349–358.

1008 1009 1010

Stephenson, A.G., Lehmann, E.A., Phatak, A., 2016. A max-stable process model for rainfall extremes at different accumulation durations. Weather and Climate Extremes 13, 44–53. https://doi.org/10.1016/j.wace.2016.07.002

1011 1012

Stern, R.D., Coe, R., 1984. A Model Fitting Analysis of Daily Rainfall Data. Journal of the Royal Statistical Society. Series A (General) 147, 1. https://doi.org/10.2307/2981736

1013 1014 1015

Swift, L.W., Schreuder, H.T., 1981. Fitting Daily Precipitation Amounts Using the SB Distribution. Monthly Weather Review 109, 2535–2540. https://doi.org/10.1175/15200493(1981)109<2535:FDPAUT>2.0.CO;2

1016 1017

Tadikamalla, P.R., 1980. A Look at the Burr and Related Distributions. International Statistical Review / Revue Internationale de Statistique 48, 337–344.

1018 1019

Todorovic, P., Woolhiser, D.A., 1975. A stochastic model of n-day precipitation. Journal of Applied Meteorology 14, 17–24.

1020 1021 1022

Tsoukalas, I., Efstratiadis, A., Makropoulos, C., 2018. Stochastic Periodic Autoregressive to Anything (SPARTA): Modeling and Simulation of Cyclostationary Processes With Arbitrary Marginal Distributions. Water Resour. Res. n/a-n/a. https://doi.org/10.1002/2017WR021394

1025 1026 1027

M

ED

PT

CE

AC

1023 1024

AN US

1000 1001

van Montfort, M.A.J., 1990. Sliding maxima. Journal of Hydrology 118, 77–85. https://doi.org/10.1016/0022-1694(90)90251-R Verhoest, N., Troch, P.A., De Troch, F.P., 1997. On the applicability of Bartlett–Lewis rectangular pulses models in the modeling of design storms at a point. Journal of Hydrology 202, 108–120. https://doi.org/10.1016/S0022-1694(97)00060-7

38

ACCEPTED MANUSCRIPT

Villarini, G., Seo, B.-C., Serinaldi, F., Krajewski, W.F., 2014. Spatial and temporal modeling of radar rainfall uncertainties. Atmospheric research 135, 91–101.

1030 1031 1032

Waymire, E., Gupta, V.K., 1981. The mathematical structure of rainfall representations: 1. A review of the stochastic rainfall models. Water Resources Research 17, 1261–1272. https://doi.org/10.1029/WR017i005p01261

1033 1034 1035

Wheater, H.S., Chandler, R.E., Onof, C.J., Isham, V.S., Bellone, E., Yang, C., Lekkas, D., Lourmas, G., Segond, M.-L., 2005. Spatial-temporal rainfall modelling for flood risk estimation. Stoch Environ Res Ris Assess 19, 403–416. https://doi.org/10.1007/s00477-005-0011-8

1036 1037

Wilks, D.S., 1999. Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agricultural and Forest meteorology 96, 85–101.

1038 1039

Wilks, D.S., 1998. Multisite generalization of a daily stochastic precipitation generation model. Journal of Hydrology 210, 178–191. https://doi.org/10.1016/S0022-1694(98)00186-3

1040 1041

Wilks, D.S., Wilby, R.L., 1999. The weather generation game: a review of stochastic weather models. Progress in physical geography 23, 329–357.

1042 1043

Wilson, P.S., Toumi, R., 2005. A fundamental probability distribution for heavy rainfall. Geophys. Res. Lett. 32, L14812. https://doi.org/10.1029/2005GL022465

1044 1045

Xiao, Q., 2014. Evaluating correlation coefficient for Nataf transformation. Probabilistic Engineering Mechanics 37, 1–6. https://doi.org/10.1016/j.probengmech.2014.03.010

1046 1047

Yaglom, A.M., 2012. Correlation Theory of Stationary and Related Random Functions: Supplementary Notes and References. Springer Science & Business Media.

1048 1049

Yao, A.Y.M., 1974. A Statistical Model for the Surface Relative Humidity. J. Appl. Meteor. 13, 17–21. https://doi.org/10.1175/1520-0450(1974)013<0017:ASMFTS>2.0.CO;2

1050 1051

Youngman, B.D., Stephenson, D.B., 2016. A geostatistical extreme-value framework for fast simulation of natural hazard events, in: Proc. R. Soc. A. The Royal Society, p. 20150855.

AC

CE

PT

ED

M

AN US

CR IP T

1028 1029

39

ACCEPTED MANUSCRIPT

Appendix A

ED

M

AN US

CR IP T

1052

CE

Fig. A.1. Probability plots for all months related to the hourly precipitation example described in Section 0. The fitted distributions (estimated parameters are given in the legend) to the observed nonzero data are shown as well as the empirical distributions of the synthetic data.

AC

1054 1055 1056

PT

1053

40

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

1057

CE

PT

Fig. A.2. Correlograms for all months related to the hourly precipitation example described in Section 0. The PII ACS was fitted to empirical values (estimated parameters are given in the legend) while for comparison the empirical ACS’s of the synthetic time series are also shown.

AC

1058 1059 1060

41

ACCEPTED MANUSCRIPT

1063 1064

Fig. A.3. Scatter plots related to the multivariate simulation example described in Section 0. The plots express the correlations given in the lag-0 and lag-1 correlation matrices.

AC

1062

CE

PT

ED

M

AN US

CR IP T

1061

42