Multivariate skew-t approach to the design of accumulation risk scenarios for the flooding hazard

Multivariate skew-t approach to the design of accumulation risk scenarios for the flooding hazard

Advances in Water Resources 33 (2010) 1243–1255 Contents lists available at ScienceDirect Advances in Water Resources j o u r n a l h o m e p a g e ...

2MB Sizes 2 Downloads 113 Views

Advances in Water Resources 33 (2010) 1243–1255

Contents lists available at ScienceDirect

Advances in Water Resources j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / a d v wa t r e s

Multivariate skew-t approach to the design of accumulation risk scenarios for the flooding hazard Tatiana Ghizzoni a, Giorgio Roth b,⁎, Roberto Rudari c a b c

Corporate Underwriting — Geo Risks, Munich Re, 80802 Munich, Germany University of Genoa, 16145 Genoa, Italy CIMA Foundation, 17100 Savona, Italy

a r t i c l e

i n f o

Article history: Received 24 November 2009 Received in revised form 29 July 2010 Accepted 16 August 2010 Available online 31 August 2010 Keywords: Flood scenarios Skew-t distribution Flood risk management Multivariate processes Cross-entropy

a b s t r a c t The multivariate version of the skew-t distribution provides a powerful analytical description of the joint behavior of multivariate processes. It enjoys valuable properties: from the aptitude to model skewed as well as leptokurtic datasets to the availability of moments and likelihood analytical expressions. Moreover, it offers a wide range of extremal dependence strength, allowing for upper and lower tail dependence. The idea underneath this work is to employ the multivariate skew-t distribution to provide an estimation of the joint probability of flood events in a multi-site multi-basin approach. This constitutes the basis for the design and evaluation of flood hazard scenarios for large areas in terms of their intensity, extension and frequency, i.e. those information required by civil protection agencies to put in action mitigation strategies and by insurance companies to price the flooding risk and to evaluate portfolios. Performances of the skew-t distribution and the corresponding t copula function, introduced to represent the state of the art for multivariate simulations, are discussed with reference to the Tanaro Basin, North-western Italy. To enhance the characteristics of the correlation structure, three nested and non-nested gauging stations are selected with contributing areas from 1500 to 8000 km2. A dataset of 76 trivariate flood events is extracted from a mean daily discharges database available for the time period from January 1995 to December 2003. Applications include the generation of multivariate skew-t and t copula samples and models' comparison through the principle of minimum cross-entropy, here revised for the application to multivariate samples. Copula and skew-t based scenario return period estimations are provided for the November 1994 flood event, i.e. the worst on record in the 1801–2001 period. Results are encouraging: the skew-t distribution seems able to describe the joint behavior, being close to the observations. Marginal distributions derived from the skew-t multivariate fit are comparable to the observed ones, and the model is also able to describe the tail behavior. © 2010 Elsevier Ltd. All rights reserved.

1. Introduction One of the main consequences of the demographic and economic development and of markets and trades globalization is represented by risks cumulus. Risks are cumulated whenever a single realization of an event can lead, directly or indirectly, to a number of severe negative effects. In most cases, the cumulus of risks intuitively arises from the geographic concentration of a number of vulnerable elements in a single place, e.g., an urban, commercial or industrial district. Nevertheless, in many cases the cumulus of risks may be associated to financial or economic relations induced by the globalization process: by following those links, event's effects can emphasize and spread around the entire global society, hitting communities that are located far away from the area originally stroked. Finally, and primarily ⁎ Corresponding author. DIST, Università di Genova, Via Opera Pia, 13, 16145 Genova, Italy. Tel.: + 39 010 353 2983; fax: + 39 010 353 2948. E-mail address: [email protected] (G. Roth). 0309-1708/$ – see front matter © 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2010.08.003

for natural events, risks cumulus can be associated, in addition to intensity, also to event's extension. In this case, the magnitude can be such that large areas, that may include many regions or even large portions of different countries, are stroked by single, catastrophic, events. Among natural risks, the impact of the flooding hazard on society cannot be understated: total economic losses produced by major floods in Europe over the 1970–2006 period amounted to 140 billion – expressed in 2006 US$ normalized values – with an average annual flood loss of 3.8 billion [12]. As a direct consequence, the flooding hazard influences land use and territorial planning and development at both the physical and regulatory levels. To cope with, a variety of mitigation actions can be put in place: from the improvement of monitoring and alert systems to the development of hydraulic structures, throughout land use restrictions, civil protection, financial and insurance plans. All of those viable options present social and economic impacts, either positive or negative, whose proper estimate should rely on the assumption of appropriate – present and future –

1244

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

flood risk scenarios, here defined as quantitative event descriptions in terms of i) the flood hazard scenario, with its probability of occurrence, extension, intensity, and duration, ii) the value of the exposed elements and iii) their vulnerability. An increase in the ability to provide reliable hazard and risk scenarios is further required by insurance and financial markets since they begun in merging their interest in seeking capital to recover from natural catastrophes. In fact, the evaluation of instruments such as weather derivatives and catastrophe bonds, as well as the pricing of insurances against the flooding risk or the evaluation of insurance companies' portfolio in terms of Probable Maximum Loss (PML) and Maximum Possible Loss (MPL), all require a deep knowledge of probabilistic concepts applied to hazard and risk [20,36–38]. At present, scientific and technical communities active in the hydrologic and hydraulic fields has devoted initial attention to the design of flood scenarios, or ensembles of them, and to the evaluation of their frequency of occurrence. Instead, hydrologic and hydraulic models are usually applied for the identification of flood prone areas, and their characterization in terms of return period. The resulting maps, which generally represent the ensemble of all floods associated to the assigned return period, are commonly used to direct structural and non-structural interventions to minimize the flooding risk. This approach presents some limits: it is a useful tool for risk analysis in case of a single element exposed to the hazard, but it is not appropriate for risk analysis in case of multiple exposures, i.e. when more than one element is exposed to the hazard. In this second case the ability to provide reliable scenarios is required, with a particular emphasis when the goal is the simulation of flood events over large areas. It is therefore necessary to identify proper statistical methodologies, able to describe the multivariate aspects of the involved physical processes and their spatial dependence. For multiple elements, the cumulated flood risk is therefore a matter of event's intensity and extension, associated to locations and vulnerabilities of the elements that constitute the portfolio. Not only in hydrology and meteorology, but also in finance and insurance practice, it has early been recognized that classical statistical theory distributions (e.g., the normal and gamma families) are of restricted use for modeling multivariate spatial data. In this context, copulas can be viewed as alternative tools for dealing with multivariate simulations. In fact, they represent an alternative way of formalizing dependence structures of random vectors (see, e.g., [66]). Although copulas have been known since the work of Sklar [66], they have been rediscovered relatively recently in applied sciences. The interest in the copula function has grown up, in fact, in different fields: mathematical finance, statistics, extreme values theory, and risk management among others. Many authors have applied copulas to provide statistical models for a variety of hydrological processes, from rainfall [17,58–60] to streamflow [11,18,21,27,28,55,62] through soil moisture and extreme value analyses [16,19,33,34,42,63] (for a comprehensive reference list on the subject see, e.g., the IAHS working group on Statistic in Hydrology website, www.stahy.org). The copula theory can overcome many of the disadvantages of the standard approaches to multivariate analysis (e.g., marginal distributions do not have to belong to the same family) but, although appealing, it also presents drawbacks, especially when the number of variate increases. Moreover, difficulties derive from the lack of objective methods for copula family selection, and from the absence of efficient inferential methods (see, e.g., [46,48,55,57]). Although many advances have been made formalizing the practical steps involved in the copulas application and in fitting and goodness of fit testing [23,24], still this relatively new path has to be reinforced. In fact, it is difficult to apply to copulas well-established inferential theoretical basis, such as maximum likelihood and other new procedures that are not directly linked to standard inferential statistical theory [55]. One goodness of fit test based on information entropy is proposed later in the paper.

On the other hand, recent research efforts have been directed towards developing statistical models capable of describing the forms of asymmetry manifest in data sets. This, in particular, for the quite frequent case of phenomena whose empirical outcome behaves in a non-normal fashion, but still maintains some broad similarity with the multivariate normal distribution. Fruitful approaches were recognized in the use of flexible models, which include the normal distribution as a special or limiting case (e.g., the skew-normal or skew-t distributions). When sampling from a symmetric, heavy tailed distribution or when robustness issues are present, the t distribution is a common choice [41]. Unfortunately, the symmetry of the t distribution prevents its use for modeling skewed datasets. Branco and Dey [13] generalized the t distribution as well as the skew-normal by introducing the skew-t distribution: an asymmetric version of the multivariate Student's t distribution. Azzalini and Capitanio [7] explored some of its probabilistic and statistical properties. They also obtained a more tractable form of its PDF. The application of the bivariate skew-t distribution to a 63 years time series of hourly sea levels has shown that it fits the data better than the normal distribution [25]. Other applications of the skew-t distribution include some income distributions [8], coastal flooding [68], portfolio selection [47] and financial time series [46,57]. Among the variety of extremes in natural processes that could probably be described through the skew-t approach, the present paper constitutes an attempt to provide an estimation of the joint probability distribution able to describe flood events in a multi-site multi-basin fashion, an issue that at present is not effectively solved, through the multivariate skew-t distribution, which allows to analytically define the joint probability distribution. Once the distribution is known, and its parameters calibrated and validated, multi-site multi-basin flood scenario design will be possible in terms of intensity, extension and frequency, and hazards finally estimated for given portfolios, composed of wherever located risks. In this framework, the empirical estimate of the joint distribution provided by the multivariate copula function will be first introduced, and subsequently used as reference for comparison purposes. 2. Multivariate statistic methods 2.1. Copula function The name copula was first used in Sklar [66] to describe a function that links a multidimensional distribution to its one-dimensional marginals. The mathematic formulation comes from Sklar [67] and Nelsen [50]. An n-dimensional copula is the joint Cumulative Distribution Function (CDF) of any n-dimensional random vector with standard uniform marginal distribution, i.e., a function C:[0,1]n → [0,1]. Sklar's theorem states that for any n-dimensional CDF, F, with marginal distribution functions F1, F2, …, Fn, there exist a copula, C, such that F ðx1 ;::::: xn Þ = C ðF1 ðx1 Þ;:::::;Fn ðxn ÞÞ:

ð1Þ

The C function is used to model dependence between variables, while marginal distribution can be described in the usual way. Consequently, the copula function can be used to derive a multivariate distribution with the desired marginal distribution. Conversely, let F be a multivariate distribution with marginal distribution F1, F2, …, Fn, it can be proven that C exist [66], which means that any multivariate distribution can be written in the form of Eq. (1). Moreover, if all the marginal CDFs, Fi, are continuous, then C is unique. Another immediate consequences of the Sklar's theorem is that, for every u = (u1, …, un)∈[0,1]n   −1 −1 C ðu1 ;::::: un Þ = F F1 ðu1 Þ;::::: ;Fn ðun Þ 1 is the generalized inverse of Fi. where F− i

ð2Þ

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

An important property of the copula is that it does not change under strictly increasing transformations of the marginals. This allows the transformation of marginals from one continuous distribution to another without changing the copula: if the marginal Xi has a CDF Fi, 1 then G− (Fi(Xi)) has CDF Gi, and the copula does not change, since i 1 both Fi and G− are increasing. i For given marginal distributions, infinity of multivariate joint distributions can be derived, and a parametric family has to be chosen to model the dependence. Among others, copulas related to elliptical distributions are very useful in practical applications since they have several properties of the multivariate normal distribution. The most well-known elliptical copulas are the multivariate Gaussian copula and the multivariate Student t copula. A brief description of the t copula is provided here, as it will be used in the following (for a more extensive list see, e.g., [61]). Recent works (e.g., [14]), have shown that the empirical fit of the t copula is generally superior to that of the Gaussian copula. The main reason is in the ability of the t copula to better capture the phenomenon of dependent extreme values. The d-dimensional random vector X = (X1, …, Xd) is said to have a (nonsingular) multivariate t distribution with υ degrees of freedom, mean vector μ and positive-definite dispersion matrix Σ, denoted X ~ td (υ, μ, Σ), if its density is given by   υ+d !−υ + d 2 ðx−μ Þ′ Σ−1 ðx−μ Þ 2 : f ðxÞ =  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 + υ υ Γ ðπυÞd jΣj 2 Γ

ð3Þ

In this standard parameterization cov(X) = υ/(υ − 2)Σ, so that the covariance matrix is not equal to Σ, and it is only defined if υ N 2. As said before, the copula remains invariant under any series of strictly increasing transformations of the components of the random vector X, so it remains invariant also under a standardization of the marginal distributions. This means that the copula of a td(υ, μ, Σ) is identical to that of a td(υ, 0, P) distribution, where P is the correlation matrix implied by the dispersion matrix Σ. The unique copula is thus given by

t

Cυ;P ðuÞ = ∫

tυ−1 ðu1 Þ

−∞

…∫

tυ−1 ðud Þ

−∞

  υ+d !−υ Γ x′ P −1 x 2 υqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 + υ Γ ðπυÞd jPj 2

+ d 2

dx

ð4Þ

1 denotes the quantile function of a standard univariate tυ where t− υ distribution. Finally, the Gaussian copula can be thought as a limiting case of the t copula as υ → ∞.

2.2. Skew-normal distribution

• when multivariate homoscedasticity is required, this often requires a different transformation from the one for normality. Alternatively, there exist several other parametric classes of multivariate distributions to choose from, although the choice is not as wide as in univariate case (for a review see, e.g., [29]). In the last decades there is a general tendency in the statistical literature towards more flexible methods, to represent features of the data as adequately as possible and reduce unrealistic assumptions [6]. This tendency leads to a growing interest in the literature of parametric families of multivariate distributions that represent departures from the multivariate normal family. Recent research efforts have been directed towards developing statistical models capable of describing the forms of asymmetry manifest in data sets, with special attention to upper and lower tail dependence, which describe the dependence among extremes. This, in particular, for the quite frequent case of phenomena whose empirical outcome behaves in a non-normal fashion, but still maintains some broad similarity with the multivariate normal distribution. Possible solutions might be flexible models, which include the normal distribution as a special or limiting case. One of such models is the skew-normal distribution. The term Skew-Normal (SN) is used to indicate a parametric class of probability distributions that extends the normal distribution by an additional shape parameter that regulates the skewness, allowing for a continuous variation from normality to non-normality. From a theoretical point of view, the SN class has the advantage of being mathematically tractable and of having a good number of properties in common with the standard normal distribution (e.g., SN densities are unimodal). The systematic treatment of the SN class in the scalar case is due to Azzalini [4,5]; subsequently, Azzalini and Dalla Valle [9] introduced a multivariate version of the skew-normal density, while Azzalini and Capitanio [6] examined further probabilistic properties of the distribution, investigating also the most relevant statistical aspects. The skew-normal distribution appears in stochastic frontier analysis [2] and in threshold models [3]. Some other applications of the SN class include the analysis of financial, property and labor markets [15,44,52], the study of the distribution of relative-price changes which influence the rate of inflation [10], and to model visual acuity data [43]. Mukhopadhyay and Vidakovic [49] chose it as a prior distribution in Bayesian analysis. It is important to cite Adcock and Shutes [1], as they present a theoretical exposition of a model based upon the multivariate skew-normal distribution for portfolio selection. A random variable X is standard skew-normal distributed with shape parameter λ, X ~ SN(λ), if its density is of the form φðx; λÞ = 2ϕðxÞΦðλ xÞ

For the treatment of continuous multivariate observations within a parametric approach, it is necessary to point out the overwhelming role played by the assumption of normality, which underlies most of methods used for multivariate analysis. A major reason for this is certainly due to the unrivalled mathematical tractability of the multivariate normal distribution, in particular its simplicity when dealing with fundamental operations like linear combinations, marginalization and conditioning, and indeed its closure under these operations. From a practical viewpoint, the most commonly adopted approach is the transformation of the variables to achieve multivariate normality. While in a number of cases this works satisfactorily, there are however problems: • transformations are usually on each component separately, and achievement of joint normality is only hoped for; • transformed variables are more difficult to deal with as for interpretation, especially when each variable is transformed using a different function;

1245

−∞ b xb∞;−∞ b λ b ∞

ð5Þ

where ϕ(⋅) and Φ(⋅) are the standard normal density and the cumulative distribution function of a standard normal distribution, respectively. When the shape parameter equals zero, φ(x;λ) is the Probability Density Function (PDF) of a normal distribution. As λ → + ∞ the distribution tends to the standard folded normal distribution, i.e. that of X = ∣Z∣, where Z ~ N(0,1). The distribution SN (−λ) is the reflection of SN(λ) in x = 0, and hence as λ → − ∞ the limiting density is the standard negative folded normal distribution, i.e. that of X = −∣Z∣. In practice, to fit real data it is possible to generalize the expression with the introduction of location and scale parameters, ξ and ψ, respectively, with ξ∈R and ψ N 0. Thus, if X ~ SN(λ), then Y = ξ + ψX is a skew-normal random variable, with PDF

fSN ðy;ξ;ψ;λÞ =

    2 y−ξ y−ξ ϕ Φ λ : ψ ψ ψ

ð6Þ

1246

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

The following properties come from the definition: 1. 2. 3. 4.

the PDF of the SN(0) is identical to the PDF of the N(0,1); as λ → + ∞, φ(x;λ) tends to 2ϕ(z)Iz N 0, which is the half normal PDF; if Z ~ SN(λ), then Z ~ SN(−λ); the PDF of an SN(λ) random variable is unimodal, because log (φ(y;λ)) is a concave function of y; 5. if Y ~ N(0,1) and Z ~ SN(λ) then ∣Z∣ and ∣Y∣ have the same PDF.

where Z is skew-normal distributed with ξ = 0 and V ~ χ2/υ, independent of Z. Some algebra leads to the density of Y, which is, in the multivariate case

T

−1

fY ð yÞ = 2tp ð y;υÞT1 λ ω

!1 = 2 ! υ+p ðy−ξÞ ;υ + p Qy + υ

ð11Þ

where: The multivariate extension of the above introduced univariate skew-normal distribution appears to be very important, since in the multivariate case there are few distributions available for dealing with non-normal data, primarily when the skewness of the marginals is quite moderate. In the standard form, a random vector y = (Y1,.... YP)T   is p-dimensional skew-normal, y e SNP Ω; λ , if it is continuous with PDF    T  φðyÞ = 2ϕP y; Ω Φ λ y

ð7Þ

  with y∈RP, and ϕP y; Ω denotes the PDF of the p-dimensional multivariate normal distribution with standardized marginals and correlation matrix Ω. In the same way, as in the univariate case, we can refer to λ as the vector of shape parameters, even if its derivation is more complex: when λ = 0, we recover the regular multinormal density. Introducing a full rank d × d covariance matrix Ω = (ωrs), it is possible to define    1 ω = diag ω1 ;:::::;ωp = diag ω11 ;:::::;ωpp 2

ð8Þ

and Ω = ω−1 Ωω−1 is the associated correlation matrix. Let ξ, λ ∈ Rp we can write   T −1 y e SNP ðξ; Ω; λÞ = 2ϕP ð y−ξ;ΩÞΦ λ ω ðy−ξÞ

ð9Þ

and make reference to ξ, Ω, λ as the location, dispersion and shape or skewness parameters, respectively. By varying λ, it is possible to obtain a variety of shapes. Applications of the skew-normal distribution to real datasets might be problematic due to its limited range of skewness and kurtosis, since the third and fourth standardized cumulants do not exceed one in absolute value [4]. Moreover, the Fisher information matrix is singular when the shape parameter equals zero and maximum likelihood estimate of the shape parameter might be infinite with positive sampling probability [6,51]. 2.3. Skew-t distribution When sampling from a symmetric, heavy tailed distribution or when robustness issues are present, the t distribution is a common choice [41]. The usual construction of the t distribution is via the ratio of a normal variate and an appropriate transformation of a chi-square. Unfortunately, the symmetry of the t distribution prevents its use for modeling skewed datasets. If one wants to introduce an asymmetric variant of the t distribution, a quite natural option is to replace the normal variate by a skew-normal one [4,5,9]. It is thus possible to link the skew-t distribution with the skew-normal one through

Y =ξ+

Z V1=2

ð10Þ

   1 ω = diag ω1 ;:::::;ωp = diag ω11 ;:::::;ωpp 2 T

−1

Qy = ðy−ξÞ Ω

tp ðy;υÞ =

ðy−ξÞ

ð12Þ ð13Þ

  υ+p   Qy − 2 1 Γ½ðυ + pÞ = 2 g Q ;υ = 1 + : p y υ jΩj1 = 2 jΩj1 = 2 Γðυ = 2ÞðυπÞd = 2 ð14Þ

Here tp(y,υ) is the PDF of a p-dimensional t variate with υ degrees of freedom. The notation T1(x;υ + p) denotes the scalar t cumulative distribution function with υ + p degrees of freedom. It is therefore possible to call fY(y) skew-t and write Y e Stp ðξ;Ω;λ;υÞ:

ð15Þ

The PDF of a multivariate skew-t reduces to the one of the multivariate t distribution tp(ξ, Ω, υ) when the shape parameter is null, i.e., λ = 0. Moreover, with υ → + ∞, the skew-normal distribution is a limiting case of the skew-t distribution. Alternative univariate skew-t distributions, constructed similarly to the so-called two-piece normal density, have been proposed [22,30]. Jones and Faddy [32] introduced further developments, based on a suitable transformation of a beta density. A multivariate form of the skew-t distribution has been proposed [31], but the associated inferential aspects have not been discussed. The alternative form of multivariate skew-t distribution considered by Sahu et al. [56] coincides with the presented PDF in the case p = 1; for general p, their density involves the multivariate t distribution function. The multivariate skew-t distribution enjoys various useful formal properties. Among them, one could notice that skewness is unbounded and kurtosis can take any values greater than 3. Hence, the skew-t distribution can model very skewed as well as very leptokurtic datasets. Moreover, we would like to call attention on the fact that, while the skew-normal family does not lead to fat tails and is therefore unsuitable for predicting extremes, the skew-t distribution is suitable for this usage [56]. As regard to extremal dependence in its multivariate version, it can be proven that the skew-t belongs to the one of the generating family, i.e., asymptotically dependent. However, within its limiting class, it can provide a wider range of extremal dependence strength than the generating distribution. In addition, the skew-t distribution allows different upper and lower tail dependence. All these characteristics are of crucial importance when extremes are of concern, as in the following application. Finally, analytical expressions of the moments and of the likelihood are also available. Moments' equations can be used, in the univariate and in the multivariate case, as an instrument for the estimation of the distribution's parameters. With reference to the inferential aspects, maximization of the likelihood or log-likelihood function must be accomplished numerically. To improve efficiency, the derivatives of the log-likelihood can be supplied to an optimization algorithm. For a discussion on the inferential properties of the skew-t distribution see Ghizzoni [26].

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

2.4. Model's comparison via the principle of minimum cross-entropy Describing multivariate joint distributions of flood events, no quantitative goodness of fit test is available to help in selecting the best model. In this work the principle of minimum cross-entropy will be used to compare model's results. Suppose to have an observed random sample q1, q2, …, qn: there may be many probability distributions consistent with given constraints (e.g., moments of the distribution). Out of those, it is necessary to choose the one that is nearest to the given distribution. The principle used to find the nearest distribution is similar to the principle of maximum entropy. The concept of information entropy was introduced by Shannon [65] as a measure for information lost in communication across a noisy channel. In our application, instead of saying that we should be as uncertain about the missing information as possible, we should be as near as possible to the supposed distribution. According to this principle, we need a measure of distance or discrepancy or divergence of a supposed distribution, P, from a given empirical distribution, Q. Such a measure was given by Kullback and Leibler [39], even before the principle of maximum entropy was explicitly stated. This measure is n

DðP : Q Þ = ∑ pi ln i=1

pi qi

ð16Þ

It can be easily verified that: • D(P:Q) ≥ 0; • D(P:Q) = 0 if and only if pi = qi for each i; • D(P:Q) is a convex function of p1, p2, …, pn so that when it is minimized its local minimum is also its global minimum. Therefore, minimizing D(P:Q) would be equivalent to maximizing Shannon's measure of entropy. According to Kullback's [40] principle of minimum cross-entropy, given a priority distribution Q it is necessary to choose the probability distribution P which minimizes the cross-entropy. For our purpose, the estimation of the cross-entropy can be used as a measure of distance between the observed sample and the simulated one via copula function or skew-t distribution. In particular, the distance D between the distributions in a n-variate framework has been evaluated by dividing the space of the observations in x1, x2, …, xn bins and calculating the empirical frequency of the simulated sample, p, and of the observed one, q, in each bin  x1

 x2

 xn

DðP : Q Þ = ∑ ∑ ::::: ∑ pðx1 ; x2 ;…;xn Þ ln x1 = 1 x2 = 1

xn = 1

pðx1 ;x2 ;…;xn Þ qðx1 ;x2 ;…;xn Þ

ð17Þ

where p(x1, x2, …, xn) is the simulated empirical probability for the x1, x2, …, xn bin, and q(x1, x2, …, xn) is the observed empirical probability for the x1, x2, …, xn bin. To evaluate the value of D one remark is necessary: the argument of the logarithm has to be defined, and then the empirical probability in each bin has to be greater than zero. In the univariate case, it is easy to satisfy this constrain by bounding the analysis between the minimum and the maximum observed values. Unfortunately, this is not sufficient in most multivariate cases: in fact, due to the correlation structure, there may be many bins with observed or simulated empirical probability equal to zero between the lower and upper bounds. The bins with null values don't have to be taken into account in the estimation of D; anyway a detailed analysis of this aspect has to be provided in order to compare the differences between the copula function and the skew-t. The aim is the estimation of the cross-entropy for the trivariate copula and the trivariate skew-t: the lower value should highlight which is the simulation closest to the observed sample.

1247

The first aspect to be taken into account is the number of bins in which the space of observations should be partitioned. It has been shown [64] that the optimal histogram bin size, which provides the most efficient and unbiased estimation of the probability density function, is achieved when W = 3:49σ ⋅ N

−13

ð18Þ

where W is the width of the histogram bin, σ is the standard deviation of the distribution, and N is the number of available samples. 3. Application to the Tanaro basin This work is intended to verify the ability of flexible models, specifically joint probability distributions pertaining to the skew-t family, to improve the description of multi-site flood events. In this framework, the copula function will be introduced to represent the state of the art for multivariate simulations. For the sake of simplicity in the results' analysis and presentation, performances of the skew-t and of the corresponding t copula will be discussed with reference to the trivariate case of the Tanaro basin, presented in Fig. 1. The proposed gauging station selection include locations i) close enough to provide some amount of cross-correlation among discharge values; ii) nested and non-nested to check model flexibility; and iii) closing drainage basins of different contributing area (here assumed as a surrogate variable for discharge) to account for both river and torrentlike behaviors. 3.1. Basin corography and relevant databases Tanaro is a 276 km-long river in North-western Italy. It rises in the Ligurian Alps, close to the border with France and is the most significant right-side tributary to the Po River in terms of length, drainage area (partly Alpine and partly Apennine) and discharge. At its junction with the Po River, the Tanaro drains about 8000 km2, 500 km2 of which are in mountainous terrain. Major tributaries are the Stura di Demonte (contributing area 1430 km2), Alto Tanaro (1840 km2 ), Belbo (480 km2 ), Bormida (1640 km2) and Orba (840 km2). Morphological variability allows for the identification of three main zones with characteristic behaviors. The mountain part, with an average 6% slope, very steep catchments and deep river beds; the mild part, with an average 1% slope, shallower river beds and mildly steep catchments; finally, the alluvial part, characterized by very low slope values. Unique among the Po right-side tributaries, the river has an Alpine origin. Nevertheless, the Ligurian Alps have a not high enough altitude, and are located too close to the sea, to allow for the formation of snow fields or glaciers large enough to provide a steady source of water during the dry season. Furthermore, the Alpine zone forms only a part of the basin drained by the Tanaro. The discharge is therefore subject to a great deal of variation, and the seasonal regime of the river is therefore more typical of an Appenine torrent, with maximum discharges in spring and autumn and a very small flow rate in summer. The river is highly prone to flooding. During the two hundred year period 1801–2001, the Tanaro basin was affected by floods on 136 occasions, the most devastating being that of November 1994 when the whole of the river valley was affected by severe flooding and the town of Alessandria was especially stricken [45]. For the purpose of the present work, discharge time series recorded in the following gauging stations have been selected: Tanaro at Alba with a drained area of 3415 km2, Bormida at Cassine with a drained area of 1508 km2, and Tanaro at Montecastello with a drained area of 7985 km2. The location of these three gauging stations is reported in Fig. 1, their main characteristics are summarized in Table 1. From these, one could notice that the Montecastello station is

1248

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

Fig. 1. Location map of the Tanaro basin, North-western Italy, and of the gauging stations of Alba (contributing area 3415 km2), Cassine (1508 km2), and Montecastello (7985 km2).

located downstream of both Alba and Cassine, and that the subcatchments drained by these two stations are not nested. Moreover, drainage areas of Alba and Cassine are significantly different. Different correlation structures can be therefore expected in analyzing multivariate joint behaviors. A dataset of mean daily discharges is available for the time period from January 1995 to December 2003 [54]. Dealing with multivariate approach to flood events, the construction of a proper statistical population is an open issue. The usual approach to univariate studies of extreme events is simply to extract a subset from the available observations, e.g. by selecting only annual maxima, and to fit to this new statistic population a proper distribution, e.g. Gumbel. In multivariate cases this procedure cannot be applied, since annual maxima are generally not synchronous, i.e. they are observed in different stations at different time. Selecting only the annual maxima, regardless of the time at which they are experienced, will modify the spatial correlation. Selecting stations' annual maxima plus concomitant discharge values will produce, at the

Table 1 Main characteristics for the selected gauging stations and their contributing drainage basins. Gauging station

Drainage basin

Area [km2]

Elevation [m a.s.l.]

Average basin elevation [m a.s.l.]

Flow concentration time [h]

Alba Cassine Montecastello

Tanaro Bormida Tanaro

3415 1508 7985

172 123 216

1050 493 663

14.5 9.5 22.5

annual scale of analysis, a non-homogeneous subset composed of a mixture of annual maxima and medium-range discharge values in the proportion 1 to (n − 1), where n is the number of stations. In the present work, main events are first chosen on the basis of the Montecastello record (i.e., the gauging station with the largest drained area) according to a peak over threshold methodology operating at the 0.80 quantile threshold. To create a trivariate sample, maximum discharge values observed for Alba and Cassine within a 3day time window are identified and associated to each Montecastello event. This first step singles out 62 trivariate events. Second, the event dataset is improved by adding severe but localized events for Alba and Cassine. For these stations, a 0.95 quantile threshold respectively selects 38 and 40 events, some of which already contained in the Montecastello events dataset. The overall procedure selects 76 trivariate events. 3.2. Copula function approach The first analysis has been obtained via the copula approach, based on the idea that a simple transformation can be made of each marginal in such a way that each transformed variable has a uniform distribution. As a consequence, the first step consists in the identification of the appropriate marginal distribution for each variate: the results of this analysis are shown, in terms of PP plot and density function, in Fig. 2. For the specific case under analysis, the generalized extreme value distribution has been identified as the best distribution for fitting the marginals; its estimated parameter sets are reported in Table 2. All the fits have been verified via the Kolmogorov– Smirnov and the Anderson–Darling tests (KS p-values are 0.77, 0.74

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

1249

Fig. 2. Marginal distribution fitting in terms of probability density function and PP plot for Alba, Cassine, and Montecastello gauging stations.

and 0.86 and AD p-values are 0.67, 0.51 and 0.69 respectively for Alba, Cassine and Montecastello). The second step consists in the evaluation of the rank of correlation among variables: the Spearman's Rho has been therefore calculated. Finally, the analysis of the transformed sample allows proper copula identification. In this case a t copula has been chosen; the estimated parameter set is reported in Table 3. A multivariate random sample has been generated from a t copula with rank of correlation ρ by using a Monte Carlo simulation, and each marginal retransformed by using the inverse of the CDF. This way a simulated sample congruent with the observed one in terms of both marginal distribution and correlation structure has been obtained (Fig. 3). 3.3. Skew-t approach The selected dataset has been analyzed by applying a trivariate skew-t distribution. Fig. 4 shows bivariate scatter plots for daily Table 2 Estimated parameters and 0.95 confidence intervals for the copula marginal GEV distributions. Gauging station

Shape

Scale

Location

Alba 0.37 (0.15÷0.60) 105.78 (84.04÷133.16) 148.11 (120.60÷175.62) Cassine 0.86 (0.48÷1.25) 56.05 (40.08÷78.38) 52.64 (36.33÷68.95) Montecastello 1.02 (0.63÷1.40) 150.37 (104.88÷215.59) 270.32 (227.85÷312.78) Table 3 Estimated parameters for the t copula and 0.95 confidence intervals for the degrees of freedom. Linear correlation matrix

Degrees of freedom

1.00 0.36 0.66 0.36 1.00 0.78 0.66 0.78 1.00

5.28 (1.07÷10.49)

discharge recorded at the three selected stations with superimposed contours of the fitted trivariate skew-t distribution. The plots reveal both skewness and heavy tails. To verify the applicability of this model to the data sample a goodness of fit test has been used. The trivariate fit, in terms of both PP plot and QQ-plot, is shown in Figs. 5 and 6, respectively. The comparison indicates how the fit to the data provided by the skew-t distribution is markedly superior to the one provided by the normal distribution. The maximum likelihood method has been used to estimate the trivariate skew-t parameters (ξ = [121.40, 26.64, 152.65]; Ω = [9194.73, 4324.16, 18008.14; 4324.16, 7849.39, 21404.86; 18008.14, 21404.85, 69109.11]; λ = [18.09, 83.85, 703.39] and υ = 2.52). While the full list of estimated parameters is not of great interest, it is important to observe that the estimated degrees of freedom confirms the presence of longer tails than the normal distribution has. Moreover, it indicates the presence of extreme discharge values and corroborates the behavior highlighted by the bivariate scatter plots. After the estimation of the parameters values, the next step consists in the generation of a trivariate skew-t random sample, Stp(ξ, Ω, α, υ), congruent with the observed one. To do this it is necessary to: • generate a random sample from a skew-normal distribution with zero location, and the same scale and shape parameter fitted for the selected events with the ML method, Z ~ SN(0, Ω, α); • generate a χ2 random variable with the same degrees of freedom obtained with the ML method, Y ~ χ2(υ); • multiply Z by (υ/Y)1/2 and add ξ to this product, obtaining ξ+(υ/Y)1/2Z~ Stp(ξ, Ω, α, υ). This way a simulation of multivariate sample has been obtained. In Fig. 7 the simulations' results are shown in terms of bivariate projections: results seem encouraging and the skew-t distribution seems to be able to describe the joint behavior. Once the patterns of the observed data and of the simulated ones have been analyzed, it is very interesting to analyze marginal distributions' behavior. While the copula function is based on the

1250

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

Fig. 3. Bivariate projections for the observed (•) and the t copula (+) simulated samples for Alba, Cassine, and Montecastello.

Fig. 4. Bivariate scatter plots of observed discharge values (•) for Alba, Cassine, and Montecastello and fitted trivariate log-skew-t contours.

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

1251

Fig. 5. Healy plots for normal and skew-t distributions.

idea of separately analyzing the marginal distributions and the correlation structure – and to select, for each single variable, its best marginal distribution – in the case of the skew-t distribution each marginal is obtained from a multivariate fit. So, it is necessary to verify if and how this multivariate distribution, which seems able to describe well the joint behavior, describes the marginals. In Fig. 8 marginal distributions' histograms for the observed sample and for the simulated one, via trivariate skew-t distribution, are shown for each

station. All the fits have been verified via the Kolmogorov–Smirnov and the Anderson–Darling tests (KS p-values are 0.45, 0.3 and 0.50, respectively for Alba, Cassine and Montecastello and AD p-values are 0.20 and 0.19, respectively for Alba and Montecastello). The AD test is not passed for Cassine. With respect to these results, it is important to note that skew-t marginal distributions come from a trivariate fit, i.e. results are obtained by comparing univariate observed samples with the simulated one obtained via a trivariate fit.

Fig. 6. QQ plots for normal and skew-t distributions.

1252

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

Fig. 7. Bivariate projections for the observed (•) and the log-skew-t (*) simulated samples for Alba, Cassine, and Montecastello.

Fig. 8. Marginal distribution histograms for the observed (continuous line) and the log–skew-t simulated samples (dotted line).

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

3.4. Model's comparison The cross-entropy method has been applied to compare the results of simulations obtained for the Tanaro basin. To estimate the number of bins, the estimated standard deviation of the observed sample, s, for each marginal has been used with N = 76 (i.e., the length of the observed sample). This way, each marginal has been divided in 6 bins. According to this division, the values of cross-entropy are D = 0.093 for the copula function and D = 0.047 for the skew-t distribution. As said before, it is necessary to choose the probability distribution that minimizes the cross-entropy: as a consequence the skew-t distribution seems to be the best model. One could question i) if the cross-entropy value results to be a function of the number of bins and in which way, ii) if the skew-t distribution presents a lower D value only for optimal bins, or finally iii) if the skew-t is always the closest distribution to the observed empirical one. Trying to answer such questions, D values for the copula function and the skew-t distribution were evaluated for different bin numbers. In Fig. 9 the cross-entropy, D, evaluated only for those bins with observed empirical probability greater than zero, is reported against the number of bins, N, in which each marginal is divided, producing, in this trivariate case, a total bin number equal to N3. Even if the principle of minimum cross-entropy is not a quantitative goodness of fit test, one can assume that, at least, it provides a first effort for the identification of the best joint behavior model. In this perspective, the results reported in Fig. 9 show that the two approaches seem to be equivalent, none of them being always closer to the observations, independently from the selected number of bins. 4. Scenario's return period estimation In hydrology, many efforts are devoted to the development of flood frequency analysis techniques able to link the event magnitude and its frequency, expressed in terms of return period, i.e., the average number of years between events of magnitude equal to or greater than a given one. On the other side, events for given return periods are generally adopted for design purposes. Indeed, the return period provides a very simple and efficient basis for risk analysis. Most of the literature investigated the flood event return period using a univariate approach: the estimation of the return period associated with a given section is a well-known problem. But, as a matter of fact, hydrologic events should be better characterized by a description of the joint behavior of several, usually non-independent, variables. To mention few examples, it is possible to consider storm duration and intensity, flood peak and volume or the duration–magnitude–intensity char-

1253

acteristics of drought events. As a consequence, there is a growing interest in the estimation of the multivariate return period of hydrologic events, also because it has been recognized that univariate approaches can lead to both over and under-estimates of the associated risks (e.g., [17,35,53]). In spite of the relevance of the related applications, it is quite difficult to find literature related to the estimation of the return period of flood scenarios, i.e., in which the problem of the estimation of the return period of a flood that affects more than one section is approached. To cope with this topic, it is necessary to write the multivariate cumulative distribution function. Unfortunately, an analytical expression of the multivariate skew-t CDF is not available: this means that the return period has to be calculated via a multivariate integration of the probability density function between the origin and the value of the observed discharges as qˆ



qˆ 3

0

0

0

∫ 1 ∫ 2 …∫

fSt ð y;ξ;Ω;α;υÞdq1 dq2 …dqn

ð19Þ

where fSt(y;ξ, Ω, α, υ) is the multivariate skew-t density function. The assessment of the return period of a multivariate flood event could be derived from the definition of extremal events proposed by Salvadori et al. [61]: a bivariate event (X, Y) is defined as extreme if i) either X or Y exceed given thresholds or ii) both X and Y are larger than the prescribed values. We estimate the return period of the event by using the second hypothesis, i.e. identifying events that, for all the stations, present discharge values greater than given values. According with the procedure introduced for events selection, a 3-days time window is used for peak identification. The November 1994 flood event, i.e. the worst among the 136 flood events observed in the Tanaro basin during the two hundred year period 1801–2001 [45], has been assumed as case study for the present analysis. For simplicity, the multivariate return period has been estimated empirically, i.e., the return period has been calculated as an empirical frequency. To obtain a multivariate skew-t sample congruent with the observed statistical properties, the multivariate skew-t parameter set estimated in the previous paragraph by using maximum likelihood method have been used in conjunction with a Monte Carlo simulation procedure based on a conditioning method. To reach convergence in terms of return period a long sample was generated, from which the return period associated with the November 1994 scenario has been estimated to be about 150 years. When the same procedure is applied to the t copula, the estimated return period is about 300 years. Unfortunately, there is no possibility of comparing multivariate skew-t results with alternative estimates, as no study of this sort is published in the literature or available. To verify the consistency between multivariate estimates and return period values obtained via a standard statistical approach, a univariate skew-t model was constructed for each gauging station. For the November 1994 flood event, a good congruence has been observed, in particular for Alba and Montecastello, which return period estimates are respectively of 100 and 20 years, close to the values proposed in literature for this event (e.g., [54]). Finally, a simple control can be done with respect to the return periods evaluated from the marginals: the event's frequency should be smaller than any of its marginal frequencies, and this is actually the case for both the multivariate skew-t distribution and the copula function approaches. 5. Conclusions

Fig. 9. Copula function (+) and skew-t (*) distribution cross-entropies as a function of the number of bins.

In this study we investigated the significance of the multivariate skew-t distribution in describing flood events from a multi-site perspective. Results obtained with reference to the application to a trivariate case for the Tanaro basin seem to indicate skew-t performances similar to those provided by the state of the art approach to multivariate hydrologic processes, i.e., the copula function.

1254

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255

The skew-t advantage associated with the availability of analytical expressions for the moments and the likelihood are important in estimating the distribution's parameters, even if maximization of the likelihood function must be accomplished numerically. Unfortunately, benefits derived from the availability of the analytical expression of the skew-t PDF are not extended to the CDF, an analytical expression of which is not yet available. Therefore, return periods of flood scenarios are supposed to be estimated via a multivariate PDF numerical integration between the origin and the value of the observed discharges or through empirical approaches based on the generation of long samples, as accomplished for the Tanaro application. On the basis of overall performances, we are confident that the ability of flexible models, specifically joint probability distributions pertaining to the skew-normal and skew-t families, to improve the description of multivariate hydrologic processes will foster their diffusion in the hydrologic community. Acknowledgements We gratefully acknowledge Nicola Loperfido, University of Urbino, Italy, for his helpful advice during research work and for his comments on some sections of the manuscript. Angela Celeste Taramasso, University of Genoa, Italy, is also greatly acknowledged for her advice. This research was partially supported by the Italian National Civil Protection Department. References [1] Adcock CJ, Shutes K. Portfolio selection based on the multivariate skew-normal distribution. In: Skulimowski A, editor. Financial modelling. Krakow: Progress & Business Publishers; 2001. [2] Aigner DJ, Lovell CA, Schimdt P. Formulation and estimation of stochastic frontier production function model. J Econometrics 1977;12:21–37. [3] Andel J, Netuka I, Zvàra K. On threshold autoregressive processes. Kybernetika 1984;20:89–106. [4] Azzalini A. A class of distributions which includes the normal ones. Scand J Stat 1985;12:171–8. [5] Azzalini A. Further results on a class of distributions which includes the normal ones. Scand J Stat 1986;46:199–208. [6] Azzalini A, Capitanio A. Statistical applications of the multivariate skew-normal distributions. J R Stat Soc B 1999;61:579–602. [7] Azzalini A, Capitanio A. Distribution generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J R Stat Soc B 2003;65:367–89. [8] Azzalini A, Dal Cappello T, Kotz S. Log–skew-normal and log–skew-t distributions as model for family income data. J Income Distrib 2003;11:12–20. [9] Azzalini A, Dalla Valle A. The multivariate skew-normal distribution. Biometrika 1996;83:715–26. [10] Ball L, Mankiw NG. Relative-price changes as aggregate supply shocks. Q J Econ CX 1995;110:161–93. [11] Bárdossy A, Pegram G. Copula based multisite model for daily precipitation simulation. Hydrol Earth Syst Sci 2009;13(12):2299–314. [12] Barredo JI. Normalised flood losses in Europe: 1970–2006. Nat Hazards Earth Syst Sci 2009;9:97–104 www.nat-hazards-earth-syst-sci.net/9/97/2009/. [13] Branco M, Dey D. A general class of skew-elliptical distributions. J Multivariate Anal 2001;79:99–113. [14] Breymann W, Dias A, Embrechts P. Dependence structures for multivariate highfrequency data in finance. Quant Finance 2003;3:1–14. [15] Caudill SM. Estimating the cost of partial coverage rent controls: a stochastic frontier approach. Rev Econ Stat 1993;75:727–31. [16] Chebana F, Ouarda T. Index flood-based multivariate regional frequency analysis. Water Resour Res 2009;45, doi:10.1029/2008WR007490. [17] De Michele C, Salvadori G. A generalized Pareto intensity–duration model of storm rainfall exploiting 2-copulas. J Geophys Res 2003;108:4067. [18] De Michele C, Salvadori G, Canossi M, Petaccia A, Rosso R. Bivariate statistical approach to check adequacy of dam spillway. ASCE J Hydrol Eng 2005;10:50–7. [19] Durante F, Salvadori G. On the construction of multivariate extreme value models via copulas. Environmetrics 2010;21(2):143–61, doi:10.1002/env.988. [20] Elsner JB, Burch RK, Jagger TH. Catastrophe finance: an emerging discipline. EOS 2009;90:281–2. [21] Favre AC, El Adlouni S, Perreault L, Thiémonge N, Bobée B. Multivariate hydrological frequency analysis using copulas. Water Resour Res 2004;40:W01101, doi: 10.1029/2003WR002456. [22] Fernandez C, Steel MFJ. On Bayesian modelling of fat tails and skewness. J Am Statist Assoc 1998;93:359–71. [23] Genest C, Favre A-C. Everything you always wanted to know about copula modeling but were afraid to ask. J Hydrol Eng 2007;12(4):347–68.

[24] Genest C, Favre A-C, Béliveau J, Jacques C. Metaelliptical copulas and their use in frequency analysis of multivariate hydrological data. Water Resour Res 2007;43: W09401, doi:10.1029/2006WR005275. [25] Genton MG, Thompson KR. Skew-elliptical time series with application to flooding risk. In: Brillinger DR, Robinson EA, Schoenberg FP, editors. Time series analysis and applications to geophysical systems, 139. IMA Volume in Mathematics and its Applications; 2003. p. 169–86. [26] Ghizzoni T. Accumulation risk analysis for the flooding hazard. PhD dissertation, University of Genoa, Italy; 2009. [27] Grimaldi S, Serinaldi F. Asymmetric copula in multivariate flood frequency analysis. Advan Water Resour 2006;29:1155–67. [28] Grimaldi S, Serinaldi F, Napolitano F, Ubertini L. A 3-copula function application for design hyetograph analysis. In: Dragan A, et al, editor. Sustainable water management solutions for large cities, 293. IAHS Publ.; 2005. ISBN 1-901502-97-X. 203-212. [29] Johnson NL, Kotz S. Distributions in statistics: continuous multivariate distributions. New York: Wiley; 1972. 333 pp. [30] Jones MC. A skew-t distribution. In: Charalambides CA, Koutras MV, Balakrishnan N, editors. Probability and statistical models with applications: a volume in honor of Theophilos Cacoullos. London: Chapman and Hall; 2001. p. 269–78. [31] Jones MC. Multivariate t and beta distributions associated with the multivariate F distributions. Metrika 2002;54:215–31. [32] Jones MC, Faddy MJ. A skew extension of the t-distribution, with applications. J R Stat Soc B 2003;65:159–74. [33] Kao S-C, Govindaraju RS. Trivariate statistical analysis of extreme rainfall events via the Plackett family of copulas. Water Resour Res 2008;44:W02415, doi: 10.1029/2007WR006261. [34] Kao S-C, Govindaraju RS. A copula-based joint deficit index for droughts. J Hydrol 2010;380(1–2):121–34, doi:10.1016/j.jhydrol.2009.10.029. [35] Kim T, Valdés B, Yoo C. Nonparametric approach for estimating return periods of droughts in arid regions. J Hydrol Eng 2003;8:237–46. [36] Kreibich H, Muller M, Thieken AH, Merz B. Flood precaution of companies and their ability to cope with the flood in august 2002 in Saxony, Germany. Water Resour Res 2007;43:W03408, doi:10.1029/2005WR004691. [37] Kron W. Flood risk = hazard.value.vulnerability. Water Int 2005;30:58–68. [38] Kron W, Willems W. Flood zoning and loss accumulation analysis for Germany. Proc of the International conference on flood estimation. Switzerland: Berne; 2002. p. 549–58. [39] Kullback S, Leibler RA. On information and sufficiency. Ann Math Stat 1951;22:79–86. [40] Kullback S. On information theory and statistics. New York: John Wiley; 1959. 399 pp. [41] Lange KL, Little JA, Taylor MG. Robust statistical modelling using the t distribution. J Am Stat Ass 1989;84:881–96. [42] Leonard M, Metcalfe A, Lambert M. Frequency analysis of rainfall and streamflow extremes accounting for seasonal and climatic partitions. J Hydrol 2008;348(1–2): 135–47. [43] Loperfido N. Modelling maxima of longitudinal contralateral observations. Test 2007;17:370–80. [44] Louis H, Blenman LP, Thatcher JS. Interest rate parity and the behavior of the BidAsk Spread. J Financ Res 1999;22:189–206. [45] Luino F. Flooding vulnerability of a town in the Tanaro basin: the case of Ceva (Piedmont–Northwest Italy). In: Thorndycraft VR, Benito G, Barriendos M, Llasat MC, editors. Proc PHEFRA Workshop “Palaeofloods, historical data & climatic variability: application in flood risk assessment”; Barcelona, Spain; 2002. p. 321–6. [46] MacKenzie D. End-of-the-world trade. Lond Rev Books 2008;30:24–6. [47] Meucci A. Beyond Black–Litterman: views on non-normal markets. Risk 2006;19: 87–92. [48] Mikosch T. Copulas: tales and facts. Extremes 2006;9:3–20. [49] Mukhopadhyay S, Vidakovic B. Efficiency of linear bayes rules for a normal mean: skewed prior class. J R Stat Soc D 1995;44:389–97. [50] Nelsen RB. An introduction to copulas. New York: Springer-Verlag; 1998. 265 pp. [51] Pewsey A. Problem of inference for Azzalini's skew-normal distribution. J Appl Stat 2000;27:859–70. [52] Polachek SW, Robst J. Employee labor market information: comparing direct world of work measures of workers' knowledge to stochastic frontier estimates. Lab Econ 1998;5:859–70. [53] Raynal-Villasenor JA, Salas DJ. Multivariate extreme value distributions in hydrological analyses. In: Rodda JC, Matalas NC, editors. Water for the future: hydrology in perspective, Rome, Italy; 1987. p. 111–9. [54] Regione Piemonte – Direzione Regionale Servizi Tecnici di Prevenzione, Rapporto sull'evento alluvionale del 13–16 Ottobre 2000 – Parte I, 2000. www.regione. piemonte.it/alluvione/dwd/rapp 1.pdf. [55] Renard B, Lang M. Use of a Gaussian copula for multivariate extreme values analysis: some case studies in hydrology. Advan Water Resour 2007;30:897–912. [56] Sahu SK, Dey DK, Branco MD. A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat 2003;31:129–50. [57] Salmon F. Recipe for disaster: the formula that killed Wall Street. Wired 2009:17 www.wired.com/techbiz/it/magazine/17-03/wp_quant. [58] Salvadori G, De Michele C. Analytical calculation of storm volume statistics with Pareto-like intensity–duration marginals. Geophys Res Lett 2004;31:L04502. [59] Salvadori G, De Michele C. Frequency analysis via copulas: theoretical aspects and applications to hydrological events. Water Resour Res 2004;40:W12511. [60] Salvadori G, De Michele C. Statistical characterization of temporal structure of storms. Adv Water Resour 2006;29:827–42. [61] Salvadori G, De Michele C, Kottegoda NT, Rosso R. Extremes in nature: an approach using copulas. Water science and technology librarySpringer; 2007.

T. Ghizzoni et al. / Advances in Water Resources 33 (2010) 1243–1255 [62] Samaniego L, Bárdossy A, Kumar R. Streamflow prediction in ungauged catchments using copula-based dissimilarity measures. Water Resour Res 2010;46:W02506, doi:10.1029/2008WR007695. [63] Serinaldi F, Bonaccorso B, Cancelliere A, Grimaldi S. Probabilistic characterization of drought properties through copulas. Phys Chem Earth 2009;34(10–12): 596–605, doi:10.1016/j.pce.2008.09.004. [64] Scott D. On optimal and data-based histograms. Biometrika 1979;66:605–10. [65] Shannon C.E. A mathematical theory of communication. Bell System Tech J 1948; 27: 379-423 and 623-656.

1255

[66] Sklar A. Fonctions de repartition a n dimensions et leurs marges. Publ Inst Stat Univ Paris 1959;8:229–31. [67] Sklar A. Random variables, distribution functions, and copulas — a personal look backward and forward. In: Rüschendorff L, Schweizer B, Taylor M, editors. Distribution with fixed marginals and related topics. Hayward, CA: Institute of Mathematical Statistics; 1996. p. 1–14. [68] Thompson KR, Shen Y. Coastal flooding and the multivariate skew-t distribution. In: Genton MG, editor. Skew-elliptical distributions and their applications: a journey beyond normality. Boca Raton: Chapman & Hall; 2004. p. 243–58.