Accepted Manuscript
Copula Theory and Probabilistic Sensitivity Analysis: Is there a Connection? Elmar Plischke, Emanuele Borgonovo PII: DOI: Reference:
S0377-2217(19)30292-9 https://doi.org/10.1016/j.ejor.2019.03.034 EOR 15739
To appear in:
European Journal of Operational Research
Received date: Revised date: Accepted date:
20 April 2018 20 March 2019 21 March 2019
Please cite this article as: Elmar Plischke, Emanuele Borgonovo, Copula Theory and Probabilistic Sensitivity Analysis: Is there a Connection?, European Journal of Operational Research (2019), doi: https://doi.org/10.1016/j.ejor.2019.03.034
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • Establishes a link between copula theory and probabilistic sensitivity analysis
• Introduces new graphical visualization tools
CR IP T
• Paves the way to new estimators for global sensitivity measures
AC
CE
PT
ED
M
AN US
• All the insights can be obtained from a Monte Carlo sample
1
ACCEPTED MANUSCRIPT
Copula Theory and Probabilistic Sensitivity Analysis: Is there a Connection?
a Institut
CR IP T
Elmar Plischkea , Emanuele Borgonovob,∗ für Endlagerforschung, Technische Universität Clausthal, 38678 Clausthal-Zellerfeld, Germany b Department of Decision Sciences, Bocconi University, 20136 Milano, Italy
Abstract
AN US
Copula theory is concerned with defining dependence structures given appropriate marginal distributions. Probabilistic sensitivity analysis is concerned with quantifying the strength
of the dependence among the output of a simulator and the uncertain simulator inputs. In this work, we investigate the connection between these two families of methods. We define four classes of sensitivity measures based on the distance between the empirical copula and the product copula. We discuss the new classes in the light of transformation invari-
M
ance and Rényi’s postulate D of dependence measures. The connection is constructive: the new classes extend the current definition of sensitivity measures and one gains an of un-
ED
derstanding which sensitivity measures in use are, in fact, copula-based. Also a set of new visualization tools can be obtained. These tools ease the communication of results to the modeler and provide insights not only on statistical dependence but also on the partial be-
PT
haviour of the output as a function of the inputs. Application to the benchmark simulator for sensitivity analysis concludes the work.
CE
Keywords: Robustness and Sensitivity Analysis; Simulation; Copula Theory
1. Introduction
AC
Computational modelling is increasingly supporting decision making in business and policy making. According to the recent Blackett Review (Government Office for Science, 2018),
modelling can help drive performance improvement of products and services, achieve productivity and efficiency gains, and create new innovative smart products and services. Increased computing power allows analysts to build simulators of notable complexity that can accommodate a variety of industry problems. The resulting simulator is often pro∗ Corresponding
author Email addresses: elmar.plischke@tu-clausthal-de (Elmar Plischke),
[email protected] (Emanuele Borgonovo)
Preprint submitted to Elsevier
March 27, 2019
ACCEPTED MANUSCRIPT
grammed in a computer code and the dependence of the quantities of interest on the simulator inputs has typically no closed form expression. In these situations, sensitivity analysis plays a crucial role in helping the analyst to understand the model behavior and the manager in
CR IP T
gaining confidence in the model results. Under uncertainty, the analyst’s goal is frequently to understand the strength of the statistical dependence between the simulator output and the simulator inputs. This information helps investigators in identifying areas in which addi-
tional data collection or modelling efforts are needed. In this context, the use of probabilistic (or global) sensitivity measures is recommended not only by scientific investigators (Saltelli
and Tarantola, 2002), but also by international agencies and institutions such as the US En-
AN US
vironmental Protection Agency, (US EPA, 2009), the US Nuclear Regulatory Commission
(US NRC, 2011) and the Intergovernmental Panel on Climate Change (Mastrandrea et al., 2010).
Several probabilistic sensitivity measures have been proposed for simulation experiments over the years (see Borgonovo and Plischke (2016) for a recent review). They can be divided into the categories of regression-based (Helton et al., 2006), variance-based (Saltelli and
M
Tarantola, 2002) and distribution-based methods (Borgonovo and Tarantola, 2008). This last class of methods has appeared more recently on the ground of obtaining sensitivity measures
ED
that are transformation invariant and possess the nullity-implies-independence property. This property is the probabilistic sensitivity equivalent of Rényi’s postulate D for measures of statistical dependence (Rényi, 1959, p. 443): A measure of statistical dependence has a
PT
null value if and only two random variables X and Y are independent. In our context, this is equivalent to say that a null value of a probabilistic sensitivity measure between Y (the
CE
simulator output) and X (the simulator input) reassures the analyst that Y does not depend on X. This allows for an interpretation of the probabilistic sensitivity measure as a measure of dependence.
AC
Copulas are an indispensable tool towards modelling statistical dependence (Nelsen,
2006; Bedford and Cooke, 2002) and are widely applied in operations research (see Section 2 for a synthetic review). In fact, previous works have proposed the use of dependence measures coming from copula theory as sensitivity measures in simulation experiments (Wei et al., 2014). Specifically, these authors discuss one type of density-based sensitivity measure that can be read in a copula theory framework. However, whether this connection is restricted to a single probabilistic sensitivity measure or can be extended to others is an open research question. In this work, we propose a systematic investigation that brings these two research streams
3
ACCEPTED MANUSCRIPT
together. Our analysis benefits from the fact that several (but not all) probabilistic sensitivity measures can be defined as mean distance operators between the conditional and marginal simulator output distributions. We will name this general approach to probabilistic sensi-
CR IP T
tivity measures common rationale, henceforth (Borgonovo et al., 2016). We introduce the notion of copula-based dependence measures as the distance between the independence copula and the bivariate copula associated with both the simulator output and the simulator input
of interest. We show that this definition not only includes all sensitivity measures in the com-
mon rationale, but extends beyond to accommodate new dependence measures. Therefore, several copula dependence measures turn out to be suitable probabilistic sensitivity mea-
AN US
sures. We examine these sensitivity measures also with regard to transformation invariance and Renyi’s postulate D.
While these implications broaden some aspects concerning the interpretation of probabilistic sensitivity measures, we investigate also whether the copula framework can be the ground of computational and result communication advantages. In fact, efficient estimation of global sensitivity measures is a known challenge (Saltelli, 2002). Recently, the given data
M
methodology has enabled the estimation of sensitivity measures contained in the common rationale with just a once-through strategy of Monte Carlo evaluations (Plischke et al., 2013;
ED
Strong and Oakley, 2014). The copula framework allows us to introduce a set of consistent estimators that maintain this computational cost. Moreover, the framework permits us to introduce a set of new graphical visualization
PT
tools to facilitate the interpretation and communication of results. From the graphical tools the analyst obtains a way for easily communicating not only the key-drivers of uncertainty,
CE
but also the trend of the simulator response. Furthermore, the framework is not restricted to simulators with deterministic response, but is applicable to simulators with stochastic response as well.
AC
The remainder of the work is organized as follows. Section 2 overviews the related lit-
erature on copula and probabilistic sensitivity analysis and introduces some formal elements and preliminary notation. Section 3 presents a framework for defining sensitivity measures based on copula theory. Section 4 discusses copula-based estimators. Section 5 discusses result visualization tools. Section 6 highlights the findings also in light of Rényi’s postulate D. Section 7 presents an application to the Level E code, the benchmark simulator for sensitivity studies. Section 8 concludes the work.
4
ACCEPTED MANUSCRIPT
2. Probabilistic Sensitivity Analysis and Copula Theory: Synthetic Overviews We divide the discussion into two blocks, each subsection consisting of a literature review, followed by the corresponding theoretical setup. In the first part, we review works on prob-
CR IP T
abilistic sensitivity analysis. In the second part, we address works making use of copulas to support decision making.
2.1. Probabilistic Sensitivity Analysis: Review and Formal Setup
Probabilistic (or global) sensitivity methods are the gold standard for studying the simulator
input-output behavior under uncertainty and have been increasingly studied since the early
AN US
1990’s. Works such as Saltelli and Marivoet (1990); Helton (1993) are among the first to
propose dependence measures such as Pearson’s, Spearman’s rank and the partial rank correlation coefficients to study the strength of the dependence between uncertain simulator inputs and output. Campolongo and Saltelli (1997) observe that the linearity assumption underlining these dependence measures leads to a shortcoming when the simulator inputoutput mapping is non-linear and non-additive, and advocate the use of variance-based sen-
M
sitivity measures. Indeed, these are also independently proposed in Iman and Hora (1990), Sobol’ (1993) and Wagner (1995) and can be seen as a fresh take at Pearson’s correlation
ED
ratio (Pearson, 1905). Since then, variance-based sensitivity measures have been intensively investigated — see Borgonovo et al. (2018b) for a recent account. When regarded as measures of statistical dependence, none of the techniques discussed
PT
thus far complies with postulate D in Rényi’s requirements for measures of statistical dependence: their null value does not imply that the input and the output are independent, see also
CE
Ebrahimi et al. (2014, p. 35). This has led to the introduction of the class of distributionbased (or moment-independent) sensitivity measures (Borgonovo, 2007; Da Veiga, 2015; Gamboa et al., 2015). These measures, when based on operators that consider the discrep-
AC
ancy between densities or cumulative distribution functions, do comply with such a postulate. If, in addition, the operator is transformation invariant, then the corresponding global sensitivity measure becomes invariant for monotonic transformations. The work of Baucells and Borgonovo (2013) discusses in detail the decision-analysis implications of transformation invariance. Transformations are often employed by analysts to accelerate numerical convergence in the estimation of probabilistic sensitivity measures. Rank transformation may improve linear regression fit (see Conover and Iman (1981), among others) and is advantageous in association with the use of non-parametric sensitivity approaches (Kleijnen and Helton, 1999).
5
ACCEPTED MANUSCRIPT
(Borgonovo et al., 2014) show that using transformation invariant probabilistic sensitivity allows one to exploit the computational advantage, while avoiding scaling interpretation problems.
CR IP T
Transformations can be regarded as a conceptual link between probabilistic sensitivity analysis and copula theory. In fact, a rank transformation may turn a measure of statistical dependence (e.g., the Pearson correlation coefficient commonly used as a probabilistic sen-
sitivity measure) into a copula-based dependence measure (the Spearman rank correlation coefficient).
The setup of simulation experiments foresees that a simulator computes one or more
AN US
quantities of interest in a given problem. For notation simplicity, we shall consider a single simulator output (Y, henceforth). One writes
Y = g(X) + ε(X)
(1)
where X = (X1 , . . . , Xi , . . . , Xk ) ∈ Rk is the vector of simulation inputs, g(X) is a determin-
istic input-output mapping and ε(X) is a zero-mean error term dependent on X. If this term
M
is absent, the simulator is deterministic. The distinction between deterministic or stochastic simulators is immaterial for the rest of our work. We also assume that the mapping in (1) is
ED
represented by a set of equations encoded in a computer program and its analytical expression is not known explicitly. We also denote with (X, B(X), PX ) the probability space associated with X, where PX is the simulator inputs distribution, with cumulative distribution
PT
function (cdf) FX (x). Similarly, let (Y, B(Y), PY ) denote the simulator output probability space and FY (y) denote the cdf of Y.
CE
Consider that the analyst is interested in the influence of factor j, characterized through the random variable X j . The pair (X j , Y) is then the input to all sensitivity measures. A first relevant class of probabilistic sensitivity indices is represented by the so-called non-
AC
parametric measures. If a linear regression surface accurately fits the input-output mapping, then natural measures of statistical dependence between Y and X j are the standardized regres-
sion coefficients (Kleijnen and Helton, 1999), which, under the above assumptions, coincide with Pearson’s correlation coefficient of Y and X j (Pearson, 1901) %(X j ) =
Cov(Y, X j ) , σi σY
(2)
where Cov(Y, X j ) is the covariance between Y and X j and σY , σ j are the standard deviations of Y and X j , respectively. Further nonparametric methods include linear regression after logor rank-transforming the data.
6
ACCEPTED MANUSCRIPT
With PY being the marginal distribution of Y, let PY|X be the distribution after one receives information about X. Let d(·, ·) denote an operator between distributions with the requirement that d(P, P) = 0. The function ζ d (x) = d(PY , PY|X=x ) is called an inner operator
CR IP T
(Borgonovo et al., 2016). It quantifies the effect of knowing that X = x as the separation between PY|X and PY . Then, the expectation of ζ d (x) with respect to the marginal distribution of X defines the global sensitivity measure of X with respect to Y based on operator ζ d (·) (Borgonovo et al., 2016):
ξd = E[ζ d (X)] = E[d(PY , PY|X )].
(3)
AN US
The requirement ζ d (PY , PY ) = 0 reassures one that ξd is null if X and Y are independent.
Borgonovo et al. (2016) call (3) common rationale. The reason is that several of the currently used sensitivity measures are, in fact, particular instances of the functional ξd that measures the separation between the unconditional and the conditional model output distributions. To illustrate, let E[Y], E[Y|X] and V[Y] denote the unconditional and conditional expected values of Y, and the variance of Y. Setting ζ v (X) =
(E[Y]−E[Y|X])2 V[Y]
and averaging, we
(4)
ED
M
obtain the variance-based sensitivity measure (Saltelli and Tarantola, 2002) h i 2 v E (E[Y] − E[Y|X]) V[E[Y|X]] η = E ζ (X) = = , V[Y] V[Y]
where the last identity follows from the law of total variance. In case X is an individual simulator input, η equals Pearson’s correlation ratio (Pearson, 1905) (correlation fraction
PT
in (Ebrahimi et al., 2014)) which is identical to the well know first order variance-based sensitivity measures Plischke (2012).
CE
For pdf-based sensitivity measures, assuming that Y is absolutely continuous, let fY (y)
and fY|X (y) denote the unconditional and conditional densities, respectively. Consider then to quantify their separation using any member of the family of Csiszár f -divergences (Csiszár,
AC
2008) with a convex function h(·) satisfying h(1) = 0. One writes "Z ! # fY (y) ξCsiszar = E fY|X (y) h dy , fY|X (y) Y
(5)
which defines a family of density-based sensitivity measures discussed, among others, in (Borgonovo et al., 2014; Da Veiga, 2015; Rahman, 2016). In fact, selecting h(t) = 12 |t − 1|
and h = t log(t) (see Table 1 in Rahman (2016)), we obtain inner operators Z Z 1 1 f (y) − f (y) dy and ζ KL (X) = ζ L (X) = fY|X (y) log fY|X (y) − log fY (y) dy Y Y|X 2 Y Y
based, respectively, on the L1 -norm and the divergence of (Kullback and Leibler, 1951) 7
ACCEPTED MANUSCRIPT
between densities. The corresponding global sensitivity measures are "Z # "Z # 1 δB = 2 E fY (y) − fY|X (y) dy , δKL = E fY|X (y) log fY|X (y) − log fY (y) dy Y
(6)
Y
CR IP T
which have been proposed in Borgonovo (2007) and Critchfield and Willard (1986), respec-
tively. Alternatively, consider taking Renyi’s α-divergence (Ebrahimi et al., 2014; Rényi, 1961), d(PY , PY|X ) =
1 log α−1
Z
α fY|X
Y
fYα−1
dy, with α , 1, α > 0,
AN US
as inner operator in (3). One obtains the family of global sensitivity measures: Z fα 1 Y|X . ξRenyi,α = E dy log α−1 α−1 Y fY
(7)
(8)
For cdf-based sensitivity measures, one can consider an operator between cdfs, ζ = ζ(FY , FY|X ). If ζ KS (FY , FY|X ) denotes the Kolmogorov–Smirnov distance, one obtains the probabilistic sensitivity measure proposed in Baucells and Borgonovo (2013)
ζ KS (X) = sup FY (y) − FY|X (y) .
(9)
y∈Y
M
βKS = E[ζ KS (X)],
ED
Using the Cramér–von Mises distance, one obtains Z βCvM = E[ζ CvM (X)], ζ CvM (x) =
Y
FY (y) − FY|X (y) 2 dFY (y),
(10)
a probabilistic sensitivity measures recently introduced in Gamboa et al. (2015).
PT
Some aspects are worth underlining. First, several of the sensitivity measures defined above are studied in the literature as measures of statistical dependence. For instance, Ebrahimi et al. (2014), in their work that characterizes alternative forms of partial depen-
CE
dence in statistics, rely on Pearson’s correlation coefficient (2), Pearson’s correlation ratio
(4), and on Spearman’s rank correlation coefficient. They also define a further indicator,
AC
called the mutual information index, written as (Ebrahimi et al., 2014, p. 59): δ2 = 1 − e−2δMu ,
(11)
where δMu is the mutual information between X and Y: ! Z Z fXY (y, x) δMu = fXY (y, x) log dydx. fY (y) fX (x) X Y We note that δMu coincides with δKL in equation (6), so that δ2 = 1 − e−2δKL . Thus, the mu-
tual information index can be regarded as the transformation of a global sensitivity measures that falls in the common rationale of equation (3). Second, not all probabilistic sensitivity
8
ACCEPTED MANUSCRIPT
measures in use fall in the above-mentioned rationale, e.g., Pearson’s or Spearman’s correlation coefficients. Third, for sensitivity measures falling in the common rationale, it is intuitive that their properties depend on the inner operator. In particular, regarding Renyi’s
CR IP T
Postulate D, we have the following. This postulate reads : ξ = 0 ⇔ X and Y are independent. It can be shown (Borgonovo et al., 2018a) that a global sensitivity measure in the
rationale (3) complies with Renyi’s postulate D if and only if the inner operator satisfies d(P, Q) = 0 ⇐⇒ P = Q. Divergences as well as distances between distributions satisfy
this property; specifically, this condition is satisfied by any separation measurement which is at least a pre-metric, a functional satisfying d(P, Q) ≥ 0, d(P, P) = 0.
AN US
Regarding transformation invariance, let T = t(Y) be a monotonic transformation of Y.
Consider then ξXT , the global sensitivity measure of X on the transformed output. Then, a global sensitivity measure ξX is transformation invariant if it assumes the same value for the transformed and the original simulator output (denoted by ξXY ), that is, if ξXY = ξXT for all monotonic transformations t and for all X. Now, copulas are by construction transformation invariant. We may then argue that, if we were able to define global sensitivity measures
M
starting from a copula-theory rationale, we would define sensitivity measures which are monotonic transformation invariant. Obtaining a general formulation is one of the purposes
ED
of this paper — see Section 3.
2.2. Copulas: Review and Formal Setup
PT
Copulas are widely used to define dependence structures. Even restricting attention to the literature in operational research, the field is extremely vast and, due to space constraints, we cannot claim exhaustiveness. However, we propose a brief overview to let one glimpse the
CE
variety of applications. A first set of applications is represented by the aggregation of expert opinions, see (Jouini and Clemen, 1996) and (Hammitt and Zhang, 2013), among others. An
AC
application to scenario uncertainty in risk management is presented in (Palomo et al., 2007). Works such as Abbas (2009, 2013); Abbas and Sun (2015) address the problem of building multiattribute utility functions from marginal utility assessment while considering dependence among the attributes. In particular, Abbas (2009) shows that the functional forms of Sklar’s copulas can be used to generate copula utility functions. Applications in finance and financial econometrics are numerous and range from portfolio and credit risk management, to default probability estimation to the modelling of financial contagion (Glasserman and Li, 2005; Gupton et al., 1997; Sak et al., 2010; Kakouris and Rustem, 2014; Ye et al., 2012; Jayech, 2016; Ye et al., 2017). In reliability analysis, (Navarro et al., 2015) use copulas to
9
ACCEPTED MANUSCRIPT
describe the statistical dependence between lifetimes for coherent systems with dependent failures. Copulas are used to create the required dependence structure among the involved uncertainties in modelling with event and decision trees in Clemen and Reilly (1999) and
CR IP T
Wang et al. (2016). Biller (2009) and Biller and Corlu (2011) are representative works which make use of copulas to reproduce the correct dependence structure for the multivariate time-series inputs of a stochastic simulation. For greater details and historical remarks, we refer to Nelsen (2006) and Durante and Sempi (2015).
Regarding the formal setup, a fundamental result in copula theory is Sklar’s theorem.
Theorem 1 (Sklar’s Theorem, Sklar (1959); Nelsen (2006)). For a bivariate random vec-
AN US
tor (X, Y) there exists a copula C which satisfies F XY (x, y) = C(F X (x), FY (y)) where F X and FY are the marginal cdf of X and Y and F XY is the joint cdf of the pair (X, Y). If F X and FY are continuous, then C is unique. If the functional inverses of the marginal distributions exist, the copula is uniquely defined via C(u, v) = F XY (F X−1 (u), FY−1 (v)), (u, v) ∈ [0, 1]2 .
Setting u = F X (x) and v = FY (y), C(u, v) is a joint distribution function with uniform
M
marginal distributions U, V ∼ U[0, 1]. Hence, the copula C(u, v) satisfies C(u, 0) = 0 =
C(0, v) and C(u, 1) = u, C(1, v) = v.
PT
ED
The copula density associated with C(F X (x), FY (y)) is defined by c(F X (x), FY (y)) = fXY (x, y) ∂2 C and the following equality holds c(u, v) = ∂u∂v (u, v) (Nelsen, 2006). We regfX (x) fY (y) ister c(u, v) ≥ 0 for all u, v ∈ [0, 1] (Durante and Sempi, 2015). Under absolute continuity, RuRv one obtains the copula C(u, v) = 0 0 c(s, t) dt ds through double integration of the copula density.
CE
The notion of conditional copula plays an important role in this paper. The conditional R1 copula is defined as C(v|u) = 0 c(u, v) dv = ∂C(u,v) ∂u . Thus, it is the marginal integral of the
copula density with respect to just one of the two variables involved. Correspondingly, the conditional copula density is simply c(u|v) = c(u, v) = c(v|u), see Nelsen (2006) for greater
AC
details. Two random variables X and Y are called comonotone if U = V, and contramonotone if U = 1 − V. The Fréchet-Hoeffding copulas are associated with these two cases:
Comonotonicity is modeled by M(u, v) = min{u, v} and contramonotonicity by W(u, v) =
max{u + v − 1, 0}. All bivariate copulas C are bounded by W(u, v) ≤ C(u, v) ≤ M(u, v) (Fréchet–Hoeffding bounds).
Finally, we note that in this work we regard copulas from a different perspective. Given realizations of random variables X1 , X2 , . . . and Y, we want to use copulas to create global sensitivity measures, that is, indicators that quantify numerically the strength of the dependence between Xi , i = 1, 2, . . . , and Y. In particular, we shall use these indicators to discrimi10
ACCEPTED MANUSCRIPT
nate the degree of statistical dependence of Y on Xi from the degree of statistical dependence of Y on X j , j , i. In this way, we can use the indication provided by a copula-based sensitivity measure to rank model inputs and identify the ones that deserve further attention. A
CR IP T
formal bridge is proposed in the next section.
3. Bridging Probabilistic Sensitivity Measures and Copula Theory
In this section, we propose a series of definitions introducing a formal connection between copula theory and probabilistic sensitivity analysis. Specifically, we identify four niveaus of probabilistic sensitivity measures that can be created based on copula-theory, starting with
AN US
traditional concordance measures. For each niveau, we discuss the corresponding probabilistic sensitivity measures and determine whether they fall into the common rationale in (3). Note that the niveaus differ with regard to the complexity of the underlying assumptions
on X and Y. An overall result for Rényi’s postulate D is proposed in Section 3.5. We observe that all probabilistic sensitivity measures presented in the following subsections depend on the copula C, but not explicitly on the pair (X, Y).
M
3.1. Sensitivity Measures and Copula Theory: Niveau 0 The copula framework has allowed researchers to introduce several measures of association
ED
and concordance, which have become popular over time. Among the two most well known are Spearman’s %∗ and Kendall’s τ∗ (Schreyer et al., 2017). Spearman’s %∗ (Spearman, 1904)
PT
equals Pearson’s correlation coefficient between U = F X (X) and V = FY (Y). By the Hoeffding formula for the covariance of two variables on Cov(U, V), one obtains the well known
CE
relation
Cov(U, V) % = √ = 12 V[U] V[V] ∗
where E[U] = E[V] =
1 2
ZZ
and V[U] = V[V] =
[0,1]2
1 12 .
C(u, v) du dv − 3,
(12)
The works of (Scarsini, 1984; Liebscher,
∗
AC
2014) show that Spearman’s % can be also written as %∗ = 3
ZZ
(2u − 1)(2v − 1) dC(u, v) = 12
ZZ
uv dC(u, v) − 3,
(13)
a rewriting that evidences the existence of a Monte Carlo estimator, as we discuss later on. Spearman’s correlation coefficient has the following properties: it is comprised within −1
and 1, with values equal to 1 for the comonotonic copula M and −1 for the contramonotonic copula W.
Kendall’s τ∗ is a measure of rank correlation obtained by comparing the concordance probability P(X< x, Y< y ∨ X > x, Y > y) to the discordance probability P(X< x, Y > y ∨ X > 11
ACCEPTED MANUSCRIPT
x, Y< y) (Nelsen, 2006) given by τ∗ = 4
ZZ
[0,1]2
C(u, v) dC(u, v) − 1.
(14)
Kendall’s τ∗ has had limited application as a sensitivity measure in simulation experiments,
CR IP T
as opposed to Pearson’s and Spearman’s correlation coefficients, which were among the first probabilistic sensitivity measures in use.
Variance-based sensitivity measures cannot be encompassed in any copula framework. Nonetheless, due to their importance, we discuss a way to obtain a corresponding copula-
AN US
version. This operation leads to a fresh look at a set of well known results in copula-theory.
Proposition 2. The rank correlation ratio is given in copula form by !2 Z 1 Z 1 V[E[V|U]] η∗ = = 12 C(v|u) dv du − 3. V[V] 0 0
(15)
Note that the non-ranked correlation ratio η of (4) is invariant with respect to affine linear transformations of Y. Replacing X with U and Y with V creates a modified ratio that is
M
invariant with respect to general monotonic transformations. There is a direct link between
ED
η∗ and the Kruskal–Wallis test (Kleijnen and Helton, 1999) which satisfies H = (n − 1)b η∗ , 2 3 PP v p − 1 is an estimator of (15) obtained using a piecewise constant where b η∗ = P−1 m=1 2¯ h h p−1 p ii b V U ∈ approximation v¯ p = E with a partition of size P. This estimator simplifies P , P the one used in Pearson (1905) for η because E[V] and V[V] are known for the uniform
PT
output distribution V.
Thus, the rank-transformed sensitivity measures in Saltelli and Sobol’ (1995) are, indeed, the rank correlation ratios defined here, whose connection with copula-theory has not been
CE
revealed before. The fact that these sensitivity measures might be associated with a test for insignificant contributions (the Kruskal–Wallis test) has not yet caught the attention of
AC
investigators.
The next example shows that none of the sensitivity/dependence measures discussed in
this section complies with Renyi’s postulate D. Example 3. Let Y = X1 · X2 · X3 , with Xi ∼ N(0, 1) standard normal and independent. Then %(Xi ) = %∗ (Xi ) = τ∗ (Xi ) = η(Xi ) = η∗ (Xi ) = 0 for all inputs Xi . However, the conditional distributions of Y differ from the unconditional distribution whenever we fix Xi , i = 1, 2, 3.
3.2. Sensitivity Measures and Copula Theory: Niveau 1 Example 3 shows that none of the proposed measures of dependence at niveau 0 satisfies Rényi’s postulate D. We then investigate alternative setups (called niveaus 1, 2 and 3) with 12
ACCEPTED MANUSCRIPT
the aim of creating copula-based sensitivity measures that actually comply with this postulate. A general result will be presented in Section 3.5. Definition 4. Let d(C, D) be a pre-metric between copulas C and D. Then, we define the the product copula.
CR IP T
global sensitivity measure associated with copula C as γd∗ = d (C, Π) where Π(u, v) = uv is
The above definition is quite general, as the class of pre-metrics comprises both distances and divergence operators. If we use the L p norm, we obtain sZ Z p
[0,1]2
|C(u, v) − D(u, v)| p du dv,
AN US
d p (C, D) =
with p ≥ 1 and d∞ = max[0,1]2 |C(u, v) − D(u, v)|. Then we define the (normalized) probabilistic sensitivity measure based on the L p distance between copulas as γ∗p =
d p (C, Π) . d p (M, Π)
(16)
Some notable dependence measures falling under Niveau 1 are summarized in the next
M
example.
ED
Example 5. The Sigma Index of Schweizer and Wolff (1981), ZZ s∗ = 12 |C(u, v) − uv| du dv,
(17)
[0,1]2
PT
Hoeffding’s Omega Statistic (Hoeffding, 1948; Blum et al., 1961; Schmid et al., 2010), s ZZ w∗ =
90
[0,1]2
(C(u, v) − uv)2 du dv,
(18)
and the normalized Star Discrepancy (Schweizer and Wolff, 1981; Niederreiter, 1992; Dick
CE
and Pillichshammer, 2014) D∗ = 4 max |C(u, v) − uv| .
(19)
[0,1]2
AC
are particular cases of Definition 4 and (16), with p = 1, 2, and ∞, respectively. Note that the star discrepancy is normally presented without the normalization factor. Such factors are obtained by considering the distance between the product copula and the comonotone copula. Proposition 6.
1. For p < ∞ the distance of the independent copula to the co- or contra-
monotone copula is given by
−2/p
d p (M, Π) = d p (W, Π) = (p + 1) and lim p→∞ d p (M, Π) =
1 4
= d∞ (M, Π). 13
2p + 1 p
!−1/p
(20)
ACCEPTED MANUSCRIPT
2. For all copulas C, d p (M, Π) ≥ d p (C, Π) and d p (W, Π) ≥ d p (C, Π). 3. γ∗p ∈ [0, 1] and γ∗p = 1 in case of co- or contra-monotone dependence, γ∗p = 0 in case of independence.
CR IP T
The family γ∗p of dependence measures in Definition 4 (niveau 1, henceforth) is based on bivariate cdfs. However, the γ∗p measures do not necessarily fall into the common ra-
tionale discussed in (3). The reason is that Definition 4 does not foresee the expectation
with respect to the marginal distribution of X. However, in the next section, we introduce a class of copula-based sensitivity measures that fall into the common rationale of probabilistic
AN US
sensitivity measures in (3).
3.3. Sensitivity Measures and Copula Theory: Niveau 2 Definition 7. Given two copulas C, D and a distance d(·, ·) between two one-dimensional cdfs, we define ζ(u; C, D) = d(C(·|u), D(·|u)), the conditional copula distance given u ∈
[0, 1]. Setting D = Π (and therefore D(v|u) = v), the conditional copula sensitivity measure
M
associated with the copula C is derived by averaging over all U and normalizing, E[ζ(U; C, Π)] . (21) E[ζ(U; M, Π)] qR Using any L p norm, one has ζ p (u; C, D) = p [0,1] |C(v|u) − D(v|u)| p dv. For the L∞ norm,
ED
β∗ =
ones defines ζ∞ (u; C, D) = maxv∈[0,1] |C(v|u) − D(v|u)|. The renormalization constant for β∗p
PT
is given by
E[ζ p (U; M, Π)] =
Z
0
1
q
1 p+1
p
(1 − u) p+1 + u p+1 du.
(22)
This quantity is generally not available analytically. However, we find analytical results for
AC
CE
p = 1, p = 2 and p = ∞. Specifically, we have E[ζ1 (U; M, Π)] = √ √ 2 √3+3 1 1 3 + log . For p = ∞ we obtain 6 4 2 3−3 lim E[ζ p (U; M, Π)] =
p→∞
Z
1 3
1
0
max{u, (1 − u)} du =
and E[ζ2 (U; M, Π)] =
3 . 4
(23)
Instead of the norm itself, as in (21), one might also use the pth power of a p-norm. In this
case, the normalization constants are available analytically and the normalized sensitivity measures are given by βp =
E[ζ pp (U; C, Π)] E[ζ pp (U;
M, Π)]
=
(p + 1)(p + 2) 2
ZZ
|C(v|u) − v| p dv du.
(24)
For p = 1, β∗1 and β1 coincide. The next proposition shows that the dependence/sensitivity measures are normalized in [0, 1]. 14
ACCEPTED MANUSCRIPT
Proposition 8. For all copulas C and for all p ∈ 1, 2, . . . , β∗p ∈ [0, 1] and βp ∈ [0, 1]. Example 9. Setting p = ∞ in (21) leads to a global sensitivity measure based on the β∗∞ =
4 3
Z
1
CR IP T
Kolmogorov–Smirnov distance,
max |C(v|u) − v| du,
0 v∈[0,1]
which differs by the normalization factor
4 3
(25)
from the Kolmogorov–Smirnov sensitivity mea-
sure introduced in Baucells and Borgonovo (2013). Considering p = 2 in (24) we obtain the sensitivity measure based upon the Cramér–von Mises distance ZZ (C(v|u) − v)2 du dv β2 = 6
(26)
AN US
[0,1]2
discussed in Gamboa et al. (2015). These two global sensitivity measures fall under the common rationale.
There is a further link between the sensitivity measure β2 and copula theory. In fact, Dette et al. (2012) introduce this measure under the name regression dependence. Furthermore, it is discussed in a sensitivity context by Gamboa et al. (2015) where the renormalization measure (Yitzhaki, 2003).
1 6
is also derived. Moreover, β2 is a transformation invariant Gini
M
constant E[ζ22 (U; M, Π)] =
ED
3.4. Sensitivity Measures and Copula Theory: Niveau 3 In analogy with niveaus 1 and 2, we can now proceed by considering distances between
PT
copula densities and the independent copula density. Noting that the product copula density is given by the constant π = 1, we formulate the following definition.
CE
Definition 10. Let c(·, ·) be the density of copula C and d(·, ·) a distance between scalar
AC
values. We define the associated copula-density sensitivity measure as ZZ ∗ δd (c) = d(c(u, v), 1) du dv.
(27)
[0,1]2
To illustrate, setting d(c1 , c2 ) = |c1 − c2 | p one finds ZZ δ∗p (c) = |c(u, v) − 1| p du dv.
(28)
[0,1]2
We have δ∗d (π) = δ∗p (π) = 0. For a sensitivity measure in Equation (27) there may not RR exist a normalization factor of the type d(m(u, v), 1)du dv, because the density m of the
co-monotonic copula is a point measure along the diagonal, i.e., m(u, v) = δDirac (u − v)
and its integral may not be defined. The following example communicated by Durante and
Fernández-Sáchez shows that for p > 1 in (28) the normalization factor given by the distance to the co-monotonity copula density diverges. 15
ACCEPTED MANUSCRIPT
CR IP T
Example 11. For an integer m > 1 consider the copula Cm whose density cm is given by `−1 ` 2 m, if (u, v) ∈ [ m , m ] , ` = 1, . . . , m, cm (u, v) = (29) 0, otherwise.
Hence Cm has m uniforms blocks along the diagonal. A small computation shows δ∗p (cm ) = m m2 −m 1 p p p−1 · (m − 1) + · 1 = (m − 1) (m − 1) + 1 . For m → ∞, Cm approximates the 2 2 m m m comonotone copula, and we have δ∗p (cm ) → ∞ for p > 1 and δ∗1 (cm ) → 2 for p = 1.
Therefore, the only distance for which (28) is well-defined in case of co-monotone dependence is the L1 distance. For p = 1, the normalization with factor
1 2
yields the δ-measure
2014) as δ∗1
1 = 2
AN US
introduced by Borgonovo (2007), which can be written in copula density terms (Wei et al., ZZ
[0,1]2
|c(u, v) − 1| du dv.
(30)
Definition 10 can be made more general as follows. Consider a function h(·) such that h(1) = 0 and c 7→ h( 1c ), c > 0, is convex. Then, define the copula sensitivity measure by
(31)
[0,1]2
ED
[0,1]2
M
considering the Riemann-Stieltjes integral of h(c) with respect to the copula C, ZZ ZZ c(u, v)h(c(u, v)) du dv. h(c(u, v)) dC(u, v)= δh (c) = This definition then accommodates f -divergences, as the next example shows.
Example 12. The moment-independent importance measure in (30), the negentropy / mutual
PT
information measure (Joe, 1989) obtained via Kullback-Leibler discrepancy ZZ ZZ ∗ log c(u, v) dC(u, v) δKL = c(u, v) log c(u, v) du dv =
CE
(32)
[0,1]2
[0,1]2
(using the convention 0 log(0) = 0) and the sensitivity measure derived from the squared
AC
Hellinger distance, δ∗H = 1 −
ZZ
[0,1]2
p
c(u, v) du dv =
1 2
ZZ
[0,1]2
1−
2 p c(u, v) du dv,
(33)
are part of the class of sensitivity measures in Definition 10. These are all transformation invariant sensitivity measures which satisfy the common rationale (3).
3.5. Niveaus and Rényi’s Postulate D Our assumption that the metric d measuring the separation between copulas in the previous definitions is a distance leads to the following result. Theorem 13. Let Π = u · v denote the independent copula. We have: 16
ACCEPTED MANUSCRIPT
1. At niveau 1, C = Π ⇐⇒ d(C, Π) = 0. 2. At niveau 2, C = Π ⇐⇒ d(C(v|u), v) = 0 for all u ∈ [0, 1].
3. At niveau 3, C = Π ⇐⇒ d(c(u, v), 1) = 0 for all (u, v) ∈ [0, 1]2 .
CR IP T
That is, all sensitivity measures at niveau’s 1, 2 and 3 satisfy Rényi’s postulate D.
4. Copula-Based Estimators
Framing a probabilistic sensitivity analysis within copula theory allows one to take a fresh
look at estimators of global sensitivity measures. If the copula-based sensitivity measure can be written in the form γ =
ZZ
ZZ
AN US
∗
G(u, v)c(u, v) du dv =
G(u, v)dC(u, v)
(34)
where G(u, v) is an appropriate bivariate function, then we have a corresponding Monte Carlo (MC) estimator
b γ = n−1
n X
G(ui , vi )
(35)
i=1
M
given the realizations (ui , vi )i=1,...,n which follow the C copula distribution. One can directly note that this MC estimator being a Riemann sum is asymptotically consistent.
ED
Consider now niveau 0 dependence measures. The expressions in (13) and (14) are of the
PT
form (34). Thus, a MC estimator of Spearman’s correlation coefficient is obtained through P b %∗ = 12 1n ni=1 ui vi − 3 and a (bias-corrected) estimator for Kendall’s τ∗ is given by n 1 X b b τ = C(ui , vi ) − (n + 3) , 4 n−1 i=1 ∗
AC
CE
b i , vi ) is the empirical copula C b for the sample (ui , vi )i=1,...,n , where C(u
(36)
n
X b v) = 1 1{u j ≤ u}1{v j ≤ v}. C(u, n j=1
(37)
An algorithm for the empirical τ∗ which also works on tied data is found in Joe (2014). At niveau 1, another measure that can be estimated from the empirical copula is (19), as b i , v j ) − ui · v j . Dˆ ∗ = max C(u i, j=1,...,n
(38)
Hoeffding’s omega statistic (18) can be directly estimated from rank-transformed data. For its square, the estimator coincides with Warnock’s formula and has the expression n
wˆ ∗2 =
n
n
90 X X 45 X (1 − max{u j , u` })(1 − max{v j , v` }) − (1 − u2j )(1 − v2j ) + 10, 2 n j=1 n j=1 `=1 17
(39)
ACCEPTED MANUSCRIPT
see also the discussion in Dick and Pillichshammer (2014). Niveau 2 and 3 dependence measures cannot be estimated directly from a Monte Carlo sample, because their expression is not necessarily of the form (34). To compute them, one
CR IP T
needs access to the conditional copula, which is obtained from a directional derivative of the copula, and to the copula density, which is a second derivative of the copula. However, we show that it is possible to obtain given-data estimators via a Bernstein copula approach.
Definition 14 (Sancetta and Satchell (2004); Schmid et al. (2010)). Given C(u, v), the Bernstein copula of degree m is defined by m X m X
C
i j m, m
Pmi (u)Pm j (v)
AN US
Bm [C](u, v) =
i=1 j=1
(40)
with Bernstein polynomials Pm` (t) = m` t` (1 − t)m−` , t ∈ [0, 1]. The Bernstein conditional P P copula is given by Bm [C](v|u) = i j C mi , mj P0mi (u)Pm j (v) and the Bernstein copula den P P sity is defined by bm [C](u, v) = i j C mi , mj P0mi (u)P0m j (v).
M
For defining Bm [C](u, v), Bm [C](v|u) and bm [C](u, v) the original copula C(u, v) has to be
known on m × m regularly spaced grid-points. The functional basis of Bernstein polyno mk (u) = m Pm−1,k−1 (u) − Pm−1,k (u) . mials is closed under differentiation, so that P0mk (u) = dPdu
ED
b into (40) we obtain the Bernstein estimates Bm [C](u, b v), Plugging the empirical copula C b b v) for the copula, the conditional copula and the copula density Bm [C](v|u) and bm [C](u,
PT
which then are used to estimate transformation invariant sensitivity measures. Note that for b t) , t , C(t, b 1) holds, hence Bm [C] b is only an approximate copula. the empirical copula C(1,
CE
Example 15. Examples for Bernstein-copula based estimator for sensitivity measures are
AC
provided by:
• for niveau 1 b γ∗p (C)
• for niveau 2
b β∗p (C)
1 ! N N i j i j p p 2p + 1 1 X X 2 b Bm [C] = (p + 1) , − 2 , p N N N 2 i=1 j=1 N p 1p N N j i 1 X 1 X j b Bm [C] = E(ζ p (U; M, Π)) − , N i=1 N j=1 N N N
(41)
−1
p N N (p + 1)(p + 2) 1 X X b b j i − j , βp (C) = B [ C] m 2 N N N N 2 i=1 j=1 18
(42)
ACCEPTED MANUSCRIPT
• for niveau 3
p N N 1 X X b i j b bm [C] δ∗p (C) = 2 , − 1 N N N i=1 j=1
(43)
n
CR IP T
and using (35) on the Kullback–Leibler divergence (32), an estimator is given by 1X b b i , vi ). δ∗KL (C) = log bm [C](u n i=1
(44)
Theorem 16. Sensitivity estimators built through the empirical Bernstein copula which are
AN US
based on L p distances are consistent when Bernstein polynomial degree m, sample size n q and quadrature grid size N satisfy n, m, N → ∞ and m log nlog n → 0, N m • for niveau 1,
• for niveau 2 if the copula C is C2 ([0, 1]2 )
• for niveau 3 if the copula C is C3 ([0, 1]2 ).
The Bernstein degree m and the number of outer points N 2 influence the precision of the
M
estimates in Example 15. Theorem 16 shows that the polynomial degree m should be chosen √ larger than the root of the sample size, n. For large n, care must therefore be taken to use
ED
numerically stable algorithms to compute high-degree Bernstein polynomials, e.g. with the well-known de Casteljau algorithm for spline approximation (Prautzsch et al., 2002).
PT
Example 17. Let Y represent a random arrival time, modeled through Y ∼ Exp(λ = λ0 +X1 + X2 ) with λ0 = 0.05 where X1 and X2 follow independent generalized trapezoidal distributions
(van Dorp and Kotz, 2003) — see the Online Supplement. Given this setup, it is possible
CE
to obtain the marginal and conditional distributions of Y analytically (Borgonovo et al., 2018a). To test the estimators in Example 15, at niveau 1, γ1∗ (Xi ) = s∗ (Xi ), the Schweizer– Wolff sensitivity measure in (16) with p = 1, is used; at niveau 2 the Gini/Cramér/von Mises
AC
sensitivity measure β2 in (24) with p = 2 is used. At niveau 3 we use the Kullback-Leibler information, (32). Thus, estimators for the sensitivity measures are in (41) with p = 1, (42) with p = 2, and in (44). The analytical values of the sensitivity measures are γ1∗ (X1 ) = 0.249,
γ1∗ (X2 ) = 0.195, β2 (X1 ) = 0.0372, β2 (X2 ) = 0.0254, and δ∗KL (X1 ) = 0.0433 and δ∗KL (X2 ) = 0.0348. Figure 1 shows the results for numerical experiments at increasing sample sizes. Results are based on a quasi-Monte-Carlo sample (Dick et al., 2013) of size 215 , and the sensitivity measures are computed with multiples of 256 realizations. The dotted lines represent values computed from the analytical expressions presented in the Online Supplement. Different strategies of selecting the Bernstein polynomial degree were tested, using the 19
ACCEPTED MANUSCRIPT
Schweizer Wolff γ1∗
0.3
Gini Cramer von Mises β2⋄
0.04
0.04
0.28
0.24 0.22
0.035
Sensitivity
Sensitivity
0.03
0.03 0.025 0.02
0.025 0.2
X1 X2
0.015 0
1
2
Sample Size
3
0.02
0
1
2
Sample Size
× 104
3 × 104
CR IP T
Sensitivity
0.035 0.26
0.18
∗ Kullback Leibler δKL
0.045
0.01
0
1
2
Sample Size
3
× 104
Figure 1: Bernstein estimators as the sample size increases for the sensitivity measures γ1∗ = s∗ (17), β2 = 6βCvM
(26) and δKL (32) for the stochastic simulator in Example 17. Dotted lines represent analytical values. Solid lines use m = bn2/5 c, dashed lines m = bn1/2 c, and dash-dotted lines use m = bn1/3 c.
AN US
guidance of Theorem 16. Note that the cube root strategy as used in histogram binning and kernel density bandwidth selection is suboptimal. The grid size for estimating the sensitivity
measures was kept constant at N = 20, so that the empirical copulas are evaluated at 400 points. Estimating δ∗KL with a Monte-Carlo approach is also consistent.
5. Visualization: A Variety of Graphical Options
M
For improving result communication, analysts benefit from the use of graphical representations. Copula theory opens the door to a variety of graphical options that can ease the com-
ED
munication of sensitivity analysis results. A commonly used visualization method is a contour plot of the level sets of a copula. Refining this intuition, Genest and Boies (2003) proposed a graphical method called Kendall plot. In this plot, one compares the area P(UV < t)
PT
given by the t contour of the independent copula against an estimate from the empirical b i , vi ) and copula. In particular, one defines the (empirical) diagonal copula vector Ci = C(u
CE
the empirical independence vector Πi = ui vi using a sample pair (ui , vi )i=1,...,n . The diagonal copula is the empirical copula (37) evaluated at the available observations. The Kendall
plot is obtained by plotting the increasingly ordered values pi = Π(i) and zi = C(i) against
AC
each other. If U and V are independent then all values are located along the diagonal of the Kendall plot. While maintaining its spirit, we propose a slight modification of the Kendall plot by √ considering the quantities qi = pi instead of pi and 4(zi − pi ) = 4(zi − q2i ) instead of zi .
Hence we visualize the curve
kC : qi 7→ 4(zi − q2i ).
(45)
In this modification, the proportionality constant 4 stems from the maximum discrepancy, so that the modified plots for the co- and contra-monotone copulas M and W are bounded by 1 or −1, respectively. Also, independence is signaled by a flat line, so that the information 20
ACCEPTED MANUSCRIPT
originally condensed at the diagonal is magnified. It is interesting to note the expressions that the lines for the co- and contra-monotone copulas assume in this graph. Proposition 18. For the co-monotone copula, the modified Kendall curve in (45) is given by q 7→ 4q(1 − q); for the contra-monotone copula, the modified Kendall
curve is given by kW : [0, 21 ] → [−1, 0],
q 7→ −4q2 .
Modified Kendall Plot
0.02
Copula Distance and Rank CUSUNORO
0.01
0
0 x1
-0.02
x2
-0.01
C(u,v)-uv
AN US
-0.04 -0.06 -0.08 -0.1
-0.02 -0.03 -0.04
-0.12
x1
-0.05
-0.14 -0.16
CR IP T
k M : [0, 1] → [0, 1],
0
0.2
0.4 √
0.6
0.8
-0.06
1
0
0.2
0.4
0.6
0.8
1
u
M
uv
x2
Figure 2: Modified Kendall curves and copula distance plots with rank cusunoro curves, 1024 QMC realizations
ED
for the stochastic simulator of Example 17.
To illustrate, the left graph in Figure 2 shows the modified Kendall curves for the simulator inputs in Example 17. Both of them lie on the negative quadrant, signaling that both
PT
simulator inputs have an expected contra-monotonic effect on Y. Despite the rather low sensitivity, with this moderately sized sample of n = 210 the ranking of X1 and X2 seems
CE
robust.
We propose also an alternative visualization technique (copula-distance plot, hence-
forth). The intuition is to graph the difference between the empirical copula and the product
AC
copula that would apply on the same data. Given a (X, Y) sample pair, the associated transformed values from the empirical marginal cdfs ui = Fˆ X (xi ), vi = Fˆ Y (yi ) populate a Latin ˆ i , vi ) − ui vi ) form the graph square, ignoring the case of tied data. Hence, the pairs (ui , C(u of a function ϕ that maps { ni ; i = 1, . . . , n} into [−0.25, 0.25]. Thus, one can report ui on the
ˆ i , vi ) − ui vi on the vertical axis, obhorizontal axis and the corresponding points ϕ(ui ) = C(u ˆ i , vi )−ui vi ). Note that the function C(u, v)−uv is the common taining the scatterplot of (ui , C(u
term in the star discrepancy (19), Spearman (12), Schweizer–Wolff (17) and Hoeffding (18) RR dependence measures. Thus, dependence measures of the form h(C(u, v) − uv) dC(u, v)
might be straightforwardly extracted from the plot. 21
ACCEPTED MANUSCRIPT
The intuition behind this plot is that low statistical dependence is evidenced by values of ˆ i , vi ) − ui vi close to zero. To illustrate, the right graph of Figure 2 reports the copula disC(u ˆ v) − uv) associated with the empirical tance plot for our running example. The pairs (u, C(u,
CR IP T
copula for (X1 , Y) (marked with blue ∗ symbols in Figure 2) lie further from the horizontal
axes than the ones for the (X2 , Y) pair (red ◦). Thus, we gauge a stronger dependence of Y on X1 rather than on X2 , in accordance with the results in Example 17.
In addition, a copula distance plot yields insights on trend as well as non-linearity. The √ P black lines in the right graph of Figure 2 are given by u 7→ 2n3 i:ui ≤u (1 − 2vi ) and we call them rank cusunoro curves (cusunoro=cumulative sum of normalized reordered output).
AN US
They are a copula-version of a graphical tool introduced in Plischke (2012). The mean square gradient of this curve is directly linked to the rank correlation ratio, as we discuss in the Online Supplement. A cusunoro curve lying entirely above the horizontal axis signals a co-monotonic effect of a simulator input parameter, a line lying below signals a contramonotonic effect. An off-center turning-point shows a non-linear effect on the output. Thus,
the right graph in Figure 2 shows that X1 and X2 have an expected contra-monotonic effect,
ED
6. Discussion
M
which is nearly rank-linear for X1 and non-linear for X2 .
We have seen that measuring statistical dependence in simulation experiments within a copula-theory framework can give rise to alternative sensitivity measures, new estimators
PT
and graphical approaches. Regarding measures, Table 1 lists the dependence measures discussed in this paper. For each measure, the table reports whether it can be regarded as copula-based, whether it belongs to the common rationale of global sensitivity analysis and
CE
whether it complies with Rényi’s postulate D. From the fourth column, we deduce that, out of the seventeen measures listed, Pearson’s
AC
% and η are the only global sensitivity measures that cannot be regarded from the point of view of copula theory. In fact these two sensitivity measures are not transformation invariant. If a sensitivity measure is not transformation invariant, it cannot be copula-based. In this respect, consider the class of sensitivity measures in the common rationale based on Rényi’s divergence in (8). For any value of α for which this divergence is defined, the inner operator
is not transformation invariant and, therefore, ξRenyi cannot be a copula-based sensitivity measure. The fifth column reports whether the dependence measure falls within the common rationale of equation (3). Outside the common rationale one finds Pearson’s and Spearman’s
22
CR IP T
ACCEPTED MANUSCRIPT
Table 1: Sensitivity measures used in this work. Copula: The measure is transformation invariant and can be formulated in copula terms. Comm. Rat.: The measure is part of the common rationale. Rényi’s D: The measure
AN US
complies with Rényi’s postulate.
Name/Type
Symbol(s)
Eq.
Copula
%
(2)
No
No
%∗
(12)
Yes
No
∗
τ
(14)
Yes
No
η
(4)
No
Yes
No
η∗
(15)
Yes
Yes
No
γ∗p
(16)
Yes
Yes
s∗ , γ1∗
Pearson Corr. Coeff. Spearman Corr. Coeff Kendall tau Rank Corr. Ratio
(17)
Yes
Yes
, γ2∗
(18)
Yes
Yes
∗ , γ∞
(19)
Yes
Yes
β∗p
(21)
Yes
Yes
Yes
βKS , β∗∞
(25)
Yes
Yes
Yes
βCvM , β2
(26)
Yes
Yes
Yes
Copula Density, Niveau 3
δ∗d
(27)
Yes
Yes
Yes
Borgonovo delta
δB , δ∗1
(30)
Yes
Yes
Yes
Kullback–Leibler
δKL , δ∗KL
(32)
Yes
Yes
Yes
δ∗H
(33)
Yes
Yes
Yes
ED
Schweizer Wolff sigma
Rényi’s D
PT
Copula Distance, Niveau 1
M
Pearson Corr. Ratio
Comm. Rat.
Hoeffding omega Star Discrepancy
Conditional Copula, Niveau 2 Kolmogorov–Smirnov
AC
CE
Cramér–von Mises
Hellinger
∗
w D
∗
23
ACCEPTED MANUSCRIPT
correlation coefficients, Kendall’s τ∗ , and all the copula-based sensitivity measures introduced at niveau 1. While there is a non-empty intersection between the set of copula-based sensitivity measures and measures that are in the common rationale of (3), there are also
CR IP T
copula-based sensitivity measures that are not in the common rationale (niveau 1) as well as sensitivity measures in the common rationale that cannot be regarded as copula-based (e.g., variance-based).
The sixth column of Table 1 analyzes the dependence measures in light of Rényi’s postulate D.
Let us come to implementation. Table 1 shows a variety of sensitivity measures. Compu-
AN US
tational cost could serve as a discriminant for pointing the analyst towards a specific measure.
However, in principle, all these sensitivity measures can be estimated from the same sample simultaneously (see Section 4). This might be advantageous because alternative sensitivity measures take into account different facets of the input-output dependence (see Ebrahimi et al. (2014)). Thus, our suggestion is to employ a handful of measures chosen at different niveaus. For instance, one can select a copula-based sensitivity measure which looks at vari-
M
ance such as η∗ . Because this sensitivity measure does not comply with Rényi’s postulate D, one can accompany the analysis with sensitivity measures at niveaus 1, 2 and 3 that look
ED
at the density and cdf of the simulator output. If all the sensitivity measures flag a model input as important, the indication is more robust than if it came from a single sensitivity measure. Moreover, the analyst reduces the risk of overlooking relevant simulator inputs.
PT
For instance, in Example 17, Pearson’s and Spearman’s correlation coefficients and Pearson’s correlation ratio (all at niveau 0) would erroneously indicate that X1 , X2 and X3 are not
CE
relevant. However, complementing the insights by using sensitivity measures at niveaus 1, 2 or 3, the analyst would obtain a non-null value of the sensitivity measures, recovering the correct indication that the output depends indeed on these inputs.
AC
Finally, from the same sample the analyst can build modified Kendall and rank cusunoro
plots (Section 5). These graphical tools allow the analyst to gain additional insights on the simulator partial behavior with respect to the individual simulator inputs, as they yield information on whether there is an expected monotonic relationship between the simulator output and the input of interest, and on whether the dependence is close to rank-linear.
7. Benchmark Application: The Level E code The Level E code is the outcome of a multi-year international effort on a benchmark safety assessment case study of a waste disposal system (Nuclear Energy Agency, 1989) and has
24
ACCEPTED MANUSCRIPT
become a benchmark for probabilistic sensitivity analysis over the years (see Saltelli et al. (2000); Saltelli and Tarantola (2002); Borgonovo et al. (2014) to mention a few). The code has 12 uncertain inputs, whose distributions have been determined by expert elicitation and
CR IP T
used in subsequent experiments. We calculate the sensitivity measures on an available quasi-Monte Carlo input-output
sample of size 215 . We analyze two quantities of interest, the peak dose and the dose at
t = 320, 000 years. This point in time is significant from a physical perspective, as the fast decaying iodine starts fading off and the neptunium decay chain starts dominating the radioactive release. Following the discussion in Section 6, besides η∗ at niveau 0, we select
AN US
two density-based sensitivity measures at niveau 3, δKL and δB , and two cdf-based sensitivity measures, β2 and D∗ , at niveaus 2 and 1 respectively.
0.9 0.8 0.7 0.6
0.4 0.3 0.2 0.1 0
T
kI
kC v1
l1 R1I
1 C
v2
b)
1
M
0.5
ED
Rescaled Dependence Measure Value
Peak Dose
Rescaled Dependence Measure Value
a)
1
l2 R2I
2 C
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
W
Dose at t=320,000
T
kI
kC v1
l1 R1I
1 C
v2
l2 R2I
2 C
W
PT
Figure 3: Sensitivity measures η∗ , β2 , δKL , δB , D∗ for the peak dose (graph a) and the dose at t=320,000 years
CE
(graph b). For display purposes, the values of the sensitivity measures are normalized by the maximum value (e.g., for η∗ , the graphs report maxi=1,2,...,12 η∗ (Xi ) −1 η∗ (Xi )).
Figure 3 displays the values of the five sensitivity measures for the peak dose (graph a)
and the dose at t = 320, 000 years (graph b). When the peak dose is the output of interest,
AC
the biosphere parameter W is the simulator input on which Y depends more strongly, followed by v1 , the water velocity in geosphere layer 1. The simulator inputs characterizing the remaining physical properties of geosphere layer 1 (l1 , RC1 , γC1 ) are all significant. The cannister failure time T , the leaching rates kC and kI , the Neptunium decay chain parameters γC1 and γC2 have minor influence. When the dose at t = 320, 000 is the output of interest, the sensitivity measures identify three groups of simulator inputs (Figure 3): {v1 , l1 }, then {γC1 , R1I , v2 , l2 , γC2 , W} followed by {kI , kC , R2I , T }. Notably, the most important simulator in-
put on the peak dose, W, loses importance in favor of v1 and l1 . The values of the sensitivity
measures of v1 and l1 are about one order of magnitude higher than the ones of the second 25
ACCEPTED MANUSCRIPT
group, indicating that the dependence of the simulator output on these two inputs is much stronger than on the other inputs. The values of the sensitivity measures of the simulator inputs in the least relevant group are, in turn, one order of magnitude lower than the values
Level E, maximum dose. CvM sensitivity β2⋄ , MC sampling
100
Level E, dose at 320,000 years. CvM sensitivity β2⋄ , MC sampling
100
T kI
T kI
10
kC
-1
v1
10
-1
10
-2
10
-3
kC v1 l1
l1
RI1
RI1
10
γ 1C 2
-2
γ 1C v2
v
l2
l2
10
R2I
-3
γ 2C
R2I γ 2C W
W
0
0.5
1
1.5
2
2.5
Sample size
3
3.5
10-4
0
0.5
1
1.5
AN US
10-4
CR IP T
of the inputs in the intermediate-importance group.
2
2.5
3
Sample size
× 104
3.5
× 104
Figure 4: Level E: Convergence of the estimator for CvM sensitivity in eq. (26). A logarithmic scale is used on the vertical axis.
For the peak dose, previous works (see Borgonovo et al. (2014), among others) identified the two major key-drivers of uncertainty (W and v1 ) with a computational cost of
M
about 1, 000-1, 500 model runs, with variance-based and density-based sensitivity measures as well. In Figure 4 we report the values of the Bernstein estimator from (42) for the Cramér–
ED
von Mises sensitivity measure β2 of (26), with the Bernstein polynomial degree fixed at m = 100 and the grid size at N = 20. Figure 4 shows that the estimates are convergent, and that the identification of the key-uncertainty drivers occurs at sample sizes in line with
PT
previous studies. The same occurs for the dose at t = 320, 000 years. Figure 5 reports the modified Kendall curves and rank cusunoro curves for the peak dose
CE
(graphs a and b) and the dose at t = 320.000 years (graphs c and d), respectively. Graphs a) and b) in Figure 5 show that all simulator inputs have a monotonic effect on the peak dose (either co- or contra-monotonic). Specifically v1 has a strong co-monotonic effect, while W
AC
has a strong contra-monotonic effect. Also, the dependence is close to rank-linear, given the symmetric parabola-shaped cusunoro curves. However, graphs c) and d) show that v1 has a strong non-linear and non-monotonic effect on the dose at t = 320, 000, with a contramonotonic effect for values of v1 lower than then its 70th quantile, and co-monotonic for
values higher than this quantile. The second most important input l1 has a contra-monotonic and non-linear effect. A comparison of the plots shows that the dose at t = 320, 000 is less linear as a function of X than the peak dose. Thus, these graphs unveil additional insights on the behavior of the two model outputs with respect to the key-drivers of uncertainty.
26
AN US
CR IP T
ACCEPTED MANUSCRIPT
8. Conclusions
M
Figure 5: Graphical sensitivity analysis of Level E, a),b) peak dose; c),d) annual dose at t = 320.000.
ED
This article has investigated the question of whether there is a connection between copula theory and probabilistic sensitivity analysis. We have shown that there is more than one layer of overlap, ranging from the goal, measuring statistical dependence, to the means,
PT
using transformations. As a result of our investigation, however, it seems safe to state that the connection is not complete but constructive. There are popular sensitivity measures, e.g.,
CE
variance-based, which cannot be expressed as copulas. However, the connection enlarges the rationale for interpreting and building probabilistic sensitivity measures and permits to introduce new estimators and visualization tools. The new estimators keep the computational
AC
cost at n model runs, where n is the sample size. The visualization tools have the potential to enrich the insights obtained from the simulation input-output sample at no additional cost. The approach works for simulators with a deterministic and a stochastic response as well. In fact, there is a variety of probabilistic sensitivity measures available to the analyst. Our
work can help in orienting the choice of which measure(s) to use. The emerging suggestion is to use a restricted ensemble of dependence measures, composed of indicators that look at alternative statistical properties (e.g., variance, density, cdf) and such that some of the indicators comply with Rényi’s postulate D. Two possible lines of future research emerge from this work. First, some of the sensitivity 27
ACCEPTED MANUSCRIPT
measures in Table 1 possess a value of information interpretation, while some others do not (Borgonovo et al., 2018a). Then, establishing the conditions under which a sensitivity measure is in fact a copula and, at the same time, possess the interpretation of information
CR IP T
value is an open research problem. Second, it is interesting to understand whether graphical methods based on copula theory can be competitive with tools used in machine learning, such as partial dependence functions (Friedman, 2001), in the quest towards interpretability.
Acknowledgments
The authors thank the editor, Professor Roman Slowinski, for the careful editorial atten-
AN US
tion. We also wish to thank the two anonymous reviewers for their careful reading of the
manuscript and for the several constructive suggestions. We thank Pietro Muliere, Fabrizio Durante, Juan Fernández-Sáchez and Tim Bedford for stimulating discussions.
Appendix A. Proofs
Proof of Proposition 2. The nonparametric regression curve of V conditional to U = u is Z
1
∂C (u, 1) − ∂u
M
E[V|U = u] =
vc(u, v) dv =
0
Z
1
0
∂C (u, v) dv = 1 − ∂u 1 2
ED
The mean and the variance of the uniformly distributed V are using (A.1)
[0,1]2
C(v|u) du dv =
Z η =3
CE
As
RR
PT
η∗ = 12 E[(E[V|U] − E[V])2 ] = 3
∗
0
R1 0
Z
1
1−2
0
(C(1, v) − C(0, v)) dv =
R1 0
Z
0
0
0
1 12 ,
1
C(v|u) dv. (A.1) 0
respectively. Hence,
!2 C(v|u) dv du.
v dv = 12 , this yields
!2 Z 1 Z 1 Z C(v|u)dv du = 12 1−4 C(v|u)dv + 4
1
1
and
Z
0
1
Z
0
1
!2
C(v|u) dv du − 3.
AC
Proof of Proposition 6. Ad 1.) The differences M − Π and Π − W are nonnegative, and if u + v ≤ 1, u(1 − v) if u ≥ v, uv (M −Π)(u, v) = (Π−W)(u, v) = (A.2) v(1 − u) if v ≥ u, (u − 1)(v − 1) if u + v ≥ 1.
A substitution u 7→ 1 − u or v 7→ 1 − v shows that both differences are mirror images on [0, 1]2 . Hence d p (M, Π) = d p (W, Π). Then (20) follows from integrating d pp (M, Π) = R1 2 p−1 )(1 − v) p dv (Nelsen, 2006). The maxima in (A.2) are attained at u = 21 , v = 12 . p+1 0 (1 − v Hence d∞ (M, Π) = d∞ (W, Π) =
1 2
· 12 .
Ad 2.) If C ≥ Π or C ≤ Π for all u, v ∈ [0, 1] (quadrant dependence) then either 0 ≤ C − Π ≤ 28
ACCEPTED MANUSCRIPT
M − Π or 0 ≤ Π − C ≤ Π − W which leads to d p (C, Π) ≤ d p (W, Π) = d p (M, Π). Now assume
that there exists a copula C which is not quadrant dependent. Then C − Π has zeros inside
the unit square and, because of continuity, multiple local extrema. Each of the associated because of bounded derivatives of C.
CR IP T
extreme value is absolutely smaller than the unique extreme value of M − Π (or Π − W), Ad 3.) The choice C = Π or C = M leads to γ∗p = 0 and γ∗p = 1, for C = W one uses 2.).
Proof of Proposition 8. Let C(v|u) = 1{v ≥ g(u)} be an indicator function for a measurable
function g : u 7→ v. As g : [0, 1] → [0, 1] maps a uniform distribution into a uniform
distribution, it is measure-preserving. Then = =
p2 +3p+2 2 p+2 2
ZZ
Z 1 0
[0,1]2
p
(C(v|u) − v) du dv =
Z
Z
1
g(u)
p
v dv +
Z
AN US
βp,p
g(u) p+1 + (1−g(u)) p+1 du =
Any other C(v|u) contributes less, hence
βp
0
p+2 2
Z 1 0
0
1
g(u)
p
!
(1 − v) dv du
g(u) p+1 + (1−g(u)) p+1 dg(u) = 1
∈ [0, 1]. Analogous results hold for β∗p (C).
Proof of Theorem 13. Ad 1.) This is the nullity-implies-independence property of distances Ad 2.) This follows from
M
between bivariate absolute continuous copulas.
ED
C(v|u) = v ⇐⇒ C(u, v) = C(0, v) +
Z
0
u
∂C(u, v) du = ∂u
Z
0
u
C(v|u)du =
Z
u
v du = uv
0
which holds almost everywhere. As C is assumed absolutely continuous, C = Π holds.
PT
Ad 3.) d(c(u, v), 1) = 0 is equivalent to c(u, v) = 1 a.e. Under absolute continuity, C = Π. Proof of Theorem 16. The estimators based on L p norms replace the integrals with Rieb conmann sums that are consistent by construction. The empirical Bernstein copula Bm [C]
CE
b (Nelsen, 2006). Evaluation of Bm [C] b is only verges uniformly to the empirical copula C sensible if N ≤ m, i.e., when using it for interpolation. Under the iterated logarithm condi-
AC
tion, the convergence to the true copula C is also uniform. As the copula is in C1 ([0, 1]2 ) by
construction, the convergence for niveau 1 sensitivity measures follows immediately. Higher niveaus require smoother copulas. Convergence properties of empirical conditional copula and copula density are found in Abegaz et al. (2011); Janssen et al. (2014). Proof of Proposition 18. As u = v holds for M, we have q = u and 4(min{u, v} − u2 ) = 4u(u − 1) which reaches a maximum at u = 21 . For W, v = 1 − u holds such that q = √ u(1 − u) ∈ [0, 21 ] and 4(max{u + v − 1, 0} − q2 ) = −4q2 .
29
ACCEPTED MANUSCRIPT
References Abbas, A. E. (2009). Multiattribute utility copulas. Operations Research, 57(6):1367– 1383.
CR IP T
Abbas, A. E. (2013). Utility copula functions matching all boundary assessments. Operations Research, 61(2):359–371.
Abbas, A. E. and Sun, Z. (2015). Multiattribute utility functions satisfying mutual preferential independence. Operations Research, 63(2):378–393.
Abegaz, F., Mandrekar, V., and Naik-Nimbalkar, U. (2011). On computing approximaStatistical Congress, Dublin.
AN US
tion of correlations using Bernstein copula and probabilistic tools. In Proc. 58th World
Baucells, M. and Borgonovo, E. (2013). Invariant probabilistic sensitivity analysis. Management Science, 59(11):2536–2549.
Bedford, T. and Cooke, R. M. (2002). Vines–a new graphical model for dependent random variables. The Annals of Statistics, 30(4):1031–1068.
M
Biller, B. (2009). Copula-based multivariate input models for stochastic simulation. Operations Research, 57(4):878–892.
Biller, B. and Corlu, C. G. (2011). Accounting for parameter uncertainty in large-scale
ED
stochastic simulations with correlated inputs. Operations Research, 59(3):661–673. Blum, J. R., Kiefer, J., and Rosenblatt, M. (1961). Distribution free tests of independence based on the sample distribution function. The Annals of Mathematical Statistics,
PT
32(2):485–498.
Borgonovo, E. (2007). A new uncertainty importance measure. Reliability Engineer-
CE
ing&System Safety, 92(6):771–784. Borgonovo, E., Hazen, G., Jose, V. R. R., and Plischke, E. (2018a). Scoring rules, value
AC
of information, and sensitivity analysis. Under review. Borgonovo, E., Hazen, G. B., and Plischke, E. (2016). A common rationale for global sensitivity measures and their estimation. Risk Analysis, 36(10):1871–1895.
Borgonovo, E. and Plischke, E. (2016). Sensitivity Analysis: A Review of Recent Advances. European Journal of Operational Research, 3(1):869–887. Borgonovo, E. and Tarantola, S. (2008). Moment independent and variance-based sensitivity analysis with correlations. International Journal of Chemical Kinetics, 40(11):687– 698. Borgonovo, E., Tarantola, S., Plischke, E., and Morris, M. D. (2014). Transformations and invariance in the sensitivity analysis of computer experiments. Journal of the Royal 30
ACCEPTED MANUSCRIPT
Statistical Society, Series B, 76:925–947. Borgonovo, E., Wendell, R., and Buzzard, G. (2018b). A Global Tolerance Approach to Sensitivity Analysis in Linear Programming. European Journal of Operational Research,
CR IP T
1(16):321–337. Campolongo, F. and Saltelli, A. (1997). Sensitivity analysis of an environmental model:
an application of different analysis methods. Reliability Engineering&System Safety, 57(1):49–69.
Clemen, R. T. and Reilly, T. (1999). Correlations and copulas for decision and risk analysis. Management Science, 45(2):208–224.
AN US
Conover, W. J. and Iman, R. L. (1981). Rank transformations as a bridge between parametric and nonparametric statistics. The American Statistician, 35:124–133. Critchfield, G. C. and Willard, K. E. (1986). Probabilistic analysis of decision trees using Monte Carlo simulation. Medical Decision Making, 6:85–92. Csiszár, I. (2008).
Axiomatic characterizations of information measures.
10:261–273.
Entropy,
M
Da Veiga, S. (2015). Global sensitivity analysis with dependence measures. Journal of Statistical Computation and Simulation, 85(7):1283–1305.
ED
Dette, H., Siburg, K. F., and Stoimenov, P. A. (2012). A copula-based non-parametric measure of regression dependence. Scandinavian Journal of Statistics, 40(1):21–41. Dick, J., Kuo, F. Y., and Sloan, I. H. (2013). High-dimensional integration: The quasi-
PT
Monte Carlo way. Acta Numerica, 22:133–288. Dick, J. and Pillichshammer, F. (2014). Discrepancy theory and quasi-Monte Carlo inte-
CE
gration. In A Panorama of Discrepancy Theory, pages 539–619. Springer Verlag, Cham. Durante, F. and Sempi, C. (2015). Principles of Copula Theory. CRC Press, Boca Raton. Ebrahimi, N., Jalali, N. Y., and Soofi, E. S. (2014). Comparison, utility, and partition of
AC
dependence under absolutely continuous and singular distributions. Journal of Multivariate Analysis, 131:32–50. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29(5):1189–1232. Gamboa, F., Klein, T., and Lagnoux, A. (2015). Sensitivity analysis based on Cramér von Mises distance. arXiv:1506.04133v1. Genest, C. and Boies, J.-C. (2003). Detecting dependence with Kendall plots. The American Statistician, 57(4):275–284. Glasserman, P. and Li, J. (2005). Importance sampling for portfolio credit risk. Manage-
31
ACCEPTED MANUSCRIPT
ment Science, 51(11):1643–1656. Government Office for Science (2018). Computational modelling: Technological futures. Blackett review, Council for Science and Technology, London.
CR IP T
Gupton, G. M., Finger, C. C., and Bhatia, M. (1997). CreditmetricsT M — technical document. Technical report, J. P. Morgan & Co., New York, NY. Reprinted with errata, RiskMetrics Group, 2007.
Hammitt, J. K. and Zhang, Y. (2013). Combining experts’ judgments: Comparison of algorithmic methods using synthetic data. Risk Analysis, 33(1).
Helton, J., Johnson, J., Sallaberry, C., and Storlie, C. (2006). Survey of sampling-based 91(10-11):1175–1209.
AN US
methods for uncertainty and sensitivity analysis. Reliability Engineering & System Safety, Helton, J. C. (1993). Uncertainty and sensitivity analyses techniques for use in performance assessment for radioactive waste disposal. Reliability Engineering&System Safety, 42(2–3):327–367.
Hoeffding, W. (1948). A non-parametric test of independence. The Annals of Mathemat-
M
ical Statistics, 19(4):546–557.
Iman, R. L. and Hora, S. C. (1990). A robust measure of uncertainty importance for use
ED
in fault tree system analysis. Risk Analysis, 10(3):401–406. Janssen, P., Swanepoel, J., and Veraverbeke, N. (2014). A note on the asymptotic behavior of the Bernstein estimator of the copula density. Journal of Multivariate Analysis,
PT
124:480–487.
Jayech, S. (2016). The contagion channels of July-August-2011 stock market crash: A
CE
DAG-copula based approach. European Journal of Operational Research, 249(2):631– 646.
Joe, H. (1989). Relative entropy measures of multivariate dependence. Journal of the
AC
American Statistical Association, 84(405):157–164. Joe, H. (2014). Dependence Modeling with Copulas. CRC Press, Boca Raton. Jouini, M. N. and Clemen, R. T. (1996). Copula models for aggregating expert opinions. Operations Research, 44(3):444–457. Kakouris, I. and Rustem, B. (2014). Robust portfolio optimization with copulas. European Journal of Operational Research, 235(1):28–37. Kleijnen, J. P. C. and Helton, J. C. (1999). Statistical analyses of scatterplots to identify important factors in large-scale simulations, 1: Review and comparison of techniques. Reliability Engineering&System Safety, 65(2):147–185.
32
ACCEPTED MANUSCRIPT
Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 2(1):79–86. Liebscher, E. (2014). Copula-based dependence measures. Dependent Modeling, 2:49–
CR IP T
64. Mastrandrea, M. D., Field, C. B., Stocker, T. F., Edenhofer, O., Ebi, K. L., Frame, D. J., Held, H., Kriegler, E., Mach, K. J., Matschoss, P. R., Plattner, G.-K., Yohe, G. W., and Zwiers, F. W. (2010). Guidance note for lead authors of the IPCC fifth assessment report
on consistent treatment of uncertainties. Technical report, Intergovernmental Panel on Climate Change, Geneva. Available at www.ipcc.ch.
AN US
Navarro, J., Pellerey, F., and Di Crescenzo, A. (2015). Orderings of coherent systems
with randomized dependent components. European Journal of Operational Research, 240(1):127–139.
Nelsen, R. B. (2006). An Introduction to Copulas. Springer Series in Statistics. Springer Verlag, 2nd edition.
Niederreiter, H. (1992). Random number generation and quasi-Monte Carlo methods,
M
volume 63 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia, PA.
ED
Nuclear Energy Agency (1989). PSACOIN level E intercomparison. An international code intercomparison exercise on a hypothetical safety assessment case study for radioactive waste disposal systems. Technical report, OECD, Paris.
PT
Palomo, J., Insua, D. R., and Ruggeri, F. (2007). Modeling external risks in project management. Risk Analysis, 27(4).
CE
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559–572. Pearson, K. (1905). On the General Theory of Skew Correlation and Non-linear Regres-
AC
sion, volume XIV of Mathematical Contributions to the Theory of Evolution, Drapers’ Company Research Memoirs. Dulau & Co., London. Plischke, E. (2012). An adaptive correlation ratio method using the cumulative sum of the reordered output. Reliability Engineering&System Safety, 107:149–156. Plischke, E., Borgonovo, E., and Smith, C. L. (2013). Global sensitivity measures from given data. European Journal of Operational Research, 226(3):536–550. Prautzsch, H., Boehm, W., and Paluszny, M. (2002). Bézier and B-Spline Techniques.
Springer Verlag, Berlin. Rahman, S. (2016). The f -sensitivity index. SIAM/ASA Journal of Uncertainty Quantifi-
33
ACCEPTED MANUSCRIPT
cation, 4:130–162. Rényi, A. (1959). On measures of dependence. Acta Mathematica Academiae Scientiarum Hungaricae, 10:441–451.
CR IP T
Rényi, A. (1961). On measures of entropy and information. In Fourth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 547–561. University of California Press.
Sak, H., Hörmann, W., and Leydold, J. (2010). Efficient risk simulations for linear asset portfolios in the t-copula model. European Journal of Operational Research, 202(3):802– 809.
AN US
Saltelli, A. (2002). Making best use of model evaluations to compute sensitivity indices. Computer Physics Communications, 145(2):280–297.
Saltelli, A. and Marivoet, J. (1990). Non-parametric statistics in sensitivity analysis for model output: A comparison of selected techniques. Reliability Engineering&System Safety, 28(2):229–253.
Saltelli, A. and Sobol’, I. M. (1995). About the use of rank transformation in the sensi-
M
tivity analysis of model output. Reliability Engineering&System Safety, 50(3):225–239. Saltelli, A. and Tarantola, S. (2002). On the relative importance of input factors in math-
ED
ematical models: Safety assessment for nuclear waste disposal. Journal of the American Statistical Association, 97(459):702–709. Saltelli, A., Tarantola, S., and Campolongo, F. (2000). Sensitivity analysis as an ingredi-
PT
ent of modelling. Statistical Science, 19(4):377–395. Sancetta, A. and Satchell, S. (2004). The Bernstein copula and its applications to mod-
CE
eling and approximations of multivariate distributions. Econometric Theory, 20(3):535– 562.
Scarsini, M. (1984). Strong measures of concordance and convergence in probability.
AC
Rivista di matematica per le scienze economiche e sociali, 7(l–2):39–44. Schmid, F., Schmidt, R., Blumentritt, T., Gaißer, S., and Ruppert, M. (2010). Copulabased measures of multivariate association. In Copula Theory and Its Applications, Lecture Notes in Statistics, pages 209–236. Springer Verlag, Berlin. Schreyer, M., Paulin, R., and Trutschnig, W. (2017). On the exact region determined by Kendall’s τ and Spearman’s ρ. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 79(2):613–633. Schweizer, B. and Wolff, E. F. (1981). On nonparametric measures of dependence for random variables. The Annals of Statistics, 9(4):879–885.
34
ACCEPTED MANUSCRIPT
Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8:229–231. Sobol’, I. M. (1993). Sensitivity estimates for nonlinear mathematical models. Mathe-
CR IP T
matical Modelling & Computational Experiments, 1:407–414. Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15:72–101. Strong, M. and Oakley, J. E. (2014).
When is a model good enough?
Deriving
the expected value of model improvement via specifying internal model discrepancies. SIAM/ASA Journal of Uncertainty Quantification, 2:106–125.
AN US
US EPA (2009). Guidance on the development, evaluation, and application of environmental models. EPA/100/K-09/003, Council for Regulatory Environmental Modeling. US NRC (2011). An approach for using probabilistic risk assessment in risk-informed decisions on plant-specific changes to the licensing basis. Regulatory Guide 1.174, Revision 2, Office of Nuclear Regulatory Research.
van Dorp, J. R. and Kotz, S. (2003). Generalized trapezoidal distributions. Metrika,
M
58:85–97.
Wagner, H. M. (1995). Global sensitivity analysis. Operations Research, 43(6):948–969.
ED
Wang, T., Dyer, J. S., and Butler, J. C. (2016). Modeling correlated discrete uncertainties in event trees with copulas. Risk Analysis, 36(2):396–410. Wei, P., Lu, Z., and Song, J. (2014). Moment-independent sensitivity analysis using
PT
copula. Risk Analysis, 34(2):210–222. Ye, W., Liu, X., and Miao, B. (2012). Measuring the subprime crisis contagion: Evidence
CE
of change point analysis of copula functions. European Journal of Operational Research, 222(1):96–103. Ye, W., Luo, K., and Liu, X. (2017). Time-varying quantile association regression model
AC
with applications to financial contagion and VaR. European Journal of Operational Research, 256(3):1015–1028. Yitzhaki, S. (2003). Gini’s mean difference: A superior measure of variability for nonnormal distributions. Metron, 61(2):285–316.
35