A post-contingency power flow emulator for generalized probabilistic risks assessment of power grids

A post-contingency power flow emulator for generalized probabilistic risks assessment of power grids

A Post-Contingency Power Flow Emulator for Generalized Probabilistic Risks Assessment of Power Grids Journal Pre-proof A Post-Contingency Power Flow...

1MB Sizes 0 Downloads 33 Views

A Post-Contingency Power Flow Emulator for Generalized Probabilistic Risks Assessment of Power Grids

Journal Pre-proof

A Post-Contingency Power Flow Emulator for Generalized Probabilistic Risks Assessment of Power Grids Roberto Rocchetta, Edoardo Patelli PII: DOI: Reference:

S0951-8320(19)30302-3 https://doi.org/10.1016/j.ress.2020.106817 RESS 106817

To appear in:

Reliability Engineering and System Safety

Please cite this article as: Roberto Rocchetta, Edoardo Patelli, A Post-Contingency Power Flow Emulator for Generalized Probabilistic Risks Assessment of Power Grids, Reliability Engineering and System Safety (2020), doi: https://doi.org/10.1016/j.ress.2020.106817

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier Ltd.

Highlights • A risk assessment emulator for power grids overload risk assessment is proposed; • The emulator is embedded within a generalized uncertainty quantification framework; • Line-Outage Distribution Factors efficiently approximate post-contingency severity evaluations; • Archiving 2 orders of magnitudes time-cost reduction with a modest accuracy loss; • Sensitivity analysis via pinching revealed the key inputs driving the imprecision in the risk score;

1

A Post-Contingency Power Flow Emulator for Generalized Probabilistic Risks Assessment of Power Grids Roberto Rocchettaa,, Edoardo Patellib a Technical

University of Eindhoven, Department of Mathematics and Computer Science, Security group, Eindhoven, The Netherlands b University of Strathclyde, Department of Civil and Environmental Engineering, Glasgow, United Kingdom

Abstract Risk-based power dispatch has been proposed as a viable alternative to Security-Constrained Dispatch to reduce power grid costs and help to better understand of prominent hazards. In contrast to classical approaches, risk-based frameworks assign different weights to different contingencies, quantifying both their likelihood occurrence and severity. This leads to an economically profitable operational schedule by exploiting the trade-off between grid risks and costs. However, relevant sources of uncertainty are often neglected due to issues related to the computational cost of the analysis. In this work, we present an efficient risk assessment frameworks for power grids. The approach is based on the Line-Outage Distribution Factors for the severity assessment of postcontingency scenarios. The proposed emulator is embedded within a generalized uncertainty quantification framework to quantify: (1) The effect of imprecision on the estimation of the risk index; (2) The effect of inherent variability, aleatory uncertainty, in environmental-operational variables. The computational cost and accuracy of the proposed risk model are discussed in comparison to traditional approaches. The applicability of the proposed framework to real size grids is exemplified by several case studies. Keywords: Power Grid, Risk Assessment, Line-Outage-Distribution-Factors, Probability Boxes, Sensitivity Analysis 1. Introduction The goal of a power grid operator is to satisfy the customer’s electric needs which must be done safely and economically. To guarantee both profitability and safety, the grid operator should prescribe an optimal generation profile which minimizes production costs and, at the same time, fulfill a set of security-related constraints [1]. For instance, the scheduled power production must comply with the thermal limit of the transmission lines, with the maximum capacity of the generators and prove to be safe and stable even in the unlikely occurrence of component outages catalogued in a so-called contingency list. This list can include all the N-1 failures, i.e. single component outages, the loss of two bulk transmission elements consecutively (N-1-1 contingencies) and a set of higher order failures, i.e., the simultaneous loss of k components in the grid (N-k failures). In practice, the complete set of N-1 failure is often accounted for and grids’ operations are scheduled accordingly. On the other hand, safety compliance with respect to a complete set of N-1-1 and N-k failures (k > 1) is hard to guarantee. This is due to the large number of high order failures leading to an empty set of feasible operational solutions. In practice, only a subset of N-k or N-1-1 failures is (rarely) considered and only for specific power grids believed to be at high risk.

Email addresses: [email protected] (Roberto Rocchetta), [email protected] (Edoardo Patelli) Preprint submitted to Reliability Engineering and System Safety

Following the definition of a contingency list, SecurityConstrained Optimal-Power-Flow (SCOPF) is used to schedule an optimal power production that guarantees safety and compliance for this list, see e.g. [2, 3, 4, 5]. SCOPF is a well-established approach, commonly used in practice as it provides solutions guaranteed to be safe even when one of the listed failures occurs. One of the key issues of SCOPF methods lay in the fact that grid safety and revenue are known to be conflicting objectives. As such, SCOPF often leads to sub-optimal profits [6]. This is due to the hard deterministic constraints imposed on the gird operations (the list of failures), which weight equally in the SCOPF optimization routine. However, some failures are less severe than others or less likely to occur, and SCOPF cannot fully exploit the existing trade-off between operational risks and costs [7]. Online risk assessment frameworks have been developed as a viable alternative to SCOPF, see e.g. [8, 9]. A risk-informed dispatch compared to a traditional security-constrained solution leads to a similar safety performance but for a higher expected economic gain. This is due to the enhanced awareness of relevant operational threats [10]. However, risk assessment frameworks for power grids have several limitations. In particular, simplifying model assumptions are made to cope with stringent efficiency requirement and in addition relevant sources of uncertainty are often disregarded. Power grid production production profile, electric power demand and weather factors are influenced by uncontrollable parameters and, therefore, generally affected by large uncertainty. Problems with the effect of those uncertainJanuary 20, 2020

ties are growing in the power grid sector due to sustainability targets and the inclusion of renewable sources and emerging energy technologies, thus changing the demand and generation profiles. As a consequence, the available historical data are often out-of-date and not fully applicable to the new grid paradigm [11]. Uncertainty arises due to inherent variability (aleatory uncertainty) or a lack of data (epistemic uncertainty) [12, 13]. Epistemic uncertainty can be reduced by collecting more data or gathering additional sources of information (although this may be impractical or too expensive). Differently, aleatory uncertainty is only characterised as it is due to inherent stochasticity. It is important to differentiate and quantify the effect of different types of uncertainty on power grids risks to enhanced awareness of risks and the associated confidence level due to significant gaps in the available information [14].

• Performing uncertainty quantification of probability-boxes and sensitivity analysis via the pinching method; • Separate the effects of inherent variability and lack of data; • Applicable to large network with moderate accuracy loss. The rest of this paper is organized as follows: The severity model and the failure model are presented in Section 2 while the proposed efficient risk assessment emulator and the numerical implementation of the framework are presented in Section 3. In Section 4 shows the accuracy and efficiency of the proposed model on different case studies and its applicability within a generalized UQ framework exemplified on the IEEE 24-Bus reliability test system. Section 5 closes the paper with a discussion on the results.

To overcome existing limitations of power gird risk assessment frameworks, we introduce in this work a computationally efficient risk assessment model. The risk model is embedded within a generalized Uncertainty Quantification (UQ) framework which allows quantifying the effect of mixed sources of uncertainty. An efficient risk model strategy based on linear approximations of sensitivity indices, known as Line Outage Distribution Factors (LODF), is used to reduce the computational cost of each risk evaluation. The proposed model replaces the many time-consuming post-contingency power flow evaluations which are commonly needed to estimate failures consequences and associated power grid risk [10]. The proposed framework is exemplified on various power gird case studies. A modified version of the IEEE 24 reliability test system, the New England (case39), the simplified American Electric Power systems (case57 and case118) and the large size Polish grid (case2737sop) are the considered networks. The points of strength and weaknesses of the proposed emulator are discussed by comparison with high-fidelity risk assessment models, i.e., based on AC and DC power flows. Various UQ tasks are performed and the models results compared and discussed. Specifically, mixed aleatory and epistemic uncertainty is considered affecting the weather conditions the parameters of the probabilistic risk model and power demanded. Probability boxes (P-boxes) introduced by S.Ferson [15] and already adopted to analysed power grids [11, 16] are also adopted in this work to model quantities affected by mixed uncertainty sources [12]. The slicing method is adopted as a rigorous procedure to propagate the input P-boxes to the output of a structural reliability model [17]. Furthermore, sensitivity analysis on P-boxes is also proposed to reveal the importance of imprecisely known input quantities to the power grid risk [18].

2. Power grid probabilistic risk assessment Generally speaking, the risk is defined as the product of the probability of occurrence of an event, i.e. a contingency, and the event consequences (severity). Risk of different events can be accounted for in the definition of risk in an integral way: R=

nC X i=1

P(Ei ) · S ev(Ei )

(1)

where P(Ei ) is the probability of occurrence of the undesired event Ei , S ev(Ei ) is the severity of the event when it occurs and nC is the number of events considered, e.g. a list of failures. In the context of power grid risk assessment, the undesired events are unexpected losses of one (or more) element in the system (e.g. distribution lines, transformers, generators) [8]. The consequences of failure are quantified by specifically defined severity functions that capture trends in physical indicators of the network stress, e.g., voltage drops, frequency deviations, flows lines. This work focuses on the severity of lines overloading. Failure consequences are measured in terms of closeness of the survived lines to their thermal limits, [7, 10]. Common practice in power grid security evaluations is to select a list of relevant failures for the assessment, i.e., a contingency list. We limit our focus on the full list of N − 1 line failures. For specific areas at high risk, the list can be extended to higherorder N − k failures. The composite risk index R(ζ) for a given environmental-operational condition ζ is defined as follows [7]: R(ζ) =

nC X i=1

P(Ci |ζ)

nl X

S OL,l (Ci , ζ)

(2)

l=1

where nC is the total number of failures in the contingency list, nl is the total number of lines in the system, ζ = {L, B, L, w, λ} is the set of structural, environmental and operational factors which includes, for instance: L = (L1 , ..., Lnb ) the load demand vector for the nb busses in the grid, B = (B1 , ..., Bl , ..., Bnl ) the vector of line susceptances, w the wind speed, L = (l1 , ..., ll , ..., lnl ) the vector of line lengths, λ = (λ1 , ..., λl , ..., λnl ) the line failure rates and other power grid parameters. Ci is

The key and novel contributions of the proposed emulator are summarized as follows: • A computationally tractable risk assessment framework for power grids; • Grid overload severity approximated via a single run of the power flow solver; 3

occ ] during normal weather where λl,nom is the failure rate in [ km·h conditions and λl,w is the contribution to the line failure rate due to high wind speed [19]: ! w2 − 1 αw (6) λl,w (w) = λl,nom wcr 2

the ith contingency listed, S OL,l is the overload severity for the line l due to the occurrence of failure Ci in the operationalenvironmental condition ζ, P(Ci |ζ) is the probability of facing Ci given ζ. If the N-1 single line contingencies are the only failures listed, nC = nl . 2.1. Line overload severity model A overload severity function S OL,l is specifically defined for each line in the power grid and quantifies the extent of a violation by evaluating the line proximity to its thermal limit. The S OL,l is defined as follows [10]:  f    a · fem,ll + b if fl ≥ c · fem,l (3) S OL,l ( fl ) =    0, otherwise

were w is the predicted wind speed, αw is a regression parameter obtained in [8] and wcr is the critical wind speed assumed to be 8 [m/s]. The failure rate due to a wind event has a strong relation with the wind intensity, following a quadratic law and for w ≤ wcr the wind contribution to the overall line failure rate is null. A critical wind speed was identified in [8] wcr = 8 [m/s] and indicates a statistical correlation between wind speed w > wcr . This result was found running a regression analysis on historical data on failure occurrences and wind speed at a specific geographical location. Hence, this result holds for a specific power grid and geographical area. Nonetheless, this model is adopted in this work as an example of probabilistic model as it does not affect the comparison between the proposed emulator and high-fidelity risk models. Alternative failure model can be adopted when considered suitable, the interested reader is reminded to [8] for a detailed discussion on this subject.

where a, b, and c ∈ [0, 1] are parameters of the severity model, fl is the active power in a transmission line l, and fem,l is its f maximum admissible flow (emergency rating). The ratio f l is em,l the line percentage of rating which measures the line proximity to its deterministic failure threshold. The severity S OL,l results f zero in the safe region, i.e., for f l < c, whilst it grows linearly em,l

with fl in the near violation region, c ≤ fl

fl fem,l

< 1 and in the

violation region, i.e., 1 ≤ f . The parameters of the severity em,l model are set to c = 0.9, a = 10, and b = −9 as in [7, 10].

A wind speed prediction is considered available but affected by errors. The error probability is modelled and assumed normally distributed with density function:

2.2. A probabilistic model for weather-induced failures Probabilistic risk assessment requires the analyse of the impact of each failure and the estimation of the probability of such event. The failure of a distribution line is modelled as a Poison distribution [9]: nl X   P(Cl |ζ) = 1 − ell ·λl (ζ) · exp(− λ j (ζ) · l j )

fW (w) = √

2

(7)

where µw and σw the mean and standard deviation of the prediction error. To simplify the analysis, but without loss of generality, we assumed the sample w to uniformly affect the whole system.

(4)

j,l, j=1

where P(Cl |ζ) is the probability of facing exactly one failure of the line l and no failures of the remaining lines j , i in the next time interval and given the set of factors ζ, λl (ζ) is the weather-dependent failure rate of the distribution line l in the next time interval and length and ll source is the length of the lth line. Correlation between the failure occurrence rate of transmission lines failures and wind speed intensity exists [8]. High wind speed lead to a higher chance of components tripping and blackouts. Therefore, two environmental conditions are considered, ‘normal weather’ and ‘severe weather’ [19]. The model for normal weather conditions describes weather-independent events, e.g. aging, maintenance, targeted attacks or manufacturing errors in general. The model for severe weather captures failure events which are induced by high wind speed conditions, e.g., friction-induced fatigue on the transmission joint, trees branches falling in the proximity of the line. A high wind speed increases the total failure rate λl for a transmission line l as follows [19]: ( λl,nom i f w ≥ wcr λl = (5) λl,nom + λl,w otherwise

1 − (w−µw ) e 2σ2w 2πσw

2.3. Direct Current Optimal Power Flow The Direct Current (DC) power flow is a linear approximation of the Alternate Current (AC) power flow which accounts for just active power, and neglects transmission loses and reactive power management [20]. It is often used to alleviate the computational cost of numerically intensive codes and it has always a feasible solution. The DC power balance equations can be written as follows [20]: Pk ≈

nb X

Bi,k θi,k

(8)

i=1

were nb is the number of nodes in the network, Pk is the active power injected in the node k, θi,k is the voltage angle difference between nodes i and k and Bi,k is the susceptance of the line connecting nodes i and k. Equation (8) is obtained under three DC assumptions: • Flat voltage profile, i.e., |Vi | = 1 per unit in each node i. 4

• Small voltage angle differences, i.e., sin(θi,k ) ≈ θi,k .

were dlk is the element on the lth row and kth column of the LODF matrix. Equation (10) only approximates of the true post-contingency flows, because the power system is only approximately linear. However, the accuracy of (10) is generally considerable acceptable and power system planners and operators extensively use LODFs in contingency analysis and also to address transmission switching and reconfiguration problems [26]. Thus, we argue that LODFs can be used as surrogates for post-contingency power flows to approximate, efficiently, the overload severity scores. The integrated overload risk in equation (2) can be rewritten for the N − 1 line failures as follows:   nC nl  X X    a · fl k (11) + b · 1{Ak } R(ζ) = P(Ck |ζ)  f em,l k=1 l=1

• Negligible line resistances (Rl ) compared to line reactances (Xl ) , i.e., Rl  Xl . The DC power flow equations can be used to select an optimal production profile which minimizes the production costs for the system. This is commonly referred to as a DC optimal power flow (DC-OPF) defined as follows: min {J(x) : g(x) = 0, h(x) ≤ 0}

x∈[x,x]

(9)

where J(x) is the network cost function which depends on the design variables x (the power generation profile), g(x) is a set of equality constraints (power balance equations), h(x) is a set of inequality constraints, e.g., maximum voltage phases, whereas x and x are the lower and upper bounds on the generators operating powers, respectively. In contrast to a standard AC-OPF, which solution requires iterative method (e.g. Newton-Raphson) program (9) can be solved efficiently using standard linear programming methods. It is worth remarking, however, that DC models are limited in accuracy when compared to higher-fidelity AC models. For instance, in [21] the authors show that the DC model fails to identify some key elements in the grid, despite correlating with AC models. Similarly, limitations of DC model are pointed out in reference [22]. Two types of DC models, the state-dependent (Hot-Start) and the state-independent (Cold-Start) models, are compared. It is pointed out that the accuracy of the state-independent model in different circumstances remains of great concern. This is due to the (generally wrong) assumption of flat voltage profile.

where 1{Ak } is the indicator function for the condition Ak := fl k ≥ c · fem,l , i.e., the lth term in the sum is non zero if the flow in line l, after tripping of line k, exceeds a predefined emergency rating flow threshold.

The risk index is calculated using Eq. (11) with a single power flow analysis required to compute fl for the network in a pre-contingency state. Hence, it provides a great gain in computational efficiency when compared to classical risk assessments which requires solution of nc post-contingency power flows. Higher order N −k failures is not considered in this paper for simplicity in the calculation of LODFs. For an example of LODFs for higher order contingencies the reader is reminded to [27]. 3.2. Numerical implementation An efficient risk assessment framework is proposed by exploiting the approximation capability of Line Outage Distribution Factors. Algorithm 1 summarizes the proposed procedure. First, a prediction for the future environmentaloperational conditions (e.g. a forecast) and the grid data are provided in the set ζ. Then, a pre-contingency DC optimal power flow is solved to estimate the line active flows fk in normal operational conditions. The PTDFs and LODFs are also obtained. After the pre-contingency power flow evaluation, a contingency analysis starts by updating the line failure rates accordingly to the expected wind speed (as described in Section 2.3), probability of failure computed as in equation 4 and post-contingency flows are directly obtained by using the Line-Outage Distribution-Factors dlk for the N-1 line failures as in equation 10. Finally, overload severity for each line and each contingency is obtained and composite overload risk index computed as in equations 3 and 11, respectively.

The assessment of the consequences, the severity, of each contingency listed requires the estimation of the network risk Eq. (2) and the nC solutions of Eq. (9). Clearly, the computational cost becomes quickly prohibitive for a large list of failures and for networks of considerable sizes. 3. The proposed framework 3.1. Line Outage Distribution Factors Power Transfer Distribution Factors (PTDFs) are linear sensitivity measures obtained for a power grid under the DC assumption. PTDFs quantify incremental changes in real power flowing on transmission lines due to a power transfer between two regions of the grid (i.e. two areas, two nodes, etc). Line Outage Distribution Factors (LODFs) can be directly derived from the PTDFs to quantify the ratio of change in line active flows conditional to the removal of transmission lines one-at-atime, i.e., for the N-1 line failures. Specifically, the LODF dlk measures the impact on the active flow in the line l due to the removal, e.g., failure, of transmission line k, see [23]-[24]-[25] for further details. Given the active flows in the undamaged network (the flow fl for line l), the post-contingency flows can be computed directly [14]: fl k = fl + dlk fk

The efficient risk assessment model is embedded within the generalized framework for uncertainty quantification presented in Appendix A. The framework is used to perform classical uncertainty quantification (e.g. aleatory uncertainty propagation, correlation analysis, sensitivity analysis, etc.) or generalized UQ via P-box modelling. The slicing algorithm is used to estimate the effect of mixed sources of uncertainty

(10) 5

on the overall risk index R(ζ), e.g., imprecision and inherent variability affecting the input parameters ζ. As described in Appendix A, the minimum and maximum of the model output R(ζ) must be computed within a slice, i.e., equations (A.5) and (A.6). Genetic Algorithms (GA) are used to solve the min-max problems as they can tackle non-convex optimization problems, i.e., the objective function (the risk) is a non-linear function of ζ and local gradient-based optimization methods are likely to get stuck in local optimums. The optimization procedure starts by sampling randomly an initial population of individuals, i.e. a set of candidate solutions X = {x j : x j ∈ IX (α), j = 1, .., n pop } where n pop is the number of candidates and x j is a subset of uncertain factors in ζ. Then, the set of fittest individuals in X is identified, i.e., based on the min-max risk score of the candidate solutions, and mutation and cross-over operations applied to generate new candidate. The selection, mutation, cross-over and fitness evaluation operations are repeated until a predefined number of iterations nep is reached. The risk of the best candidate solution will approximate the global minimum/maximum of R(ζ). Alternative schemes can be adopted to approximate the global min-max, e.g., the vertex methods, sampling [11] or different global optimisation methods [12]

Algorithm 1 Risk Assessment Model 1: procedure R(x) (Risk Assessment based on LODFs) 2: Input ζ = {L, B, L, w, λ} 3: Run pre-contingency DC-OPF, get fk ∀ lines k 4: Compute dlk 5: for each single-line contingency Cl do 6: Update λl (w) 7: Compute P(Cl |ζ) 8: Compute post-contingency flows fl k = fl + dlk fk 9: 10: 11:

Compute severity score

l=1

S OL,k (Cl , ζ)

end for nl nC P P Compute R(ζ) = P(Ci |ζ) S OL,l (Ci , ζ) i=1

12: end procedure

P-boxes are propagate through the model using the slicing procedure (Algorithm 2) and summarised as follows: Initialization: Input the number n s of α slices, the population number, n pop , the number of generations, nep , and the number of input P-boxes, n. Slicing: 1. Sample randomly α = (α1 , α2 , ..., αn ) where αi ∼ U(0, 1), i = 1, ..., n. Invert input P-boxes and construct the hyperrectangle IX (α), see equation (A.4). 2. Solve min-max problems in equations (A.5)-(A.6). Randomize the initial candidate solutions uniformly in IX (α) and evaluate the population fitness score, R(ζ). Selection, crossover and mutation are performed to generate offspring chromosomes. Evaluate offspring fitness function and repeat in evolutionary procedure until nep is reached. 3. Save the resulting min-max interval [R j , R j ] for the jth slice and repeat steps 1, 2 and 3 until j = n s slices have been evaluated. 4. The collected intervals are used to construct the bounds for the P-box of the grid risk, [F R , F R ]. See equation (A.7) for further details.

nl P

l=1

Algorithm 2 P-box slicing 1: procedure Slicing Method 2: Input n s , n pop , nep and [F Xi , F Xi ] i = 1, ..., n 3: for j = 1 to n s do 4: Randomize α = (α1 , α2 , ..., αn ), αi ∼ U(0, 1) ∀ i 5: for i = 1 to n do −1 6: Invert bounds [F −1 Xi (αi ), F Xi (αi )] 7: end for −1 −1 −1 8: IX (α) : [F −1 X1 (α1 ), F X1 (α1 )] × ... × [F Xn (αn ), F Xn (αn )] 9: Solve constrained min-max problem: R j = max{R(x): x ∈ IX (α)} 10:

11: 12: 13:

To analyse the importance of mixed aleatory-epistemic uncertain input to the risk output we use a generalized sensitivity analysis method proposed in [18]. ‘Pinching’ is used to evaluate the importance of input uncertainties characterized by P-boxes to the output of interest of a computational model. Sensitivity measures are obtained by reducing, i.e., pinching, the uncertainty of the input P-boxes one-at-a-time and quantifying a relative reduction in the output uncertainty. A P-box can be pinched in several ways: 1) it can be pinched to specific CDF, i.e., no epistemic uncertainty is involved; 2) it can be pinched to a precise point-value, i.e., both epistemic nor aleatory uncertainty are removed from the input; 3) it can be

14: 15:

x

R j = min {R(x): x ∈ IX (α)} x end for Compute CDF bounds: ns P F R (y) = n1s 1{y ≤ R j } F R (y) =

1 ns

16: end procedure

6

j=1 ns P j=1

1{y ≤ R j }

pinched to an interval with zero variance, i.e., only epistemic uncertainty will be affecting the input. The bounds of the output P-box obtained after the pinching has been performed must be always included within the bounds obtained before the pinching, i.e., the area δ > δi . This holds independently from the pinching scheme adopted [18]. Figure 1 presents graphically the P-box pinching procedure. Two uncertain input x1 and x2 are modelled using P-boxs and their uncertainty propagated to the output of a computational model y = f (x). The resulting P-box bounds are displayed by black solid lines (bottom plot). The pinching method is then applied to the input x2 which is pinched to a specific CDF, thus cancelling the effect of epistemic uncertainty on this factor. As a consequence the uncertainty (area) of the output P-box reduces. see dotted green line. Similarly, x2 in top right plot is pinched to a crisp point-value, therefore cancelling the effect of both aleatory and epistemic uncertainties on the input. However, this uncertainty reduction has a weaker effect on the uncertainty in y, see dotted marked red line. Accordingly to the pinching method, the output y results more sensitive to the uncertain input factor x2 , i.e. a reduction in the uncertainty in x2 leads to the highest reduction in epistemic uncertainty in y.

is selected as the metric to quantify the P-box uncertainty, [18]: Z +∞ δ= [F R (y) − F R (y)] dy (12) −∞

where δ is the area of the risk P-box obtained from Algorithm 2 and measures the level of (epistemic) uncertainty affecting the risk score without performing any uncertainty reduction on the input bounds [F Xi , F Xi ]. The metric δ is then used to quantify the sensitivity of the output to a reduction in uncertainty for the n inputs, as follows: δ − δ  i i = 1, ..., n (13) Si = δ

where δi is the area after pinching the ith input P-box. An high value S i implies a larger reduction in the P-box area and, thus, indicates a greater importance of of the input i. The interested reader is reminded to Appendix A.2.2 for more details on this subject. 4. Case studies Applicability of the proposed framework is exemplified on popular power grid case studies. In Section 4.2 the computational cost and scalability of Algorithm 1 are discussed by comparison to traditional DC-OPF and AC-OPF methods. In Section 4.3 a classical uncertainty propagation analysis is presented for the New England network and the IEEE 24-Bus Reliability Test System (RTS). Accuracy of the proposed method is discussed in comparison to the results of an AC-OPF risk model. Applicability of generalized UQ framework is tested on a modified version of the IEEE 24-Bus system in Section 4.4. The analyses run on low-spec machine equipped with an Intel(R) Core(TM) i5-3317U CPU 1.70 GHz processor and 4 GB RAM. 4.1. Systems overview The New England, case39, the simplified American Electric Power systems, case57 and case118, the large size Polish grid, case2737sop, and a modified version of the IEEE 24-bus RTS, case24mod, are the selected case studies. The New England system consists of 39 buses, nl = 46 lines and 10 generators. The case57 has 57 buses, 7 generators, nl = 80 lines and 42 loads whilst the case118 has 118 nodes, 19 generators, nl = 186 lines and 91 loads. Both are simple approximations of the American Electric Power system as it was in the early 1960s. The fourth case study, case2737sop, is an model for the Polish power grid. It includes 2737 nodes and nl = 3506 lines. The available data refer to the summer 2004 off-peak conditions. The modified version of the IEEE 24-Bus RTS is used to test generalized UQ capabilities of the proposed framework. Figure 2 depicts the network architecture, the 38 lines connecting the 24 nodes, the main generating units, the transformers and the locations of the 17 load nodes (the nodes with arrows). Table 1 presents the estimated values of the line failure rates λl expressed in occurrences per hour. For synthesis sake, additional details, e.g., design peak loads, value of the

Figure 1: Example of sensitivity analysis on model y = f (x1 , x2 ) by pinching input P-boxes on-at-a-time. The epistemic uncertainty affecting the inputs is removed by pinching x1 and x2 to a crisp point-value and to a precise CDF, respectively. This induces a different reduction in area for the output P-box.

In this work, the area between the lower and upper risk CDFs 7

ID 1 2 3 4 5 6 7 8 9 10 11 12 13

λl [ occ ] h 0.0274 0.0582 0.0377 0.0445 0.0548 0.0434 0.0023 0.0411 0.0388 0.0377 0.0342 0.0502 0.0502

ID 14 15 16 17 18 19 20 21 22 23 24 25 26

λl [ occ ] h 0.0023 0.0023 0.0023 0.0023 0.0457 0.0445 0.0457 0.0594 0.0559 0.0434 0.3801 0.0468 0.0468

ID 27 28 29 30 31 32 33 34 35 36 37 38 -

λl [ occ ] h 0.0468 0.3801 0.0388 0.0365 0.0616 0.0400 0.0400 0.0434 0.0434 0.0388 0.0388 0.0514 -

For the IEEE 24 system, each run of the AC-OPF model requires solving one AC-OPF (pre-contingency dispatch) and 38 AC power flows (post-contingency consequence evaluation). This results in an indicative wall-clock time of 3.8 second. Although this computational cost is relatively small for a single run of a deterministic model, the time to perform the analysis increases dramatically when the model is used to solve classical UQ tasks. Furthermore, the time needed to run classical UQ analysis is generally much lower than the time needed to run generalized UQ analysis (P-box sensitivity analysis and propagation). Differently, the proposed emulator only requires a single DC-OPF run (pre-contingency dispatch) to approximate the severity scores and R(ζ) and only takes 0.043 seconds.

Table 1: The lines Failure rates in frequency of occurance per hour of operation. Table 2: The computational costs for a single run of the proposed risk assessment methods compared to risk assessment based on DC-PF and AC-OPF. For case studies data refer to [28].

lines susceptances, thermal rating, are not reported here. For a detailed description of the power girds, the reader is reminded to [28, 7, 29].

Grid case24 case39 case57 case118 case2737sop

nl = nC 38 46 80 186 3506

LODF 0.043 [s] 0.046 [s] 0.043 [s] 0.047 [s] 3.3 [s]

Time DC-OPF 1.78 [s] 2.1 [s] 3.9 [s] 8.3 [s] > 2.8 [h]

AC-OPF 2.8 [s] 4.4 [s] 6.5 [s] 24.3 [s] > 4.7 [h]

4.3. Accuracy and cassical uncertainty quantification Accuracy of the risk estimator obtained in Algorithm 1 is examined by solving classical uncertainty propagation analyses on the New England network and the modified IEEE 24-Bus RTS. Results of the AC-OPF risk model are used as baseline solution and compared to the overload severity and risk resulting from Algorithm 1. 4.3.1. New England Network Aleatory uncertainty is considered affecting the value of the power demands of the New England network and a probabilistic model provided. We assume loads to be Gaussian distributed with mean values equal to the nominal load values in [28] and standard deviations set to be 0.05 times the means. Variability in the loads inevitably affects the severity scores and the expected consequences when a failure occurs. To quantify this uncertainty, 5000 realizations of load profiles are randomly sampled and the severity scores computed for each one of the listed failures, i.e. the N-1 line failures.

Figure 2: The IEEE 24 nodes reliability test system [29].

4.2. Computational complexity The computational time required by the Algorithm 1 is compared with the computational cost the DC-OPF and ACOPF used to evaluate the severity of single line contingency scenarios, see [16]. nl power flows need to be solved to check the severity of damaged network states. Table 2 shows the wall-clock time required for the four power grids analysed. DC-OPF is computationally cheaper than AC-OPF thanks to the DC approximations and faster solution via linear programming. However, both methods quite expensive when applied to larger grids, e.g. 2.8 hours are required for the risk assessment of the Polish power grid. This is due to the many runs of the power flow model needed to evaluate the consequence of every single failure listed. On the other hand, the proposed approach based on LODF solves the same Polish power grid in only 3.3 seconds.

Figure 3 displays the results of the analysis and the top panel presents the 5000 realization of the network overload severity, nl P S OL,l (Ci , ζ), faced by the grid when one of the 46 line in l=1

the system fails (line ID on the x-axis). Solution obtained via AC-OPF and the results of the proposed method are displayed by red dashed lines and blue solid lines, respectively. In the bottom panel the out-of-sample mean (solid lines) and ±3 standard deviations from the mean (dashed lines) are also presented.

8

Figure 3: Overload severity results for the New England system. A comparison between high-fidelity AC-OPF model and Algorithm 1.

It can be observed that the proposed method captures the global trend in the severity score, although approximation errors are affecting some of the failure events, e.g. the failure of lines 14 or 35, resulting in an underestimation/overestimation of the overload severity. This is possibly due to the limited accuracy of the method in scenarios of high stress for the gird, i.e., the DC assumptions hold weakly when the voltage angles differences between adjacent nodes are not small and/or when voltage profile results non-flat. Nonetheless, the proposed method guarantees a good approximation for the true severity score for most of the contingencies, and with the great advantage of substantially speeding up the analysis. In fact, the ACOPF model required more than 6 hours to process the 5000 samples on a relatively small power grid and, differently, the proposed model only requires an average of 4 minutes (approximately).

Figure 4: PDFs of R(ζ) using the proposed risk model (in blue) and the AC-OPF model (in red).

Correlation

1

Surrogate LODF Risk model

0

-1 0

Loads 5

Correlation

1

10

15

20

25

20

25

High-fidelity AC Risk model

0

-1 0

Loads 5

10

15

Figure 5: Persons’ correlation coefficients between R(ζ) and loads for the proposed risk emulator and the AC-OPF model.

4.3.2. IEEE 24-Bus RTS Aleatory uncertainty is assumed affecting the loads and the wind speed for the IEEE 24 system. We assume loads to be Gaussian distributed with mean values equal to the nominal load values in [28] and standard deviations set to be 0.5 times the values of the nominal demands. A prediction for the wind speed is considered and assumed to be normally distributed with mean µw = 6 and standard deviation σw = 1. Monte Carlo simulation is used to propagate the uncertainty in the loads and in the wind speed and quantify their effect on the risk, the result of Algorithm 1. Figure 4 presents the PDFs of the risk, obtained from 5000 realizations of the aleatory uncertain variables. The proposed model and the high-fidelity models lead to similar risk result with PDF moments, µ = 0.0186 and µ = 3.487 × 10−4 , for the proposed method, and µ = 0.0191 and σ = 3.503 × 10−4 for the AC-OPF model. However, the proposed method slightly underestimate the average risk and extreme cases. This can be seen as a limitation of the proposed model, which is incapable of fully capture all the relevant grid behaviors, possibly due to the limitations of DC assumptions discusses in Sections 4.1. Nonetheless, this result shows a good approximation capability of the risk model with 98% reduction in computational cost (i.e. the evaluation of the 1500 samples requires 483 seconds using the proposed model against the,

approximately, 6 hours needed for the high-fidelity model). Correlation analysis is also proposed to investigate the major differences between the proposed approach and the highfidelity model. The Persons’ correlation coefficients linking R(ζ) to the 17 loads have been estimated and graphically presented in Figure 5. The input-output correlations of the two models results qualitatively in good agreement. Both models identify a positive correlation between the overload risk and the loads in the southern part of the grid, i.e. nodes from 1-10 (with a little discrepancy in node 9). A negative correlation has been observed with the load in node 18, which indicates a reduction in the overload in post-contingency risk for increasing demand in node 18. This result is in good agreement with previous analysis [30] and can be explained by looking at the generators production profile. Generators in the upper part of the grid (e.g. generators in nodes 13, 15, 18) are associated with lower generation costs. This lead to the maximum exploitation of their production capacity, independently from the load profile realization. Consequently, when electrical power is demanded (and consumed) locally, less power will be flowing from the northern area to the southern area of the network, thus reducing the risk of facing more severe post-contingency overflow scenarios. 9

4.4. Generalized uncertainty quantification analysis The 17 loads, the 38 failure rates of the lines λ and the wind speed w of the IEE 24-Bus RTS are assumed affected by mixed aleatory-epistemic uncertainties and the generalized UQ framework is adopted to quantify their separate effects. Probability boxes are used to characterize the uncertainty affecting the loads and failure rates. In general, only a small amount of failure data (or no data at all) is available for reliable components. Thus, epistemic uncertainty is assumed affecting statistical indicators of the lines failure rates which true (unknown) value lay within an interval [λl ,λl ]. The lower and upper bounds are defined equal to ±2% the values in Table 1. Differently, the loads are inherently variables (e.g. the demand changes from day to night and from winter to summer time) and, commonly, a large volume of historical time series are available to build probabilistic models. However, due to recent trends in the power grid sector (e.g. allocation of renewable sources, distributed generators, etc.), the reliability of the data and probabilistic models describing the load volatility diminished. Thus, the loads are assumed affected by both aleatory and epistemic uncertainty (e.g. due to a lack of knowledge on the effect of new technological developments) and P-boxes used to model this uncertainty. The distribution family of the parent distribution is assumed to be Gaussian and imprecision affects the mean and standard deviation. For each load, the mean value µLi of the load distribution is assumed to be ±2% of a design peak load, presented in [29]. Similarly, the standard deviations are assumed to be imprecisely defined within the intervals [σLi ,σLi ] = [0.05µ , 0.05µLi ], i = 1, .., 17. Imprecision is Li assumed affecting the mean of the wind speed µw which lays within the error interval [5.4,6.6].

100 slices No epistemic

Figure 6: Comparison between the CDF obtained considering only aleatory uncertainty (black solid line) and P-boxes obtained for 20 slices (blue dotted lines) and 100 slices (red dashed lines).

This means that in the worst case 30% of possible scenarios will not comply with the risk requirement. The classical UQ results (1%) strongly underestimate the true worst-case scenario obtained by generalized UQ (30%). The area of the risk risk P-box obtained from n s = 100 slices result δ = 0.0057. To check the goodness of this result we run 10 independent run of the slicing method using the same parameters resulting in an average error on δ of approximately 4.7%. This relatively small error is due to an early truncation of the GA optimization (at 100 epochs) and to the low number of slices. 4.4.2. Results of the generalized sensitivity analysis Sensitivity analysis on the epistemic space is performed by pinching the loads P-boxes to their mean empirical CDFs, i.e., the average between upper and lower P-box bounds. Similarly, line failure rates have been pinched to a reduced interval (i.e. an improved estimator, from the initial ±2% to a ±1% of the estimated λl ). The reduced epistemic space has been propagated through the risk model by the slicing method, Algorithm2, setting n s = 50 slices, n = 55 inputs, i.e. 17 loads Li and 38 failure rates λl , and the GA parameters to n pop = 10 and nep = 50.

4.4.1. Results of the slicing method The slicing method, Algorithm 2, propagates the input P-boxes to the output of the risk model. Two independent propagation analyses are performed using 20 and 100 slices, respectively. The GA parameters are selected empirically on the basis of earlier works [14]. A starting population of 10 individuals and 100 iterations generally lead to a risk index close to convergence. Hence, n pop = 10 and nep = 100 are selected for the risk minimization and maximization routine. Figure 6 shows the P-boxes of the overload risk index obtained from two runs of Algorithm 2 with n s = 20 samples (Blue dotted lines) and n s = 100 slices (red dashed lines). As expected, the P-box bounds include the classical result (black solid line CDF) which are neglecting epistemic uncertainty. This result shows a possible underestimation of the risk if imprecision is not accounted for, for instance, consider a case where the system operator imposes a risk requirement R(ζ) ≤ 0.023. The probability of meeting this requirement is P[R(ζ) ≤ 0.023] = 0.99 accordingly to classical analysis. This can be interpreted as a very good result, in fact, only 1% of the possible scenarios will have an unacceptably high risk. If imprecision is accounted for, however, the likelihood of compliance with this risk requirement is P[R(ζ) ≤ 0.023] ∈ [0.7, 1].

It is worth remarking that P-box sensitivity can be time consuming, i.e. the computational complexity is equal to n times the (high) cost of the slicing methods. Thus, a highperformance computing strategy has been designed to forward each α-slice to a core of a 8-cores computers cluster. Thanks to the efficient risk assessment framework, Algorithm 1, the overall computational time for the analysis was about 13.3 hours. The same analysis would have been infeasible using the high-fidelity model (indicatively 20 days). Figure 7 shows the effect on the R(ζ) P-box bounds when pinching the loads and Table 3 displays ranking results based on their sensitivity score S i , i.e., a P-box area reduction as in equation (13). The sensitivity analysis points out that the uncertainty affecting load in node 18 has the largest contribution to the risk of epistemic uncertainty. This result agrees with previous analysis [14]. Similarly, the epistemic uncertainty affecting the loads at 10

Not Pinched Pinched Figure 7: Risk probability bounds before and after pinching the 17 loads. For each α level, the input load is pinched to the mean between upper and lower bounds.

specification of the network risk.

Table 3: Ranks of the load imprecision with respect to S i .

Load ID L18 L19 L3 L14 L15 L20 L13 L1 L2

Si 25.8% 20.2% 13.9% 13.6% 11.8% 11.7% 10.8% 8.6% 8.6%

rank 1 2 3 4 5 6 7 8 9

Load ID L16 L5 L7 L10 L4 L6 L9 L8

Si 8.5% 7.9% 7.1% 6.4% 6.1% 6% 5% 1%

rank 10 11 12 13 14 15 16 17

5. Conclusion A generalized uncertainty quantification framework has been proposed to quantify the effect of mixed aleatory-epistemic sources of uncertainty affecting the power grid risk. A computational efficient approach has been developed and verified. The results from four realistic power grid cases have demonstrated the applicability and computational efficiency of the proposed approach. Specifically, a speed up factor up to two order of magnitudes have been observed for the proposed method. The estimated risks obtained by the proposed approach are comparable in accuracy with those obtained from the AC-OPF and DC-OPF.

nodes 19, 3 and 14 has also a sensible effect on the risk imprecision. A reduction in the uncertainty in the load at node 18 leads to a substantial reduction in the risk P-box are, approximatively 25.8%. Similarly, the pinching of loads at nodes 19, 3 and 14 leads to a reduction in the risk uncertainty of, respectively, 20.2%, 13.9% and 13.5%. This is a piece of valuable information that can be used, for instance, to define better data collection strategies and uncertainty reduction schemes. For this case study, better monitoring of loads ad nodes 18 and 19 must be considered or, for instance, a dedicated control strategy prescribed to reduce their uncertainties, e.g., by peak shaving or by load curtailment. The pinching of other factors, i.e., the other loads and the failure rates, lead to a reduction in the risk uncertainty between 1% and 10 %. Hence, it is not advisable to dedicate resources in reducing the epistemic uncertainty affecting these factors, e.g., better monitoring and data collection. In this example, a reduction in the uncertainty of other loads and failure rates does not offer an immediate gain in terms of better

The generalized uncertainty quantification framework has been tested on the IEEE 24-Bus reliability test system by propagation and sensitivity analysis of probability boxes. Generalized sensitivity analysis identified two of the loads as the key epistemic uncertain factors preventing a crisp estimation of the grid risks. The proposed framework proved to be a valuable tool to efficiently compute power grid overload risks and to identify key sources of epistemic (reducible) uncertainty. This is valuable information which can be used to: 1) design smarter data collection schemes, 2) prescribe an uncertainty reduction scheme thus providing less uncertain risk scores, 3) update the risk assessment model, define better operational policies based on a greater awareness of associated risks, and different sources of uncertainty, i.e., epistemic, thus reducible in principle, and aleatory uncertainty. 11

Declaration of Competing Interests

[19] R. Rocchetta, E. Zio, E. Patelli, A power-flow emulator approach for resilience assessment of repairable power grids subject to weather-induced failures and data deficiency, Applied Energy 210 (2018) 339 – 350. [20] D. Van Hertem, J. Verboomen, K. Purchala, R. Belmans, W. Kling, Usefulness of dc power flow for active power flow analysis with flow controlling devices, in: AC and DC Power Transmission, 2006. ACDC 2006. The 8th IEE International Conference on, 2006, pp. 58–62. [21] J. Li, L. Dueas-Osorio, C. Chen, C. Shi, Ac power flow importance measures considering multi-element failures, Reliability Engineering & System Safety 160 (2017) 89 – 97. [22] S. M. Fatemi, S. Abedi, G. B. Gharehpetian, S. H. Hosseinian, M. Abedi, Introducing a novel dc power flow method with reactive power considerations, IEEE Transactions on Power Systems 30 (2015) 3012–3023. [23] Y.-C. Chang, W.-T. Yang, C.-C. Liu, Improvements on the line outage distribution factor for power system security analysis, Electric Power Systems Research 26 (1993) 231 – 236. [24] J. Guo, Y. Fu, Z. Li, M. Shahidehpour, Direct calculation of line outage distribution factors, IEEE Transactions on Power Systems 24 (2009) 1633–1634. [25] G. Chen, Y. Dai, Z. Xu, Z. Dong, Y. Xue, A flexible framework of line power flow estimation for high-order contingency analysis, International Journal of Electrical Power & Energy Systems 70 (2015) 1 – 8. [26] S. A. Sadat, M. Sahraei-Ardakani, Reducing the risk of cascading failures via transmission switching, 2018. arXiv:1810.00651. [27] T. Guler, G. Gross, M. Liu, Generalized line outage distribution factors, IEEE Transactions on Power Systems 22 (2007) 879--881. [28] R. D. Zimmerman, C. E. Murillo-Sanchez, R. J. Thomas, Matpower: Steady-state operations, planning, and analysis tools for power systems research and education, IEEE Transactions on Power Systems 26 (2011) 12--19. [29] P. Wong, P. Albrecht, R. Allan, R. Billinton, Q. Chen, C. Fong, S. Haddad, W. Li, R. Mukerji, D. Patton, A. Schneider, M. Shahidehpour, C. Singh, The ieee reliability test system-1996. a report prepared by the reliability test system task force of the application of probability methods subcommittee, Power Systems, IEEE Transactions on 14 (1999) 1010--1020. [30] R. Rocchetta, E. Patelli, B. Li, G. Sansavini, Effect of load-generation variability on power grid cascading failures, in: European Safety and Reliability Conference (ESREL 2018), 2018. [31] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, 1976. [32] Y. Ben-Haim, Uncertainty, probability and information-gaps, Rel. Eng. & Sys. Safety 85 (2004) 249--266. [33] L. Utkin, S. Destercke, Computing expectations with continuous p-boxes: Univariate case, International Journal of Approximate Reasoning 50 (2009) 778 -- 798. [34] R. Zhong, Z. Zong, J. Niu, Q. Liu, P. Zheng, A multiscale finite element model validation method of composite cable-stayed bridge based on probability box theory, Journal of Sound and Vibration 370 (2016) 111 -- 131. [35] C. Simon, F. Bicking, Hybrid computation of uncertainty in reliability analysis with p-box and evidential networks, Reliability Engineering & System Safety 167 (2017) 629 -- 638. Special Section: Applications of Probabilistic Graphical Models in Dependability, Diagnosis and Prognosis. [36] T. Ali, H. Boruah, P. Dutta, Modeling uncertainty in risk assessment using double monte carlo method, International Journal of Engineering and Innovative Technology 1 (2012) 114--118. [37] H. Zhang, Interval importance sampling method for finite element-based structural reliability assessment under parameter uncertainties, Structural Safety 38 (2012) 1 -- 10. [38] J. Sadeghi, M. De Angelis, E. Patelli, Analytic

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] Q. Chen, J. D. McCalley, Identifying high risk n-k contingencies for online security assessment, IEEE Transactions on Power Systems 20 (2005) 823–834. [2] D. Gan, R. J. Thomas, R. D. Zimmerman, Stability-constrained optimal power flow, IEEE Transactions on Power Systems 15 (2000) 535–540. [3] N. Amjady, H. Sharifzadeh, Security constrained optimal power flow considering detailed generator model by a new robust differential evolution algorithm, Electric Power Systems Research 81 (2011) 740 – 749. [4] H. Harsan, N. Hadjsaid, P. Pruvot, Cyclic security analysis for security constrained optimal power flow, IEEE Transactions on Power Systems 12 (1997) 948–953. [5] Z.-L. Gaing, R.-F. Chang, ’security-constrained optimal power flow by mixedinteger’, Proc. IEEE Power Engineering Society General Meeting (2006) 1–2. Cited By 2. [6] J. McCalley, S. Asgarpoor, L. Bertling, R. Billinion, H. Chao, J. Chen, J. Endrenyi, R. Fletcher, A. Ford, C. Grigg, G. Hamoud, D. Logan, A. Meliopoulos, M. Ni, N. Rau, L. Salvaderi, M. Schilling, Y. Schlumberger, A. Schneider, C. Singh, Probabilistic security assessment for power system operations, in: Power Engineering Society General Meeting, 2004. IEEE, 2004, pp. 212–220 Vol.1. doi:10.1109/PES.2004.1372788. [7] F. Xiao, J. McCalley, Power system risk assessment and control in a multiobjective framework, Power Systems, IEEE Transactions on 24 (2009) 78–85. [8] J. McCalley, F. Xiao, Y. Jiang, Q. Chen, Computation of contingency probabilities for electric transmission decision problems, in: Intelligent Systems Application to Power Systems, 2005. Proceedings of the 13th International Conference on, 2005, pp. 540–545. doi:10.1109/ISAP.2005.1599321. [9] M. Ni, J. McCalley, V. Vittal, T. Tayyib, Online risk-based security assessment, IEEE Transactions on Power Systems 18 (2003) 258–265. Cited By 243. [10] R. Rocchetta, Y. Li, E. Zio, Risk assessment and risk-cost optimization of distributed power generation systems considering extreme weather conditions, Reliability Engineering & System Safety 136 (2015) 47 – 61. [11] R. Rocchetta, M. Broggi, E. Patelli, Do we have enough data? robust reliability via uncertainty quantification, Applied Mathematical Modelling 54 (2018) 710 – 721. [12] M. Beer, S. Ferson, V. Kreinovich, Imprecise probabilities in engineering analyses, Mechanical Systems and Signal Processing 37 (2013) 4–29. [13] E. Patelli, M. Broggi, M. Angelis, M. Beer, Opencossan: An efficient open tool for dealing with epistemic and aleatory uncertainties, in: Vulnerability, Uncertainty, and Risk, American Society of Civil Engineers, 2014, pp. 2564–2573– . URL: http://dx.doi.org/10.1061/9780784413609.258. doi:10.1061/9780784413609.258. [14] R. Rocchetta, E. Patelli, Imprecise probabilistic framework for power grids risk assessment and sensitivity analysis, in: Risk, Reliability and Safety: Innovating Theory and Practice, 2016, pp. 2789–2796. doi:10.1201/9781315374987-424. [15] S. Ferson, V. Kreinovich, L. Ginzburg, D. S. Myers, K. Sentz, Constructing probability boxes and Dempster-Shafer structures, volume 835, Sandia National Laboratories, 2002. [16] R. Rocchetta, E. Patelli, Assessment of power grid vulnerabilities accounting for stochastic loads and model imprecision, International Journal of Electrical Power & Energy Systems 98 (2018) 219 – 232. [17] R. Sch˝obi, B. Sudret, Structural reliability analysis for p-boxes using multi-level meta-models, Probabilistic Engineering Mechanics 48 (2017) 27 – 38. [18] S. Ferson, W. T. Tucker, Sensitivity analysis using probability bounding, Reliability Engineering & System Safety 91 (2006) 1435 – 1442. The Fourth International Conference on Sensitivity Analysis of Model Output (SAMO 2004).

12

imprecise-probabilistic structural reliability difficult to judge a priori [11]). If the information is abundant, analysis, 2018. classical probabilistic methods can be suitable to quantify the [39] D. A. Alvarez, J. E. Hurtado, J. Ramrez, Tighter uncertainty, e.g., joint probability distribution function (PDF) bounds on the probability of failure than those and/or random processes for modeling and uncertainty propaprovided by random set theory, Computers & Structures 189 (2017) 101 -- 113. gated via plain Monte Carlo simulation. However, if the in[40] Z. Qiu, Y. Xia, J. Yang, The static displacement formation is limited, both aleatory and epistemic uncertainties and the stress analysis of structures with bounded will have a relevant role in the analyses. In these cases, the uncertainties using the vertex solution theorem, use of classical probabilistic approaches may lead to a incorrect Computer Methods in Applied Mechanics and Engineering 196 (2007) 4965--4984. Cited By 56. quantification of the uncertainty due to subjective assumptions [41] R. C. Williamson, T. Downs, Probabilistic arithmetic. needed to model epistemic uncertainty (imprecision) in a purely i. numerical methods for calculating convolutions probabilistic setting [11]. Examples are a subjective definition and dependency bounds, International Journal of of a probability distribution family given just a few samples Approximate Reasoning 4 (1990) 89 -- 158. [42] E. Patelli, D. A. Alvarez, M. Broggi, M. de Angelis, or assumptions on the probabilistic behavior of interval-valued An integrated and efficient numerical framework for quantities, e.g., assuming a uniform distribution. These modeluncertainty quantification: application to the nasa ing choices provide additional information to the analysis, poslangley multidisciplinary uncertainty quantification sibly leading to better results, i.e., less uncertain. However, this challenge, 16th AIAA Non-Deterministic Approaches Conference (SciTech 2014) (2014) 2014--1501. does not reflect the current state of knowledge nor reveals the [43] R. Sch} obi, B. Sudret, Global sensitivity analysis true level of uncertainty of the results. Differently, generalized in the context of imprecise probabilities (p-boxes) UQ methods have been introduced to deal with mixed types of using sparse polynomial chaos expansions, Reliability Engineering & System Safety (2018). uncertainties. Generalized UQ methods generally require less [44] L. G. Crespo, S. P. Kenny, D. P. Giesy, The artificial assumptions to model epistemic uncertainty. A few NASA Langley Multidisciplinary Uncertainty examples of most widely applied mathematical tools for generQuantification Challenge, ???? URL: alized UQ are Dempster-Shafer structures [31], Fuzzy variables https://arc.aiaa.org/doi/abs/10.2514/6.2014-1347. doi:10.2514/6.2014-1347. arXiv:https://arc.aiaa.org/doi/pdf/10.2514/6.2014-1347. [12], info-gaps [32] and P-Boxes [15]. [45] C. Safta, K. Sargsyan, H. N. Najm, K. S. Chowdhary, B. Debusschere, L. P. Swiler, M. S. Eldred, Appendix A.1. Probability boxes Uncertainty quantification methods for model calibration validation and risk analysis. (2013). A Probability box is a powerful mathematical tool often used [46] K. Buren, F. Hemez, Robust decision making applied to the nasa multidisciplinary uncertainty to characterize uncertain factors affected by mixed sources of quantification challenge problem, volume 12, 2014. aleatory and epistemic uncertainty. P-boxes have grown in doi:10.2514/6.2014-1350. popularity thanks to their versatility and powerful uncertainty [47] N. Pedroni, E. Zio, Hybrid uncertainty and sensitivity quantification capability [15, 33], and are used in this work to analysis of the model of a twin-jet aircraft, Journal of Aerospace Information Systems 12 (2015) 73--96. quantify the different effects of inherent variability and lack of [48] M. Oberguggenberger, J. King, B. Schmelzer, Classical knowledge on the output of a power grid risk model. P-boxes and imprecise probability methods for sensitivity have been recently applied in reliability analysis [34], [17], analysis in engineering: A case study, International Journal of Approximate Reasoning 50 (2009) 680 -- 693. [12], in the context of evidential networks [35], and to quantify Imprecise Probability Models and their Applications uncertainty in power grids [11], [16]. (Issues in Imprecise Probability). [49] J. M. Aughenbaugh, C. J. J. Paredis, Probability A P-box is defined as a pair of cumulative distribution funcbounds analysis as a general approach to sensitivity analysis in decision making under uncertainty, in: tions (CDFs) bounding the probability between upper and lower SAE World Congress & Exhibition, SAE International, bounds. The underlying distribution family can be defined 2007. URL: https://doi.org/10.4271/2007-01-1480. (i.e. distributional/parametric P-box) or only the bounds on doi:https://doi.org/10.4271/2007-01-1480. the CDFs are available (i.e. distribution-free/non-parametric [50] J. Hall, Uncertainty-based sensitivity indices for imprecise probability distributions, Reliability P-boxes). Fig. A.8 presents an example of parametric P-box Engineering & System Safety 91 (2006) 1443--1451. modelling the uncertainty affecting a quantity x. The family [51] S. Bi, M. Broggi, P. Wei, M. Beer, The bhattacharyya of the distribution is known to follow a Gaussian N(µ, σ) but, distance: Enriching the p-box in stochastic due to lack of knowledge, only intervals are available for the sensitivity analysis, Mechanical Systems and Signal Processing 129 (2019) 265 -- 281. first two moments, i.e. the mean µ ∈ [µ, µ] and the stan-

dard deviation σ ∈ [σ, σ] are imprecisely defined. P-boxes can be constructed from data in different ways, for more details refer to [15]. Before defining mathematically a P-box, let first recall the definition of Cumulative Distribution Functions (CDFs), which is a non decreasing mapping from R to [0,1] such that for a probability measure P and for each x ∈ R, the following F X (x) = P((−∞, x]) holds. P-boxes is a pair of CDFs, [F X , F X ], such that F X dominates F X probabilistically. A P-box can be viewed as a continuous form of random set and defined

Appendix A. A generalized framework for UQ Aleatory and epistemic are two different types of uncertainties, sometimes referred to as inherent variability and lack of knowledge, respectively. Epistemic uncertainty can be reduced by further data collection, at least in principle. Epistemic uncertainty will result in negligible effect when the available information is abundant and of good quality (although this is generally 13

random numbers α = (α1 , ..., αn ), where αi ∼ U(0, 1), i = 1, ..., n are probabilistic levels associated to the n input P-boxes. An α-slice is then used to obtain an interval-valued sample of the uncertain quantities x as follows:

P-box 1

−1

F Xi (αi ) = {x|F Xi (x) = αi }, αi ∈ [0, 1]

(A.2)

F −1 Xi (αi ) = {x|F Xi (x) = αi }, αi ∈ [0, 1]

(A.3)

−1

where F Xi (αi ) and F −1 Xi (αi ) are the inverse CDF of the upper and lower bound of the P-box characterizing the uncertain input xi . −1 The interval [F −1 Xi (αi ), F Xi (αi )] is a focal element, support for the ith input. Cartesian product of the n focal elements defines an hyper-rectangle, i.e., an n-orthotope:

0

x

−1

Figure A.8: An example of distribution P-box with Gaussian distribution family N and interval mean, µ, and standard deviation, σ.

where IX (α) is the analogous of a Monte Carlo sample in the slicing method. Once a sample IX (α) is obtained, the intervalvalued response of the output can be computed by solving two optimization problems constrained in IX (α):

as follows [16]: {P : ∀x ∈ R, F(x) = P((−∞, x]) ≤ F(x)}

−1

−1 IX (α) : [F −1 X1 (α1 ), F X1 (α1 )] × ... × [F Xn (αn ), F Xn (αn )] (A.4)

(A.1)

y(α) = max{R(x): x ∈ IX (α)}

(A.5)

where equation (A.1) defines the Credal set induced by the Pbox [F X , F X ].

y(α) = min {R(x): x ∈ IX (α)}

(A.6)

x

x

where y(α) and y(α) are, respectively, the maximum and minimum of the model output associated to the slice α. The sampling procedure is repeated until the predefined number of hyper-rectangles and output intervals, n s , is obtained. Then, the bounds of the output P-box are approximated by empirical CDFs computed as follows:

Appendix A.2. Uncertainty propagation for P-boxes Monte Carlo simulation is often used in classical UQ frameworks to quantify the uncertainty affecting the outputs of a computational model by propagating point-valued realization of the inputs, i.e. using the well-known inverse transform method. Similarly, uncertainty can be propagated in a generalized UQ framework using P-boxes to model the uncertainty. Specifically, the input P-boxes can be propagated to the output of a computational model y = R(x) , a risk assessment model in this work, which maps an input vector x ∈ Rn of length n to an output response of interest y ∈ Rny . Propagation of P-boxes trough R(x) can be performed using different methods, e.g., double loop Monte Carlo [36], importance sampling method [37], slicing method [11], multi-level meta-models [17] and, for special classes of problems, analytical methods [38]. The authors of [39] shows that computing probability bounds solving reliability-based-design-optimization problems can lead to tighter results when compared to the one obtained via random set theory. In this work, the slicing method is the selected due to its versatility, effectiveness and straightforward applicability to a wide range of models [14].

F Y (y) =

ns 1 X 1{y ≤ y j } n s j=1

F Y (y) =

ns 1 X 1{y ≤ y } j n s j=1

(A.7)

where 1{A} is the indicator function (equal 1 when condition A is met, 0 otherwise) and yi and y are, respectively, the upper i and lower bound associated to the jth slice. To ease the notation y j = y(α( j) ). Depending on the problem at hand, different techniques can be used to compute [y , y j ] , for instance, sampling j methods [11], the vertex method [40], interval arithmetic [41] or a global optimization strategies [42]. In this work, a Genetic Algorithm global optimization scheme is adopted to compute the minimum and maximum of the model response of interest. This method is selected due to non-convexity of the risk output in the uncertainty space, which complicates the applicability of local gradient-based optimization methods. For further details refer to 3.1.

Appendix A.2.1. The slicing method The slicing method shares some similarity with a plain Monte Carlo, differently however, propagation trough slicing uses interval-valued samples from the input P-boxes rather than point-valued samples. In the slicing procedure n s slices {α(1) , ..., α(ns ) } are randomly obtained and used to invert the bounds of the input P-boxes. Each sample α, i.e, an ‘α-cut’ or ‘α-slice’, is a n-dimensional vector of uniformly distributed

Appendix A.2.2. Generalized sensitivity analysis Sensitivity analyses quantify the importance of inputs for the model outputs of a computational model. Variance-based sensitivity measures, e.g. Sobol’s indices, have been used to identify key factors driving the output uncertainty. These measures quantify a reduction in the aleatory uncertainty of model outputs (uncertainty defined in terms of variance) due to a less 14

uncertain specification of the inputs. Unfortunately, the application of a variance-based concept within a generalized UQ framework is not a straightforward task because a suitable metric must be chosen (comparable to the variance in a classical setting) to quantify mixed aleatory-epistemic uncertainty. A limited number of works investigated the applicability of sensitivity analysis methods within generalized UQ frameworks. Sch˝obi and Sudret [43] investigated global sensitivity analysis using sparse polynomial chaos expansion to reduce the computational cost of the analysis. Ferson and Tucker [18] proposed a sensitivity analysis method by P-box ’pinching’ and convolution. Patelli et al. proposed an extension of global sensitivity analysis and Sobol indices estimation for mixed uncertainty [42]. The developed approach and a sensitivity analysis method based on Hartley-like measure of non-specificity was used to solved the the NASA uncertainty quantification problem [44]. Several other groups attempted solution to this challenging sensitivity analysis problem, see for instance [45, 46, 47]. Oberguggenberger et al. compared imprecise sensitivity analysis methods (random sets, fuzzy sets interval bounds) to classical technologies [48]. Aughenbaugh et al. introduced a generalized sensitivity analysis methods for Probability Bounds Analysis in [49]. Hall investigates generalized UQ where inputs uncertainty are expressed as closed convex sets of probability measures [50]. Recently, Bi et al discussed the role of Bhattacharyya distance as a measure for P-box uncertainty to be applied for sensitivity analysis [51]. Propagation of P-boxes and generalized sensitivity analysis are computationally intensive, possibly intractable for expensive R. The high computational cost is mostly driven by the min-max mapping in Eqs. (A.5) and(A.6) and the wall-clock time needed to run R. Thus, we propose a computationally cheap emulator of the time-costly high-fidelity model R. This is an important task needed to reduce the computational burden of this analysis.

15