Electrical Power and Energy Systems 88 (2017) 1–12
Contents lists available at ScienceDirect
Electrical Power and Energy Systems journal homepage: www.elsevier.com/locate/ijepes
Probabilistic load flow calculation with quasi-Monte Carlo and multiple linear regression Xiaoyuan Xu ⇑, Zheng Yan Key Laboratory of Control of Power Transmission and Conversion, Ministry of Education, Department of Electrical Engineering, Shanghai Jiao Tong University, No. 800, Dongchuan Road, Shanghai, China
a r t i c l e
i n f o
Article history: Received 6 June 2016 Received in revised form 6 November 2016 Accepted 28 November 2016
Keywords: Correlation Multiple linear regression Non-normal distribution Probabilistic load flow Quasi-Monte Carlo Wind power
a b s t r a c t In this paper, quasi-Monte Carlo combined with multiple linear regression (QMC-MLR) is proposed to solve probabilistic load flow (PLF) calculation. A distinguishing feature of the paper is that PLF is approached by a low-dimensional problem with the concept of the effective dimension, and thus QMC based on low-discrepancy sequences is used to improve the sampling efficiency of the Monte Carlo simulation (MCS). Moreover, according to the relationship between linear correlation and linear regression, the MLR-based correlation control technique is developed to arrange the orders of samples in order to introduce prescribed dependences between variables. The proposed method is tested with the IEEE 118-bus system. Simulation results indicate that the MLR-based technique is robust and efficient in handling correlated non-normal variables and the proposed method shows better performances in PLF calculation compared with other MCS techniques, including simple random sampling (SRS), Latin hypercube sampling (LHS) and Latin supercube sampling (LSS). Ó 2016 Elsevier Ltd. All rights reserved.
1. Introduction Following the increased use of renewable energies, the uncertainties linked to their resources have had an increasing impact on the operation of power systems. Deterministic methods cannot fully depict the characteristics of systems and stochastic approaches are now attracting more attention. Probabilistic load flow (PLF) is an efficient stochastic tool to analyze the steady state of power systems considering various random variables [1–6]. Monte Carlo simulation (MCS) is an important approach to solve PLF, and simple random sampling (SRS) is usually used to check the accuracy of other methods. However, SRS requires large samples in order to obtain satisfactory results, thus various methods are proposed to improve the efficiency of MCS. Typical techniques include antithetic variables, control variables, conditional Monte Carlo, importance sampling (IS), stratified sampling, quasiMonte Carlo (QMC) and so on [7]. Abbreviations: ANOVA, analysis of variation; CD, Cholesky decomposition; GA, genetic algorithm; IS, importance sampling; LDS, low-discrepancy sequence; LHS, Latin hypercube sampling; LSS, Latin supercube sampling; MCS, Monte Carlo simulation; MLR, multiple linear regression; NORTA, normal to anything model; PLF, probabilistic load flow; QMC, quasi-Monte Carlo; RP, random permutation; SRS, simple random sampling. ⇑ Corresponding author. E-mail address:
[email protected] (X. Xu). http://dx.doi.org/10.1016/j.ijepes.2016.11.013 0142-0615/Ó 2016 Elsevier Ltd. All rights reserved.
IS is useful in the estimation of rare-event probabilities [8], and the cross-entry method, a classic IS technique, has been applied in reliability evaluation [9]. IS can accurately capture the means of system states but it is limited in analyzing probability distributions of output variables. Latin hypercube sampling (LHS) is a stratified sampling method and has been widely used in PLF. LHS consists of sampling and permutation [10]. Random permutation (RP), the basic permutation technique, may introduce undesired correlations between samples of independent variables. Hence Cholesky decomposition (CD) is adopted in [4] to diminish the spurious dependences and the resulting LHS-CD obtains more accurate results than LHS-RP under the same sample sizes. QMC, which is based on low-discrepancy sequences, yields a faster convergence rate than MCS using random sequences and has already been used in some stochastic problems such as probabilistic small signal stability analysis [11]. But its advantage degenerates for high dimensions and thus few researches discuss its performance in PLF. Recently, Latin supercube sampling (LSS), which combines QMC with LHS, has been used to solve PLF and shows better performances than SRS and LHS [12]. Besides the sampling strategy, introducing desired correlations between dependent variables is also important. Non-normally distributed variables are frequently used for the detailed modelling of uncertainties in power systems. However, inducing correlations between non-normal variables is more difficult and the most
2
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
popular method is the Normal to Anything (NORTA) model [13,14]. SRS-NORTA is used in [15] to analyze the impact of wind production on locational marginal prices. LHS-NORTA is developed in [16] to tackle PLF with dependent input variables. The main shortcoming of NORTA is that it becomes infeasible with the increase of correlation matrix dimensions [17]. Another important type of correlation control method is the intelligent optimization algorithm. In [18], a population-based algorithm, genetic algorithm (GA), is designed to control correlations by treating the arrangement of samples as a combinatorial optimization problem. In [19], a simulated annealing (SA) approach is proposed to introduce correlations between variables by exchanging the position of samples. Intelligent optimization algorithms can be robust and efficient for correlation control in small-sample MCS but suffer from a much heavier computational burden in large-scale problems. In addition, other approaches such as polynomial normal transformation [20] and unscented transformation [21] have also been developed to handle correlated variables in PLF. In this paper, MCS is improved in two aspects, including the sampling strategy and the correlation control. Firstly, the analysis of variation (ANOVA) decomposition and the concept of the effective dimension are described. Then the method to calculate the effective dimension of PLF is designed. Based on the consideration that PLF is a low-dimensional problem, Sobol sequences, which have a good uniformity in low-dimensional projections, are employed to obtain samples of variables. Moreover, according to the relationship between correlation and regression, an MLRbased technique is designed to introduce prescribed dependences among variables. Finally, the proposed PLF calculation method is tested with the IEEE 118-bus system. Simulation results indicate that this new method is more accurate and efficient than popular MCS methods. The rest of the paper is organized as follows. In Section 2 the theoretical analysis of QMC for solving PLF is given. Section 3 gives the relationship between linear correlation and linear regression and describes the design of the technique to control correlations between variables. The proposed PLF calculation method is given in Section 4, followed by the case studies in Section 5. Finally, the conclusion is presented in Section 6. 2. Quasi-Monte Carlo and probabilistic load flow In this section the theoretical basis of using QMC in PLF calculation is given. Firstly, the deficiency of QMC in solving highdimensional problems is described. Then, the ANOVA decomposition and the concept of the effective dimension are introduced. If the effective dimension is low, QMC can be efficient in handling the high-dimensional problem. Finally, the procedure of calculating the effective dimension of PLF is designed.
jQ Q n j 6 Vðf ÞDn
where V(f) is the variation of f in the sense of Hardy and Krause, and Dn⁄ is the discrepancy which reflects the geometric nonuniformity of points in the set. According to (3), the error of MCS estimation is bounded and dominated by Dn⁄ since V(f) is a constant as long as the function f is given. For random points uniformly distributed over Cs, it has been shown that [23]
1=2 Dn ¼ O ðlog log nÞ n1=2
The convergence rate, O(n1/2), is independent of the dimension of problems, which shows that MCS based on random points is very robust but not efficient. Instead of random points, deterministic low-discrepancy sequences (LDSs) are used in QMC and the discrepancy of LDSs is [24]
s Dn ¼ O ðlog nÞ n1
2.2. ANOVA decomposition and effective dimension The function f(x) defined in Cs can be represented in the following form
f ðxÞ ¼ f 0 þ
f i1 ik xi1 ; . . . ; xik
X X f i ð xi Þ þ f ij xi ; xj þ þ f 12s ðx1 ; x2 ; . . . ; xs Þ i
Z
ð1Þ
where f is an integrable function and Cs is the unit cube in s dimensions. Then the MCS estimation of (1) is n 1X Qn ¼ f ðxfig Þ n i¼1
s s X X k¼1 i1 <
MCS can be used to solve the following integration
Cs
ð5Þ
We see from the comparison of (4) and (5) that the convergence rate of QMC is much faster than MCS in low-dimensional problems. But the O((log n)s n1) error bound of QMC may not present any improvement over the O(n1/2) error bound of MCS in high dimensions and it should take very large samples before the O(n1) convergence rate can manifest [25]. Although the convergence rate decreases with higher dimensions, QMC has been seen to outperform MCS in some problems with large dimensions and moderate sample sizes. In [11], the probabilistic small signal stability is studied by a QMC-based technique, and the New England ten-generator 39-bus system is analyzed in the case study. There are 55 input variables and QMC obtains relatively accurate results with hundreds of samples. In [20], QMC is used to solve probabilistic optimal power flow. More than 100 input random variables are considered in the test system, and QMC gives better performances compared with SRS for 2000 samples. In [27], researchers at IBM studied the pricing of a fiveyear discount bond, comprising a 1439-dimensional statistical integral. They observed a QMC speedup of about 150 for an accuracy level of one basis point (i.e., a relative accuracy of 104) compared with random Monte Carlo. Some of these successful applications have been explained by the concept of the effective dimension. In the following, this concept is reviewed since it lays the foundation of using QMC in PLF.
f ðxÞ ¼ f 0 þ f ðxÞdx
ð4Þ
ð6Þ
where 1 6 i1 < < ik 6 s. Eq. (6) means that
2.1. Introduction of quasi-Monte Carlo
Q¼
ð3Þ
The total number of the summands in (6) is 2s. Eq. (6) is called the analysis of variation (ANOVA) decomposition of f(x) if
Z 0
ð2Þ
where x{i} are n independent and identically distributed random points drawn from the s-dimensional problem. The estimated error of (2) is given as follows [22]:
ð7Þ
i
1
f i1 ik xi1 ; . . . ; xik dxm ¼ 0 for m ¼ i1 ; . . . ; ik
ð8Þ
It follows from (8) that the members in (6) are orthogonal and can be expressed as integrals of f(x). Assume that f(x) is square integrable. Then all the fi1 ik in (6) are also square integrable. Squaring (6) and integrating over Cs we get
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
Z
2
Z s s X X
2
f ðxÞdx f 0 ¼
k¼1 i1 <
The constants
r2 ¼
Z
2
2
f dx f 0 ;
r2i1 ik ¼
Z
2 f i1 ik xi1 ; . . . ; xik dxi1 dxik
2
f i1 ik dxi1 dxik
ð9Þ
ð10Þ
s s X X k¼1 i1 <
r2i1 ik
ð11Þ
X
r2u
j j nj ¼ c1j ; . . . ; csj ; n0j ¼ csþ1 ; . . . ; c2s
ð18Þ
nj and n0 j are divided into
Eq. (11) can be written in a more compact notation:
r2 ¼
The sum in (17) is extended over all groups (i1, . . . , ik) where all the (i1, . . . , ik) belong to L. The variance of f(x) and the variance involved with y, respectively r2 and r2y, can be estimated by the following Monte Carlo method [28]: The sample size of the Monte Carlo method is n. For the jth trial j (j = 1, 2, . . . , n), 2s standard random numbers c1j ; . . . ; c2s are generated, and
are called variances and
r2 ¼
3
ð12Þ
nj ¼ gj ; fj ; n0j ¼ g0j ; f0j
ð19Þ
juj>0
where u # {1, . . . , s}. Readers can refer to [28] for the details of the ANOVA decomposition. Then the effective dimension can be defined as follows [29]: (1) The effective dimension in the superposition sense is the smallest integer sS such that
X
r2u P pr2
ð13Þ
0
(2) The effective dimension in the truncation sense is the smallest integer sT such that
X
r2u P pr2
ð14Þ
u # fi1 ;...;iS g T
where p is a selected threshold close to 1. sS shows if only lowdimensional interactions dominate the variation and sT is the number of leading dimensions accounting for the variation. For example, if p is 1, the function f(x) = x1 + x3 + x4 has the superposition dimension sS = 1 and the truncation dimension sT = 3. Hence, the ANOVA version of (3) can be described as [26]
X
jQ Q n j 6
V u ðf u ÞDn;u
ð15Þ
u # f1;...;sg;juj>0
where Vu(fu) is the variation of fu taken as a |u|-dimensional function and D⁄n,u is the discrepancy of the |u|-dimensional points obtained by projecting the point-set onto the coordinates in u. For subsets u that cause large variance ru in f (captured by Vu(fu)), if the discrepancy D⁄n,u is small, all the terms in the error bound are small, leading to a small error bound. From this point, if the effective dimension in one or both of the senses is low, it is possible to achieve a very low error by an LDS which has a good uniformity in low-dimensional projections. In the following the method to calculate the effective dimension of PLF is studied. 2.3. Method to calculate the effective dimension of probabilistic load flow Before assessing the effective dimension of PLF, the method to calculate the variances of functions are studied. For the ANOVA decomposition in (6), consider an arbitrary set of m variables, 1 6 m 6 s 1, that
y ¼ xl1 ; . . . ; xlm ; 1 6 l1 < < lm 6 s
ð16Þ
Let z be the set of s m complementary variables, thus x = (y, z). Let L = (l1, . . . , lm), and the variance corresponding to the subset y is
r2y ¼
m X
X
k¼1 ði1 <
r2i1 ik
ð17Þ
where gj and g0 j are samples of y, fj and f0 j are samples of z. The variances r2 and r2y can be estimated by
r
2
r
2 y
! n n 2 1X 1X 2 ¼ f gj ; fj f gj ; fj n j¼1 n j¼1
! n n 2 1X 1X 0 ¼ f gj ; fj f gj ; fj f gj ; fj n j¼1 n j¼1
ð20Þ
Then the sensitivity index Sy, which is the contribution of y to the total variance of f(x), is calculated as follows:
Sy ¼ r2y =r2
ð21Þ
Finally, the effective dimension can be obtained by analyzing the subset y which makes a significant influence on the total variance of f(x). As to PLF, its model can be simply expressed as follows:
u ¼ gðv Þ w ¼ hðv Þ
ð22Þ
where u is the vector of active and reactive power injections, v is the vector of bus voltage magnitudes and angles, w is the vector of active and reactive line flows, g and h are power injection functions and line flow functions. Input random variables of PLF are usually non-uniformly distributed and are sometimes correlated with each other. Therefore the input variables should be properly handled when the aforementioned Monte Carlo method is used to calculate the variances of functions. In the following, the procedure to calculate the effective dimension of PLF is designed. (1) Transform input random variables of PLF, namely u, to standard uniformly distributed variables x. For independent random variables, the cumulative distribution function transformation (CDFT) is used to obtain the uniform random variables [30]. As to correlated random variables, CDFT combined with correlation control techniques can be used. (2) Select a subset of the uniform random variables. (3) Generate the standard random samples of the uniform variables by (18). (4) Obtain the standard input samples of PLF by (19). (5) Transform standard samples to ones with specific probability distributions and correlations. (6) Perform PLF calculation with (22). (7) Calculate the total variance and the variance involved with the subset using (20). (8) Calculate the sensitivity index of the subset by (21). (9) Evaluate the effective dimension of PLF by (13) and (14). In the case study, the effective dimension of PLF is studied, and simulation results indicate that PLF is approached by a sum of one-
4
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
dimensional functions. The detailed discussions can be found in Section 5.
In this section, based on the relationship between correlation and regression, a technique is developed to introduce desired correlations between variables. 3.1. Linear regression and linear correlation For t correlated variables x1, x2, . . . , xt, it is common to describe dependences among variables using a correlation matrix. Another way is to use regression coefficients and regression error terms of the multiple linear regression in Table 1. Both correlation matrices and regression parameters can be estimated from the data. But in many cases, only the correlation matrix is known. In order to adopt linear regression to control the correlations, the regression parameters should be obtained. Based on the given correlation matrix, t 1 multiple linear regression equations can be established in turn. In the jth regression (j = 1, 2, . . . , t 1), x1, x2, . . . , xj are the regressors, xj+1 is the response variable and the regression equation is
xjþ1 ¼ aj;0 þ aj;1 x1 þ aj;2 x2 þ þ aj;j xj þ N½0; ej
ð23Þ
where aj, 0, aj, 1, . . . , aj, j are the regression coefficients, ej is the standard error and N[0,ej] is a normally distributed random term with zero mean and standard deviation of ej. The parameters of the jth regression are obtained as follows [31]: Let R be the correlation matrix of x1, x2, . . . , xj+1 and R is partitioned into submatrices R11, R12, R21 and a single coefficient 1 in the final row.
R11 j R12 R21 j 1
ð24Þ
The standardized regression weight vector b = [b1, b2, . . . , bj]T is calculated as
R11 b ¼ R12
ð25Þ
The regression coefficients aj, 0, aj, 1, . . . , aj, j are obtained by
aj;k ¼
bk rjþ1 =rk ;
k ¼ 1; 2; . . . ; j
mjþ1 m1 b1 rjþ1 =r1 mj bj rjþ1 =rj ;
k¼0
ð26Þ
where mk and rk are the mean and the standard deviation of xk. Finally, the coefficient of determination rj2 and the standard error ej are obtained as T
r 2j ¼ b R12 ;
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn 2 x2i x02i =ðn 2Þ i¼1
ð29Þ
where x0 2i = a0 + a1x1i. The coefficient of determination r2 is estimated as
3. Correlation control method
R¼
e¼
qffiffiffiffiffiffiffiffiffiffiffiffiffi
ej ¼ rjþ1 1 r2j
ð27Þ
Consider a simple linear regression model
x2 ¼ a0 þ a1 x1 þ N½0; e
ð28Þ
where x1 is the regressor and x2 is the response variable. Let x1i and x2i be n samples of x1 and x2. The standard error e of the linear regression model can be estimated as
r ¼1 2
n X
x2i
2 x02i
,
i¼1
n X ðx2i x2 Þ2
ð30Þ
i¼1
where x2 is the mean of x2i. Besides, for two variables, r2 is the square of the correlation coefficient q.
r2 ¼ q2 "
q¼
ð31Þ
n X ðx1i x1 Þðx2i x2 Þ i¼1
#,qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Xn Xn ðx1i x1 Þ2 ðx2i x2 Þ2 i¼1 i¼1 ð32Þ
According to (32), changing the orders of samples affects the numerator and does not change the denominator. Therefore, the correlation coefficient changes by arranging the orders of samples. Fig. 1 gives a simple example to explain this characteristic. Suppose that the samples of x1 and x2 are permutations of 1, 2, . . . , 6. The correlation coefficient q is initially 0.54, and then becomes 0.43 after swapping the orders of the two samples of x2. Moreover, based on (29), (30) and (31), arranging the orders of samples that increases the absolute value of q will increase r2 and decrease e. Likewise, changing the orders of samples that reduces the absolute value of q will decrease r2 and increase e. As q approaches its maximum or minimum value of 1 or 1, r2 approaches its maximum value of 1 and e approaches zero. Fig. 2 gives an intuitive description of the relationship between e, r2 and q. Suppose that x1 and x2 are standard normal variables. 5000 samples are randomly generated and the orders of samples are re-arranged to change the correlation coefficient q. Fig. 2 shows that e and r2 change with the variation of q. Hence correlation control techniques can be built based on the thought that changing e or r2 by re-ordering the samples also changes q. This consideration can be generalized to multiple linear regression, in which r2 is the composite correlation measure when one variable is regressed against several variables. 3.2. Correlation control based on multiple linear regression In Fig. 3, the procedure of generating samples of non-normal variables is described. Firstly, the independent non-normal samples are obtained, and then the correlations between non-normal samples are introduced. The detailed procedure is given in the following. For t correlated variables, the sample matrix X is initially obtained by sampling methods
0
x11 Bx B 21 X¼B @ xt1
x12 x22 xt2
x1n
1
x2n C C C A
ð33Þ
xtn
Table 1 Parameters of multiple linear regression. Regression coefficients a1,0 a2,0 at-1,0
Standard errors a1,1 a2,1 at-1,1
a2,2 at-1,2
– –
– – – at-1,t-1
e1 e2
et-1
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
5
Fig. 1. Illustration of changing the correlation coefficient by arranging the orders of samples.
(4) Arrange the orders of xj+1, 1, xj+1, 2, . . . , xj+1, n according to orders of x00 j+1, 1, x00 j+1, 2, . . . , x00 j+1, n. ^ based on the first j + 1 (5) Calculate the correlation matrix R rows of X. (6) Evaluate the difference between the desired correlations and the correlations of the re-ordered samples.
^ 21 k=j e ¼ kR21 R
ð36Þ
where || || is the norm of vectors. (7) If one of the following two criteria is satisfied, the permutation of the elements in the (j + 1)th row of X is stopped. Otherwise, proceed to Step 8. (a) The error of correlation e is smaller than the threshold. (b) The iteration number I reaches the maximum number. (8) Estimate the standard error of estimate e0 j by Fig. 2. Relationship between correlation coefficient q, coefficient of determination r2 and standard error e.
e0j ¼
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 Xn xjþ1;i x0jþ1;i =ðn j 1Þ i¼1
ð37Þ
Adjust the standard error ^ej such that Elements in each row of X can describe the marginal distribution of each variable but the orders of elements which determine the correlations between different variables are not properly arranged in sampling methods. Thus the elements of X should be re-ordered to introduce the desired correlation matrix. Based on the permutation method in [32,33], a technique which utilizes multiple linear regression to control correlations is developed. The flowchart of the correlation control is given in Fig. 4. In the proposed technique, the first row of X is kept constant and elements in other rows are re-ordered row by row. When the (j + 1)th row (j = 1, 2, . . . , t 1) is handled, elements in the first j rows have already been arranged and the orders of elements xj+1, 1, xj+1, 2, . . . , xj+1, n, are permutated. The steps of the correlation control technique are as follows: (1) Obtain the parameters of the jth regression equation. (2) I ¼ 1 and ^ej ¼ ej . (3) Calculate x0 j+1,i and x00 j+1,i (i = 1, 2, . . . , n) by
x0jþ1;i ¼ aj;0 þ aj;1 x1i þ þ aj;j xji
ð34Þ
x00jþ1;i ¼ aj;0 þ aj;1 x1i þ þ aj;j xji þ N½0; ^ej
ð35Þ
^ej ¼
ej =e0j ej
ð38Þ
I = I + 1 and go to Step 3. 4. Probabilistic load flow calculation 4.1. Procedure of probabilistic load flow calculation The procedure of PLF calculation by the proposed quasi-Monte Carlo combined with multiple linear regression (QMC-MLR) is summarized as follows: (1) Deterministic data is given and s input variables are obtained. The sample size n is determined. (2) LDSs are generated and are transformed to samples with given marginal distribution functions. (3) If there are correlated random variables, the samples of correlated variables are re-ordered following the steps in Section 3.2 to introduce desired correlations. (4) The power flow Eq. (22) are solved for the ith column of the sample matrix and the results are stored (i = 1, 2, . . . , n).
Fig. 3. Procedure of generating correlated non-normal samples.
6
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
Fig. 4. Flowchart of the correlation control.
(5) Statistical indices and probability distributions of output random variables are calculated. Popular LDSs include Halton, Faure, Sobol, Niederreiter-Xing (NX) sequences and so on. It is reported in [11] that Halton sequences become unsatisfactory after dimension 14 and Faure sequences suffer degradation at approximately 25 dimensionalproblems. Hence, it is not suitable to apply Halton or Faure in PLF with large numbers of input variables. NX sequences have a much lower discrepancy than other LDSs, but there remain difficulties in implementing NX sequences for arbitrary dimensions [26]. As to Sobol sequences, F. Kuo has given the implementation in 21201 dimensions [34]. Moreover Sobol sequences have excellent one-dimensional projections, and many of the two-dimensional projections are also very uniform. That is to say, Sobol sequences are efficient to tackle high-dimensional problems with low effective dimensions. Therefore, Sobol sequences are used in our research. 4.2. Different methods for comparison In order to indicate the effectiveness of the proposed method, popular MCS-based PLF methods are tested for comparison. The sampling strategy and the correlation control are the key issues in PLF calculation. For the sampling strategy, QMC is compared with SRS, LHS-RP, LHS-CD and LSS. As to the correlation control, the MLR-based technique is compared with NORTA and GA. SRS which is the basic MCS technique uses random points. LHSRP and LHS-CD are LHS-based methods and the algorithms can be found in [4]. LSS combines (t, s)-sequences with LHS to overcome the deficiency of QMC in high dimensions. In our study 2N Sobol sequences are used as a (t, s)-net in LSS (N is a positive integer) [12]. NORTA is a widespread technique to tackle correlated nonnormal variables. In NORTA, x1, x2, . . . , xt with the correlation matrix Rx are transformed into intermediate standard normal variables z1, z2, . . . , zt with the correlation matrix Rz.
zi ¼ U1 ½F i ðxi Þ () xi ¼ F 1 i ½Uðzi Þ
ð39Þ
where U is the cumulative distribution function (CDF) of the standard normal variable and Fi is the ith marginal CDF of xi. Every entry qzizj of Rz is calculated by inversely solving the equality for the desired correlation qxixj.
qxi xj ¼
Z
1
1
Z
1 1
1
ri rj
ðxi mi Þ xj mj u2 zi ; zj ; qzi zj dzi dzj
ð40Þ
where u2(zi, zj, qzizj) is the bivariate standard normal probability distribution function with correlation qzizj.
In the use of NORTA, samples of correlated normal variables z1, z2, . . . , zt are firstly obtained and then transformed into samples with desired marginal distributions by (39). Readers can refer to [13] for the details of NORTA. GA treats the correlation control as a combinational problem and handles the permutation of samples by a genetic algorithm. The procedure and parameters of GA are given in [18]. 4.3. Measures for evaluation of methods For correlated input random variables, the following index is used to assess the accuracy of the sample correlations:
eR ¼
i;j¼t
X
^ ij
Rij R
, ððt 1Þt Þ
ð41Þ
i;j¼1
^ ij are the respective elements of the target correlawhere Rij and R tion matrix and the correlation matrix of the sample matrix. For output variables, means and standard deviations are usually involved, thus the following indices are used to evaluate the accuracy of the results:
emean ¼ jl l^ j=jlj 100% estd ¼ jr r^ j=jrj 100%
ð42Þ
where l and r are respectively the mean and the standard deviation of accurate output variables which are approximately obtained ^ and r ^ are respecby SRS with a large number of sample sizes, and l tively the mean and the standard deviation of the results obtained by methods with tested sample sizes. Because the output variables are usually non-normally distributed, means and standard deviations cannot fully describe the probability distributions. Therefore the Chi-squared method is used to evaluate the differences between the distributions on a bin-by-bin basis [12].
echi2 ¼
nb X ðP i M i Þ2 =M i
ð43Þ
i¼1
where Mi and Pi are the respective frequencies of the ith bin in the histogram of accurate and tested results. nb is the number of bins and is calculated by the Freedman–Diaconis method. 5. Case study The case studies consist of four parts. In the first part, the effective dimension of PLF is analyzed. In the second part, the MLRbased technique is adopted to obtain samples of correlated nonnormal variables. In the third and fourth parts, the proposed
7
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
method is used to solve PLF with independent and correlated input random variables, respectively. Programs are developed with MATLAB 8.1 on a PC with Inter Core i5 3.00-GHz CPU and 8 GB of RAM. MATPOWER is used to solve deterministic power flow [35].
5.1. Effective dimension of PLF Firstly, the effective dimension of PLF for the 30-bus system is studied. The 30-bus system is from MATPOWER, and input active and reactive power are modelled as normal variables with coefficients of variation equal to 0.1. For this 30-bus system, there are 40 input random variables and 133 output random variables, including voltage magnitudes and angles, as well as power flow through lines. In Fig. 5, the sensitivity index of each standard input variable with respect to the voltage magnitude of bus 12 is given. In Fig. 5a the load power in different nodes are independent and in Fig. 5b the load power are assumed to be correlated with correlation coefficients equal to 0.5. For the voltage magnitude of bus 12, only a small part of input variables has relatively large sensitivity indices. Moreover, the sum of these one-dimensional sensitivity indices is 0.9781 for independent variables and is 0.9810 for correlated variables. In Fig. 6, the sums of one-dimensional sensitivity indices with respect to other output variables are given. These values are very close to 1, and some of them are a bit larger than 1 due to estimation errors. According to the ANOVA decomposition, the sum of one-dimensional sensitivity indices equal to 1 means that
(a) For independent input variables
the problem is a sum of one-dimensional functions. Therefore, the effective dimension in the superposition sense sS can be approximately regarded as 1. Then, the effective dimensions of PLF for other cases are briefly analyzed. In Table 2, the average values of the sums of onedimensional sensitivity indices are given. Similar to the results obtained in the 30-bus system, the sums of one-dimensional sensitivity indices are close to 1. In Table 3, the approximate effective dimensions in the truncation sense sT are given. Compared with the total number of input variables, the effective dimension sT is much smaller. In other words, only a small part of the input variables largely determine the output variables. In summary, by analyzing the effective dimension in the superposition sense and in the truncation sense, PLF is treated as a lowdimensional problem, mainly a sum of one-dimensional functions, that can be efficiently solved by QMC.
Table 2 Average values of the sums of one-dimensional sensitivity indices. Case
Case 9 Case 14 Case 30 case 39 Case 57
Average values of the sums of one-dimensional sensitivity indices For independent input variables
For correlated input variables
0.9979 0.9888 0.9724 0.9528 0.9709
0.9966 0.9786 0.9594 0.9001 0.9591
(b) For correlated input variables
Fig. 5. Sensitivity index of each standard input variable with respect to the voltage magnitude of bus 12.
(a) For independent input variables
(b) For correlated input variables
Fig. 6. Sum of one-dimensional sensitivity indices with respect to each output variable.
8
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
Table 3 Average effective dimensions in the truncation sense.a Case
Case Case Case Case Case
9 14 30 39 57
Total number of input variables
Average ST for independent input variables
Average ST for correlated input variables
6 22 40 42 84
2.5 3.8 5.3 8.2 8.5
2.6 5.3 7.9 13.1 14.5
Table 5 Correlation error indices and computation time of MLR and NORTA for different dimensions of correlation matrices. Dimension of correlation matrix
eR (102) MLR
NORTA
MLR
NORTA
10 25 50 100
0.24 0.65 1.33 0.72
2.63 2.83 2.62 2.78
0.05 0.17 0.43 1.24
0.66 4.60 18.96 76.56
Computation time (s)
a In calculating the effective dimension in the truncation sense, PLF is approximated treated as a sum of one-dimensional functions and the threshold p in (14) is selected as 0.9.
Table 6 Number of correlation matrices successfully handled by MLR and NORTA.
Fig. 7. Correlation matrix errors of NORTA, GA and MLR.
Table 4 Computation time of NORTA, GA and MLR. Sample size
NORTA (s)
GA (s)
MLR (s)
50 250 500 1000
0.66 0.66 0.66 0.66
2.29 6.63 16.33 46.80
0.04 0.05 0.05 0.06
Dimension of correlation matrix
MLR
NORTA
3 5 7 9
1000 1000 1000 1000
930 771 587 286
correlation matrices. For a correlation matrix with the dimension of t t, the double integration (40) will be solved at most t (t 1)/2 times. Hence the computation time of NORTA obviously increases with the increase of correlation matrix dimensions. In Table 6 the influence of the correlation coefficients on correlation control is studied. For each fixed dimension, 1000 correlation matrices are randomly generated. MLR successfully handles all these matrices. However, the number of correlation matrices that can be tackled by NORTA significantly decreases as the dimension of the correlation matrices increases. For a positive definite matrix Rx, the intermediate correlation matrix Rz obtained in NORTA is likely to be negative definite with the increase of dimensions. Thus, NORTA may be defective in high-dimensional correlation matrices. There are also some limitations of the proposed method. The MLR technique only concerns linear correlations between variables, and the sample size should be determined before the correlations are obtained. In summary, compared with NORTA and GA, the MLR-based technique is more efficient and robust in obtaining samples from correlated non-normal variables.
5.2. Correlation 5.3. PLF with independent variables Suppose that there are ten Weibull variables. The scale parameter and the shape parameter of each variable are respectively defined as 9 m/s and 2. Moreover, the correlation coefficient between the Weibull variables is defined as 0.5. The proposed MLR-based technique is compared with NORTA and GA in obtaining samples from the designed Weibull variables. The errors of correlation matrices are given in Fig. 7 and the computation time is shown in Table 4. NORTA gives the largest errors but its computation time remains almost the same for different sample sizes. GA accurately captures the desired correlations but the computation time significantly increases as the sample size increases. This is because the main task of NORTA is to estimate correlations between intermediate normal variables by (40) and it is not affected by the changes of sample sizes. On the other hand, GA is an intelligent optimization algorithm that suffers from a heavy computational burden in largescale problems. Compared with NORTA and GA, MLR has the advantage of obtaining accurate results in a quite short time. Due to the popularity of NORTA, MLR is further compared with NORTA. In Table 5 the sample size is defined as 500 and the influences of the correlation matrix dimensions on the results are analyzed. The correlation errors eR of MLR are smaller in all cases and the computation time of MLR is much shorter for high-dimensional
In PLF calculation, the IEEE 118-bus system is used and its probabilistic data is from [5]. Four areas are distinguished covering nodes 1–33, 34–59, 60–79 and 80–118. Active and reactive load demands are modelled by normal distributions with different coefficients of variation in different areas. The outages of generators are described by binomial distributions. QMC is used to solve this PLF problem and the result is compared with those obtained by other MCS methods. In Fig. 8 the relationship between statistical indices of output variables and sample sizes is shown. For large sample sizes, QMC and SRS obtain the same results. For small sample sizes, the two methods show different performances. In this case, QMC only needs about 500 samples to obtain relatively steady results. But SRS cannot get reliable results until with about 3000 samples. In other words, QMC has a much faster convergence rate compared with SRS in solving PLF calculation. In Fig. 9 the performances of QMC, SRS, LHS and LSS with the same sample sizes are compared. The result obtained by SRS with the sample size of 50,000 is regarded as the benchmark. The average errors of the means and the standard deviations of the output variables with small sample sizes are shown. Under the same sample sizes, SRS gives the largest errors. Both LHS-RP and LHS-CD
9
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
(b) Standard deviations of the voltage magnitude of bus 9
(a) Means of the voltage magnitude of bus 9
Fig. 8. Relationship between the statistical indices of the voltage magnitude of bus 9 and sample sizes.
(a) Errors of the means of output variables
(b) Errors of the standard deviations of output variables
Fig. 9. Errors of the means and the standard deviations of output variables obtained by different methods.
Table 7 Distribution error indices of different methods. Error index
Output
SRS
LHS-RP
LHS-CD
LSS
QMC
echi2
V h Pij Qij
0.1515 0.2500 0.2746 0.2889
0.1405 0.2230 0.2463 0.2564
0.1387 0.2359 0.2518 0.2660
0.1413 0.2562 0.2528 0.2705
0.1294 0.2159 0.2259 0.2409
accurately estimate the means. As to the standard deviations, LHSCD does a better job since Cholesky decomposition is used to reduce unwanted correlations between samples. The performance of LSS is affected by the choice of the base number. In this paper, Sobol sequences with the base number of 2 are used, and LSS obtains similar results as LHS-RP. QMC has similar performances as LHS and LSS in Fig. 9a, and it is obviously more accurate in estimating the standard deviations of output random variables, as shown in Fig. 9b. In Table 7 the sample size is 512 and the distribution error indices echi2 of different methods are given. In accordance with the results of Fig. 9, the errors of QMC are the smallest among all the methods. In Fig. 10, probability densities of active and reactive power flow through lines are shown. Power flow through a line is obtained at the starting node and the flow direction from the starting node to the end node is defined as the positive direction. According to Fig. 10, QMC captures the multimodality of probability distributions and obtains relatively accurate results with much smaller sample sizes. Furthermore, by using constant loads or adding loads modelled with normal distributions, the number of input variables can be
increased or decreased. In Fig. 11, the performances of QMC in solving PLF with different numbers of input variables are given. Under the same sample sizes, the errors of the output random variables do not significantly increase with the increase of the numbers of input variables. Since Sobol sequences only have a good uniformity in low-dimensional projections, the results in Fig. 11 indicate that the effective dimension of PLF remains low even for large numbers of input variables. In summary, QMC using Sobol sequences shows satisfactory performances for large numbers of input random variables and obtains more accurate results compared with other MCS techniques for the same sample sizes. 5.4. PLF with correlated variables In this section the 118-bus system is modified to consider correlated input variables. Load active power is correlated with a correlation coefficient equal to 0.8 if they are placed within the same area, and equal to 0.5 otherwise. The load reactive power is modelled such that the power factor of each load is kept constant. Moreover, ten 100-MW wind farms are connected in pairs to nodes
10
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
(a) Active power flow through line 84–85
(b) Reactive power flow through line 105–106
Fig. 10. Probability densities of output variables obtained with different sample sizes.
(a) Errors of the means of output variables
(b) Errors of the standard deviations of output variables
Fig. 11. Errors of the means and the standard deviations of output variables for different numbers of input variables.
Table 8 Comparisons of error indices between different methods. Output
Error index
SRS-MLR
LHS-MLR
QMC-MLR
V
emean (%) estd (%) echi2 (102)
0.37 102 1.44 11.77
0.05 102 1.00 11.21
0.05 102 0.83 10.71
h
emean (%) estd (%) echi2 (102)
0.96 2.66 21.33
0.09 1.97 20.11
0.11 1.67 19.46
Pij
emean (%) estd (%) echi2 (102)
4.02 2.54 20.53
0.37 1.83 19.39
0.38 1.52 17.80
Qij
emean (%) estd (%) echi2 (102)
2.46 2.74 21.55
0.30 2.13 20.62
0.30 1.75 19.47
9, 23, 35, 86 and 117. For the wind speeds of the wind farms, the probability distribution models are obtained from Section 5.2 and the correlation coefficients are defined as 0.5. The power curve and the power factor of the wind farms are obtained from [16]. The result of SRS-MLR with the sample size of 50,000 is treated as the benchmark, and the errors of QMC-MLR, LHS-MLR and SRSMLR with the sample size of 512 are given in Table 8. The errors of SRS-MLR are the largest in all cases. Compared with LHS-MLR, QMC-MLR gives similar results for the means and obtains more accurate estimation for the standard deviations of the output ran-
dom variables. As to the distribution indices, QMC-MLR gives the smallest errors as well. The probability densities of the output random variables are described in Fig. 12. Compared with LHS-MLR and SRS-MLR, QMC-MLR produces more reliable results under the same sample sizes. The three methods use the same correlation control technique. Therefore the correlations between samples are the same and the properties of one-dimensional samples mainly determine the differences between methods. QMC-MLR gives the most accurate results, which indicate that the points of QMC are more uniform
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
(a) Active power flow through line 26–25
11
(b) Reactive power flow through line 40–42
Fig. 12. Probability densities of output variables obtained by different methods.
in one-dimensional projections than the points used in LHS and SRS. 6. Conclusion In this paper quasi-Monte Carlo combined with multiple linear regression is proposed to address PLF calculation. A distinguishing feature of this paper is that PLF is treated as a low-dimensional problem and Sobol sequences which have a good uniformity in low-dimensional projections are adopted to improve the sampling efficiency of MCS. Moreover, according to the relationship between correlation and regression, a technique based on multiple linear regression is designed to control the correlations between input random variables. The proposed method is tested with the IEEE 118-bus system. According to the simulation results, the developed correlation control technique is robust and efficient in handling correlated variables. The proposed approach shows satisfactory performances for large numbers of input random variables and obtains more accurate results compared with existing MCS methods in PLF calculation. In future work, we will apply the proposed method to actual grids and analyze the influences of uncertainties on the operation of power systems. Acknowledgments This work was supported by the Natural Science Foundation of China (51377103). References [1] Borkowska B. Probabilistic load flow. IEEE Trans Power App Sys Apr. 1974;PAS93:752–9. [2] Zhang P, Lee S. Probabilistic load flow computation using the method of combined cumulants and Gram-Charlier expansion. IEEE Trans Power Syst Feb. 2004;19(1):676–82. [3] Fan M, Vittal V, Heydt GT, et al. Probabilistic power flow studies for transmission systems with photovoltaic generation using cumulants. IEEE Trans Power Syst Nov. 2012;27(4):2251–61. [4] Yu H, Chung CY, Wong KP, et al. Probabilistic load flow evaluation with hybrid Latin hypercube sampling and Cholesky decomposition. IEEE Trans Power Syst May 2009;24(2):661–7. [5] Morales JM, Pérez-Ruiz J. Point estimate schemes to solve the probabilistic power flow. IEEE Trans Power Syst Nov. 2007;22(4):1594–601. [6] Tang J, Ni F, Ponci F, et al. Dimension-Adaptive Sparse Grid Interpolation for uncertainty quantification in modern power systems: probabilistic power flow. IEEE Trans Power Syst Mar. 2016;31(2):907–19. [7] Kroese DP, Taimre T, Botev ZI. Handbook of Monte Carlo methods. New York: John Wiley; 2013. p. 347–80. [8] Preece R, Milanovic JV. Efficient estimation of the probability of smalldisturbance instability of large uncertain power systems. IEEE Trans Power Syst Mar. 2016;31(2):1063–71.
[9] Leite da Silva AM, Fernández RAG, Singh C. Generating capacity reliability evaluation based on Monte Carlo Simulation and Cross-Entropy methods. IEEE Trans Power Syst Feb. 2010;25(1):129–37. [10] Mckay MD, Beckman RJ, Conover WJ. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics May 1979;21(2):239–45. [11] Huang H, Chung CY, Chan W, et al. Quasi-Monte Carlo based probabilistic small signal stability analysis for power systems with plug-in electric vehicle and wind power integration. IEEE Trans Power Syst Aug. 2013;28 (3):3335–43. [12] Hajian M, Rosehart WD, Zareipour H. Probabilistic power flow by Monte Carlo Simulation with Latin supercube sampling. IEEE Trans Power Syst May 2013;28(2):1550–9. [13] Liu PL, Der Kiureghian A. Multivariate distribution models with prescribed marginals and covariances. Probab Eng Mech 1986;1(2):105–12. [14] Li X, Zhang X, Wu L, et al. Transmission line overload risk assessment for power systems with wind and load-power generation correlation. IEEE Trans Smart Grid May 2015;6(3):1233–42. [15] Morales JM, Conejo AJ, Pérez-Ruiz J. Simulating the impact of wind production on locational marginal prices. IEEE Trans Power Syst May 2011;26(2):820–8. [16] Chen Y, Wen J, Cheng S. Probabilistic load flow method based on Nataf transformation and Latin hypercube sampling. IEEE Trans Sustain Energy Apr. 2013;4(2):294–301. [17] Ghosh S, Henderson SG. Behavior of the NORTA method for correlated random vector generation as the dimension increases. ACM Trans Model Comput Simul Jul. 2003;13(3):276–94. [18] Liefvendahl M, Stocki R. A study on algorithms for optimization of Latin hypercubes. J Statist Plan Infer Sep. 2006;136(9):3231–47. [19] Vorˇechovsky´ M, Novák D. Correlation control in small-sample Monte Carlo type simulations I: a simulated annealing approach. Probab Eng Mech Jul. 2009;24(3):452–62. [20] Zou B, Xiao Q. Solving probabilistic optimal power flow problem using quasi Monte Carlo method and ninth-order polynomial normal transformation. IEEE Trans Power Syst Jan. 2014;29(1):300–6. [21] Aien M, Fotuhi-Firuzabad M, Aminifar F. Probabilistic load flow in correlated uncertain environment using unscented transformation. IEEE Trans Power Syst Nov. 2012;27(4):2233–41. [22] Niederreiter H. Quasi-Monte Carlo methods and pseudo-random numbers. Bull Am Math Soc Nov. 1978;84(6):957–1041. [23] Kiefer J. On large deviations of the empirical d.f. of vector chance variables and a law of the iterated logarithm. Pacific J Math 1961;11(2):649–60. [24] Singhee A. Novel algorithms for fast statistical analysis of scaled circuits PhD dissertation. Pittsburgh, PA: Dept. Electr. Comput. Eng., Carnegie Mellon Univ.; 2007. [25] Morokoff W, Caflisch RE. Quasi-random sequences and their discrepancies. SIAM J Sci Stat Comput Nov. 1994;15(6):1251–79. [26] Singhee A, Rutenbar R. Why quasi-Monte Carlo is better than Monte Carlo or Latin hypercube sampling for statistical circuit analysis. IEEE Trans Comput Aided Des Integr Circuits Syst Nov. 2010;29(11):1763–76. [27] Ninomiya S, Tezuka S. Toward real-time pricing of complex financial derivatives. Appl Math Finance Mar. 1996;3(1):1–20. [28] Sobol IM. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math Comput Simul Feb. 2001;55(1–3):271–80. [29] Caflisch RE, Morokoff W, Owen A. Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. J Comp Finance 1997;1 (1):27–46. [30] Papaefthymiou G, Kurowicka D. Using copulas for modeling stochastic dependence in power system uncertainty analysis. IEEE Trans Power Syst Jul. 2009;24(1):40–9. [31] Cooley WW, Lohnes PR. Multivariate data analysis. New York: John Wiley; 1971 [Chapter 3].
12
X. Xu, Z. Yan / Electrical Power and Energy Systems 88 (2017) 1–12
[32] Xu X, Yan Z. Probabilistic load flow evaluation with hybrid Latin hypercube sampling and multiple linear regression. Presented at IEEE PES General Meeting, Denver, America; 2015. [33] Ilich N. A matching algorithm for generation of statistically dependent random variables with arbitrary marginal. Eur J Oper Res Jan. 2009;192(2):468–78.
[34] Joe S, Kuo FY. Constructing Sobol sequences with better two-dimensional projections. SIAM J Sci Comput 2008;30(5):2635–54. [35] Zimmerman RD, Murillo-Sánchez CE. MATPOWER, a MATLAB Power System Simulation Package. Ithaca, NY: Version 5.0. Power System Engineering Research Center (PSERC), Sch. Elect. Eng., Cornell Univ.; 2015.