Statistical testing for load models using measured data

Statistical testing for load models using measured data

Electric Power Systems Research 163 (2018) 66–72 Contents lists available at ScienceDirect Electric Power Systems Research journal homepage: www.els...

889KB Sizes 0 Downloads 58 Views

Electric Power Systems Research 163 (2018) 66–72

Contents lists available at ScienceDirect

Electric Power Systems Research journal homepage: www.elsevier.com/locate/epsr

Statistical testing for load models using measured data a,⁎

a

a

Jiaqing Lv , Mirosław Pawlak , U.D. Annakkage , Bagen Bagen a b

T

b

Department of Electrical & Computer Engineering, University of Manitoba, Canada Manitoba Hydro, Canada

A R T I C LE I N FO

A B S T R A C T

Keywords: Load modeling Statistical testing F-test Regression nested-models Parametric system identification

This paper applies the statistical testing theory to examine the validity of different load parametric models. Traditionally measurement-based static load modeling has been performed based on a single parametric model. Commonly utilized models include: ZIP (constant-impedance–constant-current–constant-power) model, exponential model, and frequency component adjusted ZIP/exponential models. It has been conjectured that the models making use of the frequency feature should be better compared to the ones purely based on the voltage component. However, there has not been any theoretical-based justification to confirm this claim. It is a goal of this paper to provide a formal method for verifying this claim by employing the theory of statistical testing for correct parametric model specification. In particular, a class of F-tests for checking the correctness of the specific load model is employed. Our methodology is verified on the real phasor measurement unit (PMU) data describing a radial load in the Manitoba Hydro power system. The obtained results confirm the usefulness of the frequency component based models.

Q = a4 V 2 + a5 V + a6 + ξ,

1. Introduction Load modeling plays an important role in stable operation of power systems. Load models are used in power flow and transient stability studies for planning and operation of power systems. It is well known [1] that the results of a transient stability analysis depend on the load models used. Therefore, it is important to model the loads as close as possible to the actual load. Modeling of loads is complicated because loads connected to a typical bus are composed of a large number of devices often with ever-changing characteristics. The composition of loads often has a complicated dependence on many factors including time, season, weather condition, and energy prices, just to name a few. Moreover, one should take into account the time-varying nature of the load systems as the load signals such as voltage, frequency, current are non-stationary processes. Typical approaches for load modeling can fall into two categories: component-based methods [1–3] and measurement-based methods. Furthermore, the measurement-based approach can be based on static modeling, dynamic modeling, or the combined static–dynamic modeling. The following is a popular polynomial static model describing the dependence between the voltage (V) signal and the powers: the real power – P and the reactive power – Q:

P = a1 V 2 + a2 V + a3 + ε,



(1a)

Corresponding author. E-mail address: [email protected] (J. Lv).

https://doi.org/10.1016/j.epsr.2018.05.026 Received 24 October 2017; Received in revised form 6 April 2018; Accepted 28 May 2018 0378-7796/ Crown Copyright © 2018 Published by Elsevier B.V. All rights reserved.

(1b)

where ε, ξ are modelling errors regarded to be random variables with zero mean and finite variance. Another commonly used load model is the exponential model of the following form:

P = a1 V a2 + ε,

(2a)

Q = a3 V a4 + ξ.

(2b)

The above models assume that the output variables, i.e., P, Q are only influenced by the voltage signal V. In more refined models P, Q may depend both on the voltage V and the frequency f. This leads to the following extended models corresponding to (1) and (2)

P = (a1 V 2 + a2 V + a3)(1 + a4 (f − f0 )) + ε ,

(3a)

Q = (a5 V 2 + a6 V + a7)(1 + a8 (f − f0 )) + ξ ,

(3b)

and

P = a1 V a2 (1 + a3 (f − f0 )) + ε ,

(4a)

Q = a4 V a5 (1 + a6 (f − f0 )) + ξ ,

(4b)

where f0 is the reference frequency being 60 Hz in the North American power system. The aforementioned models are defined up to the unknown coefficients {ai} that need to be determined from the available

Electric Power Systems Research 163 (2018) 66–72

J. Lv et al.

the introduced methodology to the real PMU data derived from the Manitoba Hydro power system. Concluding remarks are presented in Section 5.

data. It is also worth mentioning that the model in (1) is called the ZIP model, since the different terms of the formula in (1) correspond to constant impedance (Z), constant current (I) and constant power (P). Model (2) is commonly referred to as the exponential model. Models (3), (4) are the extended ZIP and exponential models, respectively. They have correction terms depending on the additional input variable being the frequency component. Note that if a4 = a8 = 0 then the model in (3) reduces to the simpler model in (1). An analogous reduction applies to the model in (4). The exponential model in (2) has been examined in [4,5], whereas in [6] the frequency-dependent models in (3), (4) have been evaluated. Besides the aforementioned models there have been other, less commonly used in practice, modelling approaches that can be regarded as some modifications of the above introduced popular models. For instance, the static model in [7] can be viewed as an intermediate model between models (1) and (3), while the EPRI LoadSyn program [8] uses a model with the real power part being a linear combination of (2a) and (4a), and the reactive part being a linear combination of the two models in (4b) with different parameters [8]. The latter approach has been also applied in [2,8]. Composite load models are the extension of the static models by allowing the dynamic part to be involved. Their static parts are as in (1)–(4), whereas the dynamic part is connected in a certain block-oriented structure. For instance, in [9], the models in (1), (2) are combined in the parallel structure with a dynamic part represented by an induction motor. These overall composite models are referred to as the ZIP-induction motor (GZIP-IM) load model and the exponentialinduction motor (Exp-IM) load model. Similarly, in [10–14] a combination of (1) and the motor dynamic part has been taken into consideration. On the other hand, in [11] a composite model with the combination of (3) and the motor dynamic part has been utilized. Besides, a dynamic model can be extended from the basic form of static model through other approaches. For instance, the so-called “exponential recovery model” in [15] provide such a means. Recently, artificial intelligence methods [16,17] as well as Hidden Markov models [18] have also been applied in load modeling. In this paper, the basic static load models defined in (1), (2), (3), and (4) are examined. The problem of fitting, i.e., estimating unknown parameters of these models has been addressed in the aforementioned references. In practice, however, one needs to decide which model gives the most significant fit to the observed data. In fact, every loading modeling problem should be equipped with the proper model check. Hence, the principal goal of this paper is to consider testing procedures for such hypotheses. A formal statistical test is designed that is able to choose either the simple models in (1), (2) or the more complex models in (3), (4). These are nested models and the version of the F-test to verify the posed hypotheses is applied. The F-test is the general strategy for verifying the hypothesis that the proposed restrictive model fits the data well compared to the unrestricted larger model [19,20]. The test is based on the relative distance in residual sum of squares between the restrictive model and the full model. The developed methodology is then applied to the real PMU data representing the observations of a radial load in the Manitoba Hydro power system. Our findings are that for this particular data the frequency component is essential for the load modeling problem, i.e., the complex models in (3) and (4) are more preferable than the simple ones. Due to the random nature of the observed data this conclusion should be interpreted in the statistical framework, i.e., with the high probability of acceptance one can conclude that the models with the frequency component better fit the given data set. It is also clear that the developed methodology can be used by power engineers to select the appropriate load model that is suitable for a particular application. The rest of the paper is organized as follows. Section 2 gives the description of the problem of parametric nested models specification and the corresponding test statistics. The problem of load models fitting under the hypotheses is examined in Section 3. Section 4 implements

2. Model specification and F statistics In the regression analysis one obtains the data set Dn = {(X1, Y1), …, (Xn, Yn)}, where the component Yi is the response variable that we try to explain from the values of the input variables {Xi}. The postulated model between Xi and Yi is of the parametric form

Yi = m (Xi; θ★) + εi,

i = 1, …, n,

where {εi} is the random non-observed noise process that is assumed to be a sequence of independent and identically distributed (i. i. d .) random variables with variance σ2. The nonlinearity m(x;θ) is the known function except for the p-dimensional parameter θ, where θ★ is the true value of θ. If the model m(x;θ) is correctly specified then one can estimate θ★ by minimizing the least-squares criterion, i.e., n

S (θ) =

∑ (Yi − m (Xi; θ))2.

(5)

i=1

The resulting least-squares estimate θˆ can converge to θ★ under very general regularity condition on the nonlinearity m(x;θ), see Chapter 12 in [20] for details. Moreover, if the noise {εi} is normally distributed then the θˆ is also the maximum-likelihood estimator that reveals some further efficiency properties [20]. In practice, one is confronting with several alternative regression models and would like to choose the model that explains the data the best. Hence, let

H0: Yi = m 0 (Xi; θ0★) + εi

i = 1, …, n,

represents the restricted model, where

H1: Yi =

m1 (Xi; θ1★)

+ εi

(6)

θ0★



R p1

. On the other hand

i = 1, …, n,

(7)

θ1★

Rp

∈ . These are nested describes the alternative full model, where models as p = p1 + p2, some p2 ≥ 0 and if p2 = 0 then θ1★ = θ0★ and consequently m 0 (x; θ0★) = m1 (x; θ1★) . Hence, for the nested models, the model under H0 is the special case of the model under H1. An important example is the following linear null hypothesis model

H0: Yi = a 0 + a1 X1i + ⋯+a p1 − 1 X p1 − 1, i + εi,

i = 1, …, n,

(8)

where Xi = (1, X1i , …, X p1 − 1, i )T is the p1-dimensional vector of the input variables. The alternative full linear regression model is

H1: Yi = a0 + a1 X1i + ⋯+a p1 − 1 X p1 − 1, i + ⋯+ap − 1 Xp − 1, i + εi,

(9)

for i = 1, …, n, where now Xi = (1, X1i, …, Xp−1,i)T is the p-dimensional input vector. In this case the choice between the two competitive models is equivalent to testing the null hypothesis H0 that the last p2 = p − p1 of the ai's are 0, i.e., ai = 0 for i = p1, …, p − 1. The general approach for testing the nested nonlinear regression hypotheses defined in (6) and (7) is based on the generalized likelihood ratio test [20,21] that requires the evaluation of the log-likelihood ratio

log

p (Y|H1) , p (Y|H0)

where p(Y|H0) is the joint density of the observed data Y = (Y1, …, Yn) when the null hypothesis model in (6) holds. This density should be conditioned on the input data (X1, …, Xn) if they reveal the random nature. This is the case in our load model specification problem that is examined in this paper. The same interpretation applies to p(Y|H1) for the alternative model in (7). The explicit evaluation of the likelihood ratio is generally difficult. If, however, one assumes that the noise process {εi} is Gaussian then the direct calculation, see [20,21], yields the so-called F-statistic

Fn = 67

(SS0 − SS1) n − p , SS1 p − p1

(10)

Electric Power Systems Research 163 (2018) 66–72

J. Lv et al.

residual process

where n

SSE0 =

eˆj = Yj − m1 (Xj; θˆ1),

∑ (Yi − m0 (Xi; θˆ0))2 i=1

n

∑ (Yi − m1 (Xi; θˆ1))2 i=1

is the residual sum of squares for the full alternative model. In the above θˆ0 is the least-squares estimate (see (5)) of the true parameter θ0★ for the null hypothesis model and θˆ1 is defined analogously for the alternative model parameter θ1★ . The test statistic in (10) has an appealing interpretation, i.e., the smaller the difference between SS0 and SS1, the more likely we will accept the null hypothesis. On the other hand, the larger the difference, the more likely we will reject the null hypothesis. Thus, the restrictive model is rejected if Fn is too large, i.e., that Fn > c, where c is a suitable chosen constant. In order to find the appropriate value of the control limit c, one needs to know the distribution of Fn under the null hypothesis H0. It is important to note that the exact distribution of Fn is known under the assumption that the restrictive m 0 (x; θ0★) and alternative m1 (x; θ1★) models are linear such as in (8) and (9) and the estimates θˆ0 , θˆ1 of θ0★, θ1★ are determined by the least-squares estimation method. For the nonlinear hypothesis problem in (6) and (7) the linearization strategy allows to show [20] that the test statistic Fn in (10) is approximately (for the large sample size) distributed (under the null hypothesis) as the F-distribution with p − p1 and n − p degrees of freedom. We denote this distribution as Fp − p1, n − p . The fact that the distribution of the test statistic Fn in (10) can be approximated by the Fp − p1, n − p distribution allows us to determine the asymptotic the control limit c(α). In fact, this is the solution of the following equation

P (Fn > c|H0) = α,

3. Identification methods In order to efficiently perform the nested models testing procedure discussed in Section 2, we need to identify the assumed models for the null and alternative hypotheses. Let us examine this issue in the context of the load models introduced in Section 1. Hence, we have given the data sets

(11)

where the left-hand-side of this equation is the false rejection rate of the null hypothesis, whereas the right-hand side is the assumed significant level α often referred to as the p-value. Thus, the asymptotic solution of (11) is given by c (α ) = Qp − p1, n − p (α ) – the (1 − α)-quantile of the Fp − p1, n − p distribution. This leads us to the following final testing procedure: reject the null hypothesis H0 if

Fn > Qp − p1, n − p (α ),

(13)

To verify the Gaussianity one can employ the so-called Q–Q plot (quantile–quantile plot) that is is defined as a curve formed by the pairs (Q1(p), Q2(p)) for 0 ≤ p ≤ 1, where Q1(p), Q2(p) are the quantile functions of the two different distributions. The fundamental property of the Q–Q plot is that becomes a diagonal line if the two distributions are identical. Any departure from this is the indication of the differences between the distributions. In our case we wish to discriminate between the normal distribution and the distribution of the observed residual process {eˆj} . This requires estimation of the quantile function of {eˆj} and this can be easily done by ordering {eˆ1, …, eˆn} from the smallest to the largest. In the resulting empirical Q–Q plot the horizontal coordinate will correspond to the theoretical quantiles of the Gaussian distribution, whereas the vertical coordinate will reflect the empirical quantiles of the residual process {eˆj} . In summary, if the empirical Q–Q plot is linear, then the residuals strictly follow the Gaussian distribution. The differences from the linear line can reflect whether the residual process has the distribution with heavier or lighter tails. It is also worth noting that Q–Q plots are able to detect differences in tails of two distributions, the task that would be difficult to observe based on any estimates of the distributions. The residual process in (13) can also provide the information about the variance σ2 of the noise process {εj}. In fact, it can be shown [20] n that s 2 = ∑ j = 1 eˆj2/(n − p) is a consistent estimate of σ2.

is the residual sum of squares for the simpler model and

SSE1 =

j = 1, …, n.

Dn = {(V1, f1 ; P1), …, (Vn, fn ; Pn )},

Dn = {(V1, f1 ; Q1), …, (Vn, fn ; Qn )} of the input and output observations, where the input variable is X = (V, f), i.e., voltage and frequency, and the output is either Y = P or Y = Q, i.e., the real power or the reactive power. The basic tool to identify the models in (1) and (3) is the theory of linear regression [20]. For the model in (1a) we can write

(12)

where Fn is the test statistic defined in (10). The flowchart for the aforementioned testing procedure is shown in Fig. 1. The above theory is relying on the assumption that the noise process {εi} is Gaussian. If the noise distribution is not normal then one should replace the least-squares criterion by the so-called concentrated loglikelihood function for θ and derive the corresponding likelihood-ratio test that reveals the asymptotic χ p2− p distribution when the null hy1 pothesis is true, see Chapter 5 in [20] for further details. It is worth mentioning, however, that the F-test is relatively robust to mild departure from the normality assumption [20,19]. Nevertheless, one can formally assess the normality assumption by examining the full model

⎡ P1⎤ Y = ⎢ ⋮ ⎥, ⎢ Pn ⎥ ⎣ ⎦

2 ⎡1 V1 V1 ⎤ Λ = ⎢⋮ ⋮ ⋮ ⎥, ⎢ 2⎥ ⎣1 Vn Vn ⎦

a ⎡ 3⎤ Θ = ⎢ a2 ⎥, ⎣ a1 ⎦

ε ⎡ 1⎤ ε = ⎢ ⋮ ⎥. ⎣ εn ⎦

Then the ZIP model (1a) is equivalent to the linear model that can be rewritten in terms of the above defined arrays as follows

Y = ΛΘ + ε.

(14)

The estimate of the parameter vector Θ is the classic least squares solution defined in (5), i.e.,

ˆ = (ΛT Λ)−1ΛT Y . Θ

(15)

An analogous solution can be written for the model in (1b) by replacing the output variable Pi by Qi. Let us now consider the model in (3a) corresponding to the active power. This is the two-input model that can be written as the linear model by writing

P = b1 + b2 V + b3 V 2 + b4 (f − f0 ) + b5 V (f − f0 ) + b6 V 2 (f − f0 ) + ε , (16) where b1 = a3, b2 = a2, b3 = a1, b4 = a3a4, b5 = a2a4, b6 = a1a4. For the observed data one can introduce the following arrays

Fig. 1. The flowchart for the applied F-test. 68

Electric Power Systems Research 163 (2018) 66–72

J. Lv et al.

2 2 ⎡1 V1 V1 (f1 − f0 ) V1 (f1 − f0 ) V1 (f1 − f0 ) ⎤ ⎥, ⎢ Λ= ⋮ ⋮ ⋮ ⋮ ⋮ ⎥ ⎢ 2 2 ⎣1 Vn Vn (fn − f0 ) Vn (fn − f0 ) Vn (fn − f0 ) ⎦

⎡ P1⎤ Y = ⎢ ⋮ ⎥, ⎢ Pn ⎥ ⎣ ⎦

⎡ b1⎤ ⎢ b2 ⎥ ⎢ b3 ⎥ Θ = ⎢ ⎥, ⎢ b4 ⎥ ⎢ b5 ⎥ ⎢ ⎥ ⎣ b6 ⎦

ε ⎡ 1⎤ ε = ⎢ ⋮ ⎥. ⎣ εn ⎦

Owing to this one can write (16) in the form as in (14), i.e., Y = ΛΘ + ε. This again allows us to find the least squares estimate in the form as in (15). It is worth noting that in (16) there are 6 parameters to identify, whereas in (3a) there are only 4. The transformation from (3a) to (16) is, however, very useful since we can use the classic linear least squares estimate of the unknown coefficients. As result, we can utilize the exact version of the F-test that has the well-established precision and the exact solution for finding the control limit. It is worth mentioning that if we want to come up with a solution in the form of (3a), then we need to further derive (a1, a2, a3, a4) in (3a) in terms of (b1, b2, …, b6) in (16). Let us consider the model in (2a). This is the nonlinear model that cannot be directly transformed in the desirable linear structure. To deal with the nonlinear model in (2a) we can consider a grid of values of the parameter a2, and then for the given value of a2 perform the linear regression analysis treating the term Va2 as the input variable. Hence, for the data set

Pi = a1 Via2 + εi,

Fig. 2. An illustration of the radial load measurement set-up where the PMU data are obtained.

4. Load models specification for the PMU data The real data set we used in our research comes from the PMU observations of a radial load in the Manitoba Hydro power system. In the Manitoba Hydro power system, the PMU observations are currently available in 48 stations. Within these stations, we only take consideration the ones that connect to the 115 KV lines, and find one of these stations happens to connect to a radial load. Measured by the phasor data concentrators, the instant voltage and frequency observations along this station are available, as well as the instant current observations along the radial load. Note that this load is illustrated in Fig. 2. It is worth noting that within the Manitoba Hydro power system, there is only one radial load within the 48 stations where PMU observations are available. In our research we have used the observations corresponding to this load. The differences of voltage angles and current angles have been calculated, together with the voltage magnitude and the current magnitude information yielding the instant active and reactive power consumptions for the selected load. The PMU observations corresponding to this load have been measured since July 18, 2016 and we collected them until March 3, 2017. The load is inherently exhibiting an ever time-changing nature, but for a very short time interval, the change is relatively slow and can be approximately treated as a time-invariant system. Therefore for field operators and researchers, they can continuously calculate the load characteristics according to the aforementioned model based on the data within the last 10–30 min interval. The time window should be neither too large nor too small. If it is too large then the load characteristics could change noticeably; if it is too small then there would be too little information for the identification. Then the obtained load characteristics can be used to represent the load for the future 10–30 min. In our research, we have obtained the data of the length of 10 min. We also down-sample the observations at the rate 1 Hz yielding the data length is n = 600. In our implementation studies we first compare model (1a) with model (3a), and next we compare model (1b) with model (3b). The identification algorithms are applied to the reduced model in (1) and the full model in (3). This allows us to obtain the corresponding residuals from which the quantities SSE0 and SSE1 are obtained. This in turn is needed for forming the test statistic Fn in (10). As we have discussed in Section 2, the statistic Fn has the F-distribution Fp − p1, n − p if the reduced model is good enough. This in our case leads to the distribution Fp − p1, n − p with p1 = 3, p = 6 and n = 600. As a result, in our testing procedure in (12) we should compare the test statistic Fn with the quantile value Q3,594(α). The values of Q3,594(α) for different α's are: Q3,594(0.1) = 2.0930, Q3,594(0.05) = 2.6169, Q3,594(0.025) = 3.1381, Q3,594(0.01) = 3.8147. This finally allows us (by comparing Fn with Q3,594(α)) to draw a statistical conclusion whether the reduced model (1) can be accepted comparing to the full model (3) for the active power P or for the reactive power Q. We should note that observation interval

i = 1, …, n

and for the given a2 we can estimate a1 via the classic least-squares method. In this case we can get the following explicit formula n

aˆ1 =

∑i = 1 Pi Via2 n

∑i = 1 Vi2a2

.

Then the estimate of a2 can be found by minimizing the least-squares error n

∑ (Pi − aˆ1 Via2)2 i=1

(17)

over the assumed grid values of a2. Concerning the model in (4a) we can write

P = b1 V a2 + b2 V a2 (f − f0 ) + ε , where b1 = a1 and b2 = a1a3. This under the fixed value of a2 can be again viewed as the linear regression model with the two-dimensional input variable (X1, X2) = (Va2, Va2(f − f0)). The ordinary least-squares solution can be easily derived as in (15). Then, an estimate of a2 can be found by minimizing the least-squares error over the assumed set of grid values. Yet another direct approach is based on the formal log transformation of the model in (2a). Hence, let us use the following notation: Z = log(P), V = log(V ) , b1 = log(a1), b2 = a2. Then, the formal log transformation applied to (2a) gives

Z = b1 + b2 V + η,

(18)

where η is the modelling that has the distribution structure different from the original noise ε. This seems to be an attractive approach as one can derive the exact linear model. Nevertheless, this can be a dangerous strategy if the noise structure in the original model (2a) reveals long tails such as the Gaussian distribution. Through examination of the noise distribution, the approach exhibited in (17) was adopted for estimating the models in (2a) and (4a). Let us finally note that the models in (1b), (2b), (3b), (4b) can be also identified in the analogous fashion. 69

Electric Power Systems Research 163 (2018) 66–72

J. Lv et al.

is just 10 min and to obtain an objective decision on the proper model choice we have randomly drawn data subsets of the length 10 min from the entire database representing the 8-months interval. Based on 1000 of such draws we have obtained a sequence of acceptance or rejection decisions of the postulated models. To implement the random selection of data subsets, we have drawn a sequence of uniformly distributed random numbers that are mapped into random positions of the 10 min data interval within the total 8-month time interval. It has been also important to detect and skip possible missing values since occasionally the PMU recordings were unavailable during some time intervals. In summary, the total of 1000 datasets were randomly selected to represent the variable load conditions. The statistical descriptors of such observed data sets are summarized in Table 1. Hence, in Table 1, mean value, standard deviation, maximum/minimum values are shown corresponding to the active power (P), the reactive power (Q), frequency (f), voltage (V), current (I), and power factor (pf). It is also worth mentioning that the error contribution of the dynamic component of the load was found to be less than 0.01%. Having such basic understanding of our data sets variability we next perform our main experiment concerning the testing for the model choice between the model in (1) and the model in (3). Table 2 gives the rejection rate of the simple model (1). This is obtained for various values of the significance level α. It is clear that for nearly 90% (at the level α = 0.05) of the total number of experiments our test rejects the simple model and favors the more complex model that includes the frequency component. The rejection rates are 89.5% and 87.9% for the simple model corresponding to the active power and the reactive power, respectively. Furthermore, we also show in Fig. 3 the boxplot of the values of test statistics Fn, for our test corresponding to the active and reactive power. In the boxplot, the upper and lower levels of the box correspond to the 75% and 25% quantiles of the test distribution. The median value is also indicated. The whiskers extend to the maximum and minimum of the distribution which are not considered as outliers. Moreover, the outliers are also depicted. Fig. 3 reveals that the 25% quantiles of the tests are 8.50 and 7.38, respectively for the active and reactive power parts. Since the F-distribution F3,594 is equal to 0.999985 at 8.50, and 0.999926 at 7.38, meaning that in testing the validity of the frequency component for the ZIP model (corresponding to the active power) in 75% of the total number of experiments, the model without the frequency component is rejected with more than the 99.9985% certainty. Analogous testing for the reactive power suggests that in 75% of the total number of experiments, the model without the frequency component is rejected with more than the 99.9926% certainty. Also in Fig. 3 the median value and the 75% quantile correspond to much larger pvalues, which further confirms the necessity of adopting the frequency component in the load model. In our next experiment we conduct identical studies for comparing exponential models, i.e., the model in (2) with the model in (4) for both the active and reactive powers. First we identify the load models corresponding to the reduced model (2) and the full model (4). From this we determine the corresponding residuals and the quantities SSE0 and SSE1 needed for forming the test statistic Fn in (10). The null hypothesis F-distribution Fp − p1, n − p of Fn is determined for p1 = 2,p = 3, and n = 600. Hence, the test statistic Fn must be compared with the quantile Q1,597(α). For different significance levels we find that: Q1,597(0.1) = 2.7140, Q1,597(0.05) = 3.8571, Q1,597(0.025) = 5.0493, Q1,597(0.01) = 6.6775. Then for the data sets measured over the 10 min intervals we form the decision process according to (12) and decide about the rejection of the reduced model (2). For all the 1000 measured data sets, the percentage of rejecting the reduced model (2) is shown in Table 3. On the other hand, the boxplot representing the values of the test statistic is depicted in Fig. 4. From Table 3, we can draw the conclusion that for 87% of the total

Table 1 Statistical properties of the 1000 sets of load PMU data.

Mean Std Min Max

P (MW)

Q (MVAR)

F (Hz)

V (kV)

I (A)

p.f.

13.608 4.7557 2.1496 25.286

3.2846 1.4406 -1.4458 6.9913

60.0001 0.0106 59.9666 60.0364

115.16 0.3477 114.20 116.25

71.2687 21.7035 22.5534 127.477

0.9430 0.0605 0.4788 1

Table 2 Percentage of rejection of the reduced “ZIP” model (1) compared to the full “ZIP-Frequency” model (3) for respectively active and reactive power averaged over all randomly selected data sets. Significance level α

0.1

0.05

0.025

0.01

Reject “ZIP” in favor of “ZIP-F” for P Reject “ZIP” in favor of “ZIP-F” for Q

90.7% 90.6%

89.5% 87.9%

88.4% 85.4%

86.3% 82.5%

Fig. 3. Boxplot for the values of test statistics Fn for comparing ZIP model (1) vs ZIP model with frequency component (3) for active and reactive power, respectively.

Table 3 Percentage of rejection of the reduced exponential model (2) compared to the full exponential-frequency model (4) for respectively active and reactive power averaged over all randomly selected data sets. Significance Level α

0.1

0.05

0.025

0.01

Reject “Exp” in favor of “Exp-F” for P Reject “Exp” in favor of “Exp-F” for Q

88.4% 89.4%

86.8% 86.7%

85.8% 84.6%

84.1% 81.8%

Fig. 4. Boxplot for the values of test statistics Fn for comparing exponential model (2) vs exponential model with frequency component (4) for active and reactive power respectively.

70

Electric Power Systems Research 163 (2018) 66–72

J. Lv et al.

Fig. 5. Q–Q plot for residuals of fitted model (3): (a) active power, (b) reactive power.

Fig. 6. Q–Q plot for residuals of fitted model (4): (a) active power, (b) reactive power.

deviation. Then, it is compared with the standard Gaussian distribution. Hence, the Q–Q plots corresponding averaged and standardized residuals are shown in Figs. 5 and 6. Figs. 5 and 6 reveal that the distribution of the residuals is not too far from the Gaussian distribution. The shape of the curves suggests that they have lighter tails comparing to the Gaussian distribution. Note also that the reactive power models exhibit the residual process distribution closer to the Gaussian distribution than the corresponding active power models. Therefore, we can conclude that the proposed F-test is properly reflecting the nature of the observed real PMU data.

experiments, the simple model (2) is rejected at the significance level α = 0.05. We note that in testing the validity of the frequency component for the exponential model (corresponding to the active power part) in 86.8% of the total number of experiments, the model without the frequency component is rejected with more than the 95% certainty. Analogous testing for the reactive power suggests that in 86.7% of the total number of experiments, the model without the frequency component is rejected with more than the 95% certainty. Fig. 4 reveals that the 25% quantile for the tests are 12.71 and 11.68, respectively for the active and reactive model. Since the F-distribution F1,597 is equal to 0.999608 at 12.71, and 0.999327 at 11.68, meaning that in testing the validity of the frequency component for the exponential model (corresponding to the active power) in 75% of the total number of experiments, the model without the frequency component is rejected with more than the 99.9608% certainty. Analogous testing for the reactive power suggests that in 75% of the total number of experiments, the model without the frequency component is rejected with more than the 99.9327% certainty. Also in Fig. 4 the median value and the 75% quantile correspond to much larger p-values, which further confirms the necessity of adopting the frequency component in the load exponential model. In our next experiment we investigate the issue whether the error process {εj} is normally distributed. In order to verify that, we derive the Q–Q plots of the residuals process versus the Gaussian distribution. The empirical quantiles of the residuals process are determined based on the 1000 number of data sets obtained from over 10 min time interval. Hence, the obtained Q–Q plots illustrate the average plot from the 1000 possible such plots. The Q–Q plots for the model in (3) and the model in (4) are shown in Figs. 5 and 6, respectively. It is worth noting that considering the variability of load characteristics demonstrated in Table 1, the residuals according to each 10 min data set would vary significantly in variance, therefore each set of residuals are first standardized by dividing its own standard

5. Concluding remarks In this paper, we have presented a formal statistical procedure for testing the validity of the frequency component in the static load models. We applied our testing methodology to the specific case study represented by the PMU data. The results reveal the strong evidence for incorporating the frequency component in the examined load models. The developed testing methodology can be applied by researchers and power engineering operators to select the proper load models appearing in power engineering systems. Acknowledgements This research has been supported by Manitoba Hydro under Grant #T320 as well as Mitacs Grant IT06521. References [1] P. Kundur, N.J. Balu, M.G. Lauby, Power System Stability and Control, McGrawHill, New York, 1994. [2] W.W. Price, K.A. Wirgau, A. Murdoch, J.V. Mitsche, E. Vaahedi, M. El-Kady, Load modeling for power flow and transient stability computer studies, IEEE Trans. Power Syst. 3 (1) (1988) 180–187. [3] F.L. Quilumba, W.-J. Lee, J. Jativa-Ibarra, Load models for flat-panel TVs, IEEE

71

Electric Power Systems Research 163 (2018) 66–72

J. Lv et al.

[12] S. Sabir, D. Lee, Dynamic load models derived from date acquired during system transients, IEEE Trans. Power Appar. Syst. (9) (1982) 3365–3372. [13] M. Jin, H. Renmu, D.J. Hill, Load modeling by finding support vectors of load data from field measurements, IEEE Trans. Power Syst. 21 (2) (2006) 726–735. [14] J. Ma, D. Han, R.-M. He, Z.-Y. Dong, D.J. Hill, Reducing identified parameters of measurement-based composite load model, IEEE Trans. Power Syst. 23 (1) (2008) 76–83. [15] R. Balanathan, N. Pahalawaththa, U. Annakkage, P. Sharp, Under voltage load shedding to avoid voltage instability, IEE Proc. Gener. Transm. Distrib. 145 (2) (1998) 175–181. [16] S. Yang, M. Wu, X. Yao, J. Jiang, Load modeling and identification based on ant colony algorithms for EV charging stations, IEEE Trans. Power Syst. 30 (4) (2015) 1997–2003. [17] I.F. Visconti, D.A. Lima, J.M.C. de Sousa Costa, N.R.de.B.C. Sobrinho, Measurementbased load modeling using transfer functions for dynamic simulations, IEEE Trans. Power Syst. 29 (1) (2014) 111–120. [18] Z. Guo, Z.J. Wang, A. Kashani, Home appliance load modeling from aggregated smart meter data, IEEE Trans. Power Syst. 30 (1) (2015) 254–262. [19] G. Casella, R.L. Berger, Statistical Inference, Duxbury Pacific Grove, CA, 2002. [20] G.A. Seber, C. Wild, Nonlinear Regression, John Wiley & Sons, 2003. [21] J. Fan, J. Jiang, Nonparametric inference with generalized likelihood ratio tests, Test 16 (2007) 409–444.

Trans. Ind. Appl. 50 (6) (2014) 4171–4178. [4] T. Ohyama, A. Watanabe, K. Nishimura, S. Tsuruta, Voltage dependence of composite loads in power systems, IEEE Trans. Power Appar. Syst. (11) (1985) 3064–3073. [5] K. Srinivasan, C. Lafond, Statistical analysis of load behavior parameters at four major loads, IEEE Trans. Power Syst. 10 (1) (1995) 387–392. [6] J. Ribeiro, F. Lange, A new aggregation method for determining composite load characteristics, IEEE Trans. Power Appar. Syst. (8) (1982) 2869–2875. [7] L. Pereira, D. Kosterev, P. Mackin, D. Davies, J. Undrill, W. Zhu, An interim dynamic induction motor model for stability studies in the WSCC, IEEE Trans. Power Syst. 17 (4) (2002) 1108–1115. [8] W. Price, H.-D. Chiang, H. Clark, C. Concordia, D. Lee, J. Hsu, S. Ihara, C. King, C. Lin, Y. Mansour, et al., Load representation for dynamic performance analysis, IEEE Trans. Power Syst. 8 (2) (1993) 472–482. [9] B.-K. Choi, H.-D. Chiang, Y. Li, Y.-T. Chen, D.-H. Huang, M.G. Lauby, Development of composite load models of power systems using on-line measurement data, Power Engineering Society General Meeting, 2006, IEEE, 2006, pp. 8–16. [10] H. Renmu, M. Jin, D.J. Hill, Composite load modeling via measurement approach, IEEE Trans. Power Syst. 21 (2) (2006) 663–672. [11] J.-C. Wang, H.-D. Chiang, C.-L. Chang, A.-H. Liu, C.-H. Huang, C.-Y. Huang, Development of a frequency-dependent composite load model using the measurement approach, IEEE Trans. Power Syst. 9 (3) (1994) 1546–1556.

72