Electric Power Systems Research 69 (2004) 161–167
Power system security evaluation using ANN: feature selection using divergence K.R. Niazi a,∗ , C.M. Arora b , S.L. Surana c a b
Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan 302017, India Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan 302017, India c Department of Electrical Engineering, S.K. Institute of Technology, Jaipur, Rajasthan 302017, India Received 8 February 2003; received in revised form 28 July 2003; accepted 21 August 2003
Abstract This paper presents an artificial neural network (ANN)-based method for on-line security evaluation of power systems. One of the important considerations in applying ANN is feature selection. A new divergence-based feature selection algorithm has been proposed and investigated. The method has been applied on an IEEE test system and the results demonstrate the suitability of the proposed method for on-line security evaluation of power systems even under changing topological conditions. © 2003 Elsevier B.V. All rights reserved. Keywords: Power system security; ANN; Decision tree; Feature selection
1. Introduction Security evaluation is a major concern in the real time operation of modern power systems. The present trend towards deregulation has forced modern electric utilities to operate their systems under stressed operating conditions closer to their security limits. Under such fragile conditions, any disturbance could endanger system security and may lead to system collapse. Therefore, there is a pressing need to develop fast on-line security monitoring method which could analyze the level of security and forewarn the system operators to take necessary preventive actions in case need arises. A complete answer about power system security requires evaluation of transient stability of power systems following some plausible contingencies. Several methods for fast transient stability evaluation have been proposed in the past by adopting namely direct methods, pattern recognition (PR) technique, decision tree (DT) methods and artificial neural network (ANN) approach [1–6]. Direct methods aim to circumvent computational drawback of classical method by avoiding the exploration of post-disturbance phase. Direct methods in the form of transient energy function (TEF) have been successfully used ∗
Corresponding author. E-mail address:
[email protected] (K.R. Niazi).
0378-7796/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.epsr.2003.08.007
for dynamic security assessment purposes [1,2]. However, this approach is applicable under simplified modeling assumptions. The determination of TEF is a cumbersome task, specially for modern power systems which are large and complex. PR approach despite its appealing principle does not offer satisfactory transient security evaluation tool [3]. The main drawback of PR approach is that it requires generation of a quadratic or higher order security classifier, which is a function of the potential features (variables) of the power system. As the size of power system grows, the number of classifier parameters becomes very large and may make the classifier unreliable unless a prohibitively large learning set is considered [4]. DT approach falls in the broader category of PR approach. Conceptually the two approaches proceed in a similar manner viz., on the basis of information gathered in off-line phase, they aim at building on-line classifier [4]. The DT approach pursues much broader objective than mere classification. However, DT approach suffers from the same drawback as the classical PR approach, that is, as the size of power system grows, complexity of DT classifier becomes very high which requires a prohibitively large training set. The ANNs have shown a great promise as a means of predicting security of large electric power systems [5,6]. The research paper of Fidalgo et al. [7] highlights the superiority
162
K.R. Niazi et al. / Electric Power Systems Research 69 (2004) 161–167
of ANN approach over PR and DT approaches for predicting dynamic security of power systems. This paper presents an ANN-based method for on-line security evaluation of power systems. One of the important consideration in applying neural networks to power system security evaluation is the proper selection of training features among a large number of features that may characterize a given power system. Many feature selection criteria are available in the literature such as feature selection through entropy minimization, function expansion, f-value maximization, divergence maximization, etc. The basic problem encountered in all feature selection algorithms is searching an optimal combination of features among a large number of possible solutions. Recently, Jensen et al. [5] proposed a fisher discrimination function-based back track feature selection algorithm to find an optimal subset of neural network training feature for power system security assessment purposes. However, the problem with this feature selection algorithm is that it works well with linearly separable classes because fisher discrimination function basically seeks to find an optimal linear discrimination function for separating the two classes. How well this algorithm will perform on non-linearly separable classes is not established. Power system security evaluation is a complex non-linear problem, which may not have linear seperability between secure and insecure classes. Moreover the proposed back-track algorithm requires to pre define an optimal number of features to search for, which seems to be a difficult task. The present paper attempts to investigate the concept of divergence coupled with a backward sequential algorithm to find an optimal number and optimal combination of neural network training features. The organization of this paper is as follows. In Section 2, the ANN methodology is described and a divergence-based feature selection algorithm is presented. In Section 3, the method has been applied and tested for its applicability and effectiveness on the IEEE-57 bus power system. A comparison of different feature selection techniques is presented. Finally conclusions are presented in Sections 4.
2. The ANN methodology The proposed method uses ANN as a classifier, which classifies the operating state of a power system into secure and insecure classes under a predefined set of contingency. The ANN classifier is trained using an off-line data set, which is generated by most accurate power system solution methodology. The trained network is then, used for on-line security evaluation propose. The complete description of the ANN methodology is not within the scope of this paper. Nevertheless, it is important to discuss the basic design procedure, which involves the following steps. (1) Feature selection. (2) Data set generation for training.
(3) Architecture of ANN model and its training algorithm. (4) Performance evaluation. Feature selection is a process of selecting a small subset of features from a large number of features (variables) that may characterize a given power system. It involves dimensionality reduction to identify most significant and useful subset of features that carries sufficient discriminating properties to perform the given classification task most accurately. If too small feature subset is used to train the network, the neural network may not acquire the desired discriminating properties or may fail to converge even during training phase. If the size of feature subset is too large, it may require a prohibitively large size of training set beside taking large training time. In fact, the curse of dimensionality states that as a rule of thumb, the required cardinality of the training set for accurate training increases exponentially with the input dimension (training features) [5]. Mansour et al. observed in a study on B.C. Hydro and Hydro Quebec for dynamic contingency screening that accuracy of ANN classifier increases when the number of neural training features were reduced [8]. Therefore, as regards to ANN, feature selection is a process to identify an optimal combination of features which contributes most to the discriminating ability of the network and discard the rest. In the present method, feature selection is carried out in two stages. In the first stage, an initial feature set is selected which is based on the knowledge of power system and objective of the problem to be solved. Initial feature set is a general set of features, which is independent of data set used for training. The idea behind the first stage of feature selection is to eliminate the insensitive features a priori so as to avoid exhaustive search of second stage of feature selection algorithm. The initial feature set should have the following properties [7]: 1. The feature set should adequately characterize an operating state of a power system from security point of view. At the same time it should be small enough to avoid unnecessary computation. 2. Features in the feature set should be independent. 3. Features should be monitorable and controllable so that control action may be exercised, if need be. 4. As far as possible, features should be independent of network topology. Arora and co-worker [9,10] have derived that if power system is not optimally dispatched, the feature set consisting of pre-disturbance real and reactive power generations and real and reactive power demands at each system bus carry sufficient information about system security. Under certain justified assumptions, the generator currents and load currents can be expressed as a function of generator currents. The generator currents are directly related to the real and reactive powers of the generators. Therefore, the real power and reactive power demands can be expressed as a function of real and reactive powers of generators. Thus, an attribute
K.R. Niazi et al. / Electric Power Systems Research 69 (2004) 161–167
set (feature set) consisting of only real and reactive power generations is capable of providing sufficient discriminating information about the class of system security (secure or insecure). This fact is also supported by the outcome of the research paper [5]. Therefore, the proposed initial feature set consists of pre-disturbance real and reactive power generation of each generator, i.e., PGi and QGi , respectively for i = 1, 2, . . . , N, where N is the total number of generators in the system. The second stage of feature selection makes use of the concept of divergence which is calculated using the training set and therefore final feature subset is specific to the power system and contingency set considered. Divergence is a measure of dissimilarly between two classes and therefore, it can be used in feature ranking and feature selection. The divergence Jij between two classes, say i and j, can be expressed as [11] Jij = 21 tr [(Ci − Cj )(Cj−1 − Ci−1 )] + 21 tr [(Ci−1 + Cj−1 )(mi − mj )(mi − mj )t ]
(1)
where tr is the trace of a matrix and is equal to sum of its Eigen values, Ci the covariance matrix of class i of size [n × n], Cj the covariance matrix of class j of size [n × n], mi the mean vector of class i of size [n × 1], mj the mean vector of class j of size [n × 1] (mi − mj )t the transpose of (mi − mj ) and n is the number of features. Since divergence is a measure of dissimilarity between two classes, the features, which give large divergence, are more important. Any feature that makes the least contribution to the total divergence may be discarded. There are many ways to search for the best feature subset using the concept of divergence such as back track method, forward sequential method, backward sequential method, etc. The proposed feature selection algorithm makes use of the divergence in backward sequential manner. The backward sequential search guarantees the optimal solution if the criterion function satisfies the monotonicity condition. The monotonicity condition requires that values of the criterion function be non-decreasing when additional features are added. The divergence satisfies the condition of monotonicity [11]. The advantage of using backward sequential method is that feature selection process can be stopped at any stage, if it is found that further reduction in the size of feature subset is significantly affecting the discriminating properties of feature subset. Though the backward sequential method is computationally more demanding but since feature selection and neural network training are carried out in off-line phase, computation time of feature selection and neural training do not affect the on-line classification time of the ANN classifier. The size of feature subset is an important ANN design consideration. So far, in the literature, the size of feature set has been chosen using heuristics or trial and error method. However, to solve this problem in a systematic manner it is proposed to define and use two terms namely “maximum
163
permissible percentage change in divergence”, Jmax and “minimum number of features required by the classifier”, nmin . The multilayer percepton requires at least two inputs for its training. Therefore, appropriate value of nmin is 2. The parameter, Jmax is a measure of maximum permissible reduction in the discriminating properties of the feature subset. Therefore, the appropriate choice of Jmax is problem dependent. Normally, initial feature subset contains many correlated and redundant features. The aim of feature selection algorithm is to remove these correlated and redundant features. It has been found that if these features are removed one by one from the feature subset, the change in divergence is in the range of 0–7%. Therefore, appropriate value of Jmax could be chosen from this range. The detailed feature selection algorithm is as follows: 1. Read the data of initial feature subset and assume suitable values of Jmax , nmin . 2. Find the divergence Jij (n) of feature subset a(n) having n number of features. 3. Remove one feature, at a time, from the feature subset a(n) to form n feature subsets having (n − 1) features and determine the corresponding divergences Jijk (n − 1), k = 1, 2, . . . , n. 4. Determine the decrease in divergence due to each individual feature, i.e.
Jijk (n) = Jij (n) − Jijk (n − 1);
for k = 1, 2, . . . , n
where n is the size of current feature subset. 5. If Jijk (n) ≥ Jmax for all k = 1, 2, . . . , n, go to step 7, else remove the feature which is causing minimum change in divergence. 6. Set n = n − 1: if n = nmin , then go to step 7, else go to step 2. 7. Output the desired feature subset a(n). The second step in the design of ANN classifier is data set generation for network training. The primary objective of data set generation is to obtain a sufficiently rich data base containing plausible operating states of the power system. To generate a data set, initially a large number of load samples are randomly generated in the typical range of 50–150% of their base case values. For each load sample (load combination) optimal power flow (OPF) study is performed to obtain steady operating state. A disturbance (fault), from a predefined set of contingency, is simulated for a specified duration of time. Using dynamic stability studies, load angle trajectories of all generators are computed and plotted over a period long enough to ascertain system stability under the specified disturbance. Similarly, for each of the disturbances from the contingency set dynamic simulation is performed to ascertain system stability under the corresponding disturbance. For carrying out dynamic simulation, numerical
164
K.R. Niazi et al. / Electric Power Systems Research 69 (2004) 161–167
network is tested for its performance on a test set of unseen patterns.
Output signals
Output layer Input layer
First hidden layer
Second hidden layer
Fig. 1. Architecture of a MLP with two hidden layers.
integration technique is used as it has the flexibility to include all kinds of modeling sophistication and thus is able to provide desired degree of accuracy. If a steady state operating point is found to be stable, for all disturbances of the contingency set, the operating state is assigned “secure (0)” class label else it is assigned “insecure (1)” class label. Thus, for each operating state a labeled pattern is formed which contains the values of selected features along with its associated security class label. During data set generation some operating states are also generated in the neighborhood of optimal dispatch to ensure inclusion of all realistic operating states. Some frequent topological changes may also be considered during data generation. The data set is then, normalized to suit the requirement of the ANN training. The whole data set is suitably divided into training set and test set for training and performance evaluation purposes. The ANN model selected for on-line security evaluation is a multi-layer perception (MLP) as shown in Fig. 1. It consists of an output layer with one neuron specifying the security class. The number of inputs to the network is equal to the number of training features. The number of hidden layers is one or more. The network is trained using Resilient error back propagation algorithm commonly abbreviated as RPROP [12,13]. It is an adaptive weight learning algorithm, which adapt the weight step based on the local gradient information. In this algorithm the sign of partial derivative of performance function with respect to the weight is used only to determine the direction of weight update. The magnitude of partial derivative has no effect on the weight update unlike other common adaptive learning algorithms. This overcomes the problem of slow convergence due to unforeseeable behavior of the value of partial derivative. The size of weight update is determined by a separate update value. The complete description of RPROP is given in [12]. Due to its very nature of weight update the Resilient back propagation algorithm converges much faster than the conventional error back propagation algorithm and is not very sensitive to the settings of training parameters. During the training, the network performance is closely monitored to prevent network memorization. The trained
3. Simulation and results To investigate the effectiveness of the proposed method a study was performed on IEEE-57 bus system. The system consists of 7 generators, 57 buses, 67 transmission lines, 18 transformers and 42 loads. The diagram of the system is given in [14] and the data were taken from [14,15]. It is assumed that contingency set contains only one disturbance, which is a three-phase fault on the 400 kV transmission line connecting buses 8 and 9, near bus 9. Duration of the disturbance is assumed to be 210 ms, which is cleared by opening the line at both the ends. By varying the loads randomly between 50 and 150% of their base case values two sets of data have been generated. The first data set consists of 1000 operating states with fixed system topology. Second data set consisting of 2200 operating states is generated under 12 different topological conditions. The changes in topology are spread through out the system and include removal of single 400 kV transmission lines 2, 5, 14, 15, 19 and 28 one at a time, simultaneous removal of a set of three transmission lines at a time such as (2, 5, 14), (15, 19, 28), etc. The second data set is then shuffled several times to thoroughly mix the data of different topology. To highlight the effectiveness of the proposed feature selection algorithm, a comparative study has been carried out. Three different feature selection criteria, namely the proposed method, the Fisher discrimination method [5] and the Entropy minimization method [3,4], have been applied on both the data sets to select neural training features. The application of proposed feature selection algorithm on the fixed system topology data gave the following feature set F1f = [PG1 , QG5 , QG7 ] While multiple topology data set gave the feature set F1m = [PG1 , QG5 ] Here, the value of Jmax is taken as 5% of the total divergence before feature selection and value of nmin is taken as 2. The application of the Fisher discrimination-based method on the fixed and multiple topology date gave the same feature set consisting of PG1, QG1 and QG7 , i.e., F2f = F2m = [PG1 , QG1 , QG7 ] The application of entropy-based method on the fixed system topology data gave the following feature set F3f = [PG5 , PG7 , QG3 ]
K.R. Niazi et al. / Electric Power Systems Research 69 (2004) 161–167 Table 1 Result of fixed topology data (using feature subset F1f) No.
1 2 3
Network architecture used h1–h2–h3–0 or h1–h2–0
Percentage error on training set (500)
Percentage error on test set (500)
20–10–1 20–10–1 20–10–1
1.2 1 1
1.2 1.6 1.2
1.06
1.33
Average
165
Table 4 Comparison of different feature selection techniques on fixed topology data No.
Percentage error of the ANN classifiers on test set when the networks are trained by feature selected using Proposed method
Fisher method
Entropy method
1 2 3
1.2 1.6 1.2
3.2 3.4 2.8
3.2 4.0 3.8
Average
1.33
3.133
3.67
Table 2 Result of fixed topology data (using feature subset F2f) No.
Network architecture used h1–h2–h3–0 or h1–h2–0
Percentage error on training set (500)
Percentage error on test set (500)
1 2 3
20–10–1 30–10–1 10–30–1
0.4 0.4 0.4
3.2 3.4 2.8
No.
Network architecture used h1–h2–h3–0 or h1–h2–0
Percentage error on training set (1000)
Percentage error on test set (1200)
0.4
3.133
1 2 3
20–10–1 20–10–1 20–10–1
2 1.5 2.1
1.5 1.41 1.41
1.86
1.44
Average
While multiple topology data set gave the following feature subset F3m = [PG7 , QG3 , QG7 ] Using the fixed topology data set and feature subsets of fixed system topology, three MLP (one for each feature subset selected using three different feature selection criteria) have been trained using Resilient back propagation algorithm. The neural architecture used in this study consists of two hidden layers h1 and h2 and an output layer, 0. The output layer has one neuron which specifies the security class. Three training and testing runs have been performed for the network for each case and their best results are shown in Tables 1–3. The summary of these results is shown in Table 4. Similarly a second set of neural neural networks has been trained using multiple topology data and multiple topology feature subsets namely F1m, F2m and F3m. The corresponding test results for three best training runs are shown in Tables 5–7 and a comparison of these results is shown in Table 8. Tables 4 and 8 show that the neural networks trained by feature selected using proposed divergence-based feature selection algorithm, give much better result than the others. Thus, the results support and verify the effectiveness of the
1 2 3 Average
Average
Table 6 Result of fixed topology data (using feature subset F2m) No.
Network architecture used h1–h2–h3–0 or h1–h2–0
Percentage error on training set (1000)
Percentage error on test set (1200)
1 2 3
30–10–1 20–10–1 30–10–1
1.1 1.2 0.6
3.25 3.75 4.0
0.967
3.67
Average
Table 7 Result of fixed topology data (using feature subset F3m) No.
Network architecture used h1–h2–h3–0 or h1–h2–0
Percentage error on training set (1000)
Percentage error on test set (1200)
1 2 3
20–10–1 20–10–1 20–10–1
1.4 0.9 0.7
3.08 3.416 3.416
1.0
3.304
Average
Table 8 Comparison of different feature selection techniques on multiple topology data
Table 3 Result of fixed topology data (using feature subset F3f) No.
Table 5 Results of multiple topology data (using feature subset F1m)
Network architecture used h1–h2–h3–0 or h1–h2–0
Percentage error on training set (500)
Percentage error on test set (500)
20–10–1 30–10–1 40–10–1
0.2 0.2 0.4
3.2 4.0 3.8
0.267
3.67
No.
Percentage error of the ANN classifiers on test set when the networks are trained by feature selected using Proposed method
Fisher method
Entropy method
1 2 3
1.5 1.41 1.41
3.25 3.75 4.00
3.08 3.416 3.416
Average
1.44
3.67
3. 304
166
K.R. Niazi et al. / Electric Power Systems Research 69 (2004) 161–167
proposed feature selection algorithm to select neural network training features. The high accuracy rate of the test results shown in Tables 4 and 8 also highlights the effectiveness of the proposed method for predicting security of power systems using a small subset of power system variables. Modern power systems are prone to frequent changes in system topology due to many factors such as maintenance, repair, etc. Therefore, special care needs to be taken to minimize the effect of topology changes on the performance of the neural network. One approach to deal with this problem is to train different neural networks for each possible system topology and to use the specific network that reflects the current topology of the system. This approach is only practical when there are a few possible changes in system topology. Another approach is to choose training features that are independent of changes in system topology. This allows a single neural network to tackle security assessment problem of the power system under varying system topology. This also allows the single network to predict security of the power system under unexpected topological changes. The proposed feature selection algorithm when applied on a data set containing examples of as many different topologies as possible, gives features which are independent of changes in system topology and at the same time, which carries sufficient discriminating properties about system security. To investigate this capability of proposed feature selection algorithm, two more neural networks have been trained using the same fixed topology data. However, the first neural network is trained using features selected from fixed topology data, i.e., PG1, QG5 and QG7 while the second neural network is trained using features selected from multiple topology data, i.e., PG1 and QG5 . The trained networks have been tested on the same data set of multiple system topology. The test results of the two networks are shown in Table 9. It can be seen from this table that neural network trained with features selected from multiple topology data gives an accuracy rate of nearly 98.6% which is much better than the one obtained by the network trained using features selected from fixed topology data. This is a remarkable result considering the facts that both neural networks were trained by the same single topology data and tested on the same multiple topology data. The test results highlight the importance Table 9 Effect of topology on the performance of ANN classifier No.
Feature selected from single topology data (PG1 , QG5, QG7 ) error in percent on
Feature selected from multiple topology data (PG1 , QG5 ) error in percent on
Training set (1000)
Test set (2200)
Training set (1000)
Test set (2200)
1 2 3
0.8 0.9 1.0
2.86 3.27 3.63
1.1 1.3 1.2
1.363 1.409 1.409
Average
0.9
3.253
1.2
1.393
of selecting features, which are independent of changes in power system topology. This result also indicates that the feature QG7 is not independent of topological changes in the power systems as because of this feature only there is decrease in accuracy under varying topological conditions. Thus, the test results demonstrate the capability of the proposed feature selection algorithm to select an optimal subset of training features, which are independent of topological changes. 4. Conclusion Artificial neural network-based method for on-line security evaluation of power systems is presented. A divergence-based backward sequential algorithm is proposed to select an optimal set of neural training feature. The proposed method has been applied on the IEEE-57 bus system under different operating conditions and the test results are promising giving an accuracy rate of nearly 99%. A study has been carried out to compare different feature selection techniques for selecting ANN training features. The results of the comparison show that the proposed divergence-based method is superior to the other methods for security evaluation of power systems. The test results shown in Table 9 highlight the importance of selecting features, which are independent of changes in power system topology. The test results also show that the proposed feature selection algorithm can be effectively used to select features, which are independent of changes in system topology. The neural network trained using such features works well even under unexpected change in topology. Thus, it relieves the burden of designing a separate neural network classifier for each possible system topology and minimizes the need to keep abreast with every possible change in system topology. The study has been carried out on 550 MHz, Pentium III machine and it has been found that to classify the whole data set consisting of 2200 operating states the neural network classifier invariably takes less than 30 ms.
References [1] E. Vaahedi, Y. Mansour, E.K. Tse, A general purpose method for on-line dynamic security assessment, IEEE Trans. Power Syst. 13 (1) (1998) 243–250. [2] M.J. Laufenferg, M.A. Pai, A new approach to dynamic security assessment using trejectory sensitivities, IEEE Trans. Power Syst. 13 (3) (1998) 953–958. [3] L. Wehenkal, T.H. Van Cutsem, M. Ribbens-Pavella, Inductive inference applied to on-line transient stability assessment of electric power systems, Automatica 25 (3) (1989) 445–451. [4] L. Wehenkal, T.H. Van Custsem, M. Ribbens-Pavella, An artificial intelligence frame work for on-line transient stability assessment of power systems, IEEE Trans. Power Syst. 4 (2) (1989) 789–800. [5] C.A. Jensen, M.A. El-Sharkawi, R.J. Marks II, Power systems security assessment using neural networks: Feature selection using Fisher discrimination, IEEE Trans. Power Syst. 16 (4) (2001) 757– 763.
K.R. Niazi et al. / Electric Power Systems Research 69 (2004) 161–167 [6] K. Shanti Swarup, P. Britto Corthis, ANN approach assesses system security, IEEE Comput. Appl. Power July (2002) 31–38. [7] J.N. Fidalgo, V. Miranda, J.A.P. Lopes, Neural networks applied to preventive control measures for the dynamic security of isolated power systems with renewables, IEEE Trans. Power Syst. 11 (4) (1996) 1811–1816. [8] Y. Mansour, E. Vaahedi, M.A. El-Sharkawi, Large scale dynamic security screening and ranking using neural networks, IEEE Trans. Power Syst. 12 (2) (1997) 954–960. [9] C.M. Arora, On-line transient security evaluation using pattern recognition technique, Ph.D. Thesis, Department of Electrical Engineering, JNV University, Jodhpur, India, 1991.
167
[10] C.M. Arora, S.L. Surana, Transient security evaluation and preventive control of power systems using PR techniques, IE (I) J.-EL 76 (1996) 199–203. [11] J.T. Tau, R.C. Ganzalez, Pattern Recognition Principles, AddisonWesley, Reading, MA, 1974. [12] M. Riedmiller, H. Braun, A direct adoptive method for faster backpropagation learning: the RPROP algorithm, IEEE Int. Conf. Neural Networks 1 (1993) 586–589. [13] H. Demuth, M. Beale, Neural network tool box for use with MATLAB, User’s Guide, Version 4, The Math Works Inc., 2001. [14] http://www.ee.washington.edu/research/pstca. [15] http://www.pserc.cornell.edu/matpower.