Automatica, Vol.25, No. 3, pp. 445-451, 1989 PrintedinGreatBritain.
0005-1098/89 $3.00+ 0.00 PergamonPresspie ~) 1989InternationalFederationof AutomaticControl
Brief Paper
Inductive Inference Applied to On-line Transient Stability Assessment of Electric Power Systems* L. WEHENKEL,t~: TH. VAN CUTSEMt~ and M. RIBBENS-PAVELLAt Key Words--Artificial intelligence; inductive inference; knowledge based systems; on-line operation; power systems control; transient stability analysis and preventive control.
Abstmet--A decision tree methodology is proposed for the purpose of predicting the robustness of a power system in the occurrence of severe disturbances, and of discovering appropriate control actions, whenever needed. The decision trees are built off-line and used on-line. Their building calls upon an inductive inference method, which automatically extracts the relevant information from large bodies of simulation data and expresses it in terms of the system controllable variables. The trees provide a clear understanding of the intricate mechanisms of transient stability, and means to control. This paper focuses on fundamental issues relative to the inductive inference method. A real world power system comprising 14 generators, 112 branches and 92 nodes is used to illustrate it.
this kind of analysis: the time-domain, and the direct (Lyapunov like) ones. The former consists of integrating numerically the equations of motion of the system, first during the disturbance, then in the post-disturbance phase. Their main advantage is that they are flexible with respect to power system models, and thus able to provide the desired degree of accuracy; however, this is obtained at the expense of extremely (in many instances excessively) cumbersome runs, totally inappropriate for real-time purposes. Direct methods, on the other hand, aim to circumvent this drawback by avoiding the exploration of the post-disturbance phase: they merely assess whether the system entering this phase lies inside the domain of attraction of its post-disturbance stable equilibrium. Based on the construction of a Lyapunov function, direct methods have eventually overcome their initial difficulties (Ribbens-Pavella and Evans, 1985). Nowadays they are considered to be complementary to the time-domain approaches: they are less flexible (they can only handle very simplified system modelling), but much faster, yet not enough to meet real-time requirements. In brief, the above two classes are inappropriate for on-line transient stability analysis; besides, they are both of the "black-box" type, providing a "yes or no" answer, without qualitative insight on the system's behaviour, nor an explicit sensitivity information (EPRI Project, 1987). In addition to the above, we mention two other approaches: pattern recognition and the extended equal area criterion. Despite its appealing principle, the former still yields poor transient stability tools after more than 25 years of research effort (Prabhakara and Heydt, 1987). The latter exhibits interesting on-line and also control capabilities, but is restricted to very simplified power system modelling (Xue et al., 1988). The methodology developed in this paper is conceptually different from all previous ones. It is based on inductive inference and more specifically on ID3 by Quinlan (Michalski et aL, 1984), which is a member of the TDIDT (top-down induction of decision trees) family. Most of the inductive inference methods infer decision rules from large bodies of preciassified data samples. The TDIDT aims to produce them in the form of decision trees, able to uncover the relationship between a phenomenon and the observable variables driving it. Adapted and applied to transient stability assessment, our method intends to uncover the intrinsically very intricate relationships between static, pre-disturbance conditions of a power system and their impact on its transient behaviour. The reasons for choosing a member of the TDIDT family and the modifications we propose in order to adapt it to the specifics of our problem are discussed below. Let us only mention here that our ultimate goal is to combine on-line and refined modelling capabilities, while encountering both analysis and control objectives. To reach this goal, we handle the problem in two phases. One, off-line, where decision
1. Introduction IN THE AREA of power system security monitoring and control, transient stability is considered to be one of the most important and at the same time most problematic issues. Indeed, in a broad sense, transient stability is concerned with the system's capability to withstand severe disturbances, causing important electromechanical transients: this is a strongly non-linear problem; it is also highly dimensional, since power systems are by essence large scale. Ideally, transient stability assessment should encompass a twofold aspect: analysis, to appraise the system's robustness in the occurrence of a large disturbance, and control, to suggest remedial actions, i.e. means to enhance robustness, whenever needed. Moreover, in the context of real-time operation, both analysis and control should be performed on-line. The presently available methods are unable to solve this twofold problem; they can merely tackle analysis, and yet on an off-line basis only. Traditionally, analysis is carried out by simulating a series of plausible large disturbances (generally one at a time) and assessing how long such a disturbance may remain without causing the irrevocable loss of the system synchronism; this is currently computed by considering increasing disturbance durations and observing whether the corresponding system's post-disturbance configuration shows a tendency towards stabilization or not. Two broad classes of approaches are available nowadays to
* Received 12 September 1987; revised 24 May 1988; revised 23 August 1988; received in final form 6 September 1988. The original version of this paper was presented at the 10th IFAC World Congress which was held in Munich, F.R.G., during July 1987. The Published Proceedings of this IFAC meeting may be ordered from: Pergamon Press pie, Headington Hill Hall, Oxford OX3 0BW, U.K. This paper was recommended for publication in revised form by Editor A. P. Sage. %University of Liege, Institut Montefiore, Sart-Tilman, B28, B-4000 Liege, Belgium. ~:FNRS. ArT 25:3-~
445
446
Brief Paper
trees are built up. The other, on-line, where these trees will be used. The building of decision trees is guided by the fact that their information content must assume a form appropriate to their on-line use for both analysis and control. This research project is admittedly quite broad and is still at an early stage of development. In this paper, we lay the foundations of our inductive inference approach, identify the various intermediate problems in both inductive inference and transient stability areas, solve some of them, discuss some others and illustrate interesting aspects by means of practical examples.
2. Inductive inference (Michalski et al., 1984; Pao et al., 1985; Rendell, 1986) In a broad sense, inductive inference (II) is the process of deriving from specific observed data an assertion which explains them in some respect and which can be used to predict properties of new, unseen ones. The subarea of II we are concerned with is concept learning from examples. The inductive assertion is a decision rule which allows the assignment of any object to a class. The decision rules that we will be dealing with are decision trees of the TDIDT family. To make the paper self contained, we give hereafter the necessary definitions and a short description of the TDIDT family, in particular of ID3. 2.1. Classification. A binary classification,~ called also in the sequel the goal partition on U, is defined by the partition {U+; U_} where U is the universe of all possible objects. In what follows, we assume that the construction of a classifier, or classification rule, is based on a learning set L (called also training or sample set); this consists of N preclassified objects: L = ((v t, cO . . . . . (vN, cN)} where c~ e { + , - } and where: • an object ok is characterized by a given number (say n) of attributes ai(') (i = 1, 2 . . . . . n):
o,
=
[al
=
vlkl
n
[as = o2kl n - . .
n
[a.
=
v.k];
• vector v, = (vlk, v2k, • • . , vnk) corresponds to Ok; • ai(') is a function defined on U and having a known set of possible values. In the sequel, all vectors vk are assumed to be of fixed dimensionality n. Note that the set of possible values of a~(-) may assume various structures such as: qualitative (the set is small and unordered), quantitative (the set is ordered, whatever continuous (real) or discrete (e.g. integer)), or hierarchical (the set is structured into a hierarchy of subsets). 2.2. The top-down induction of decision trees (TDIDT) family.¢ The TDIDT methods possess two important characteristics. The first concerns the type of decision rules: they take on the form of decision trees (DTs). The second is related to the preference criterion and the search strategy: they aim at building a DT in a top-down fashion, and such as to realize a compromise between its size and the number of learning states it classifies correctly. In what follows, we recall these general features and provide a heuristic, rather than theoretical, justification of our choice. 2.2.1. Decision trees. A DT is a tree structured upside down. It is composed of test and terminal nodes, starting at the top node (or root) and progressing down to the terminal ones. Each test node is associated with a test on the attribute values of the objects, to each possible outcome of which corresponds a successor node. The terminal nodes carry the information required to classify the objects. Such a DT is portrayed in Fig. 1, built for the example of Section 4.3.1 in the context of transient stability.
t To simplify, we restrict this presentation to the binary, or two-class case. Its generalization follows a similar pattern; it will be used in the context of our specific application, in Section 4. ~:For example, see Breiman et al. (1984), and Quinlan (1986).
I
Unstable
r-]
stable
FIG. 1. DT for the one-machine-infinite-busbar system described in Section 4.3.1: ~), node number; [m-'],number of learning states.
A convenient way to define a DT is to describe how it is used to classify an object on the basis of its known, easily observable attributes. This classification is achieved by applying sequentially the tests at the test nodes, beginning with the root of the tree, and systematically passing the object to the successor appropriate to the outcome of the test, until a terminal node is finally reached; the object is accordingly classified. There is a perfect correspondence between a hierarchical partition of U and a DT: the whole U corresponds to its root; the set corresponding to some intermediate node is the subset composed of all those states of U which are directed to that node by the above described classification procedure. The most refined partition of U is composed of the subsets corresponding to the terminal nodes of the considered DT. Similarly, for a given L there is a correspondence between each node of the DT and a subset of L. An important characteristic of a DT is the type of information stored at its terminal nodes: they would contain either a classname (+ or - in the two-class case), or a probability vector (p+, p_).§ The latter type of information is richer than the former: it allows one to decide on the appropriate classification strategy, when the D T is used, and to assess the probability of misclassifying a given object. Another important characteristic of a DT is the type of attributes it can handle, and the way it handles them via tests at the intermediate nodes. Note that the large majority of methods allow only tests on a single attribute at a time. 2.2.2. Efficient induction of DTs. In the context of the TDIDT framework, the II problem is stated in the following terms: GIVEN a set L where the attributes are possibly of different types, and a class of candidate tests for each attribute type, BUILD a DT of optimal classification accuracy. This objective is achieved in two conceptually separate steps: (i) build a DT of a minimal number of nodes, classifying L as well as possible; this is realized in a top-down fashion, beginning with the root of the tree and splitting nodes by selecting an appropriate test among the candidate tests, until all the nodes satisfy a stopping criterion; (if) on the basis of an independent accuracy evaluation of this tree, which generally overfits the data in L, prune it by removing overspecific terminal nodes, in a bottom-up fashion, until optimal performance is achieved; at this step, the § Note that many terminal nodes may correspond to the same class.
Brief Paper information which has to be stored at the terminal nodes can be computed objectively. 2.2.3. ID3 (Quinlan, 1986). To simplify our presentation we merely describe the basic version of ID3. Originally, this method was designed to handle learning problems characterized by a very large and representative L composed of objects with qualitative attributes only. Its objective is to build a DT which classifies L correctly with as few nodes as possible. The tree is built by choosing at each intermediate node an attribute irrevocably, on the basis of the information theory criterion described below. At each intermediate node, the test consists of analysing exhaustively all possible values of a given attribute, thus creating as many successor nodes as there are particular values of it. The nodes are split until reaching terminal nodes containing objects of a single class. The information theory criterion used at a given intermediate node consists of maximizing the information 1 gained when evolving down from this node to its successors, or equivalently, in reducing the "entropy" H about the classification of the states of the set S corresponding to this node. To give a formal interpretation of this criterion, consider the ith attribute, aj(.) characterizing the objects of the set S (corresponding to the node), and assuming the values v ~ , . . . , v~,~. Moreover, consider the test
447
With respect to the top-down hillclimbing strategy: • on the basis of a postulated set of attributes, the method is able to select the relevant and discard the poor ones; this is performed at each test node, on the basis of local information; • the above selection strategy is not adapted to the handling of intricate multiple correlations between several attributes and the goal partition; • however, its great computational efficiency allows the use of very large sets of attributes; the user can define and add an arbitrary number of features which are likely to be more directly related to the goal partition; this provides a possibility to circumvent the preceding difficulty; • depending on the stopping criterion, outliers are isolated in terminal nodes containing a small number of learning states; this can help to detect misclassified samples or abnormal behaviour, not explained by the chosen attributes and thus provide guidelines for the choice of additional features and/or learning states.
3. On-line transient stability assessment ( TSA )
where n+(S) [resp. n_(S)] is the number of learning states of class U+ [resp. U_] in S, n(S) is the total number of learning states in S, and n0(S ) is the total number of its learning states having the value v~ for a,(.). The test (1) Is repeatedly used for all attributes i = 1, 2 . . . . . n. That attribute, say a.(.), which maximizes 1 expressed by (2) is chosen; it generates r. successor nodes with the partition {S.~. . . . . S.,.} of S.
Transient stability assessment in general aims to appraise the power system robustness in the inception of a sudden, severe disturbance, and whenever necessary to suggest remedial actions. A measure of robustness is the critical clearing time (CCT); this is the maximum time a disturbance may act without causing the irrevocable loss of synchronism of the system machines. On-line TSA in particular, used for the purpose of real-time operation, aims to perform onqine analysis and preventive control. Indeed, because the transient stability phenomena evolve very quickly (in the range of very few seconds), a good way to face them efficiently is to prevent them. It is therefore suitable to attempt to express the information carried by the DTs in terms of the system static, pre-disturbance parameters. Moreover, to make the overall TSA tractable, we decompose it into elementary problems as indicated below. Note beforehand that some of the constraints imposed hereafter could, and eventually will be relaxed. They are useful for approaching systematically and gradually this very intricate question. This section concentrates on the decomposition into elementary problems, and on the corresponding definition of the universe, the goal classification, and the attributes describing the objects. Section 4 is devoted to the development of an appropriate II approach. 3.1. Elementary TSA problems and their classification. For a given power system, we define an elementary problem in terms of a given disturbance and a given topology. In this case, U becomes the set of admissible pre-disturbance operating points (OPs) of the studied system. An OP is completely defined by a set of variables attached to the load and generator nodes, iMore specifically, it is defined by the active and reactive power injections at the load nodes, and the voltages and active powers at the generator nodes.) The value of the CCT relative to a postulated disturbance (and the scenario of its clearance) varies with the OP solely. The goal classification of U is chosen with respect to the CCTs of the considered disturbance by a threshold value r in the range of possible values of CCT; U+ and U_ are defined with respect to r as follows:
2.2.4. Salient features. We classify interesting features of TDIDT according to their origin.
U+ ~ {OP~U J CCT(OP)> ~} ;
a,(o,) = ?
(I)
the possible outcomes of which are v~j (j = 1 . . . . , r~); it induces the partition {Sa . . . . . S~.} of S and generates ri successors. The information gainect when passing from the considered node to its successors is, according to the method, expressed by l(S; a,) ~ H(S) - H(S I ai)
(2)
where
H(s) = - [ p +(s) • log (p +(s)) + p_(S) * log (p_(S))]
m s I a,) =
(3)
p,(S)* HIS,,).
(4)
i=l
In the above expressions, p+(S) [resp. p_(S)] denotes the conditional probability of the class U+ [resp. U_] given the objects belong to S; p~j(S) is the conditional probability of having the value vii of a~(.), given the objects belong to S; log is the logarithm in base 2. H(S) [resp. H(S~j)] measures the entropy of the class distribution in S [resp. S~j]. Note that since in general the exact probabilities are not known, they are replaced by their estimates _
n+(S) . . . . .
,,_iS)
p + ( S ) - n(S) ' v - ~ J =-n(S) ;
p,~(S)
=
n'i(s) nta)
i5)
With respect to decision trees: • the explicit, hierarchical form of the DTs gives an easy interpretation of the relationship between the relevant attributes and the classification; • nonqinear and multi-modal (where a class is composed of several significantly different types of objects) can be handled; • the sequential classification process is very efficient and uses only the relevant attributes for classifying an object; it can provide detailed information about the class probabilities.
u_
{op u I CCT(OP) <- ~}.
Thus, in a restricted sense, U+ may be viewed as the set of sufficiently robust OPs. Obviously, several decision rules may be built, corresponding to different r values. The preclassification of L is obtained by means of a standard time-domain transient stability analysis method. Many programs may be used, with various degrees of dynamic modelling sophistication. Thus, from this point of view, the proposed approach needs no particular simplifying assumptions. Observe also that the expertise we transmit to the knowledge based system is acquired via a large number
448
Brief Paper
of simulation results obtained by a deterministic system modelling and its numerically computed outputs. This is quite uncommon in the area of expert systems applications. 3.2. Attributes. We propose the following list of attributes which contain general, physically sound information, and also reflect experience gained while constructing plausible OPs. Attributes related to the system as a whole: total active power of the loads; total reactive power of the loads; mean, minimal, and maximal node voltage. Attributes related to different large regions: total active and reactive generated and load power; active power exchanges with the neighbouring regions; mean, minimal, and maximal node voltage in the regions. Attributes related to the generator nodes: active power and initial acceleration. The above list is not exhaustive. It can be augmented to encompass other numerical continuous attributes or any heuristic type of specific experience of power system experts, like operators or planning engineers. Note also that the computing cost for scanning a large number of attributes is almost negligible with respect to the overall computation involved by the construction of a DT (see Section 4.3.2). 3.3. On the construction o f L. This key issue has partly been answered in Wehenkel et al. (1987a) where a method for constructing an initial L is proposed in the case of simplified power system models. The question of more refined models and that of expounding L during the building of a DT are currently under investigation (see Section 5). 4. Proposed methodology We point out the main modifications we have brought to ID3 in order to meet the specifics of our application domain. A technical description of the resulting algorithms is given in Wehenkel et al. (1989). 4. I. Specifics o f the TSA problem. Power system OPs are characterized by a large number of numerical continuous variables. The robustness of the system varies gradually, rather than crisply, with them. Some have a very important, predominant effect on the stability, while others (a majority) have only a marginal one, or even no influence at all. However, the a priori identification of the relevant attributes is extremely hard. One of the major reasons for choosing a method of the TDIDT family in its suitability to properly select them. The other salient particularity of TSA is that the construction of a representative L is part of the design problem. This introduces an additional difficulty but also provides the freedom to choose L in an optimal fashion, tuned to the particular classification problem of concern. Our approach to this problem is an iterative one: an initial (reasonable but probably incomplete) L is built on the basis of a priori knowledge of the system. A first DT is then built on this basis in such a way as to enable the assessment of the information contained in L and also the validity and representativity of this information. This provides guidelines for generating, if necessary, additional learning states, and allows also the identification of the relevant attributes. 4.2. Adaptation o f ID3. 4.2.1. Overall approach. For a given L, the building of a DT follows the procedure described below. • The learning states are analysed in order to select a test which provides a maximum amount of information about the classification of the objects; for each attribute several candidate tests may be considered, and the selection proceeds in two steps: (1) for each attribute the optimal test on its values is selected; (2) among the different attributes, the best one, along with its optimal test, is chosen for splitting the node. In our case of continuous attributes, step 1 consists in selecting an optimal dichotomic test of the type t = [ai(ok) <--v~,?]
candidate thresholds are taken arbitrarily as the intermediate values of two successive v,z values; there are at most N-- 1 different thresholds, each one of which defines a dichotomic test of type (6); the optimal threshold is chosen according to an information theory criterion adapted from ID3 (Wehenkel et al., 1989). • The selected test is applied at the root of the tree on the set L which is split into two subsets corresponding to the two successors of the root: Ll-a-(o~Lta.(o)<-v,.};
L2~={o~Lla.(o)>L',.t.
• The successors are labelled terminal or not on the basis of the "stop splitting" rule described below. • For the non-terminal nodes, the overall procedure is called recursively in order to build the corresponding subtrees. • For the terminal nodes the class probabilities p + and p_ are estimated on the basis of the corresponding subset of learning states stored there. The algorithm of the above recursive procedure is given in Wehenkel et al. (1989). Its skeleton is similar to ID3. The main differences lie in the "stop splitting" and "optimal splitting" rules, 4.2.2. Stop splitting rule. ID3 stops splitting at a node only if the corresponding subset of L is completely included in one of the classes of the goal partition. If L is truly representative of the universe, this produces a DT which also performs well with respect to unseen situations. Otherwise, as shown by our experience, this strategy tends to build overly complex DTs, the majority of terminal nodes of which contain only a very small and unrepresentative sample of learning states. They perform generally badly with respect to unseen situations and are unable to reliably indicate the inherent relationship between the attribute values and the goal classification. To circumvent the above difficulty we propose the following, more conservative, stop splitting rule: • if the subset of the learning states corresponding to a node is included in one of the classes U , or U then the node is a leaf; • if it is impossible to find a way of splitting this subset and separate its states belonging to the classes U , and U in a statistically significant way, then the node is of type deadend; (the statistical significance test is based on a X"2 test (Wehenkel et al., 1989)); • otherwise, the node is nonterminal. Hence, the leaves are terminal nodes which identify the representative regions of L whereas deadends correspond to unrepresentative ones. 4.2.3. Optimal splitting rule. Deadends must be avoided to the extent possible. To this purpose, an optimal splitting rule has been developed which chooses the test at the non-terminal nodes on the basis of information theory (Wehenkel et al., 1989). 4.3. Illustrative examples. 4.3.1. The one-machineinfinite-busbar system. This academic type example, sketched in Fig. 2, has been treated in Wehenkel et al. (1987b), by means of ID3 modified so as to accept continuous type attributes (see Section 4.2.1). The stop splitting criterion used is also similar to Quinlan's (Michalski et al., 1984); it consists of splitting the nodes until they contain objects of a single class, regardless of the representativity of the information contained in L. Note that this method has been used successfully in the case of this
X'
r--,vrr,
P,'iQ
Xj
o2..-,-,-,-.---o
(6)
in the following way: let {oil . . . . , Vi,v} be the values taken by a~(.), sorted by increasing order of magnitude; the
FxG. 2. One-line diagram of the one-machine-infinite-busbar system.
Brief Paper very small power system, where large enough learning and test sets could easily be generated. This extremely simple system, has only three static variables, namely: P, the power delivered by the (lossless) machine to the system, V,~, the voltage at the machine external busbar, V, the voltage at the inflate busbar. One hundred prefault states (or OPs) are computed, where the above parameters assume values within the following ranges, dictated by physical considerations: [0.9...1.I] p.u. for Vm and V; [0.3...0.7] p.u. for P. The considered disturbance is a three-phase short-circuit applied at the machine's busbar. The stability of the system in its different states is assessed by computing the corresponding CCT via the equal area criterion, well known to electrical engineers. The threshold clearing time is arbitrarily chosen to be ¢ = 0.32 s. Out of the I00 states, 52 are found to be stable (their CCT is larger than ~) and 48 unstable (CCT < r). L is composed of the above I00 preclassified OPs. Together with a list of candidate attributes, it is used to build a DT. The candidate attributes arc chosen to be the three static variables P, Vm, V. Let us illustrate the building procedure, by describing the expansion of the top node of the tree. The entropy of L is of 0.999 hits; it corresponds to the sharing of 52% stable vs 48% unstable cases. (It would have been of 1.00 bits in the 50-50% sharing.) For each candidate attribute (P, Vm and V), the best threshold value is computed. For example, Fig. 3 shows the a posteriori entropy H ( L I ( P ; P , ) ) as a (smoothed) function of the threshold value P~ of P; one can see that the optimal threshold value is 0.532 p.u. and allows a reduction of 0.79 bits of the entropy of L. The same procedure was also applied to the attribute V (resp. Vm), yielding a reduction of 0.07 (resp. 0.02) bits of the entropy, corresponding to the threshold values of 0.995 (resp. 0.916) p.u. Consequently, P appears to be the best attribute; the first node is accordingly expanded, yielding its successors 1 and 2 along with the corresponding subsets LI = (o e L I P(o) <- 0.532);
L2 = {o e L I P(o) > 0.532).
Since all the states in !,2 are unstable, 2 is an "unstable" terminal node (leaf). 1 is a non-terminal node, needing further expansion. The overall DT is sketched in Fig. 1. Observe that P appears to be the attribute which affects mostly the stability; this is in agreement with physical understanding, The reliabifity of this DT has been tested by means of a very large number of OPs, other than those of L. The total error rate is somewhat smaller than 2.7%. 4.3.2. The 14-machine system. This real world system corresponds to an old version of the high voltage reduced Greek network. It is composed of 14 equivalent machines, 92 busbars and 112 branches; the considered topology is given in Wehenkel et al. (1987a).
449
Its treattnent is based on the II approach outlined in Section 4.2.1, incorporating the improved "stop splitting" rule of Section 4.2.2 and also the "optimal splitting" rule. This allows the building of a DT which reflects only the statistically significant information contained in L. This is necessary, since one cannot guarantee anymore that L is completely representative: the approach used in the one-machine case would be unsuccessful here, because prohibitively large learning and test sets should be generated. Two hundred and one OPs have been computed by using the procedure proposed in Wehenkei et al. (1987a). The considered disturbance is a three-phase short-circuit applied at the busbar of generator No. 34. The stability of the OPs is assessed by computing their CCT via a time-domain numerical program. Three different classification patterns have been experienced, namely a two-class case with threshold CCT value fixed at ~ = 0.32 s a three-class case with threshold CCT values fixed at 31 = 0.28 s, ¢2 = 0.32 s a four-class case with threshold CCT values fixed at ~1 -0.28s, r2=0.32s, r3=0.37s. Ten candidate attributes have been proposed. Of them, only three have been selected and used in the DTs. These and related results are reported in Wehenkei et al. (1989). Figure 4 reproduces the DT corresponding to the four-class classification. As indicated in the figure, among the terminal nodes, several are of the type deadend. They correspond to subsets of L which are not representative enough to allow further splitting in a statistically significant way. These nodes contain a mixture of states of different classes, whereas the leaves contain states of a single class only, and correspond to the representative parts of L. Let us also indicate that the DTs obtained in the two- and three-class cases have 15 and 21 nodes, respectively. Concerning the computing effort, the building of a DT has required a CPU time almost negligible with respect to that required for the construction of L (to fix ideas, approximately 3 s for the former vs 3 h for the latter, on a 1 MIPS DEC-20 computer). 4.3.3. Discussion. The above short description induces interesting observations (Wehenkel et al., 1989).
:365"~
H(L I (e; P.))(bits)
l/
0.999
00iiiiii
t
]
Unstable
]
Fairly stable
•
Stable
]
Very stable
E
i
,, 0.3
0.532
0,7
Ps (p.u.)
FIG. 3. A posteriori entropy as a smoothed function of Ps
relative to node 0.
FIG. 4. A four-class DT built for the 14-machine power system (taken from Wehenkel et al. (1989)): P, active power at busbar No. 34 in MW; I/i, voltage magnitude at busbar No. 34 in p.u.; V2, voltage magnitude at a neighbouring busbar in p.u.; *, identifies the deadends.
450
Brief Paper
• The proposed II method provides a natural way of understanding the TSA problem and establishes a clear hierarchy of the a priori intricate mechanism of transient stability and its relationship with predominant static variables of the power system. • The complexity of the DT structures is rather low; moreover, it increases quite slowly with the system size, for a given number of classes, and for a given system with the number of classes. Thus, for a two-class problem, the above obtained DTs comprise 7 nodes and 4 levels in the one-machine system, vs 15 nodes and 6 levels in the 14-machine system; similarly, for the 14-machine system the number of nodes is respectively 15, 21 and 27 in the two-, three- and four-class problems. • The building of DTs is extremely inexpensive; moreover, it is shown to increase only linearly with the number of tested attributes.
5. Extensions, justifications, further investigations The following considerations are deliberately restricted to aspects concerning power system TSA. 5.1. Extensions to the H method. A number of rather straightforward extensions can be brought to the basic II algorithm. Below we list the most interesting ones, along with some brief comments. lteratioe enhancement of the DTs. In Section 3, we have noted that a major problem inherent to the II approach is the building of a representative, yet of acceptable size, learning set. The iterative scheme proposed in Wehenkel et al. (1989), could alleviate this difficulty. Qualitative and hierarchical attributes, Although at this early stage of our research the topology has been fixed to make the TSA problem easier, real life conditions require the consideration of many different topologies of a power system. Moreover, the topology can have a major influence on the degree of the system robustness. Also, due to the very large number of possible topologies, it would be unrealistic to approach this problem by building a different DT for each topology and disturbance. It will thus be necessary to handle TSA problems where the topology may vary, at least to some extent, in order to reduce the total number of DTs to a reasonable value. To handle such problems, the II method should be able to take into account the qualitative and possibly hierarchical attributes characterizing the topology. For each of these attributes, it would be sufficient to define a set of candidate splits of their respective sets of possible values. Detection of multiple correlations. So far, the II method considers only tests on the values of a single attribute at a time, and at a given node it chooses a test on the sole basis of the information that this test provides about the goal classification. This is equivalent to choosing a test on the basis of the direct successors it generates. We already mentioned that this strategy may prevent the detection of intricate multiple correlations. Thus, in extreme cases this may result in overcomplicated and unreliable DTs. Several ways may be thought of to improve the II procedure from this point of view: (i) modify the search strategy in order to relax its irrevocable character; (ii) allow at non-terminal nodes to test the values of combinations of attributes (e.g. for numerical attributes, linear combinations can be sought which would provide more information about the goal classification than any attribute taken alone). The above approaches are both combinatorial in nature. However, since the learning task is achieved off-line, and since in the context of TSA the important computational expense of building and preanalysing the learning set is anyhow unavoidable, it is possible to handle the multiple correlations, at least partly, without significant increase of the off-line CPU time. Moreover, this can have a beneficial effect on the size of the learning and test sets. Indeed, smaller and better DTs can be discovered with fewer terminal nodes and a lower rate of deadends. Use of TSA specific knowledge. TSA specific knowledge can be used to complete the information provided by the learning set. This is particularly true if multiple correlations
have to be detected. Due to the very local nature of the TSA problem, the set of candidate combinations is likely to be restricted to a relatively small and tractable number 5.2. Justifications. The fundamental characteristics of the proposed method reside in the fact that it is nonparametric, sequential, and that it produces decision rules in a hierarchical fashion. For the sake of simplicity, we will discuss the consequences of these features separately. although they are interrelated in many respects. Nonparametric. This means that no special parametric family of decision functions is assumed a priori. Note that this is one of the main differences between the mathematical pattern recognition and the heuristic II approaches to the classification problem. This feature greatly facilitates the choice of an adequate representation of the objects, i.e. of the attributes. Indeed, the sole requirement on these is that they carry sufficient information for discrimination among objects of different classes; besides, no constraints are imposed on the way this information is shared among the various attributes. In the context of our problem, this allows us to use static power system variables, which carry a great deal of information, but whose intricate relationships with transient stability would prevent the stability boundary from fitting a particular, linear or non-linear, a priori known family of discrimination boundaries. A parametric approach would be extremely difficult to generalize: the strongly non-linear character of transient stability phenomena would prevent such an approach, developed within a particular case study, from fitting other cases. Sequential. The sequential nature of both the learning (construction of DTs) and classification (use of DTs) processes provides the following important advantages. During the learning phase. Only attributes useful for discrimination are used in the formulation of the DT. Consequently, the addition of new attributes can only have a beneficial effect on the resulting rule: either they are more discriminating, and the II method will use them and build a better tree, or they are less discriminating and will never be selected. This is an advantage of the proposed method: globally redundant attributes cannot cause a significant degradation of the decision rules. An even more significant advantage arises when the information provided by a given attribute depends on the values taken by other attributes: the optimal subset of attributes cannot be defined globally for the overall universe. In such cases, the conventional feature selection methods will result in large sets of attributes, masking each other in the decision function. In the proposed method, on the contrary, different attributes would be used in the different regions of the universe. During the decision phase. Only tests relevant to the analysed OP are applied sequentially. Each test provides some additional information, and the process stops only when the gathered information is deemed to be sufficient to decide about the class of the OP. Since the DT has been designed off-line, this process is straightforward and very efficient. Hierarchical. The product of induction is explicitly represented by a hierarchy of tests which are dual to a hierarchical partition of the universe and correspondingly of L. These partitions are formulated explicitly in terms of the values of the relevant attributes, selected according to a local criterion attempting to separate, to the extent possible, the stable and unstable states at each non-terminal node of the tree. This feature allows the interpretation and improvement of individual subtrees. Furthermore, it provides a direct relation between the parameters of the DT (the tests and threshold values at non-terminal nodes and the class labels or probabilities at terminal nodes) and the data which allowed the inference of these parameters (the subset of L relative to the concerned node). This should allow one to assess the reliabilities of the individual subtrees, via appropriate statistical methods. It should also allow the evaluation and improvement of the representativity of L. In the context of TSA, where the construction of L is part of the design
Brief Paper problem, this is a more important capability than in many other application domains (such as medical diagnosis), where L is given. 5.3. Further investigations. Many interesting research directions have been indicated up to now. In the following we identify some near future research topics. A key concern is the building of learning and test sets. Indeed, we should explore our iterative scheme and its capability to converge towards sufficiently reliable DTs by means of relatively restricted size of learning sets. This problem, especially stringent in the context of power system TSA, is also the major motivation for enhancing the II method and for searching a test methodology able to provide reliable estimates of the class probabilities on the basis of small test sets. It is also important to investigate the effect of allowing topology variations. Since, in its principle, the II approach is able to handle arbitrary modelling sophistication, it would be interesting to confront it with more refined models and large-scale power systems. Another open question concerns the use of DTs for on-line analysis and preventive control purposes: appropriate strategies are needed, able to extract and combine the elementary information provided by the different DTs. Some preliminary results relative to this problem are given in Wehenkel et al. (1988).
6. Conclusion A new approach to on-line transient stability assessment of power systems has been proposed. The novelty lies in both the way the physical problem has been defined, and the technique used to solve it. The main concern of this paper has been the application of an inductive inference method in conjunction with analytic dynamic models and numerical simulations to the automatic building of DTs. The accent has mainly been put on the reasons for choosing this specific method. Its salient features have been identified, means to overcome difficulties explored, and adaptations to the transient stability problem suggested. Another aim of the paper was to illustrate the derived method's feasibility and intrinsic behaviour. Still in its infancy, the ultimate objective of the proposed approach is to go beyond the existing ones: thanks not only to its on-line target of the analysis aspects, but also--and even more---to its control capabilities. Finally, we note that the approach devised in this paper has an applicability domain substantially broader than the sole transient stability assessment.
References Breiman, L., J. H. Fiedman, R. A. Olshen and C. J. Stone (1984). Classification and Regression Trees. Belmont, Wadsworth.
451
EPRI Project (1987). Dynamic security assessment for power systems: research plan. Project EL-4958, Aug. Kononenko, I., I. Bratko and E. Roskar (1984). Experiments in automatic learning of medical diagnosis rules. Technical Report, Jozef Stefan Institute, Ljubljana, Yugoslavia. Michalski, R. S., J. G. Carbonnel and T. M. Mitchell (1984).
Machine Learning. An Artificial Intelligence Approach. Springer, Berlin. Pao, Y. H., T. E. Dy Liacco and I. Bozma (1985). Acquiring a qualitative understanding of system behaviour through AI inductive inference. Proc. IFAC Syrup. on Electric Energy Systems, Pdo de Janeiro, Brazil, pp. 35-41. Prabhakara, F. S. and G. T. Hey& (1987). Review of pattern recognition methods for rapid analysis of transient stability. In Rapid Analysis of Transient Stability (System Dynamic Performance Subcommittee of the IEEE PES; W. W. Price, Chairman of the Task Force), Publication No. 87TH0169-3-PWR. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106. Rendell, D. (1986). A general framework for induction and a study of selective induction. Machine Learning, 1, 177-220. Ribbens-Pavella, M. and F. J. Evans (1985). Direct methods for studying dynamic, of large-scale electric power systems---a survey. Automatica, 21, 1-21. Wehenkel, L., Y. Xue, Th. Van Cutsem and M. Ribbens-Paveila (1987a). Machine learning applied to power systems transient security functions. Proc. IMACS
Syrup. on AL Expert Systems and Languages in Modelling and Simulation, Barcelona, Spain, June, pp. 243-248. Wehenkel, L., Th. Van Cutsem and M. Ribbens-Pavella (1987b). Artificial intelligence applied to on-line transient stability assessment of electric power systems. Proc. lOth IFAC World Congress, Munich, Germany, July, pp. 308-313. Wehenkei, L., Th. Van Cutsem and M. Ribbens-Pavella (1988). Decision trees applied to on-line transient stability assessment of power systems. Proc. IEEE Int. Syrup. on CAS, Helsinki, Finland, June, Vol. 2, pp. 1887-1890. Wehenkel, L., Th. Van Cutsem and M. Ribbens-Pavella (1989). An artificial intelligence framework for on-line transient stability assessment of power systems. IEEE Trans. Pwr Systems, PWRS4(2), 789-800. Xue, Y., Th. Van Cutsem and M. Ribbens-Pavella (1988). A simple direct method for fast transient stability assessment of large power systems. IEEE Trans. Pwr Systems, PWRS-3(2), 400-412.