Copyright © IFAC Control of Power Systems and Power Plants, Beijing, China, 1997
HYBRID ADAPTIVE NEAREST NEIGHBOR APPROACHES TO DYNAMIC SECURITY ASSESSMENT Isabelle Houben, Louis Wehenke'* , Mania Pavella University of Liege - Sart-Tilman , B28, B-4000, LIEGE, BELGIUM • Research Associate, FN.R.S.
Abstract: We develop a general hybrid k Nearest Neighbors (kNN) approach, where kNNs take advantage of problem-specific information provided by decision trees and of generalpurpose optimization provided by genetic algorithms. This general methodology is then adapted to two concerns of power system dynamic security that kNNs are conceptually well appropriate to handle. One such question of paramount importance is how to detect outliers; these are cases "too far away" from the preanalyzed cases of the data base used to train kNNs. The other question is how to avoid dangerous diagnostics which could arise from an erroneous identification of the relevant majority class of neighbors. In this paper, these two questions are tackled in the context of transient stability and illustrated on the Hydro-Quebec power system. Copyright © 1998 IFA C Keywords : Nearest neighbor techniques; decision trees; genetic algorithms; detection of outliers; dynamic security assessment. 1.
attributes and of their weights. i.e. of the distance of a given case to its k nearest neighbors. Note that kNNs are very sensitive to these choices; in particular, their performances deteriorate very rapidly when the number of non-discriminant or redundant attributes increases. Obviously. handling the intricate problem of power system dynamic security assessment (DSA) requires proper means of optimization. Decision trees (DTs) and genetic algorithms (GAs) are such means, already proposed in the context of various areas (Aba and Kibler. 1987; Kelly and Davis. 1991; Goodman and Punch. 1993; Cardie. 1994).
INTRODUCTION
kNN techniques belong to the class of statistical pattern recognition which, together with machine learning and artificial neural networks, form the general framework of automatic learning methods. All these methods consist of learning from examples, i.e. extracting information from already known situations gathered in the \earning set. But, unlike other automatic learning methods which compress the case-by-case information into synthetic rules, kNNs learn in a case-by-case fashion ; they are thus able to complement them. Among such complementary possibilities. we mention : detection of outliers. i.e. of cases not well enough represented in a data base and hence difficult to properly assess via this set; flexible procedures designed to favour a class over other(s) so as to avoid dangerous diagnostics.
In the particular context of transient stability. attempts were made to use a combination of DTs and GAs (Houben et al.. 1995; Wehenkel et al.. 1995). Since, this kNN-DT-GA method gained in maturity and robustness (Houben et al.. 1997). Nowadays the method appears to be ready for use. This paper essentially aims at explorating potentials in areas where kNNs promise to be superior to other methods.
Despite the above attractive features . development of kNNs has been quite slow as compared to the outburst of other automatic learning techniques, as for example artificial neural networks. This may be due to the fact that despite their conceptual simplicity (assess an unknown case by comparing it with its known nearest neighbors) kNNs encounter modelling difficulties. Indeed. the kNN design relies on the choice of three essential sets of parameters. namely : choice of k. the number of nearest neighbors ; choice of
The paper is organized as follows . Section 2 briefly describes the kNN-DT-GA method. lists interesting variants and assesses their performances in the context of transient stability assessment of the Hydro-Quebec power system. Section 3 applies the resulting kNN method to the detection of outliers. In Section 4 we show how the method can be adapted to remove dangerous diagnostics. 645
2. THE kNN-DT-GA METHOD IN SHORT
2.1
( AITRlBlTfES SELECTION )
GeneraL description (Houben et al., 1997)
" " ' " No [ WEIGHTING )
Yes
The design of kNN models requires choice of the following sets of parameters : (i) the attributes in terms of which a case (i.e. an operating state) may be suitably described; (ii) the attribute weights which reflect the attribute contributions to the phenomena of concern (here, to transient stability); (iii) the number k of nearest neighbors which statistically are the most similar to the case under investigation. Note that k depends on the distance measure in the attribute space, computed in terms of the attributes and their weights. For example, such a measure is provided by the Euclidean distance expressed by
; / yes____
r - - -____) : -Ge~ti~Aig·o- -: [DECISION TItEE '- - - - - - - - - - '
t
( WEIGHTING 1
_____No
: Cr~;.~~i~~~n-:
u
'----- - -- - - -'
kNN·GA with weights
[i] NN k
.u-
(2 )
(I )
~~ [OPTIMIZATION)
Y y
~NO
[OPTIMIZATION)
yes/
~o
:-Ge~~ti~ Aig-o·~:
: Cr~~~~aiid;ti~n- ': :-Ge;~ti; Aigo·~: : Cr~~-.~;thti~~ ':
, - - -- -- - - --"
'. ----- - --_. , , ----------" ' ------ - - -_ . ..
n
L
Wi
[a;(od - a;(02)] 2
(1)
k.NN· DT·GA wilh weights
;=1
(S lOS)
where a; (0) denotes the value of attribute i for object its weight, n the total number of attributes. Admittedly, "candidate" attributes may be suggested by experience. But in order to choose the most relevant of them and their weights one has to resort to a systematic and objective selection. Such a selection is provided by DTs which carry problem-dependent information and/or GAs which are problem-independent optimization techniques. Actually, Houben et al. (1997) have shown that optimal solutions are provided by the sequential combination of : (i) DTs which identify the most relevant attributes and appraise their information quantity (used as initial guess of their corresponding weights; see below in §2.2) ; (ii) GAs which optimize these two sets of parameters. Such combinations are schematically described by the treelike structure of Fig. 1 : the rightrnost path of this figure yields the simplest - and less efficient - kNN models, the leftmost path the most sophisticated and efficient ones. The numbers in '0' refer to the rows of Table 1 (see below). The resulting kNN-DT-GA algorithms may further give rise to many variants, depending upon whether one considers "global" or "local" optimizations; global optimization is meant to be distance optimization on the entire learning set, local optimization refers to distance optimization carried out on parts of the learning set, the partition being defined by the decision tree test nodes.
global or local
kNN·DT with IQ weigh~ (4)
kNN·DT·GA without weight
kNN·DT without weight (3)
global or local
0, Wi
Fig. 1. Variants of the kNN method
Out of the 3403 states, 2746 were used in the learning set (LS) to train the tree, the remaining 657 in the test set (TS) to test it (its accuracy). The resulting tree exhibits the following main features (see Fig. 2) : • number of candidate attributes: 74 • number of test attributes: 11 • information quantity carried by the 11 test attributes : 91.31 % of the total information quantity contained in the data base. The obtained tree is portrayed in Fig. 2. This figure also indicates that the tree classifies correctly 96.5 % of the test states. It further indicates the information quantity (IQ) found to be carried by each test attribute, This IQ actually expresses the relative contribution of a test attribute to the stability problem of concern 2 . For example the attribute "Trbj+b*Nb_Comp" conveys about 61 % of the total information quantity of all 11 test attributes, Based on the above tree and its features (test attributes and their IQ), we have developed 8 different variants, in addition to the pure kNN version. They are summarized in Table 1 and briefly commented below. Row 1, Pure kNN : uses all 74 candidate attributes uniformly weighted 3.
2.2 lLLustration on a particular stabiLity probLem
Row 2, kNN-GA : uses all 74 candidate attributes with weights optimized by a GA It is reported for the sake of comparisons rather than for its own interest.
To fix ideas, we consider a transient stability problem of the Hydro-Quebec power system concerning its 22North configuration (Wehenkel et aI., 1995). A data base was generated for this problem; it contains 3,403 states, classified into two classes, stable and unstable (respectively 41 % and 59 % of the cases 1).
proportion of unstable cases contained in the data base. 2Note that whenever an attribute is used more than once, its total IQ is the sum of the partial IQs at the various test nodes where it appears.
I A case is considered to be stable if it is stable with respect to all the considered contingences. unstable otherwise. This explains the high
lEach attribute is normalized by its standard deviation computed in the learning set.
646
rn
Infonnation quantities :
T"j·269."Nb_Comp>>5J6.
Tr7062: 17.3% 1 13 : :
~ ctd
Trbje: 0.6% Nb_comp: 0.6%
~~~:~:
L7079 < 1.0
I.::!::::d
1 2 3 4 5
L7090: 1.2 %
~ ~ Nb..LiSo: 1.9 %
~
Table 1. Simulation results
TrJJj+b*Nb_comp: 60.9 %
PIg3: 0.6%
Trbjo+b*Trbje: 1.5%Tr7045: 0.5 %
~'7062 > 2)".~ Test set misclassification rate : ~ ~ P.=3.5% ca-"Nb_Cam~b
6 7 8 9
c:it6 td7~2>2200b T,704>
T,7093
T"JO'1.6"T';~>-4:~.
> ' .....0
Method Pure kNN kNN-GA kNN-DT kNN-DT+IQ kNN-DT-GA kNN-DT-GA kNN-DT-GA kNN-DT-GA kNN-FSS
k
PeLs
p2 S
11
13.6
15 7 15 1 to 15 1 to 15 1 to 15 1 to 15 5
9.2 5.8 3.0 1.5 1.2 1.9 1.3 3.4
12.0 9.3 5.5 2.7 2.7 2. 1 2.4 3.8 3.4
e
# Attr. 74 74 11 11 11 11 74 74 7
> '" '0.0
~cb ~ "~ 'cl~ bd
E:3 [:] ::i2>~21)~'> ... _ bO 6~ T'bjt3b-a~:r L7090
m,~2 > '.~2.>
Learning set classification:
Jf.:U:::J CJ
t3b
method 4; PeTS is computed by classifying each test state by searching its k nearest neighbors in the LS . Note also that column 5 lists the number of attributes proposed to the design of the method (# Attr.).
~
Trb;c < 1873.0
CJ
Obviously, variants # 6 and 7 exhibit the best accuracy performances. The "champion" is the kNN-DT-GA method of row 6 (11 weights non-uniformly distributed in the GA process). Note, however, that except for the 3.8 %, the error rates of the other variants do not differ in a statistically significant way 5.
1630 unstable states 1 I 16 stable states
Fig. 2. Decision tree
Row 3, kNN-DT : uses the tree test attributes uniformly weighted.
2.4
Table 2 summarizes the computing times of the several variants. Obviously, the tree building is particularly inexpensive. This, together with the significant contribution on the kNN quality fully justifies its use for the hybridized kNN designs.
Row 4, kNN-DT +IQ : same as in row 3; but the test attributes are weighted by their respective IQ. Rows 5, 6, kNN-DT-GA : use the 11 test attributes; the GAs optimize their weights, starting with the attributes' IQs. The difference between the two approaches is the way the coded weights are distributed (Houben et aI., 1997).
The large CPU times for learning the kNN-DT-GA method come from the number of distance optimizations (one for each node in the tree) and the number of distance evaluations in the GA process (fixed at 500 in our simulations). Note that in the kNN-GA algorithm, only one distance is optimized since no DT information is used. Tables 1 and 2 also show that building first a tree in order to supply information to the GA substantially enhances accuracy with negligible computational overhead. Note also that the number of attributes has a considerable impact on nearest neighbor search time.
Rows 7, 8, kNN-DT-GA : use all 74 candidate attributes. The GA process is initialized with the IQ values for the 11 test attributes and with 0 weight for the remaining ones. The GA optimization amounts to furnishing the weights of the 74 attributes. Note that variants 5 to 8 refer to local optimizations, because in this stability problem they provide better results. Row 9, kNN-FSS : uses the 7 attributes selected by the conventional selection method, the forward sequential selection (Devijver and Kittler, 1982).
2.3
Computing times
As concerning use, the various algorithms show to be very efficient, indeed, able to comply with real-time requirements of power system security, even though kNNs are much slower than DTs.
Accuracy assessment
The criterion for training kNNs is classification accuracy. The distance optimization is performed on the LS and assessed on the TS. More precisely, the LS is used to optimize the parameters of each model ( k , attributes and their weights), the TS to assess their classification ability (or conversely the test set rnisclassification rate).
4i.e. by classifying the learning states to the majority class among their k nearest neighbors in the LS. sWe recall that the test set error rates are estimated with a standard error U Pe expressed by
Table 1 summarizes the obtained results. The meaning of its various columns is quite obvious. The LS error rate (Pe LS ) is assessed using the leave-one-out
For p.
647
= 2 % and M = 657. this gives
uPe
= 0 .5
Remarks
Table 2. CPU times
-;--;--;---:-_ _ _ _-=-:::-:-.,--_--r'rr--_ _ _ _ _ _..,.....,._I .
The above step (i) considers the entire set of candidate attributes and computes the distance with uniform weights for these attributes. This is because the selected attributes and their weights have been determined in the absence of the cases that one precisely wants to test in order to decide whether they belong (3) to the same "problem" (i.e. DB). Their use would (4) therefore bias this test. --r(ill}-:-C=P~U:-o-n"""7Ul=-tr---::aS,...p-ar-c-(-:-::2-==OO-==-M---H-z-)....,fo-r-d-e-s-ig-n"':'::in-h-o-u-r-s-.2. The computation of dmax in step (i) has to be (2) : CPU on Ultra Sparc for use in ms/case. performed once and for all, whereas the computation (3) : 11 attribute weights optimizations. of du in step (ii) has to be repeated with each new case. (4) : 74 attribute weights optimizations. 3. The above procedure is DB-dependent but independent of the automatic learning method subsequently used to assess (classify) the tested case ifit is found to be "normal". 3. DETECTING OUTLIERS 4. Similarly, given new preanalyzed cases, the pro3.1 Problem statement cedure may be used to test the validity of the very DB, i.e. to assess whether this latter is "rich enough" A possible cause of automatic learning methods' failand/or whether it is becoming obsolete and should be ure to correctly assess unseen cases is the existence of refreshed. outliers. These are cases which go beyond the gener5. The assessment of an outlier by an automatic learnalization capabilities of a method, because they are not ing method trained with the DB of concern is not nec"well enough" covered in the data base used to train essarily wrong; simply, one cannot guarantee its corit. One reason for this insufficient coverage is that rectness. some driving parameters have been overlooked and Method DT Pure kNN kNN-DT kNN-GA kNN-DT-GA kNN-DT-GA
Off-line (h) (1) 0.056 0.157 CPU(DT)+ 0.035 20 23 60
On-line (ms) 0.035 210 55 210 55 55
(2)
kept constant while generating the data base; it is then hazardous to establish valid similarities between the investigated case and cases of this data base. Another reason is that the investigated case is "far away" from all others, i.e. it has no "close enough neighbors". The former reason comes from insufficient expertise of the phenomena; the latter from insufficient scanning of the attribute space. Our aim is to focus on this latter cause and detect the corresponding outliers. To this end, we first observe that in the general, multidimensional attribute space, it is difficult to visualize directly a case on a two-dimensional space without prior processing. We therefore propose to determine a distance upper bound in this space and declare "outlier" any case whose distance to its first nearest neighbor is larger than it.
3.2
3.3 lllustration of the procedure We will illustrate the above procedure on the stability problem of the 22-North configuration of the HydroQuebec system described in §2.2. To this purpose, we will consider two DBs : (i) the "normal" DB built for this configuration (denoted for short 22N); (ii) an "abnormal" DB, labeled "31-North" (31N for short), and built for another stability subproblem of the same power system (Wehenkel et aI., 1995); to simplify, we denote this latter DB the set TS2. TS2 is composed of 1450 states (72 % unstable and 28 % stable ones). A first series of simulations consists of classifying the cases ofTS2 by the DT, the kNN and kNN -DT methods (see Fig. 1 and rows I and 4 of Table 1). This amounts to classifying the 31 N cases by methods trained with the 22N learning set. The results are reported in Table 3. The corresponding error rates show that, obviously, the TS2 cases cannot be correctly classified by the 22N learning set. (To ease comparisons, the last column of the table reproduces the error rates obtained with the 22N test set.)
Detection procedure
More specifically, for a given data base (DB), in order to decide whether a new (unknown) case is a "normal" one (i.e. well enough represented in the DB) or whether it should be labeled as an outlier, we propose the following procedure. (i) Consider the multidimensional space defined by the entire set of candidate attributes; compute the distance of the DB states to their respective nearest neighbor. Let dmax denote the maximum distance thus found . (ii) Proceed similarly with the given new state, i.e. compute its distance, du, to its nearest neighbor in the above multidimensional space. If du < dmax consider it to "belong" to the DB, i.e. to be a "normal case". Otherwise, declare it to be an outlier.
A second series of simulations uses the detection procedure of § 3.2 applied in the following way: (i) consider the 22N learning set in the 74-dimensional space of all 74 candidate attributes, with uniform weights; (ii) compute in the above space the distances of all 22N test states to their respective nearest neighbor; (iii) compute in the same space the distances of all 31 N (i.e. TS2) states. The bar diagrams of steps (ii) (and (iii)) are plotted in 648
Fig. 3. They obviously show that the two distributions are well separated. Indeed, choosing dmax = 0.78 , will discard all TS2 cases as being "outliers" except for 20. Note, however, that among the latter 20 cases which lie below d max only one is misclassified by the methods. More specifically • the DT misclassifies one case whose distance is 0.75 • the pure kNN and kNN-DT+IQ misclassify one case (the same for both methods) whose distance is 0.76. Actually, the above observations suggest that the 20 cases are not outliers, and corroborate the proposed procedure.
0.6
Pe'1'5'J.
P e1'S
22.4 33 .5 20.8
3.5 12.0 2.7
t."
1.6
1.8
Fig. 4. Distance distribution of the 22N test states in the test attribute space.
Table 3. Simulation results: 31N states classified with -models trained with 22N states Method DT Pure kNN kNN-DT+IQ
0.8 1 1.2 First .....rest ne.gtIbor cistarQ
70
60
50
~
:;
'"
~
22N
~
~
'" 20
20 0 .8 1 12 Firsl n.....sl neighbor d.tNocI
"
1.6
1.8
Fig. 5. Distance distribution of the 31 N test states in the test attribute space.
" 0.2
0 ..
0 .6
0 .' 1 12 Ftnl,....,.a ,.lstGor cla.rc.
t o.
1.6
A condition for the effective application of an automatic learning method to DSA is the avoidance of DEs. kNN methods offer a large variety of possibilities to remove them whenever they appear.
1.a
Fig. 3. Distance distribution of the 31N and 22N test states in the candidate attribute space.
Hereafter, we investigate two such kNN approaches.
Note that a distance using only the test attributes, i.e. the attributes selected by a decision tree built with the 22N learning set cannot identify the outliers. The two distributions overlap completely, as shown in Figs. 4 and 5. This corroborates remark 1 of § 3.2. As a complementary exercise, we have interchanged the two data bases, viz., we have trained the kNN models with a learning set composed of the 31N states and tested the 22N test states : the conclusion is symmetric : here, the outliers are found to be the 22N states (see Fig. 6). 4.
t..
1. Biased kNNs (a-kNNs) : they classify a state after
artificially bringing the unstable states closer to this state by multiplying the distance with a factor a . This bias coefficient may vary between 1 and 0 (it is fixed at 0.8 in our simulations). 2. Weighted kNNs (w-kNNs): the contribution of each neighbor of the state of concern is considered to be inversely proportional to its distance (whatever the definition of this latter) to this state.6 Note that the kNN-DTs and kNN-DT-GAs can also be biased by a and/or weighted. Table 4 shows the results of some variants. For the kNN-DT-GAmethods, we have chosen the two "champions" of Table 1 (i.e. variants # 6 and 7).
REMOVING DANGEROUS ERRORS
However accurate, an automatic learning approach will never be totally free from misc1assification errors. The errors may be of two types: false alarms (FA) and non-detections (ND). A FA corresponds to a stable case classified unstable by the method. Conversely, a ND is an unstable case classified stable by the method; in particular, a very unstable case classified stable is a dangerous error (DE) (Wehenkel et aI., 1995).
Note that here the optimal value of k is considered to be the one which minimizes the number of non-detections, computed with a leave-one-out method applied on the 6Note that this is essentially a non-biased approach. able to reduce both FAs and NDs.
649
.
..
31N
/
70
~
iz
stability problem of the Hydro-Quebec power system . The performances of the method have been found to be very satisfactory with respect to both classification accuracy and real-time computing capabilities.
"
..
The paper has then identified two method's intrinsic potentials and focused on ways able to take advantage of them. One is the identification of outliers, i.e. of cases which are beyond the generalization capabilities of an automatic learning method, because they are not well enough represented in the data base used to train the method. Actually, identifying outliers is a question of great concern for all automatic learning methods. The paper has proposed an identification procedure and shown that kNNs may indeed solve properly tills intricate question .
221'
'" 20
o
o
02
0.-4
0.8
0.8
1
1_6
12
UJ
FftI,...,...~cIaIIirIm
Fig. 6. Distance distribution of the 31 Nand 22N test states in the candidate attribute space. Table 4. Means to reduce dangerous errors Method Pure kNN kNN-DT kNN-DT+IQ kNN-DT-GA kNN-DT-GA a-kNN a-kNN-DT a-kNN-DT+IQ a-kNN-DT-GA a-kNN-DT-GA w-kNN w-kNN-DT w-kNN-DT+IQ w-kNN-DT-GA w-kNN-DT-GA a-w-kNN a -w-kNN-DT a-w-kNN-DT+IQ a-w-kNN-DT-GA a-w-kNN-DT-GA
k 11
p1S
12
#DE 44
1
6.4
17
15 9 13 15 15 7 15 13 3 3 3 11 3 15 15 15 15 15
2.7 3.5 2.1 22.2 6.5 4.7 3.7 3.8 9.9 4.0 2.6 3.2 2.6 19.3 5.6 4.0 3.7 3.7
6 5 3 2
e
1
o o 33 8 3 4 4
2
o 1
o o
The other interesting property of kNNs also exploited in the paper concerns means to remove dangerous classification errors while restricting the number of false alarms. The obtained results are quite satisfactory.
#FA+ND 23 + 56 12 + 30 2 + 16 8 + 15 4 + 10 143 + 3 37 + 6 24+ 7 23 + 1 19 + 6 20 +45 6 + 20 3 + 14 7 + 14 4 + 13 123 + 4 30+ 7 20+ 6 23 + 1 19+ 5
6.
REFERENCES
Aha, D.W. and D. Kibler (1987). Learning Representative Exemplars of Concepts: An Initial Case StUdy. In Proc. of the Fourth International Conference on Machine Learning, pages 24-30. Cardie, C. (1994). Domain Specific Knowledge Acquisitionfor Conceptual Sentence Analysis. PhD thesis, University of Massachusetts Amherst. Devijver, P.A. and J. Kittler. (1982). Pattern Recognition : A Statistical Approach. Prentice-Hall International. Goodman, E.D. and W.F Punch. (1993). Further Research on Feature Selection and Classification Using Genetic Algorithms. In Proceedings of the 5th ICGA93 Champaign Ill, pages 557-564. Houben I., L. Wehenkel and M. Pavella. (1995). Coupling of k-NN with decision trees for power system transient stability assessment. Proc. of the IEEE CCA '95, IEEE Con! on Control Applications, Albany, Sept. 28-29. pp. 825-832. Houben I., L. Wehenkel and M. Pavella. (1997). Genetic algorithm based k Nearest Neighbors. Proc. of CIS-97 Conf on Control of Industrial Systems, May, 20-22 Belfort, France (To appear) . Kelly, J.D. and L. Davis (1991). Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm. In Proceedings ofthe fourth inte rnational conference on Genetic Algorithms-Morgan Kaujmann Publishers, pages 377-383. Wehenkel L., I. Houben and M. Pavella. (1995) . Automatic learning approaches for on-line transient stability preventive control of the Hydro-Quebec system. Part n. A toolbox combining decision trees with neural nets and nearest neighbor classifiers optimized by genetic algorithms. Proc. of SIPOWER '95, 2nd IFAC Symp. on Control of Power Plants and Power Systems, Cancun, Mexico, Dec., pp. 237-242.
learning set. Further, in the kNN-DT-GA methods, the optimal value of k is the same for all local distances. Besides the test set error rate, p,!s , the table gives the number of dangerous errors (# DE), of false alarms (# FA) and of non-detections (# ND) (including the DE). The table shows that the biased methods are able to redl:lce, and even remove, the dangerous errors. However, this reduction often implies an increase in false alarms, reflected in the test set error rate. The weighted methods seem to be less interesting. But their combination with the biased methods allows removing the dangerous errors while restricting the number of false alarms. 5. CONCLUSION This paper has proposed improvements of the hybrid kNN-DT-GA method, then assessed it on a transient 650