Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder

ARTICLE IN PRESS JID: NEUCOM [m5G;November 26, 2019;10:17] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing...

Download PDF

3MB Sizes 0 Downloads 26 Views

Report

Full Text

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 26, 2019;10:17]

Neurocomputing xxx (xxxx) xxx

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder Yangyang Liu a, Mingyu Zhai b, Jiahui Jin a, Aibo Song a,∗, Jikeng Lin c, Zhiang Wu d, Yixin Zhao a a

School of Computer Science and Engineering, Southeast University, Nanjing 211189 China NARI Group Corporation/State Grid Electric Power Research Institute, Nanjing 211106 China School of Electronics and Information Engineering, Tongji University, Shanghai 201804 China d Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing 210023 China b c

a r t i c l e

i n f o

Article history: Received 24 July 2019 Revised 10 October 2019 Accepted 20 October 2019 Available online xxx Communicated by Zechao Li Keywords: Catastrophe assessment Preventive control Power system Transient stability assessment Stacked denoising autoencoders Cost sensitivity

a b s t r a c t In all areas of engineering, catastrophe assessment is an essential prerequisite for remedial action schemes. Modelers constantly push for more accurate models, and often meet goals by using increasingly complex, data mining-based blackbox models. However, system operators tend to favor interpretable models for after-the-fact preventive control (PC). While switching from blackbox to interpretable solutions, a tradeoff occurs between accuracy and interpretability. To avoid this tradeoff, we develop an intelligent framework for online catastrophe assessment and PC via a blackbox stacked denoising autoencoder (SDAE) equipped with accuracy and the ability to derive a PC scheme. Speciﬁcally, we implement a transient stability cost-sensitive assessment (TSCA) and PC case in the context of a power grid. First, using only controllable variables, we build the TSCA model by adding a sigmoid unit on top of the SDAE. Considering power systems’ conservatism, we explore a novel TSCA model’s training criterion to determine the operation conditions’ degrees of stability and divide them into three classes: stable, unstable, and boundary. Second, given an operation condition identiﬁed as unstable or boundary by TSCA model and its desired degree of stability, the PC model (the reverse of a TSCA model’s mapping) consists of the top sigmoid’s backward mapping and the stack of denoising decoders from trained SDAE. The former is formalized as an optimization problem to push back the desired degree of stability to a desired SDAE’s highest-level abstraction. The latter decodes back the desired SDAE’s highest-level abstraction to a desired operation condition (essentially, a PC scheme nearest to the controlled operation condition in the coordinates along the underlying causes that generate the observed data). This approach actually resembles operators’ tendency to adjust and stabilize unstable conditions (in terms of underlying causes) with the fewest control actions. A simulation study on the IEEE New England 39-bus system shows that, as a blackbox technology, our framework not only provides superior online situational awareness, but also ﬁnds a viable PC scheme, thereby justifying its practicability in engineering. © 2019 Elsevier B.V. All rights reserved.

1. Introduction With increasing demands on production quality, economic operation, and system performance, modern industrial systems have gained complexity and require higher system reliability and safety. This calls for a sophisticated catastrophe assessment model—which is essential for facilitating the system operator deciding on pre-

∗

Corresponding author. E-mail addresses: [email protected] (Y. Liu), [email protected] (M. Zhai), [email protected] (J. Jin), [email protected] (A. Song), [email protected] (J. Lin), [email protected] (Z. Wu), [email protected] (Y. Zhao).

ventive control (PC) actions [1]. With sensors’ wide applications, it is easy to collect process data that reﬂects systems’ operation status. Thus, researchers are identifying data mining techniques as a promising direction for rapid and informative catastrophe assessment. Such assessment oﬄine learns the mapping from operation conditions (input) to the corresponding system status (secure or insecure) using predeﬁned datasets. When applied online, it determines the system status immediately, as soon as the input is available [2,3]. In a data-centric industrial system, converting data to models and then information and controls is a fundamental step, requiring new predictive analytical tools to help assess catastrophes’s onset following disturbances, and to develop control strategies for modi-

https://doi.org/10.1016/j.neucom.2019.10.090 0925-2312/© 2019 Elsevier B.V. All rights reserved.

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

JID: NEUCOM 2

ARTICLE IN PRESS

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

fying the operation condition from an identiﬁed insecure to secure state [4]. In this context, we witness two competing trends in predictive models’ development: a fuzzy logic rules-based approach that offers transparency and interpretability [5,6], and a machine learning (ML)-based blackbox approach that relies on several complementary tools from a statistical learning: neural network (NN) [7,8], support vector machine (SVM) [9,10], and decision tree (DT) [11,12]. However, each one is weak at extracting features and predicting status separately, which keeps feature extraction from effectively being task-speciﬁc. A good feature is one that makes a subsequent learning task easier, and extracting features should usually depend on the subsequent learning task’s choice. Moreover, complex systems contain a high level of nonlinearities, uncertainties, and huge amount of variables, making it diﬃcult to capture system characteristics well. As an alternative ML technique, deep learning (DL) was introduced to deal with the eﬃcient training of deep neural networks (DNN), with algorithms inspired by the greedy, unsupervised, layer-wise pretraining scheme [13]. Compared with shallow ML methods whose limited calculation units restrict their capabilities to explore highly complex industrial systems, DNN can represent highly complex mapping by compositing nonlinear units [14]. To explore what forms a decent, useful feature representation, Vincent et al. [15] presented stacked denoising autoencoders (SDAE) to learn latent factors of variation in data that are the original input’s highest-level abstraction (often termed latent representation), automatically extracts task-speciﬁc feature representations of raw data. For ﬁnal classiﬁcation and regression tasks, we use this highest-level abstraction to input a ﬁnal learning algorithm added atop SDAE, which yields a DNN amenable to speciﬁc task and shows state-of-the-art results on many challenging learning tasks [16–18]. We believe SDAE is especially promising for garnering a superior catastrophe assessment model with high complexity by adding a catastrophe assessment algorithm over SDAE. However, directly applying the previously proposed SDAE technique is ill-advised in this context, because we must consider the after-the-fact PC for potentially insecure operation conditions, along with catastrophe assessment. As [4] notes, fuzzy logic rulebased methods interpret the obtained rules more easily, to modify system operation conditions and preventively control insecure contingencies—but they have lower prediction accuracy. Blackbox ML-based approaches offer better performance, but are implicit, used only as classiﬁers, and fail to extract rules for PC. So, a tradeoff exists between accuracy and interpretability for data miningbased models, it is hard to choose a data mining-based model equipped with both high performance and interpretability to favor assessment accuracy and the ability to derive PC schemes simultaneously. This reveals our main incentive for determining how to overcome the limitation that blackbox ML-based models cannot be used for deriving PC schemes, avoiding the tradeoff selection and favoring assessment accuracy as well as the ability to derive PC schemes simultaneously. With SDAE’s state-of-the-art performance in mind, we focus on adopting SDAE to facilitate catastrophe assessment task, exploring how to use blackbox SDAE for deriving PC schemes, and ﬁnding an SDAE-based systematic integration framework for catastrophe assessment and PC. In particular, SDAE contributes to catastrophe assessment by adding a catastrophe assessment algorithm over it. For PC, inspired by a decoder’s reconstruction mapping, the analysis in [15] shows that the stack of denoising decoders (SDD) from trained SDAE captures interesting characteristics about input and performs topdown decoding with the least possible distortion. Obviously, SDD is the blackbox decoder of SDAE’s blackbox encoder. Motivated by this property, although SDAE is blackbox, we think SDD from trained SDAE can help design PC schemes. With this, SDAE-based

systematic integration framework favors accuracy and the capability to derive PC schemes simultaneously. Speciﬁcally, we add industrial systems’ catastrophe assessment algorithms atop SDAE, adopt a novel methodology that oﬄine trains a mapping (from an operation condition to its security status) as a catastrophe assessment model, and its backward mapping (from a security status to a corresponding operation condition) to develop PC model (see Fig. 1). In particular, the backward mapping is the reverse of catastrophe assessment model. Given an operation condition that is identiﬁed as insecure by catastrophe assessment model and its desired security status we want to achieve, we derive a top catastrophe assessment algorithm reversely to obtain the desired SDAE’s latent representation. Then we use it as SDD’s input to decode the corresponding desired operation condition. The key here is that we modify this insecure operation condition referring to the desired operation condition, thus the given operation conditions must be controllable system variables. Clearly, a catastrophe assessment model’s backward mapping consists of a top catastrophe assessment algorithm’s backward mapping and SDD. We believe, this framework can assess and preventively control various complex industrial systems’ catastrophes. Transient stability is a power system’s ability to retain a stable subject to a severe transient disturbance [1]. As a speciﬁc case of modern industrial systems, a power system transient stability assessment (TSA) model that favors accuracy and the ability to derive a PC scheme is desirable. Hence, we aim to embody our framework in the power system domain. As power systems gain in complexity, there may be a boundary region between stable and unstable classes where stable and unstable samples criss-cross, and are deemed to contain a large potential error and therefore unreliable. Although the training results could be accurate enough, misclassiﬁcation is inevitable for nontraining samples within this area. In particular, classifying insecure operation conditions into secure ones is called incorrect-acceptance misclassiﬁcation, while classifying secure operation conditions into insecure ones is called incorrect-rejection misclassiﬁcation. In general, incorrect acceptance and incorrect rejection are treated equivalently, and the accuracy (decreasing the total number of incorrectacceptance and incorrect-rejection samples) is a common criterion used for assessment [1,3,8,11,16,19–21]. However, power system is conservative, security operation is its primary requirement, and incorrect acceptance is more severe than incorrect rejection, so more attention should be paid to reducing the number of incorrectacceptance operation conditions. Consequently, we need a way to seek boundary regions and consider misclassiﬁcation costs in two situations—namely, the incorrect-acceptance costs and incorrectrejection costs. Thus, here we develop an SDAE-based transient stability costsensitive assessment (TSCA) and PC system. Its oﬄine construction includes two main components: TSCA model (mapping from an operation condition to its degree of stability), and TSCA’s backward mapping (from a desired stability status to corresponding operation condition) for constructing a PC model (thereby favoring accuracy and the capability to derive PC schemes simultaneously). For TSCA mapping, we add a sigmoid unit over SDAE to constitute an SDAE+Sigmoid DNN. Fed with controllable variables, SDAE is used to learn the latent representations about inputs. To realize TSCA, we consider misclassiﬁcation costs in two situations, and design a novel TSCA training criterion to train SDAE+Sigmoid DNN so that it maps operation conditions to probability outcomes that quantify operation conditions’ degrees of stability, and are referenced to deﬁne operation conditions into stable, boundary, and unstable classes with two thresholds. The operation conditions in the boundary class will be controlled preventively like unstable operation conditions, to decrease misclassiﬁcation. Regarding the PC model, it consists of the top sigmoid unit’s backward mapping and

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

JID: NEUCOM

ARTICLE IN PRESS

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

3

Fig. 1. Catastrophe assessment and preventive control framework.

SDD, and we formalize the top sigmoid unit’s backward mapping as an optimization problem. The remainder of this paper is organized as follows: Section 2 overviews the related work. We introduce TSA and SDAE in Section 3. Section 4 elaborates on the proposed SDAEbased TSCA and PC system. Section 5 validates our approach on the New England 39-machine system while comparing and summarizing the different technologies. Finally, Section 6 concludes the paper. 2. Related work Today’s power systems face severe challenges within the power system interconnection and commercialization environment. To avoid huge economic losses, the power system’s TSA plays a crucial role in analytically judging the system’s dynamic behavior [20]. Time domain simulations (TDS) [22], direct methods [23], and extended equal area criteria [24] are mainstream methods for TSA. If the TSA detects that power system is vulnerable to an anticipated contingency, transient stability PC schemes should be taken to drive the system to stable. Transient stability PC refers to modifying power system’s controllable variables (such as active powers of generators) to withstand credible contingencies causing transient stability catastrophes. The predominant methods of PC are conventional analytical approaches [25,26]. However, with the increasing development of phasor measurement units (PMU) in power systems, their heavy computational burden and the extreme diﬃculty in obtaining the physical models hinder their employment for online calculations using massive PMU data [10]. In the last few decades, researchers identiﬁed the data miningbased approach as a promising direction to carry out online TSA and PC. In [8,27], NN techniques are proposed for TSA with different input features. SVM is presented for online TSA using the rotor angles, voltage magnitudes, and speeds as input in [28]. In [10], core vector machine (CVM) is proposed to assess transient stability based on PMU big data, and has higher precision than other SVMs. DT and fuzzy logic are also available for TSA [11,29]. In [30], the authors proposed a fuzzy DT technique that statistically handled power system security assessment. In [31], the authors applied an inductive inference method for a power system’s online steady-state security assessment. Obviously, input feature selection is crucial for satisfactory performance. As power systems gain in complexity, misclassiﬁcation is inevitable, all these methods are diﬃcult to improve [8]. Recently, DNNs are proposed to overcome a number of challenging problems, ranging from natural language processing [32] and audio processing [33] to image retrieval [34,35]. But the DL technology to power system TSA is still in its infancy, only a few research results were reported. In [16], the stacked autoencoder is used as a feature extraction tool, which improves the ac-

curacy of TSA effectively. The approach in [36] trains a stacked sparse autoencoder on a oﬄine dataset, and then it is used in an online application of predicting any transient instability. These studies indicate that applying DL is a promising approach for solving some complex power systems’ TSA, and discovering intricate structures in datasets. However, there is still one major research gap in data miningbased TSA methods. As Section 1 describes, we can categorize data mining-based TSA methods into fuzzy logic rules-based approaches and ML-based blackbox approaches. The former has the advantage of interpretability, the internal structures and parameters have physical meanings, while the latter offers generally better performance. A large accuracy gap exists between the highestperformance blackbox model and the most transparent fuzzy logicbased model. Kamwa et al. [4] analyze when to favor one goal over another. High performance of the model in making predictions may be suﬃcient for TSA, but in PC a transparent model may be necessary. It is hard to choose a data mining-based model equipped with both high performance and interpretability. Besides, most existing works apply data mining models mainly for TSA, whose inputs must contain a lot of measurable variables to suﬃciently reveal system behavior. However, the models focusing on PC must adopt only controllable variables as inputs which contain less system behavior information than the inputs used for security assessment. Naturally, it is also diﬃcult to select inputs that are suitable for both security assessment and PC. The works in [37,38] offer the systematic approaches consisting of two models, for TSA and PC. These two models are trained separately, one blackbox model using measurable variables is trained for online TSA, and the other interpretable model using controllable variables is trained to provide PC strategies. Obviously, the theoretical connectivity between TSA and PC models is neglected. Our work differs from previous studies, because ﬁrst, it is designed to be a systematic integration of online TSCA and PC. Second, only controllable variables are used as inputs; to extract suﬃcient information for TSCA, SDAE is adopted as a good feature extractor to capture the import factors of variation in data. Third, we design a TSCA training criterion that considers the misclassiﬁcation costs in two situations to regard TSA as a three-class classiﬁcation problem; the operation conditions in the boundary class will be controlled preventively like the operation conditions in the unstable class, to minimize misclassiﬁcation. Finally, even SDAE is blackbox, TSCA’s backward mapping is derived to ﬁnd PC schemes, which ensures the theoretical connectivity between TSA and PC models. 3. Background Before detailing our research, here we provide background information on TSA and SDAE.

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

JID: NEUCOM 4

ARTICLE IN PRESS

Fig. 2. Synchronous generator rotor angle dynamics after a system fault in the New England 10-machine system [1].

3.1. Transient stability assessment Fig. 2 depicts the generator rotor angle dynamics of the New England 10-machine system, where bus 16 experiences a threephase short circuit, and the fault clears after 0.25 seconds. In Fig. 2, the number after “G” in the legend indicates the corresponding generator’s index. Each line in the plot represents one generator’s rotor angle dynamics. Generators’ rotor angles are the direct indicators of transient stability status of a power system as the energy imbalance in a generator due to a perturbation causes the rotor angle variations. Thus, here we consider the criterion for transient stability as

δi j (t ) = δi (t ) − δ j (t ) ≤ ε ,

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

(1)

where δ i and δ j are respectively the ith and jth generators’ rotor angles, and δ ij is the relative rotor angle between generators i and j. t is the time step, and ε is the predeﬁned threshold. If the relative rotor angle of any two generators crosses ε , we deem it unstable; otherwise, it is stable. Here, ε is taken as 180◦ . TDS is an effective approach to compute post-disturbance generators’ rotor angles, and is usually used to generate labeled datasets in simulation studies [8]. In such transient situations, it is better to observe the system behaviors which are the system variables to be measured in real time (such as generators’ active/reactive powers, generators’ rotor angles and bus voltages, transmission lines’ active/reactive powers) to predict the system’s future stability status as soon as possible. If the system loses stability, PC actions occur to maintain stability. TSA is the prediction process used [1]. 3.2. Stacked denoising autoencoders The classical autoencoder (AE), a three-layer symmetrical neural network, restricts the output to be equal to the input. It consists of an encoder and decoder. Encoder: A nonlinear mapping f converts an input x into hidden representation y. Its typical form is y = f (W x + b). (W, b) is a parameter set, where W is a d × d weight matrix and b is a d dimensional offset vector. Decoder: The decoder is a mapping z = g(W y + b ) transforming back the hidden representation y to a reconstructed vector z with d dimensions in input space. (W , b ) are properly sized parameters. Let x ≈ z, which yields a reconstruction error L(x, z). We can impose a further tied weights constraint W = W T . We optimize W, b, W , b by minimizing L(x, z). We map y from x via a learned W, b. It is a ﬁnal representation of x, and captures important information from x. To yield better representation regarding input, denoising autoencoders (DAEs) are developed; they are an effective variant of

classic AEs. This trains an autoencoder that reconstructs input data from a corrupted version, automatically denoising input data. The corrupted version is achieved by corrupting initial input x into x˜ under a certain probability λnoise by a stochastic mapping x˜ ∼ D(x˜ | x, λnoise ). D is a type of distribution determined by the original distribution of x and the type of random noise added to x. Recent studies in ML demonstrated that a deep or hierarchical architecture is useful to ﬁnd highly nonlinear and complex patterns in data [39]. SDAE is a set of stacked DAEs, in which the corrupted input layer is used for discerning helpful features, and the upperlayer DAE always uses the hidden representation of the lowerlayer DAE as its input to learn better representation. Its highestlevel representation—namely, latent representation—is well-suited to capture import variation factors in data, and works much better than the original input [15]. A greedy layer-wise pretraining algorithm [14] (see the left of Fig. 3) trains such an architecture by handling one layer at a time, each layer is trained as a DAE. This greedy layer-wise training is a task-free process. Once the algorithm pretrains all layers, we can add a standalone learning algorithm (e.g., SVM or logistic regression) atop the SDAE, which yields a DNN amenable to ﬁnal classiﬁcation and regression task (e.g., TSA of this paper). We can use the pretrained SDAE to initialize all except the top layer of this DNN, so that then all the layers’ parameters are optimized simultaneously with respect to a task-speciﬁc learning criterion of top learning algorithm, which is called “ﬁne-tuning”. As a result SDAE outputs task-speciﬁc feature representations. Symmetrically, the SDD (see the right of Fig. 3) captures interesting characteristics about input, and reconstructs good-quality input at the bottom layer from a given top-layer representation. In this paper, we focus on adopting SDAE as a good feature extractor for facilitating TSCA tasks by adding a TSCA learning algorithm over SDAE, and using SDD’s topdown decoding to develop a PC model. 4. Proposed approach Next, we elaborate on the design of the proposed SDAE-based TSCA and PC system. We illustrate the proposed system in Fig. 4, which has two parts: oﬄine training and online application. The oﬄine process consists of the following components: 1) Designing a TSCA DNN model by adding a TSCA learning algorithm atop SDAE (see Section 4.1). 2) Pretraining SDAE, where trained encoders are stacked to learn latent representations and initialize all except the top layer of TSCA DNN, and trained decoders are stacked for a PC model (see Section 4.1.1). 3) For top learning algorithm, exploring a TSCA training criterion amenable to a TSCA task (see Section 4.1.2). 4) Fine-tuning the DNN with explored training criterion, to realize TSCA (see Section 4.1.2). 5) Deriving the TSCA’s backward mapping for a PC model (see Section 4.2). When trained models are applied online, once the real-time operation condition is fed, the TSCA model determines power system status immediately. If it is boundary or unstable, available PC schemes can be concluded from the PC model. 4.1. TSCA Model’s oﬄine construction based on SDAE Given DN = {(x1 , l 1 ), (x2 , l 2 ), . . . , (xN , l N )} as a training set of N (input, S/U) pairs. As a proposed framework’s input must be controllable, and the adjustment of active powers of generators is a common strategy for PC, here xi = [xi1 , xi2 , · · · , xi 0 ]T is a vector of n generators’ active powers in a power system to present an operation condition before the disturbance. It serves as input features,

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

5

Fig. 3. Greedy layer-wise pre-training. After training a ﬁrst-level DAE, its learned encoding function f1 is used on clean input. The resulting representation y1 is used to train a second-level encoding function f2 . From there, the procedure repeats. After pretraining, the DAEs’ encoders fi are “unrolled” bottom-up to create a SDAE (on the left). The DAEs’ decoders gi are “unrolled” top-down to create a SDD (on the right).

Power system Offline stage

Online stage

Offline operation conditions database

Onlilne real-time operation conditions

Designing a DNN amenable to TSCA by adding a TSCA learning algorithm atop SDAE SDAE pretraining

TSCA model

Exploring a training criterion amenable to TSCA task Fine-tuning using explored training criterion SDD

Backward mapping of TSCA

Stable class

Boundary class

Unstable class

PC model

Power system operators Fig. 4. Procedure of the proposed TSCA and PC system.

and n0 is the number of generators in power system. li is TDS’s result after the disturbance, which serves as the label with S if the system is stable under xi , and U otherwise. To seek boundary region, an effective way is to partition operation conditions into three classes—stable S, unstable U, and boundary B—rather than the two classes of S and U. The operation conditions in a boundary class are said to be potentially unstable, and preventively should be controlled like operation conditions in an unstable class to eliminate potentially inaccurate assessment results. With this in mind, we leans the mapping from a given in0 put x ∈ Rn to its degree of stability p ∈ [0, 1], referenced to de-

ﬁne the three classes S, U, and B with two thresholds t1 and t2 . Here, p is a direct estimation of stability; the larger the value, the higher the stability. The boundary region, in particular, includes uncertain operation conditions whose degrees of stability are neither high enough to be stable, nor low enough to be unstable. To induce our assessment model to be cost-sensitive, we incentivize restraining the incorrect acceptance with a Bayesian decision procedure [40], taking into account misclassiﬁcation costs in two situations: systematically adopting incorrect-acceptance costs and incorrect-rejection costs to deﬁne t1 and t2 , and the three regions S, U, and B.

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM 6

Fig. 5. Flowchart showing the proposed TSCA.

To achieve the aforementioned TSCA functions, this paper explores a TSCA DNN consisting of SDAE and top sigmoid neuron, denoted as SDAE+Sigmoid. SDAE learns x’s latent representation, which we use as input for the sigmoid neuron of Eq. (2) to output a value between 0 and 1.

f (y ) =

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

1 , 1 + e−y

(2)

where y is the input of the sigmoid activation function. Let L be the number of hidden layers in SDAE, n = (n0 , . . . , ni , . . . , nt ), n0 is the size of input x, ni is the number of hidden units in each layer 1 ≤ i ≤ L depending on the different datasets used, and nt = 1 is the number of units in the top layer. The SDAE+Sigmoid DNN implements the aforementioned function with the following two steps: SDAE’s greedy layer-wise pretraining, and SDAE+Sigmoid DNN’s ﬁne-tuning (implementing ﬁnal TSCA). 4.1.1. Greedy layer-wise pretraining Let (Wi , bi ), and (W id = W iT , bid ) be the weight matrix and bias vector for the encoder and decoder of the ith DAE, respectively. Wi is ni × ni−1 dimensionality, bi is ni dimensionality, and bid is ni−1 dimensionality. Assume fi , gi are encoder and decoder functions of the ith DAE, yi = f i (W i yi−1 + bi ) is the hidden layer output of the ith DAE, and zi = gi (W id yi + bid ) is the corresponding reconstruction 1 ≤ i ≤ L. Using a stochastic gradient descent (SGD), we greedy layerwise pretrain SDAE with respect to L(yi−1 , zi ) to learn multiple ender-decoder structures with DN . We stack the encoders fi (Wi , bi ) (1 ≤ i ≤ L) as learned SDAE, to provide latent representations capturing the coordinates along the main factors of variation in the data for eﬃcient TSCA, and to participate in ﬁne-tuning. The trained decoders gi (W id , bid ) (1 ≤ i ≤ L) are stacked as SDD to construct a PC model, described in Section 4.2. Note that the parameters in SDD are not changed by ﬁne-tuning. 4.1.2. Fine-Tuning As long as all layers are pretrained, the SDAE+Sigmoid DNN goes through a ﬁnal ﬁne-tuning dedicated to implementing the TSCA function. Let 1 × nL dimensionality Wt , 1 dimensionality bt be the weight matrix and bias vector of the top sigmoid. In ﬁnetuning, the pretrained Wi , bi (1 ≤ i ≤ L) are used as SDAE’s initialization, and what we expect is that Wi , bi (1 ≤ i ≤ L), Wt , bt are optimized with respect to a learning criteria so that given a x, we can obtain a degree of stability p and return the class S, U, or B, as Fig. 5 shows. When using sigmoid for classiﬁcation, traditionally, incorrect acceptance and incorrect rejection are treated equivalently, and maximum likelihood is used as learning criteria to decrease the total number of incorrect-acceptance and incorrect-rejection samples. In power system, incorrect acceptance is more severe than incorrect rejection. So, now we explore a new TSCA learning criterion to consider the misclassiﬁcation costs in two situations and ﬁnd a boundary region so as to decrease misclassiﬁcation.

According to our Bayesian decision procedure, DN is a simple two-label dataset. In this case, we have a set of two states and a set of two actions for each state. The set of states is given by = {S, U }, indicating that observation x is stable and unstable, respectively. Let P (S | x ) = p be the probability that the true state is stable given x; then P (U | x ) = 1 − p is the probability that the true state for x is unstable. The set of two actions is given by A = {aS , aU }, where, aS and aU represent the actions deciding whether x is stable or unstable, respectively. Let λij be the cost caused by deciding x as an i class—but it is actually a j class, i, j ∈ {S, U}. Accordingly, the incorrect-acceptance cost is λSU , and the incorrect-rejection cost is λUS . For power system, it is reasonable to constrain that (c0 ).λSS < λUS , λUU < λSU , λUS < λSU , where λUS < λSU means that incorrect acceptance is more severe than incorrect rejection. This discourages and prevents incorrect acceptance, and guarantees that our TSCA model is cost-sensitive. Given x, we can compute the risks of two actions aS , aU by

R(aS | x ) = λSS P (S | x ) + λSU P (U | x ), R(aU

| x ) = λUS P (S | x ) + λUU P (U | x ).

(3) (4)

The Bayesian decision procedure suggests the following minimum-risk decision rules: (P) For x ∈ POS(S), R(aS |x) ≤ R(aU |x). (N) For x ∈ NEG(S), R(aU |x) ≤ R(aS |x). Here, x ∈ POS(S) and x ∈ NEG(S) indicate that in DN , the label of x is S and U, respectively. By (c0 ) and the fact that P (S | x ) + P (U | x ) = 1, we derive the rule for x ∈ POS(S):

R(aS |x ) ≤ R(aU |x ) ⇔ R(aS |x ) − R(aU |x ) ≤ −σ1 ⇔ λSS P (S|x ) + λSU (1 − P (S|x )) − λUS P (S|x ) −λUU (1 − P (P |x )) ≤ −σ1 ⇔ P ( S |x ) ≥

(5)

σ1 + λSU − λUU . λUS − λUU − λSS + λSU

We compute the rule for x ∈ NEG(S) similarly:

R(aU |x ) ≤ R(aS |x ) ⇔ R(aU |x ) − R(aS |x ) ≤ −σ2 ⇔ λUS P (S|x ) + λUU (1 − P (S|x )) −λSS P (S|x ) − λSU (1 − P (S|x )) ≤ −σ2 ⇔ P ( S |x ) ≤

(6)

−σ2 + λSU − λUU . λUS − λUU − λSS + λSU

Here, σ 1 , σ 2 ∈ (0, 1] are tie-breaking criteria that ensure the boundary region is nonempty, and indicate the boundary tolerances for two misclassiﬁcations. We restrict σ 1 ≤ σ 2 similar to the effect of λUS < λSU . Thus, based on Eqs. (5) and (6), we deﬁne two thresholds:

t1 =

σ1 + λSU − λUU , λUS − λUU − λSS + λSU

(7)

t2 =

−σ2 + λSU − λUU . λUS − λUU − λSS + λSU

(8)

Distinctly, t1 > t2 , because P (P |x ) ∈ [0, 1], so t1 , t2 ∈ [0, 1], then, for t1

(σ1 + λSU − λUU ) ≤ (λUS − λUU − λSS + λSU ) ⇔ σ1 ≤ λUS − λSS .

(9)

As for t2

(−σ2 + λSU − λUU ) ≤ (λUS − λUU − λSS + λSU ) ⇔ σ2 ≥ λSS − λUS .

(10)

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

Here, σ 2 ∈ (0, 1], and λSS < λUS ; thus Eq. (10) is always true, but note that σ 1 must meet Eq. (9). Hence, we can rewrite rules (P), (N) as (P) For any x ∈ POS(S), P(S|x) ≥ t1 . (N) For any x ∈ NEG(S), P(S|x) ≤ t2 . Assume NS , NU are the numbers of stable and unstable samples in DN , respectively. Then for any xi ∈ POS(S), xj ∈ NEG(S), i = 1, 2, . . . , NS , j = 1, 2, . . . , NU , P(S|xi ) > P(S|xj ), which conforms to the objective fact that the degrees of stability of stable operation conditions are greater than that of all unstable operation conditions in the power system. Then given x, the SDAE+Sigmoid should output a degree of stability p = P (S|x ) to satisfy the aforementioned (P) and (N) targets. Using an exponential function e−x whose slope value increases monotonously as x decreases, we design the TSCA training criterion CTSCA as

CT SCA

NS 1 = e−K (−R(aS |xi )+R(aU |xi )−σ1 ) NS + NU i=1

+

NU

e

−K (R(aS |x j )−R(aU |x j )−σ2 )

(11)

,

j=1

where K is a positive penalty factor. Minimizing the Eq. (11) by SGD, Wi , bi (1 ≤ i ≤ L), Wt , bt will be optimized. In learning, if SDAE+Sigmoid DNN outputs p that does not satisfy the conditions in (P), (N), the model generates a big cost value. So, the corresponding output will be punished, with a bigger value of K chosen, and thus the penalty becomes more severe. Then the conditions in (P), (N) tend to be satisﬁed in subsequent learning. Once training stops, the prediction for x will result in a continuous output p between 0 and 1. If DN is completely separable, once p values of all x satisfy the conditions in (P), (N), the learning stops; if DN is not completely separable, a boundary region exists between thresholds t1 and t2 . Using t1 , t2 , we obtain the following decision rules: (P1) For x, if p = P (S|x ) ≥ t1 , then x belongs to the S class. (B1) For x, if t2 < p = P (S|x ) < t1 , then x belongs to the B class. (N1) For x, if p = P (S|x ) ≤ t2 , then x belongs to the U class. Note that estimating the aforementioned hyper-parameters λ, σ , and K is mostly domain-dependent and requires serious investigation based on particular applications’ domains. Additionally, the degree of stability obtained does not represent the actual degree of stability in engineering; it only provides a reference for deﬁning three classes and enabling subsequent preventive controls to ensure the power systems’ safe operation. Algorithm 1 shows TSCA’s pseudocode.

Algorithm 1 Transient stability cost-sensitive assessment (TSCA). Input: DN , λSU , λUU , λUS , λSS , σ1 , σ2 , K, learning rate ηC , ηT SCA for the stochastic gradient descent on costs C and CT SCA , L, n, f i , gi , 0 ≤ i ≤ L. Output: “S”, “B”, and “U” regions. 1: Initialize Initializing input parameters, and y 0 = x, W i , bi = 0, bid = 0, 0 ≤ i ≤ L, W t , bt = 0. 2: for i = 1 to L do for j = 1 to i − 1 do 3: 4: y j = f j (b j + W j y j−1 ). end for 5: y i = f i (bi + W i y i−1 ), z i = gi (bid + W iT y i ), C = L(y i−1 , z i ). 6: 7: while C not minimum do ∂ C , ω = ( W i , bi , bi ). Update ω ← ω − η ∂ω 8: d i i i i i −1 y = f (b + W y ), z i = gi (bid + W iT y i ), C = L(y i−1 , z i ). 9: end while 10: 11: end for 12: W i = W iT , 0 ≤ i ≤ L. d 13: Deﬁne SDAE ← f i = (W i , bi ), SDD ← gi = (W i , bi ), 0 ≤ i ≤ L. d d 14: p = sigmoid (W t y L + bt ). 15: Compute CT SCA based on Eq. (11). 16: while CT SCA not minimum do ∂ CT SCA Update ω ← ω − η ∂ω , ω = (W i , bi , W t , bt ). 17: 18: for j = 1 to L do y j = f j (b j + W j y j−1 ). 19: 20: end for p = sigmoid (W t y L + bt ). 21: 22: Compute CT SCA based on Eq. (11). 23: end while 24: Compute t1 , t2 using Eq. (7) and (8). 25: Partition DN by decision rules (P1), (B1), and (N1). 26: return the S, B, and U classes.

based on x, which is opposite the TSCA’s mapping (from an operation condition to its stability status), denoted as TSCA’s backward mapping. As Fig. 1 shows, because the TSCA model is an SDAE+Sigmoid DNN, its backward mapping involves the top sigmoid unit’s backward mapping, along with SDD, so we present the derivation details about the top sigmoid unit’s backward mapping. First, for a trained SDAE+Sigmoid DNN, the top sigmoid maps from a latent representation to its degree of stability. We invert this map by solving for a latent representation in terms of a given degree of stability. Therefore, we derive the desired latent representation with respect to pe from x, denoted as ze . Obviously,

pe = 4.2. The PC Model’s oﬄine construction Now that we have SDAE-based TSCA in place, we know that if any operation condition violates (P1), we have situational awareness on insecurity. Thus, here we discuss how to develop the PC model for operation conditions in the U/B class to resecure the power system. To modify U/B operation conditions, we must know the SDAEbased TSCA’s mapping rules —from operation conditions to degrees of stability p; however, SDAE+Sigmoid DNN is a blackbox. Alternatively, we hope that given an operation condition x that requires adjustment, and its degree of stability that we desire to achieve— namely, a desired degree of stability pe > t1 —we can derive the desired operation condition xe whose degree of stability is pe from x. In doing so, we can adjust x by referring to xe to arrive at the desired degree of stability pe . This means that, we must derive the mapping from degrees of stability pe to operation conditions xe

7

⇔

1 1 + e−(W 1

t

1 + e−(W

t

ze +bt ) ze +bt )

⇔ W t ze = − log

− pe = 0

1

pe

(12)

− 1 − bt .

It is worth mentioning that the ith element’s value interval in z and ze is that of the ith neuron’s activation function in SDAE’s highest layer, and also the top sigmoid’s input, denoted as [ai , bi ], (i = 1, 2, . . . , nL ). Thus, once TSCA is established, the top sigmoid’s output p theoretically has a value range [ pmin , pmax ] computed using Eq. (2), and learned Wt , bt . Hence, usually we set pe as maximum degree of stability pmax , which denotes that we want to modify x to the most stable scenario. Because pe is a constant, −log(1/pe − 1 ) − bt is also a constant. Let c = −log(1/pe − 1 ) − bt , then

W t ze − c = 0 ⇔ (W t ze − c )2 = 0.

(13)

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM 8

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

Fig. 6. Flowchart of the PC process.

Thus we compute ze by minimizing the left of Eq. (13). That is,

min (W t ze − c )2 ze

s.t. ai ≤ z e ≤ bi , i = 1, . . . , nL

(14)

Let s = (W t ze − c )2 ; because we are looking for a PC scheme about given x, then we must derive ze from x. Assume z is the latent representation learned from x by SDAE; we initially guess ze (0 ) = z, then immediately s = (W t z − c )2 . To make s be nearer to 0, we can derive a better solution ze (1) by the tangent line to the curve s = (W t ze − c )2 at the point ze = z. The tangent line’s is

s (z )(ze − z ) + s(z ),

(15)

where, s denotes the derivative of function s. Clearly, the value of ze such that Eq. (15) equals to 0 will typically be a better solution; that is,

ze (1 ) = z − s(z )/s (z ).

(16)

In this way, we can ﬁnd a better solution than ze (1) again. As a result, we ﬁnd the optimal ze by iterating the aforementioned solution procedure until it converges. Solving for ze (m + 1 ) gives

z e (m + 1 ) = z e (m ) −

(W t ze (m ) − c )2 , 2(W t ze (m ) − c )W t

(17)

where m is iterations, and we obtain ze nearest to z, which contributes to generating xe related to x in the sequel (see Remark 1). From this, ze is the latent representation about our PC scheme xe . Next, because SDD can perform top-down decoding with the least possible distortion, to obtain xe , taking the output ze of the previous step as input, SDD outputs the corresponding xe as x’s PC scheme. However, SDD’s decoding is nondestructive, so we must reuse xe as TSCA’s input to compute the adjusted degree of stability pe . If pe ≤ pmax , the aforementioned backward-mapping process must be iterated until pe = pmax . Deﬁning iterations k with the initial value 0, we outline how to determine x’s PC schemes (see Fig. 6) as follows: Step 1: Given an operation condition x that requires control and pe , derive ze by using Eq. (14). Step 2: The decoders gi (W id , bid ) (1 ≤ i ≤ L) from pre-trained SDAE without ﬁne-tuning are stacked as SDD. Let ze be SDD’s input to output the corresponding desired operation conditions, denoted as xe . Step 3: Reuse xe as SDAE+Sigmoid’s input, to compute the adjusted degree of stability pe . Step 4: If pe = pmax , then x must be adjusted again. So, let x = xe , k = k + 1, and conduct Steps 1 through 3 again. If pe = pmax , then xe is the desired operation condition.

Step 5: Deﬁne xe as the ultimate PC scheme, which provides a reference for dispatchers to modify x, and ensure safe system operation. We describe our PC procedures in Algorithm 2. Algorithm 2 Preventive control (PC). Input: U and B condition x, p, pe , trained SDAE f i (W i , bi ), SDD gi (Wdi , bid ), 0 ≤ i ≤ L, W t , bt . Output: Adjusted active powers of generators xe . 1: Initialize: Initializing input parameters. 2: y 0 = x. 3: for j = 1 to L do y j = f j (b j + W j y j−1 ). 4: 5: end for 6: p = sigmoid (W t y L + bt ), pe = p, k = 0. 7: while pe < t1 do y = y L , c = −log(1/pe − 1 ) − bt , ze0 = y L , g = (W t ze0 − c )2 , i 8: = 0. while g = 0 do 9: Update zei+1 ← zei + g(zei )/g (zei ). 10: g = (W t zei+1 − c )2 , i = i + 1. 11: 12: end while ze = zei , z L+1 = ze . 13: for j = L to 1 do 14: j j 15: z j = g j (bd + Wd z j+1 ). end for 16: 17: xe = z 1 , y 0 = xe . for j = 1 to L do 18: y j = f j (b j + W j y j−1 ). 19: 20: end for pe = sigmoid (W t y L + bt ). 21: if pe < t1 then 22: 23: k = k + 1. else 24: break 25: 26: end if 27: end while 28: return xe .

Remark 1. The aforementioned PC model always seeks a xe nearest to x in the coordinates along the underlying causes that generate the observed data. Speciﬁcally, we view the top sigmoid’s backward mapping as an optimization problem, and seek an optimal solution ze that starts with the initial guess ze (0 ) = z, where z is x’s latent representation. We can think of ze as an optimal solu-

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

9

Fig. 7. The distance between two samples.

tion nearest z; namely, we can obtain a latent representation ze that has a degree of stability pe , and is nearest to the latent representation z of x. Then, we use ze as SDD’s input to decode the operation condition xe . Note that we use SDAE to ﬁnd the directions of greatest variance in the data set and represent each data point by its coordinates along each of these directions. Although ze is nearest to z in the extracted coordinates, xe is not always nearest to x in the original input coordinates. Consider the dataset in Fig. 7(a), suppose we pick u to correspond the extracted coordinate shown in Fig. 7(b). The red dots denote the projections of the original data onto this coordinate. Obviously, the sample closest to sample a is b in the original coordinates, but the sample closest to sample a is c in the extracted coordinates. The empirical results in face recognition [41] show that measuring the distances between faces i and j in the extracted coordinates gives rise to a surprisingly good face-matching and retrieval algorithm. We thus can say that the distance between two samples should be measured in the extracted coordinates. Our PC technology computes xe by ze nearest to z, which amounts to xe being a desired operation condition nearest to x in the coordinates along the underlying causes that generate the observed data. This closely corresponds to system operators’ requirements—namely, they always tend to adjust unstable conditions to stable in terms of the underlying causes, and meet the goal using the least amount of control actions. Remark 2. Due to SDAE’s latent representations are explicit, the proposed TSCA approach actually learns the mapping from operation conditions to latent representations, and then further to power system’s degree of stability. In particular, the mapping from operation conditions to latent representations and the mapping from latent representations to system status are both blackbox. We can express TSCA’s mapping as “if given x, then the latent representation of x is y, then power system’s degree of stability is p” rules automatically generated from data. As intermediate variables, latent representations make TSCA’s original blackbox mapping transparent—and interpretable in a sense—but there is generally semantic loss. Similarly, the PC model actually learns “if given power system’s desired degree of stability pe , then the latent representation with pe is ye , then the operation condition with pe is xe ” rules from data.

5. Experimental results In this study, we use the New England 39-bus test power system [10] (see Fig. 8) to validate our TSCA and PC system. This test system involves 39 buses, 10 generators, 19 loads, and 46 transmission lines. The reference power is 100 megavolt amps (MVA). To develop the simulations, we use Python 3.5 on a computer with a four-core Intel Core i5-8600K CPU, 1 GTX1070Ti GPU, and 8 gigabytes of RAM.

Fig. 8. IEEE 39-bus test system.

5.1. Generating simulation data Transient stability catastrophes are small-probability events in power system, there are no real-world datasets for TSA. Compared with real-world measurement data, simulation data is much more widely used in power system TSA because they are readily accessed, structurally integrated, and suﬃciently representative. An oﬄine TDS (performed with Matlab.R2010a’s Power System Analysis Toolbox) generates the simulation data. The generator is a classical model, and the load is a constant impedance model. To generate convincing datasets, the training set includes multiple scenarios that approximate system behavior under different operation conditions. We randomly change predisturbance loads around the base case, and relevantly change generator output. For the ith operating point, we compute active power PL(i ) (s ), reactive power QL(i ) (s ) at load s, and active power PG(i ) (r ) at generator r as [42]: (i ) (i ) PL(i ) (s ) = PLbase (s )[1 + PL(i) (1 − 2 × εPL (s ))] (i ) (i ) QL(i ) (s ) = QLbase (s )[1 + QL(i) (1 − 2 × εQL (s ))] (i )

(i )

(i )

(18)

(i )

PG (r ) = PGbase (r )[1 + PG (1 − 2 × εPG (r ))] (i ) (i ) (i ) where, PLbase (s ), QLbase (s ), PGbase (r ) are respectively active power, reactive power at the sth load, and active power of rth generator in the base case. Similarly, PL(i ) , QL(i ) , PG(i ) are corresponding

(i ) (i ) (i ) maximum random changes, and εPL (s ), εQL (s ), εPG (r ) denote corresponding uniform independent random variables between 0 and 1. The contingencies here are three-phase to ground faults on a series of buses. We assumes that faults occur at 0.2 seconds (s) and clear after 0.1 s by removing the faulted line connected to the bus. In a simulation, ﬁrst it solves power ﬂow with a new operation condition. If the power ﬂow converges, pre-established faults are set to perform TDS for determining all generators’ rotor angle dynamics and system status after disturbance according to Eq. (1). Let generators’ active powers at 0 s be input data, and assign a corresponding S or U status, Table 1 describes the generated datasets, and we randomly select cases from the ﬁrst three datasets to construct a multifault dataset.

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM 10

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx Table 1 Data description. Dataset

1 2 3 Multifault

Fault bus

Bus4 Bus18 Bus15 -

Clearing line

Line 4_3 Line 18_3 Line 15_14 -

PL(i) / QL(i)

PG(i)

0.3 0.3 0.3 -

0.6 0.3 0.3 -

Training dataset

Testing dataset

Stable

Unstable

Stable

Unstable

13,363 1669 1667 5013

3647 1681 1683 5037

6572 831 833 2487

1807 819 817 2463

Table 2 TSCA deep structures and hyper-parameters of four models. Dataset

na

1 2 3 Multifault

(10, (10, (10, (10,

a b

7, 6, 8, 7,

4, 3, 6, 4,

2, 1) 1) 4, 2, 1) 3, 1)

fi b

λSS

λUS

λUU

λSU

σ1

σ2

K

sigmoid-tanh-tanh sigmoid-tanh sigmoid-tanh-tanh-tanh sigmoid-tanh-tanh

0 0 0 0

0.9 0.8 0.8 0.8

0 0 0 0

1 1 1 1

0.08 0.1 0.1 0.25

0.15 0.25 0.25 0.3

10 10 10 10

n = (n0 , . . . , nL , nt ). Activation function fi (1 ≤ i ≤ L) in SDAE.

5.2. TSCA And PC system simulations We test performance on Table 1’s datasets to validate TSCA and PC approaches’ effectiveness under not only single prefault, but also multiple prefaults. Each dimension in datasets is normalized to [0, 1] by

xi =

xig − xigmin xigmax − xigmin

,

(19)

where xi is the ith normalized feature, xig denotes the ith generator’s active power generated by TDS, and xigmax , xigmin indicate respectively the maximum and minimum value of the ith generator’s active power set. Because we train the model based on a normalized dataset, subsequent PCs should offer the preventive control schemes whose each dimension is within [0, 1], so that we can obtain generators’ desired active powers by inverse normalization. As a result, the activation functions of SDAE’s ﬁrst hidden layer— namely, SDD’s output layer—are constrained to be sigmoid with a [0, 1] value range in all models. We design and train four different TSCA and PC systems with respect to four training sets. The detailed conﬁgurations involve numbers of layers, hidden units per layer, and hyper-parameters for four systems (see Table 2). The noise level added to input and each layer in all models is uniformly λnoise = 0.4. We impose the constraint f i = gi (1 ≤ i ≤ L ). For each system, we ﬁrst pretrain SDAE based on the mean square error, and then ﬁne-tune the SDAE+Sigmoid DNN with respect to proposed TSCA training criterion by SGD. Hence, (Wt , bt ), (Wi , bi ), and (1 ≤ i ≤ L) are trained. We stack the decoders gi (1 ≤ i ≤ L) as SDD to construct PC model. Finally, we apply the four trained systems to four test sets, and the evaluation results are threefold. 5.2.1. SDAE’s representation learning After SDAE’s pretraining, it can learn latent representations. Compared to most previous TSA methods [1,3,8,10,16,20,43], we abandon uncontrollable system descriptions, and only use the system’s controllable variables as input. Here we investigate whether SDAE can capture suﬃcient system information using only controllable variables. For dataset 1, we compare the performances of various basic ML classiﬁers with three types of input (see Fig. 9). ML classiﬁers include DT, SVM, random forest (RF), NN, logistic regression (LR). Given three types of input, each classiﬁer is trained three times to obtain three models. The NN-based predictors have a feed-forward structure with ﬁve hidden layers. The SVMs are set with a radial

Fig. 9. Classiﬁcation accuracy using three different feature sets. Feature set 1: generators’ active powers; feature set 2: generators’ active/reactive powers, generators’ rotor angles and bus voltages, transmission lines’ active/reactive powers; feature set 3: two latent representations of feature set 1 learned by SDAE.

basis kernel. The DTs are trained with the following settings: the function to measure a split’s quality is information gain, the minimum number of samples required to split an internal node is 2, and the minimum number of samples required to be at a leaf node is 1. For RFs models, we observed that generalization errors started to stabilize at around 50, 15, and 15 trees with respect to the three models using feature set 1, 2, and 3, respectively. The results suggest that all classiﬁers’ accuracies are enhanced markedly using feature set 3, and are close to feature set2s effect. Thus, although we only adopt controllable variables, SDAE captures the latent representations that work even better than feature set 2 with more system variables, and notably promote TSA. Additionally, data’s dimensions increase quickly as the system’s size increases, thereby seriously affecting computational eﬃciency. Yet this paper only uses the system’s controllable variables as input, and reduces dimension by SDAE to overcome the curse of dimensionality. 5.2.2. TSCA To evaluate the four trained TSCAs’ performances, assume N is the number of samples in test dataset, NS , NU are the numbers of stable and unstable samples in test dataset, respectively. Let Nij be the number that samples predicted as the i class, but actually are in the j class. Because we do not know whether the operation conditions within the boundary are stable or unstable, and must adjust them to stable in the PC phase; here, we regard the boundary

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

11

Table 3 Performance and partition results of TSCA on four test datasets. Dataset

FD

FA

AC

t1

t2

Stable

Unstable

Boundary

pmax a

1 2 3 Multifault

0 0 0 0

0 0 0 0

100% 100% 100% 100%

0.57 0.61 0.61 0.69

0.45 0.42 0.42 0.39

5546 635 567 793

1142 633 648 1366

1691 382 435 2791

0.87 0.96 0.85 0.79

a

Maximum degree of stability in a dataset.

Fig. 10. TSCA partition results on four test datasets. The dot indicates the original stable operation conditions and the cross indicates the original unstable operation conditions in given datasets. Green indicates a stable region partitioned by TSCA, red indicates an unstable region partitioned by TSCA, and yellow indicates a boundary region.

operation conditions as correctly classiﬁed samples. Consequently, accuracy (AC), false dismissal (FD), and false alarm (FA) are three indexes used for evaluating TSCA [10].

NSS + NUU + NBS + NBU , N NSU FD = , NU NUS FA = . NS AC =

(20)

Using Table 2’s parameters, we compute t1 and t2 based on Eqs. (7) and (8). Table 3 lists TSCA’s performance and partition results on four test sets. The results suggest that our TSCA offers superior prediction accuracy while minimizing misclassiﬁcations. Fig. 10 clearly shows the three regions obtained from each test set. Obviously, in boundary regions, original stable and unstable samples criss-cross. To see TSCA’s performance improvement over existing TSA techniques, we compare TSCA’s PC and accuracy against three methodologies that offered satisfactory performance for TSA in prior work (see Fig. 11). The three methodologies compared are as follows: the shallow ML method CVM that obtains a high-precision TSA compared to other SVMs in previous work [10]; the interpretable fuzzy_ID3 [6], because it combines DT’s and fuzzy logic’s strength to form a fuzzy decision tree, making it the most accurate model of the three (DT, Fuzzy_DT, Fuzzy_ID3) [4]; and the SDAE-based system belonging to DNN models.

We separately developed the evaluations for the different TSA models’ performance on four datasets in Table 1. Fuzzy_ID3 is developed in Matlab, if a node satisﬁes the following conditions: if the ratio of the sum of data’s membership values in a class to the total sum of membership values is greater than or equal to a threshold of 0.9995, or if the sum of all data’s membership values is less than a threshold of 1, or if there are no attributes for more classiﬁcations; then this node is a leaf node and assigned by its class name. Regarding the SDAE-based system, it consists of SDAE and top sigmoid unit, uses Table 2’s network conﬁgurations, which is in line with proposed TSCA. However, the difference is that TSCA model’s ﬁne-tuning uses the designed TSCA training criterion, while the SDAE-based system ﬁne-tunes with respect to a traditional training criterion based on maximum likelihood, its top learning algorithm is an LR classiﬁer essentially. This lets us compare the two training criteria more intuitively. From these comparisons, it is apparent that blackbox models (CVM, SDAE-based system) perform much better than the interpretable model (Fuzzy_ID3); yet they fail to provide subsequent control actions against unstable situations. The results therefore support the fact that there is an accuracy-versus-comprehensibility tradeoff while switching from a blackbox solution to a interpretable model. While it is noticeable that TSCA is a blackbox model, nevertheless it works well at preventive control, so it overcomes the tradeoff between accuracy and interpretability. Additionally, SDAE-based deep architecture models (the SDAEbased system and proposed TSCA) outperform the shallow ML

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM 12

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

Fig. 11. Comparison with previous TSA methodologies. Table 4 Operation conditions before control. Datasets

OCs

G30

G32

G33

G34

G35

G36

G37

G38

G39

pa

lb

Class

1 2 3 Multifault

x27 x2 x1 x15

2.36 2.02 2.47 2.07

7.0 5.08 6.03 5.47

5.73 4.48 7.18 6.13

4.75 5.70 4.84 4.09

6.26 8.16 8.05 7.47

5.23 7.01 5.93 6.83

4.88 4.82 4.37 6.11

7.76 6.31 6.55 5.95

9.71 11.39 8.01 8.93

0.14 0.55 0.24 0.02

U S U U

U B U U

a b

The degree of stability. Original status in a dataset.

method (CVM); this could be because of DL’s highly complex mapping function and SDAE’s representation learning. Comparing TSCA and the SDAE-based system, as we expected, although they use the same network structure, the proposed TSCA offers better performance. This is compelling evidence that TSCA’s training criterion is effective. In looking at the LR model using feature set 1 in Fig. 9 and the SDAE-based system in Fig. 11(a), both are trained from dataset 1, and use the same input (generators’s active powers). Interestingly, the LR model and top unit of the SDAE-based system both use a sigmoid function and training objective based on maximum likelihood, but the SDAE-based system performs better, because of SDAE’s representation learning. Similarly, the LR model using feature set 3 in Fig. 9 are poorer than the SDAE-based system in Fig. 11(a). This LR model and top sigmoid unit of the SDAE-based system are both fed with SDAE’s output; however, the SDAE-based system ﬁne-tunes SDAE and the top sigmoid unit. Thus, we discern that alongside SDAE’s pretraining, the ﬁne-tuning process is also signiﬁcant, because it modiﬁes SDAE to learn raw data’s taskspeciﬁc feature representations. As a result, there are two reasons for our TSCA model’s superior performance compared to existing TSA methods: SDAE’s representation learning and proposed TSCA training criterion. 5.2.3. Preventive control for operation conditions with low stability For B and U operation conditions, we activate PC immediately. To examine PC models, we conduct PC on four operation conditions under a boundary or unstable class (see Table 4). Of these,

operation conditions 27, 2, and 1 are randomly selected from single prefault datasets, and the last is selected from a multifaut dataset. In Table 4, G30 ~ G39 corresponds to input features xi (here 1 ≤ i ≤ 10). Because G31 is a balancing machine, it is not considered an adjustable generator for PC. For simplicity, we only describe x27 ’s PC process. The normalized x27 is x = (0.37, 0.8, 0.91, 0.0051, 0.22, 0.298, 0.19, 0.07, 0.05, 0.45). First, let k =1, based on a learned TSCA deep structure on dataset 1, Wt = (-1.321, -0.915), bt = -0.312, z = (0.464, 0.999). Table 3 shows maximum degree of stability in Dataset 1 is pmax = 0.87, let pe = pmax , we deduce ze as (-0.997, -0.999) from step 1 of Section 4.2. Second, let ze be trained SDD’s input. We derive xe = (0.614, 0.151, 0.997, 0.374, 0.997, 0.485, 0.599, 0.997, 0.994, 0.505). Further, re-input xe into the trained TSCA: it obtains pe as 0.87. Based on the parameters listed in Table 3, pe = pmax = 0.87 > t1 , which veriﬁes that SDD captures the most stable scenarios and resynthesizes. Because we normalized the data before training, we must calculate the reverse normalization for xe to obtain the actual generator’s active power. Then, we show the generators’ adjusted active powers for all operation conditions in Tables 4 as 5. To validate our PC model’s effectiveness, we perform TDS to obtain the generators’ relative rotor angle trajectories of each condition before and after control (see Figs. 12 and 13). For x15 , because it may be perturbed by three-phase to ground faults on bus 4, 18 and 15, we conduct TDS for x15 when the system is perturbed by three faults respectively.

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

ARTICLE IN PRESS

JID: NEUCOM

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

13

Table 5 Operation conditions after control. OCs

G30

G32

G33

G34

G35

G36

G37

G38

G39

Adjusted pa

kb

27

2.64 2.12 2.64 2.63

7.1 6.48 6.31 7.60

6.17 5.51 6.40 5.37

5.68 5.22 4.93 4.66

6.48 6.84 6.99 6.56

5.72 5.91 5.79 4.0

6.0 4.55 5.54 6.58

8.89 8.53 7.66 6.61

9.81 13.48 13.51 12.33

0.87 0.96 0.85 0.79

1 1 1 1

x x2 x1 x15 a b

Adjusted degree of stability. Iterations in PC model.

Fig. 12. Generators’ rotor angle trajectories for x27 , x2 , and x1 .

Fig. 13. Generators’ rotor angle trajectories for x15 .

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

JID: NEUCOM 14

ARTICLE IN PRESS

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx

In Figs. 12 and 13, each line in the plot represents one generator’s rotor angle δ dynamics. According to Eq. (1), Figs. 12 and 13 illustrate that TSCA labels unstable x27 and x1 in the dataset as U, and PC model adjusts them to stable after control. TSCA labels stable x2 in the dataset as B. Interestingly, for x2 , the rotor angle oscillation range [-4, 3] is reduced as [-2.5, 2] after control. These results reveal that our control method modiﬁes operation conditions to increase stability and alleviate unstable situations. It is interesting that x15 is unstable under three faults, and the proposed control approach stabilizes x15 under each fault. These results reveal that our PC model modiﬁes conditions to stabilize them, even in an environment where various prefaults may occur, and alleviates unstable situations on each fault. Comparing the “Adjusted p” column in Table 5 and last column of Table 3, we see that our method adjusts operation conditions to maximum degrees of stability. More speciﬁcally, we conducted PC on every operation condition in the four-test dataset, and found only one time iteration (k = 1) in PC process; the degrees of stability of all unstable and boundary conditions were adjusted to a maximum degree of stability, and the method modiﬁed all conditions into the dataset’s best stability scenarios. This further veriﬁes SDD’s ability to capture interesting patterns about data and resynthesize good-quality input. 6. Conclusion To avoid the tradeoff between accuracy and transparency for data mining-based models, we investigated an intelligent integrated framework, it consists of SDAE-based online catastrophe assessment model for identifying potentially insecure conditions and catastrophe assessment’s backward mapping model for providing PC schemes of the identiﬁed insecure conditions. We presented the TSCA and PC system’s design, implementation, and evaluation, which embodies our framework in the domain-of-power system. An evaluation using comprehensive simulation experiments shows that three aspects of this system are particularly important. First, fed with only controllable variables, SDAE captures intricate structures in data well, thereby facilitating superior online situational awareness. Second, the top sigmoid’s backward mapping and SDD are able to provide good-quality stable scenarios (which are nearest to the controlled operation conditions in the coordinates along the underlying causes that generate the observed data) for dispatchers. Third, the explored TSCA learning criterion not only considers the misclassiﬁcation costs in two situations, but also ﬁnds a boundary region so as to decrease misclassiﬁcation. This framework therefore holds much promise as a catastrophe assessment and PC framework in engineering. TSCA model consists of an SDAE and sigmoid as simply an instance of a more general framework applied to a power system. We can apply the SDAE-based catastrophe assessment and corresponding PC framework readily to many other industrial systems, so long as we add a task-speciﬁc learning algorithm at SDAE’s top, and derive catastrophe assessment’s backward mapping. Declaration of Competing Interest The authors declare that they have no known competing ﬁnancial interests or personal relationships that could have appeared to inﬂuence the work reported in this paper. Acknowledgment We would like to thank the anonymous referees for helping us in improving this paper with their suggestions. Also, thanks are given to the institutions supporting our work in the form of

projects and grants. In particular, this work is supported by the National Key R&D Program of China (Grant Nos. 2018YFB0803400 and 2017YFB10 030 0 0), Research and Application of Intelligent Technology for Real-time Dispatching of Power Grid Based on Artiﬁcial Intelligence (52460817A029), and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology. References [1] J. James, D.J. Hill, A.Y. Lam, J. Gu, V.O. Li, Intelligent time-adaptive transient stability assessment system, IEEE Trans. Power Syst. 33 (1) (2018) 1049–1058. [2] S. Yin, S.X. Ding, D. Zhou, Diagnosis and prognosis for complicated industrial systems part I, IEEE Trans. Ind. Electron. 63 (4) (2016) 2501–2505. [3] F.J. Luo, Z.Y. Dong, G. Chen, Y. Xu, K. Meng, Y.Y. Chen, K.P. Wong, Advanced pattern discovery-based fuzzy classiﬁcation method for power system dynamic security assessment, IEEE Trans. Ind. Inf. 11 (2) (2017) 416–426. [4] I. Kamwa, S.R. Samantaray, G. Joos, On the accuracy versus transparency trade-off of data-mining models for fast-response PMU-based catastrophe predictors, IEEE Trans Smart Grid 3 (1) (2012) 152–161. [5] S. Guillaume, Designing fuzzy inference systems from data: an interpretability-oriented review, IEEE Trans. Fuzzy Syst. 9 (3) (2001) 426–443. [6] H. Ichihashi, T. Shirai, K. Nagasaka, T. Miyoshi, Neuro-fuzzy ID3: a method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning, Fuzzy Sets Syst. 81 (1) (1996) 157–167. [7] C. Ma, X. Gu, Y. Wang, Fault diagnosis of power electronic system based on fault gradation and neural network group, Neurocomputing 72 (13) (2009) 2909–2914. [8] S.A. Siddiqui, K. Verma, K. Niazi, M. Fozdar, Real-time monitoring of post-fault scenario for determining generator coherency and transient stability through ann, IEEE Trans. Ind. Appl. 54 (1) (2018) 685–692. [9] X. Yan, M. Jia, A novel optimized SVM classiﬁcation algorithm with multi-domain feature and its application to fault diagnosis of rolling bearing, Neurocomputing 313 (2018) 47–64. [10] B. Wang, B. Fang, Y. Wang, H. Liu, Y. Liu, Power system transient stability assessment based on big data and the core vector machine, IEEE Trans Smart Grid 7 (5) (2016) 2561–2570. [11] S. Rovnyak, S. Kretsinger, J. Thorp, D. Brown, Decision trees for real-time transient stability prediction, IEEE Trans. Power Syst. 9 (3) (1994) 1417–1426. [12] W. Sun, C. Jin, J. Li, Decision tree and PCA-based fault diagnosis of rotating machinery, Noise Vib. Worldwide 21 (3) (2007) 1300–1317. [13] G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Comput. 18 (7) (2006) 1527–1554. [14] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, U. Montreal, Greedy layer– wise training of deep networks, Proc. Adv. Neural Inf. Process. Syst. 19 (2007) 153–160. [15] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res. 11 (12) (2010) 3371–3408. [16] X. Yin, Y. Liu, Deep learning based feature reduction for power system transient stability assessment, in: Proceedings of the IEEE Region 10 Conference, IEEE, 2018, pp. 2308–2312. [17] C. Jia, M. Shao, S. Li, H. Zhao, Y. Fu, Stacked denoising tensor auto-encoder for action recognition with spatiotemporal corruptions, IEEE Trans. Image Process. 27 (4) (2018) 1878–1887. [18] B. Du, W. Xiong, J. Wu, L. Zhang, L. Zhang, D. Tao, Stacked convolutional denoising auto-encoders for feature representation., IEEE Trans. Cybern. 47 (4) (2017) 1017–1027. [19] I. Kamwa, S. Samantaray, G. Joos, Catastrophe predictors from ensemble decision-tree learning of wide-area severity indices, IEEE Trans. Smart Grid 1 (2) (2010) 144–158. [20] J. James, D.J. Hill, A.Y. Lam, Delay aware transient stability assessment with synchrophasor recovery and prediction framework, Neurocomputing 322 (2018) 187–194. [21] M. Mohammadi, G. Gharehpetian, On-line transient stability assessment of large-scale power systems by using ball vector machines, Energy Convers. Manag. 51 (4) (2010) 640–647. [22] M. La Scala, R. Sbrizzai, F. Torelli, P. Scarpellini, A tracking time domain simulator for real-time transient stability analysis, IEEE Trans. Power Syst. 13 (3) (1998) 992–998. [23] V. Vittal, E. Zhou, C. Hwang, A.-A. Fouad, Derivation of stability limits using analytical sensitivity of the transient energy margin, IEEE Trans. Power Syst. 4 (4) (1989) 1363–1372. [24] Y. Xue, T. Van Custem, M. Ribbens-Pavella, Extended equal area criterion justiﬁcations, generalizations, applications, IEEE Trans. Power Syst. 4 (1) (1989) 44–52. [25] Y. Xu, Z.Y. Dong, J. Zhao, Y. Xue, D.J. Hill, Trajectory sensitivity analysis on the equivalent one-machine-inﬁnite-bus of multi-machine systems for preventive transient stability control, IET Gener. Trans. Distrib. 9 (3) (2014) 276–286. [26] G. Hou, V. Vittal, Cluster computing-based trajectory sensitivity analysis application to the WECC system, IEEE Trans. Power Syst. 27 (1) (2011) 502–509.

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

JID: NEUCOM

ARTICLE IN PRESS

[m5G;November 26, 2019;10:17]

Y. Liu, M. Zhai and J. Jin et al. / Neurocomputing xxx (xxxx) xxx [27] C.-W. Liu, M.-C. Su, S.-S. Tsay, Y.-J. Wang, Application of a novel fuzzy neural network to real-time transient stability swings prediction based on synchronized Phasor measurements, IEEE Trans. Power Syst. 14 (2) (1999) 685–692. [28] F.R. Gomez, A.D. Rajapakse, U.D. Annakkage, I.T. Fernando, Support vector machine-based algorithm for post-fault transient stability status prediction using synchronized measurements, IEEE Trans. Power Syst. 26 (3) (2011) 1474–1483. [29] K. Sun, S. Likhate, V. Vittal, V.S. Kolluri, S. Mandal, An online dynamic security assessment scheme using Phasor measurements and decision trees, IEEE Trans. Power Syst. 22 (4) (2007) 1935–1943. [30] X. Boyen, L. Wehenkel, Automatic induction of fuzzy decision trees and its application to power system security assessment, Fuzzy Sets Syst. 102 (1) (1999) 3–19. [31] N. Hatziargyriou, G. Contaxis, N. Sideris, A decision tree method for on– line steady state security assessment, IEEE Trans. Power Syst. 9 (2) (1994) 1052–1061. [32] Z. Yang, W. Chen, F. Wang, B. Xu, Generative adversarial training for neural machine translation, Neurocomputing 321 (2018) 146–155. [33] O. Plchot, L. Burget, H. Aronowitz, P. Matejka, Audio enhancing with DNN autoencoder for speaker recognition, in: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 5090–5094. [34] Z. Li, J. Tang, Weakly supervised deep metric learning for community-contributed image retrieval, IEEE Trans. Multimed. 17 (11) (2015) 1989–1999. [35] Z. Li, J. Tang, T. Mei, Deep collaborative embedding for social image understanding, IEEE Trans. Pattern Anal. Mach. Intell. (2018). [36] M. Mahdi, V.I. Genc, Post-fault prediction of transient instabilities using stacked sparse autoencoder, Electr. Power Syst. Res. 164 (2018) 243–252. [37] C. Liu, K. Sun, Z.H. Rather, Z. Chen, C.L. Bak, P. Thøgersen, P. Lund, A systematic approach for dynamic security assessment and the corresponding preventive control scheme based on decision trees, IEEE Trans. Power Syst. 29 (2) (2013) 717–730. [38] Y. Zhou, J. Wu, L. Ji, Z. Yu, K. Lin, L. Hao, Transient stability preventive control of power systems using chaotic particle swarm optimization combined with two-stage support vector machine, Electr. Power Syst. Res. 155 (2018) 111–120. [39] Y. Bengio, et al., Learning deep architectures for ai, Found. Trends® Mach. Learn. 2 (1) (2009) 1–127. [40] Y. Yao, Three-way decisions with probabilistic rough sets, Inf Sci (Ny) 180 (3) (2011) 341–353. [41] T. Guo, L. Zhang, X. Tan, L. Yang, Z. Liang, Data induced masking representation learning for face data analysis, Knowl. Based Syst. 177 (2019) 82–93. [42] D.R. Gurusinghe, A.D. Rajapakse, Post-disturbance transient stability status prediction using synchrophasor measurements, IEEE Trans. Power Syst. 31 (5) (2016) 3656–3664. [43] Y. Xu, Z.Y. Dong, J.H. Zhao, P. Zhang, K.P. Wong, A reliable intelligent system for real-time dynamic security assessment of power systems, IEEE Trans. Power Syst. 27 (3) (2012) 1253–1263. Yangyang Liu received the B.S. and M.S. degrees from the School of Computer Science, Henan Normal University, Xinxiang, China, in 2014 and 2016, respectively. She is currently pursuing the Ph.D. degree in the School of Computer Science and Engineering, Southeast University, Nanjing, China. Her research interests include power system stability and control, deep learning.

15

Jiahui Jin is an associate professor in the School of Computer Science and Engineering, Southeast University, Nanjing, China. He received his Ph.D. degree in computer science from Southeast University in 2015. He had been a visiting Ph.D. student at University of Massachusetts, Amherst, U.S., during August 2012 to August 2014. His current research consists of large-scale data processing, distributed systems, and parallel task scheduling.

Aibo Song received the M.S. degree from Shandong University of Science and Technology, and the Ph.D. degree from Southeast University, Nanjing, China, in 1996 and 2003 respectively. He is currently a professor in the School of Computer Science and Engineering, Southeast University. His current research interests include big data processing, and cloud computing.

Jikeng Lin received the M.S. and Ph.D. degrees from the School of Electrical and Information Engineering, Tianjin University, Tianjin, China, in 1993 and 1998, respectively. He is currently a professor in the School of Electronics and Information Engineering, Tongji University. His current research interests include stability analysis and control of power system, distribution automation, EMS, and smart grid.

Zhiang Wu received his Ph.D. degree in Computer Science from Southeast University, Nanjing, China, in 2009. He is currently a full professor of School of Information Engineering at Nanjing University of Finance and Economics. He is also the director of Jiangsu Provincial Key Laboratory of E-Business. His recent research focuses on distributed computing, data mining, e-commerce intelligence and social network analysis. He is the member of the ACM, IEEE and a senior member of CCF.

Yixin Zhao received the M.S. degree from Stockholm University and Ph.D. degree from Linkoping University, Sweden in 2010 and 2016 respectively. She is currently a postdoc in the School of Computer Science and Engineering, Southeast University. Her current research interests include combinatorial optimization and big data processing.

Mingyu Zhai received his Ph.D. degree from Southeast University, China, in 2002. He is currently a professorlevel engineer in NARI Technology, Nanjing, China. His current research interests include smart power grid dispatching automation, big data processing, and cloud computing.

Please cite this article as: Y. Liu, M. Zhai and J. Jin et al., Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder, Neurocomputing, https://doi.org/10.1016/j.neucom.2019.10.090

Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder

Intelligent online catastrophe assessment and preventive control via a stacked denoising autoencoder

Recommend Documents