Dual heuristic programming based nonlinear optimal control for a synchronous generator

Dual heuristic programming based nonlinear optimal control for a synchronous generator

ARTICLE IN PRESS Engineering Applications of Artificial Intelligence 21 (2008) 97–105 www.elsevier.com/locate/engappai Dual heuristic programming bas...

536KB Sizes 4 Downloads 105 Views

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence 21 (2008) 97–105 www.elsevier.com/locate/engappai

Dual heuristic programming based nonlinear optimal control for a synchronous generator Jung-Wook Parka,, Ronald G. Harleyb, Ganesh K. Venayagamoorthyc, Gilsoo Jangd a

School of Electrical and Electronic Engineering, C735, Yonsei University, 134 Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea b School of Electrical and Computer Engineering, Georgia Institute of Technology, 176 Van Leer Building, GA 30332-0250, USA c Department of Electrical and Computer Engineering, University of Missouri-Rolla, 132 Emerson Electric Co. Hall, 1870 Miner Circle, MO 65409-0249, USA d School of Electrical Engineering, Korea University, Anam-dong Seongbuk-Gu, Seoul 136-701, South Korea Received 9 March 2006; received in revised form 27 February 2007; accepted 9 March 2007 Available online 24 April 2007

Abstract This paper presents the design of an infinite horizon nonlinear optimal neurocontroller that replaces the conventional automatic voltage regulator and the turbine governor (CONVC) for the control of a synchronous generator connected to an electric power grid. The neurocontroller design uses the novel optimization neuro-dynamic programming algorithm based on dual heuristic programming (DHP), which has the most robust control capability among the adaptive critic designs family. The radial basis function neural network (RBFNN) is used as the function approximator to implement the DHP technique. The DHP based optimal neurocontroller (DHPNC) using the RBFNN shows improved dynamic damping compared to the CONVC even when a power system stabilizer is added. Also, the DHPNC provides a robust feedback loop in real-time operation without the need for continual on-line training, thus reducing any risk of possible instability associated with the neural network based controllers. r 2007 Elsevier Ltd. All rights reserved. Keywords: Adaptive critic designs; Dual heuristic programming; Optimal control; Power system stabilizer; Radial basis function neural network; Synchronous generator

1. Introduction Synchronous generators in a power system are nonlinear, fast acting, and multivariable systems with dynamic characteristics over a wide range of operating conditions (Adkins and Harley, 1975; Anderson and Fouad, 1994; Venayagamoorthy and Harley, 2001). Conventional linear controllers (CONVC) for the synchronous generator consist of the automatic voltage regulator (AVR) to maintain constant terminal voltage and the turbine governor to maintain constant speed and power at some set point. They are designed to control, in some optimal fashion, the generator around one particular Corresponding author. Tel.: +82 2 2123 5867; fax: +82 2 313 2879.

E-mail addresses: [email protected] (J.-W. Park), [email protected] (R.G. Harley), [email protected] (G.K. Venayagamoorthy), [email protected] (G. Jang). 0952-1976/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2007.03.001

operating point; and at any other point the generator’s damping performance is degraded. As a result, sufficient margins of safety are included in the generator maximum performance envelope to allow for degraded damping when transients occur. As an alternative for the CONVC, artificial neural networks (ANNs) have been used in the design of nonlinear adaptive controllers with various control objectives in the field of electrical power engineering, especially for the synchronous generator excitation control (He and Malik, 1997; Kobayashi and Yokoyama, 1996). The radial basis function neural network (RBFNN) among the ANN family has in particular attracted attention for the control of synchronous generators (Segal et al., 2000; Swidenbank et al., 1999) and flexible ac transmission systems (FACTS) (Dash et al., 2000) devices due to its powerful control capability for damping of low-frequency oscillations as well as the faster convergence speed for identifying the plant

ARTICLE IN PRESS 98

J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

(Park et al., 2002). The adaptive control approach using an ANN cannot avoid the possibility of instability when moving between very different operating conditions while the neural network based controller is trained on-line using a gradient descent algorithm (Werbos, 1998). The adaptive critic designs (ACDs) technique (Miller et al., 1990; Prokhorov and Wunsch, 1997; Si and Wang, 2001; Werbos, 1992) for implementation of optimal control (Bertsekas, 2001; Gregory and Kin, 1992; White and Jordan, 1992) using ANNs, overcomes this issue of instability and provides robustness for the controller based on an infinite horizon time solution. The ACD based controller can use ANNs to identify and control the process. Park et al. (2003) have compared the performance of the multilayer perceptron neural network (MLPNN) and RBFNN for the control of the turbogenerator using the heuristic dynamic programming (HDP) technique, which is the simplest one among the ACDs family. New work in this paper describes the implementation of the dual heuristic programming (DHP), which has the most robust control capability among the ACDs family, for nonlinear optimal neurocontrol of a synchronous generator using the RBFNN. Despite the fact that the RBFNN is known for its functional approximation capability in providing best fits from input data spaces, there has been no research work to the knowledge of the authors in using the RBFNN as a function approximator to implement the DHP algorithm. This paper also describes the background of the ACDs with relation to optimal control theory in Section 2. The DHP based optimal nonlinear neurocontroller (DHPNC) using the RBFNN is designed for the control of a synchronous generator connected to an infinite bus in Section 3. The DHPNC is trained off-line prior to commissioning. The transient and dynamic performances for the proposed DHPNC are evaluated with case studies by time-domain simulations in Section 4. Finally, the conclusions are given in Section 5. 2. Adaptive critic designs The ACD technique, which was proposed by Werbos (Miller et al., 1990; Werbos, 1992, 1998), is a novel optimization and control algorithm based on the mathematical analysis to handle the classical optimal control problem by combining concepts of reinforcement learning and approximate dynamic programming (ADP). Use of the ACD technique allows the design of an optimal adaptive nonlinear controller. The sequence for user-defined cost-to-go function J in (1) is over an infinite time (so-called infinite horizon problem), and the Bellman equation using the classical dynamic programming (DP) in (2) to minimize/maximize function J has the well-known curse of dimensionality problem because the DP prescribes a search which tracks backward from the final step, retaining in memory all suboptimal paths from any given point to the finish, until the

starting point is reached (Kim, 2006). Jp ðx0 Þ ¼

1 X

gk gðxðkÞ; uðkÞÞ,

(1)

k¼0

Jkþ1 ðxÞ ¼ min½gðx; uÞ þ gJk ðf ðx; uÞÞ;

k ¼ 0; 1; . . . ,

u2U

J0 ðxÞ ¼ 0

for all x,

ð2Þ

where k is a discrete time index at each step, Jp ðx0 Þ denotes the cost associated with an initial state x0 , and a control policy p ¼ fu0 ; u1 ; . . .g; and g is the discount factor ð0ogo1Þ. The value iteration of function J in (2) based on two ‘‘critic’’ networks in the HDP algorithm, which is the simplest algorithm among the ACD families, was shown in Park et al. (2003) to give good neurocontroller performance. The ADP is now applied for the value iteration of derivatives of J with respect to the state using the DHP algorithm. Generally, two critic networks in the ACDs, where each critic network outputs an estimate of the total future value of a utility function U (user-defined cost function) at a different moment in time, are used in estimating the approximate heuristic cost-to-go function JðkÞ: JðkÞ ¼

1 X

gp Uðk þ pÞ.

(3)

p¼0

After minimizing the J in (3) by the critic networks, the control network (the so-called ‘‘action’’ network or ‘‘actor’’ in the ACD literature) is trained with the estimated output backpropagated from the critic network to obtain the converged ANN weights for the optimal control u. The backpropagation calculates the future derivatives of U with respect to present actions, through the critic and model networks. In other words, the backpropagation plays roles similar to those of the co-state and adjoint equations in Pontryagin’s Minimum Principle from the viewpoint of classical optimal control theory. A more detailed explanation is given in Park et al. (2003). An illustration relating optimal control theory to the ACDs is shown in Fig. 1. The DHP algorithm described in this paper uses three different RBFNNs, namely the critic, model (identifier), and action network. The critic network in DHP approximates the derivates of the function J with respect to the states of the plant. Note that the optimal controller can be decomposed into two parts, which are the estimator (critic network in the ACDs algorithms) and actuator (action network in the ACDs algorithms). The estimator portion of the optimal controller is an optimal solution of the problem of estimating the value of cost function assuming no control takes place, while the actuator portion is an optimal solution of the control problem assuming perfect state information prevails. In other words, the two portions of the optimal controller can be designed independently as optimal solutions of estimation and control problems. All steps taken in both optimal controller designs for

ARTICLE IN PRESS J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

Adaptive critic designs

Optimal control theory

Off-line solution

Fig. 1. Optimal controller designs for infinite horizon time solution: optimal control theory versus adaptive critic designs (ACDs).

99

generator, V fd is the exciter field voltage, V b is the infinite bus voltage, Do is the speed deviation, DV t is the terminal voltage deviation, V t is the terminal voltage, DV ref is the reference voltage deviation,V ref is the reference voltage, DPin is the input power deviation, and Pin is the turbine input power. The position of the switches S1 and S2 in Fig. 2 determines whether the DHPNC, or the CONVC consisting of governor/turbine and AVR/exciter combinations, is controlling the plant. Block diagrams for the CONVC appear in Figs. 3 and 4. The time constants and gains for the AVR/exciter and turbine/governor systems are given in Park et al. (2003). The power system stabilizer (PSS) with output V PSS in Fig. 5, is added to the exciter in Fig. 3. The combination of the CONVC plus the PSS is also compared with the DHPNC. The procedure in Kundur et al. (1989) is used for the selection of the PSS parameters, which appear in Table 1. The stabilizer output limits (0.2 pu for V PSSMX and 0:1 pu for V PSSMIN Þ in Fig. 5 are imposed in order to restrict the level of generator terminal voltage fluctuation during the transient conditions. Also, the selection of parameters for the CONVC in Park et al. (2003) was made by investigating the desired steady state and transient performances depending on their objectives. 3.2. Radial basis function neural network

Fig. 2. Plant model used for the control of a synchronous generator.

an infinite horizon time solution in Fig. 1 are carried out off-line. 3. DHP neurocontroller design using RBFNN 3.1. Plant modelling The synchronous generator, turbine, exciter, and transmission system connected to an infinite bus in Fig. 2, form the plant (dashed block in Fig. 2) that is to be controlled. The generator (G) with its damper windings is described by the seventh order d–q axis set of equations with the generator current, speed, and rotor angle as the state variables (Anderson and Fouad, 1994). In the plant, Pt and Qt are the real and reactive power at the generator terminal, respectively, Z e is the transmission line impedance, Pm is the mechanical input power to the

The RBFNN (see the structure of the RBFNN in Park et al., 2003) consists of three layers (input, hidden, and output). The input values are each assigned to a node in the input layer and passed directly to the hidden layer without weights. The hidden layer nodes are called RBF units, determined by a vector called center and a scalar called width. The Gaussian density function is used as an activation function for the hidden neurons. Then, linear output weights connect the hidden and output layers. The overall input–output mapping equation of the RBFNN is given by ! h X kX  Cj k2 yi ¼ bi þ vji exp  , (4) b2j j¼1 where X is the input vector, Cj 2 Rn is the jth center of RBF unit in the hidden layer, h is the number of RBF units, bi and vji are the bias term and the weight, respectively, between hidden and output layers, and yi is the ith output in the m-dimensional space. Once the optimal RBF centers are established over a wide range of operating points and conditions of the plant, the width of the ith center in the hidden layer is calculated by " #1=2 h X n 1X bi ¼ ðkcki  ckj kÞ , (5) h j¼1 k¼1 where cki and ckj are the kth value of the center of the ith and jth RBF units, respectively. In (4) and (5), k  k

ARTICLE IN PRESS J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

100

Fig. 3. Block diagram of the AVR/exciter combination.

Fig. 4. Block diagram of the turbine/governor combination.

  Fig. 5. Block diagram of power system stabilizer. Table 1 PSS time constants and gain T W (s)

T 1 (s)

T 2 (s)

T 3 (s)

T 4 (s)

K STAB

3

0.2

0.2

0.045

0.045

50

XðtÞ of the model network ¼ ½DoðkÞ; DV t ðkÞ; DPin ðkÞ; DV ref ðkÞjk ¼ ft  1; t  2; t  3g. ^ YðtÞ of the model network ¼ ½DoðtÞ; DV^ t ðtÞ.

The critic network estimates the derivatives of function J ^ ¼ with respect to the vector of observables DYðtÞ ^ ^ ½DoðtÞ; DV t ðtÞ (input vector of the critic network) of the plant identified by the model network. And, it learns to minimize the following error measure over time: X kE C k ¼ eTC ðtÞeC ðtÞ, (6) t

represents the Euclidean norm. To avoid the extensive computational complexity during training, the batch mode k-means clustering algorithm is initially used to calculate the centers of the RBF units. Thereafter, the pattern mode least-mean-square (LMS) algorithm is used in updating the output linear weights. By trial and error, 12 neurons are used in the hidden layer of the RBFNN for the model network, 6 neurons for the critic and action networks. 3.3. DHP neurocontroller The configuration for the critic network training in the DHP is shown in Fig. 6. The input reference vector Yref into the plant and output vector Y from the plant are

 

Yref ðtÞ, input reference vector to the plant ¼ ½Pin ðtÞ; V ref ðtÞ. YðtÞ, output vector from the plant ¼ ½DoðtÞ; DV t ðtÞ.

The input vector X and output vector Y for the action and model networks are

 

XðtÞ of the action network ¼ ½DoðkÞ; DV t ðkÞjk ¼ ft  1; t  2; t  3g. YðtÞ of the action network ¼ AðtÞ ¼ ½DPin ðtÞ; DV ref ðtÞ.

eC ðtÞ ¼

^ ^ Yðt ^ þ 1Þ qU½DYðtÞ qJ½DYðtÞ qJ½D  . g ^ qDYðtÞ qDYðtÞ qDYðtÞ

(7)

After exploiting all relevant pathways of backpropagation as shown in Fig. 6, where the paths of derivatives and adaptation of the critic network are depicted by dotted and dash–dot lines, the error signal eC ðtÞ is used for training to update the output linear weights of the critic network. The mathematical closed-form solution of the RBFNN for backpropagation computation, which accounts for the estimated input vector X from the future output vector Y, based on the chain rule of derivatives, is given as follows: qY qY qt qpL qqL qpl qql ¼ qX qt qpL qqL qpl qql qX ( ) ! ml ml X X Ck  ql ¼ 2 Þ 1  VL;j , f ðq l b2k j¼1 k¼1

ð8Þ

where t is target value, ml is the number of neurons in the hidden layer, p is the output of the activation function for a neuron, q is the regression vector as the activity of a neuron, L and l denote the output and hidden layers, respectively, C and b are the center and width of the RBFNN, respectively, the function f is the Gaussian density function defined in the right-hand side in (4) as

ARTICLE IN PRESS J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

101

Fig. 6. Critic network adaptation in DHP. This diagram shows the implementation of (10). The same critic network is shown for two consecutive times, t and t þ 1. The discount factor g is chosen to be 0.5. Backpropagation paths are shown by dotted and dash–dot lines. The output of the critic network lðt þ 1Þ is backpropagated through the model network from its outputs to its inputs, yielding the first term of (9) and qJðt þ 1Þ=qAðtÞ. The latter is backpropagated through the action network from its outputs to its inputs forming the second term of (9). Backpropagation of the vector qUðtÞ=qAðtÞ through the action network results in a vector with components computed as the last term of (10). The summation of all these signals produces the error vector ec ðtÞ used for training the critic network.

Fig. 7. Action network adaptation in DHP. The discount factor g is chosen to be 0.5. Backpropagation paths are shown by dotted lines. The output of the critic network lðt þ 1Þ at time ðt þ 1Þ is backpropagated through the model network from its outputs to its inputs (output of the action network), and the resulting vector multiplied by the discount factor ðg ¼ 0:5Þ and added to qUðtÞ=qAðtÞ. Then, an incremental adaptation of the action network is carried out by (12) and (13).

an exponential form, and V is the vector of output linear weights of the RBFNN. The jth component of the second term in (7) can be expressed by the output of critic network at time tþ1, ^ Yðt ^ þ 1Þ=qDY ^ i ðt þ 1Þ as follows: l^ i ðt þ 1Þ ¼ qJ½D n X ^ Y^ ðt þ 1Þ ^ i ðt þ 1Þ qJ½D qY ¼ l^ i ðt þ 1Þ qDYj ðtÞ qDYj ðtÞ i¼1 þ

m X n X k¼1 i¼1

^ i ðt þ 1Þ qAk ðtÞ qDY , l^ i ðt þ 1Þ qAk ðtÞ qDYj ðtÞ

where n and m are the numbers of outputs of the model and the action networks, respectively. By using (9), each component of the vector eC ðtÞ from (7) is determined by

eCj ðtÞ ¼

^ ^ Yðt ^ þ 1Þ qJ½DYðtÞ qJ½D g ^ j ðtÞ qDYj ðtÞ qDY 

ð9Þ

m qU½DYðtÞ X qU½DYðtÞ qAk ðtÞ  . qDYj ðtÞ qAk ðtÞ qDYj ðtÞ k¼1

ð10Þ

ARTICLE IN PRESS J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

102

Table 2 Operating conditions used for training of the model, critic, and action networks in off-line

Using (10), the expression for the weights’ update for the critic network is as follows: qeC ðtÞ , (11) DVC ðtÞ ¼ ZC eTC ðtÞ qVC ðtÞ where ZC is a positive learning rate and VC contains the weights of the DHP critic network. The adaptation of the action network in Fig. 6 is ^ þ 1Þ back illustrated in Fig. 7, which propagates lðt through the model network to the action network. The goal of this adaptation is expressed in (12), and the weights of the action network are updated by (13). ^ Yðt ^ þ 1Þ qU½DYðtÞ qJ½D þg ¼ 0 8t, qAðtÞ qAðtÞ " DVA ðtÞ ¼ ZA

^ Yðt ^ þ 1Þ qU½DYðtÞ qJ½D þg qAðtÞ qAðtÞ

(12) #T

Pt (pu)

Qt (pu)

Pin (pu)

d (rad/ s)

Ze (pu)

A B C D E F G

0.2 0.4 0.6 0.8 0.6 0.6 0.6

0.0853 0.0983 0.1270 0.1720 0.1273 0.1211 0.1339

0.2001 0.4003 0.6007 0.8013 0.6007 0.6007 0.6007

0.3877 0.7167 0.9678 1.1578 1.0270 1.0312 1.0823

0:02 þ j0:4 0:02 þ j0:4 0:02 þ j0:4 0:02 þ j0:4 0:025 þ j0:5 0:03 þ j0:5 0:03 þ j0:6

X: Operating conditions used for trai aini ning ng of DHPNC in Section 3 [A] [A]

AB BC CD DF

O: Operating conditions used for te testin ing of DHPNC in Section 4 [B-1], [B ], [B-2] [ ] O

1 Approximated training boundary

: Steady-state stability limit : Turbine limit : Armature current limit : Field current limit C

B

Real Power (P) [pu]

Fig. 8. Training of the model network using the backpropagation algorithm.

Operating conditions

O

0.8

X

0.6

XXX X [A]

0.4

[B[B-2] [B-1] [B-

D

X X

0.2 A

-0.6

F

-0.4

-0.2

0

0.2

Reactive Power (Q) [pu]

Underexcited (Lead)

0.4

0.6

0.8

1

Overexcited (Lag)

Fig. 9. Operating points for training and testing for the DHPNC in the first quadrant of the P–Q plane.

qAðtÞ , qVA ðtÞ

adaptation by the backpropagation algorithm. (13)

where ZA is a positive learning rate and VA contains the weights of the DHP action network. The word ‘‘dual’’ is used to describe the fact that the target outputs l in Fig. 6 are calculated for the DHP critic adaptation using backpropagation in a generalized sense. More precisely, it does use dual subroutines (states and costates) to backpropagate derivatives through the model and action networks, as shown in Figs. 6 and 7. The dual subroutines and their explanations are found in Prokhorov and Wunsch (1997). Fig. 8 illustrates how the model network (identifier) is trained to identify the dynamics of the plant. The input vector, AðtÞ ¼ ½DPin ðtÞ; DV ref ðtÞ is fed into the plant with the vector, Yref ðtÞ ¼ ½Pin ðtÞ; V ref ðtÞ. In the off-line training mode, the input signals of AðtÞ are generated with a sampling period of 200 ms as small pseudo-random binary signals (PRBSs) within 10% of their nominal values. The output vector of the plant is YðtÞ ¼ ½DoðtÞ; DV t ðtÞ, and the model ^ ¼ f^ðXðtÞÞ ¼ ½DoðtÞ; ^ network output is DYðtÞ DV^ t ðtÞ, where XðtÞ is the input vector to the model network consisting of three time lags of system input AðtÞ and output YðtÞ. The residual vector, eM ðtÞ given in (14) is used for updating the model network’s weights VM in (15) during

^ eM ðtÞ ¼ YðtÞ  DYðtÞ ^ ¼ ½DoðtÞ  DoðtÞ; DV t ðtÞ  DV^ t ðtÞ, DVM ðtÞ ¼ ZM eTM ðtÞ

qeM ðtÞ , qVM ðtÞ

ð14Þ (15)

where ZM is a positive learning rate and VM contains the weights of the DHP model network. 3.4. Training procedure for the DHP The general training procedure to adapt the DHP in offline mode is explained in Prokhorov and Wunsch (1997). It consists of two training cycles: one for the critic network and the other for the action network. Before the training of the action and critic networks, the model network is trained under a wide range of operating conditions shown in Table 2 and Fig. 9, and its weights VM (which have global convergence property1) are fixed. Then, the critic and action networks are also trained under these operating 1 Park et al. (2002) reported on the global convergence property of the RBFNN for on-line identification of a synchronous generator in a power system under various operating points and conditions when the inputs and outputs of the RBFNN are deviation signals.

ARTICLE IN PRESS J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

103

conditions (in region [A]), which are within the generator capability limits in the first quadrant (overexcited generator operation, pf lagging) of the P–Q plane shown in Fig. 9. In Section 4, the dynamic performances of the DHPNC are tested in different operating conditions (points [B-1] and [B-2] in Fig. 9), which were NOT used during the training phase of the neural networks. The critic network’s training is initially carried out and alternated with the action network’s training until an acceptable control performance is achieved. The weights VC are initialized with small random values. In the critic network’s training cycle, the incremental optimization is carried out by (10) and (11). In the action network’s training cycle, the incremental learning is carried out by (12) and (13). It is important that the whole system consisting of the DHP and plant should remain stable while both the critic and action networks undergo adaptation, thus the initial weights of the action network are those that ensure stabilizing control at an operating point. Each training cycle (lengths of the corresponding training cycles for the critic and action network, respectively) is continued until convergence of the respective network’s weights. The convergence of the action network’s weights means that the training procedure has found weights that yield the optimal control u for the plant under consideration. The discount factor g of 0.5 is used in (10) and (12), and the utility function UðtÞ in (16) is used in the heuristic costto-go function in (3) during training of the critic and action networks.

updated on-line at some point in time), but the problem is how to obtain data that represents the entire field or area of operation. A limited amount of data could be recorded on a physical generator plant over a long period of time, or more amounts of data could be obtained by the time-domain simulation through the exact model of the real plant. Based on the simulation study carried out in this paper, it was convenient to change operating conditions at much shorter intervals of time in order to speed up the simulation process. Also, it was possible to obtain the enough amounts of data to model the plant over the entire operating conditions, and therefore train the neural networks. This simulation process can be easily extended in case of learning (of the neuroidentifier and neurocontroller) for the new facilities as well as the new operating conditions of the existing plant, and then applied for the real-time implementation.

UðtÞ ¼ ½0:4DoðtÞ þ 0:4Doðt  1Þ þ 0:16Doðt  2Þ2

4.1. 5% step changes in the reference voltage of the exciter

2

þ ½4DV t ðtÞ þ 4DV t ðt  1Þ þ 16DV t ðt  2Þ .

ð16Þ

A detailed explanation for the derivation of the utility function in (16) is presented in Venayagamoorthy and Harley (2001). This utility function is formed such that it ensures a positive quadratic cost function when derivatives exist.

4. Case studies After training the critic and action networks off-line, the DHPNC with fixed weights is ready to control the plant. The performances of the DHPNC are compared with those of the combination of the CONVC plus a PSS ðCONVC þ PSSÞ as well as the CONVC. Two separate disturbances are applied to the system in Fig. 2, namely a 5% step change in the reference voltage of the exciter and a three-phase short circuit at the infinite bus, and the improvement in system damping is evaluated.

The synchronous generator of the plant in Fig. 2 is operating at a steady-state condition (Pt ¼ 1 pu, Qt ¼ 0:234 pu, and Z e ¼ 0:02 þ j0:4 pu), which is the point [B-1] in Fig. 9. At t ¼ 1 s, a 5% step increase ðDV ref Þ in the

85

3.5. Real-time implementation and data collection

 

Data recorded from the physical system over a period of time could be used off-line for the training of neuroidentifier and neurocontroller. The neuroidentifier and neurocontroller could be trained on-line as the physical system moves from one operating condition to the next over periods of minutes or even hours.

80

δ [Degree]

The tests described in the next section are carried out based on the computer simulation. On a physical system, the training of the neural networks could occur in either of the following two ways:

CONVC CONVC+PSS DHPNC

75

70

65

60

In both the off-line and on-line training cases, the neuroidentifier will not be allowed to interact with the neurocontroller until the neuroidentifier’s training has converged. The neural networks can be trained off-line (and then

0

5

10

15

20

25

Time [s]

Fig. 10. Rotor angle response for a 5% step changes in reference voltage of exciter.

ARTICLE IN PRESS J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

104

317

CONVC CONVC+PSS DHPNC

1.12

316

1.1 1.08

315 ω [rad/s]

Vt [pu]

CONVC CONVC+PSS DHPNC

1.06 1.04

314 313

1.02

312

1 0

5

10

15

20

25

311 0

Time [s]

Fig. 11. Terminal voltage response for a 5% step changes in reference voltage of exciter.

1

2

3

4 Time [s]

5

6

7

8

Fig. 13. Speed response for a three-phase short circuit test.

105 130

CONVC CONVC+PSS DHPNC

100 95

120 110 δ [Degree]

δ [Degree]

90 85 80 75

100 90 80

70 65

CONVC CONVC+PSS DHPNC

70

60 60

55 0

1

2

3

4 Time [s]

5

6

7

8

Fig. 12. Rotor angle response for a three-phase short circuit test.

reference voltage of the exciter is applied. At t ¼ 12 s, the 5% step increase is removed, and the system returns to its initial operating point. The results in Figs. 10 and 11 show that both the CONVC and CONVC þ PSS have the faster responses (underdamped) than the DHPNC, but exhibit an oscillatory response (especially during the first and second swings). From the viewpoint of stability, the DHPNC has the better damping. 4.2. Three-phase short circuit test to represent a severe impulse type disturbance A test is now carried out to evaluate the performances of the controllers during a severe disturbance. At t ¼ 0:3 s, a temporary three-phase short circuit is applied at the infinite bus for 100 ms from t ¼ 0:3 to 0.4 s while the plant is operating at the same steady-state condition ([B-1] in Fig. 9) as in the previous test. Figs. 12 and 13 show that the DHPNC has the best damping performance of low-

0

1

2

3

4

5

6

7

8

Time [s] Fig. 14. Rotor angle response for a three-phase short circuit test at Pt ¼ 1:1 pu and Qt ¼ 0:19 pu operating point.

frequency oscillations for the rotor angle ðdÞ and speed ðoÞ when compared to other controllers. 4.3. Three-phase short circuit test close to the stability limit In order to test the robustness of the DHPNC, the plant pre-fault operating point is now changed to a different steady-state condition ([B-2] in Fig. 9) from the previous tests. The active power from the generator is increased by 10% to Pt ¼ 1:1 pu, and Qt ¼ 0:19 pu, which is closer to the stability limit of the generator. At t ¼ 0:3 s, the same 100 ms three-phase short circuit is again applied at the infinite bus. The parameters for all the controllers are the same as in previous tests. A comparison of the damping performance for the CONV, CONVC þ PSS, and DHPNC appears in Figs. 14 and 15, which show that the synchronous generator controlled by the CONVC goes unstable and loses synchronism after the disturbance. In contrast, the other

ARTICLE IN PRESS J.-W. Park et al. / Engineering Applications of Artificial Intelligence 21 (2008) 97–105

References

317 CONVC CONVC+PSS DHPNC

316 315 ω [rad/s]

105

314 313 312 311 0

1

2

3

4 Time [s]

5

6

7

8

Fig. 15. Speed response for a three-phase short circuit test at Pt ¼ 1:1 pu and Qt ¼ 0:19 pu operating point.

controllers damp out the oscillations and restore the generator to a stable mode with the DHPNC providing the best damping. 5. Conclusions This paper has made new contributions by applying the radial basis function neural network (RBFNN) to implement the dual heuristic programming (DHP) form of nonlinear optimal neurocontrol. The RBFNN based DHP neurocontroller (DHPNC) has been designed for the control of a synchronous generator connected to an infinite bus in a power grid. The results show that the DHPNC improves the system damping stability more effectively than the CONVC þ PSS or the CONVC, for a large disturbance such as a three-phase short circuit as well as for a small 5% step change in the reference voltage of the exciter. The improved damping performance by the DHPNC allows the generator to be operated closer to its steady-state stability limit, and still remains stable after a severe disturbance. The DHPNC has fixed parameters for its model, action, and critic neural networks, which are trained off-line based on the infinite horizon optimal control approach. This means that there are no adaptive parameters in a real-time operation. The results show that the DHPNC provides a robust feedback with a good dynamic control capability even under operating conditions in the system, where it was NOT trained. Therefore, the DHPNC reduces the risk of possible instability issue associated with artificial neural networks (ANNs) based controllers with on-line varying parameters. Acknowledgment This work was supported by the MOCIE through EIRC program with APSRC at Korea University, Seoul, South Korea.

Adkins, B., Harley, R.G., 1975. The General Theory of Alternating Current Machines. Chapman & Hall, London. Anderson, P.M., Fouad, A.A., 1994. Power System Control and Stability. IEEE Press, New York. Bertsekas, D.P., 2001. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, MA. Dash, P., Mishra, S., Panda, G., 2000. A radial basis function neural network controller for UPFC. IEEE Transactions on Power Systems 15 (4), 1293–1299. Gregory, J., Kin, C., 1992. Constrained Optimization in the Calculus of Variations and Optimal Control Theory. Van Nostrand Reinhold, New York. He, J., Malik, O.P., 1997. An adaptive power system stabilizer based on recurrent neural networks. IEEE Transactions on Energy Conversion 12 (4), 413–418. Kim, B.H., 2006. A study on the convergency property of the auxiliary problem principle. Journal of Electrical Engineering & Technology 1 (4), 455–460. Kobayashi, T., Yokoyama, A., 1996. An adaptive neuro-control system of synchronous generator for power system stabilization. IEEE Transactions on Energy Conversion 11 (3), 621–630. Kundur, P., Klein, M., Rogers, G.J., Zywno, M.S., 1989. Application of power system stabilizers for enhancement of overall system stability. IEEE Transactions on Power Systems 4 (2), 614–626. Miller, W.T., Sutton, R.S., Werbos, P.J., 1990. Neural Networks for Control. MIT Press, Cambridge, MA. Park, J.W., Venayagamoorthy, G.K., Harley, R.G., 2002. Comparison of MLP and RBF neural networks using deviation signals for on-line identification of a synchronous generator. In: Proceedings of the IEEE PES Winter Meeting. New York, pp. 274–279. Park, J.-W., Harley, R.G., Venayagamoorthy, G.K., 2003. Adaptivecritic-based optimal neurocontrol for synchronous generators in a power system using MLP/RBF neural networks. IEEE Transactions on Industry Applications 39 (5), 1529–1540. Prokhorov, D.V., Wunsch, D.C., 1997. Adaptive critic designs. IEEE Transactions on Neural Networks 8 (5), 997–1007. Segal, R., Kothari, M.L., Madnani, S., 2000. Radial basis function (RBF) network adaptive power system stabilizer. IEEE Transactions on Power Systems 15 (2), 722–727. Si, J., Wang, Y.T., 2001. On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks 12 (2), 264–276. Swidenbank, E., McLoone, S., Flynn, D., Irwin, G.W., Brown, M.D., Hogg, B.W., 1999. Neural network based control for synchronous generators. IEEE Transactions on Energy Conversion 14 (4), 1673–1678. Venayagamoorthy, G.K., Harley, R.G., 2001. A continuously online trained neurocontroller for excitation and turbine control of a turbogenerator. IEEE Transactions on Energy Conversion 16 (3), 261–269. Werbos, P.J., 1992. Approximate dynamic programming for real-time control and neural modeling. In: Handbook of Intelligent Control. Van Nostrand Reinhold, New York, pp. 493–526. Werbos, P.J., 1998. Stable adaptive control using new critic designs. National Science Foundation (NSF) Report in adap-org 9810001 from arXiv.org/form, nonlinear sciences, 23, USA. White, D.A., Jordan, M.I., 1992. Optimal control: a foundation for intelligent control. In: White, D., Sofge, D. (Eds.), Handbook of Intelligent Control. Van Nostrand Reinhold, New York, pp. 185–214.