Combining a neural network and a rule-based expert system for short-term load forecasting

Combining a neural network and a rule-based expert system for short-term load forecasting

~ Pergamon Computers rod. Engng Vol. 32, No. 4, pp. 787 797. 1997 1997 Elsevier Science Ltd. All rights reserved Printed in Great Britain PII: S0360...

594KB Sizes 1 Downloads 114 Views

~

Pergamon

Computers rod. Engng Vol. 32, No. 4, pp. 787 797. 1997 1997 Elsevier Science Ltd. All rights reserved Printed in Great Britain PII: S0360-8352(97)00009-0 0360-8352/97 $17.oo+ o.oo

COMBINING A NEURAL NETWORK AND A R U L E - B A S E D E X P E R T SYSTEM F O R S H O R T - T E R M LOAD FORECASTING CHIH-CHOU CHIU ~, DEBORAH F. COOK 2, JEN-LUNG KAO ~ and YU-CHAO CHOU l ~Department of Business Administration, Fu-Jen Catholic University, Hsin-Chun, Taiwan and -'Department of Management Science and Information Technology, Virginia Tech., Blacksburg, VA 24061-0235, U.S.A.

(Received 14 March 1997) Abstrael--A backpropagation neural network that used the output provided by a rule-based expert system was designed for short-term load forecasting. Extensive studies were performed on the effect of various factors such as learning rate and the number of hidden nodes. Load forecasting was performed on a l-aiwan power system to demonstrate that the inclusion of the prediction from a rule-based expert system developed for a power system would improve the predictive capability of the neural network. The hourly power load for two typical days was evaluated, and for both days the inclusion of the rule-based expert system prediction as a network input significantly improved the neural network's prediction of power load. The predictive capability of the network was compared to the expert system as well as to a previously developed neural network. The proposed neural network provided improved predictive capability. In addition, the proposed combined approach converges much faster than both the conventional neural network and the rule-based expert system method. 1997 Elsevier Science Ltd

1. INTRODUCTION

It has been recognized that accurate short-term load forecasts represent a great savings potential for electric utility corporations. These savings are realized when the load forecast is used to control operation decisions such as dispatch, unit commitment, fuel allocation, and maintenance [1, 2]. Various techniques for power system load forecasting have been proposed in the past few decades in attempts to improve load forecasting accuracy. Some of these methods that use no weather information have been represented by time sequences [3-5]. In terms of time functions and extrapolation, the load behavior can be successfully described by Fourier series or trend curves [3]. In 1992, state variable models [4] and auto-regressive moving average (ARMA) models [5] were also developed to depict the load trend. Other load techniques have included the effects of weather variables on the power system load [6-8]. By decomposing the load into the weather sensitive and non-sensitive components, forecasting models can be built [6--8]. Most weather sensitive loads are predicted by correlation techniques and the non-weather sensitive loads are represented by A R M A models. Recently, researchers have found an alternative to the classical statistical and adaptive forecasting models in the current developments in artificial intelligence (AI). An expert system approach to short-term load forecasts has been successfully developed [9]. The objective of the expert system approach was to use the knowledge, experience and analytical thinking of experimental system operators. Some authors developed a new method of adaptively identifying the load model which reflects the stochastic behavior with the aid of weather variables [10]. However, to identify changes with time in the load pattern is difficult for a human operator. To overcome the drawback of relying on human experience, an alternative approach in AI, that of neural networks, was proposed [11-13]. These authors applied the backpropagation learning algorithm [14] to train artificial neural networks (ANN) to forecast power loads. Backpropagation is a traditional neural network technique and has been the most often utilized paradigm to date. Several impressive successes in forecasting results using a backpropagation neural network can be 787

788

Chih-Chou Chiu et al.

found [11-13]; however, techniques to further improve the predictive capability of neural network models are sought. A common problem in load forecasting model development using neural networks is the lack of sufficient and accurate training data. Collection of accurate training data is a time consuming task as multiple examples of all ranges of operating conditions are sought. Consequently, techniques that provide additional information for network training without requiring additional data collection are very valuable. The objective of this research effort was to improve the predictive capability of neural network models used in load forecasts by extracting additional information from existing data. Specifically, rule-based expert system and neural network techniques are combined to determine if improved neural network models can be developed. The output of a rule-based expert system for power load forecasting was used as input for a backpropagation neural network to provide additional information to the neural network model. To demonstrate the effectiveness of the proposed combined approach, short-term load forecasting is performed on the Taiwan power system. The backpropagation learning technique with various learning rates is extensively studied to determine the connection weights between neurons. In addition, the number of hidden neurons is also varied to see the effect on the convergence rate. Results from this study indicate that the proposed combined approach provides more accurate predictions and converges much faster that either the conventional neural network approach or rule-based expert system method.

2. NEURAL NETWORKS

A neural network is a massively parallel system comprised of highly interconnected, interacting processing elements, or units, that are based on neurobiological models [15]. Neural networks process information through the interactions of a large number of simple processing elements or units, also known as neurons. Knowledge is not stored within individual processing units, but is represented by the strength between units [16]. Each piece of knowledge is a pattern of activity spread among many processing elements, and each processing element can be involved in the partial representation of many pieces of information. Neural networks can be classified into two different categories, feedforward networks and feedback networks [17]. The feedback networks contain neurons that are connected to themselves, enabling a neuron to influence other neurons and itself. Examples of this type of network are the Kohonen self-organizing network and the Hopfield network. Neurons in feedforward networks

Inputs

Outputs

i

j

\k

\ Fig. I. A multilayer fcedforward neural network.~\ \\

\\

Combined neural network and rule-based expert system

789

(as shown in Fig. l) take inputs only from the previous layer and send outputs only to the next layer. The A D A L I N E and MLP are two typical examples of this kind of network. As shown in Fig. 1, the neural net consists of a number of nodes or neurons connected by links. The nodes in the neural network can be divided into three layers: the input layer, the output layer, and one or more hidden layers. The nodes in the input layer receive input signals from an external source and the nodes in the output layer provide the target output signals. The output of each neuron in the input layer is the same as the input to that neuron. For each neuron j in the hidden layer and neuron k in the output layer, the net inputs are given by netj = ~ w / o ,

(1)

t

and netk = ~w,j*o,

{2)

/

where i (j) is a neuron in the previous layer, o, (or) is the output of node i (j) and w~, (Wk~)is the connection weight from neuron i (j) to neuron j (k). The neuron outputs are given by o, = net,

{3)

1

°r = 1 + exp -- (net i + 0,) = f { n e t " 03

{4)

1

o~ = 1 + exp -- (netk + Ok) =L(netk' 0~)

(5}

where net, (netk) is the input signal from the external source to the node j (k) in the input layer and 0r (Ok) is a bias. The generalized delta rule is the conventional technique used to derive the connection weights of the feedforward network [18]. Initially, a set of random numbers is assigned to the connection weights. Then, for a presentation of a pattern p with target output vector t~ = [trt, tp2 ..... t1,,,~]T, the sum of the squared error to be minimized is given by 1

M

O-

(6)

where M is the number of output nodes. By minimizing the error El, using the technique of gradient descent, the connection weights can be updated by using the following equations [18] as Aw,i(p) = rlf~,jor, i + ~Aw~(p - 1)

(7)

6~j = ( t , j - o~3o,r(1 - o,,,)

(8)

6ej = (~6rk*Wk/)%(1

(9)

where for output nodes

and for other nodes -- Op/)

k

Note that the learning rate q affects the network's generalization and the learning speed to a great extent. The overall training (learning) process for the network using the gradient descent technique is summarized in Fig. 2.

Chih-Chou Chiu et al.

790

Initialize the weights between layers, wji and wkj; f o r m = 1 to output layer iteration n u m b e r or error criterion do decrease the a d j u s t i n g rate for wji and wkj gradually;

for n = 1 to training sample size do calculate the output for each hidden node; calculate the output for each output node; a c c u m u l a t e the difference between actual and target outputs; calculate the modified gradient for wkj; calculate the modified gradient for wji; modify the wkj; m o d i f y the wji; end end

Fig, 2. Training process for networks.

3. L O A D F O R E C A S T I N G U S I N G R U L E - B A S E D E X P E R T SYSTEM A N D B A C K P R O P A G A T I O N ALGORITHM

A rule-based expert system and an ANN are combined to forecast short-term power loads. This method is a dynamic approach in the sense that the hourly load is predicted sequentially using the previous value of the load along with the load value predicted by a rule-based expert system for the next time interval. The network investigated in this paper is illustrated in Fig. 3. The network model consists of three layers. The input layer has two elements or nodes, one representing the power load at the previous time interval and one representing the rule-based expert system prediction of the power load for the next time interval. The hidden layer consists of a number of nodes used for computation purposes. The number of hidden nodes must be determined experimentally and this determination is discussed later. The output layer consists of a single node representing the neural network's prediction of the power load at the next time interval. The terms in Fig. 3 are defined as: • J'(t) = the predicted result at time t provided by the rule-based expert system; • X ( t - 1) = the actual power load at time t - 1; • J'(t)" = the predicted result at time t provided by the combined neural network and rule-based expert system technique. In this research, the forecasting results made by Ho's rule-based expert system [19] were utilized as an input, J'(t), for the neural network. In the development of this expert system, Ho employed the operator's experience and heuristic rules in order to minimize the impact caused by special

x^(t) X ^(t ) X(t-

Fig. 3. The utilized neural network topology.

791

C o m b i n e d n e u r a l n e t w o r k a n d r u l e - b a s e d expert system Table I. Percentage forecast errors of Box-Jenkins method (weekdays) Model Date

SQ=

09/01/86 11/30/86 12/01/86-02/28/87 03/01/87-05/31/87 06/01/87-08/31/87 Total average

1 Q= 2.866 2.933 2.754 3.147 2.880

1

SQ=2

P=

I

SQ=

2.731 2.839 2.859 3.244 2.921

I P= 1 2.882 2.862 2.896 3.132 2.835

sP=

I P= 1

sP=

2.847 2.899 2.931 3.591 3.080

I Q=

1

2.839 2.896 2.931 3.600 3,080

events. The objective of this approach is to use knowledge and experience to find the relationship between the historical load data and temperature (weather variable). In addition to the development of the expert system, Ho et al. [19] applied the Box-Jenkins method to build a forecasting model for the weekly load data from September 1986 to August 31, 1987. The utilized general A R I M A model is shown as follows [19]:

(1--~p~.Bk)ll-

h~,qz,f f ' * " ) ( 1 - B ) ( I - B ' ) Z ( t ) = ( 1 - , ~ l r , , , B " t ) ( 1 - ~ = x w , l B " * ' ) A ( t )

(10)

where normalized load time series; white noise time series; B = backward shift operator; S = seasonal period (24 h); P = order of auto-regressive non-seasonal polynomial; SP= order of auto-regressive seasonal polynomial; Q= order of moving average non-seasonal polynomial; SQ= order of moving average seasonal polynomial; P~-, qh, r,,,, w,, model parameters. z(t)

=

A(O =

=

The forecast errors made by five different A R I M A models identified by Ho are reproduced in Table 1. It is noted that the model (SQ = 1, P = 1) is superior to all other models and is therefore employed by Ho to do the comparison with the expert system forecasting results. The final yearly mean absolute errors (MAE) and root mean square errors (RMSE) obtained by both approaches are reproduced in Table 2. The results reveal that Ho's expert system outperforms the Box-Jenkins method. Based on this conclusion, a comparative study between the neural network model developed in this research and a time series model was not reconducted in this research. A detailed description of the expert system and the Box-Jenkins method can be found in Ref. [19]. Since there are only two input nodes and one output node in the neural network model, the initial number of hidden nodes to test was chosen to be 2, 3, 4, 5, and 6 for this example. There is not a commonly accepted method for determining the number of hidden nodes to use in a backpropagation neural network model. Consequently, experimentation and rules of thumb were used to arrive at the numbers to use. Too few hidden nodes limit network generalization capabilities, while too many hidden nodes can result in overtraining or memorization by the network. An initial evaluation of the learning rate was also conducted. Learning rates of 0.25, 0.50, 0.75, and 1.0 were used with the networks. Large step sizes in the learning process can cause the network to oscillate and not accomplish the required minimization of the error term. The predicted power load RMSE was recorded every 5 epochs or training iterations. Each network was run for 2000 Table 2. Comparison of the Ho expert system approach and the Box Jenkins method

Expert system Box-Jenkins method

MAE

RMSE

2.52% 3.86%

2.91% 4.24%

Chih-Chou Chiu et al.

792 9000 -

8000

t0

7000

/

O~O~o~o~•....,.o/O

6000

5000

0

I

I

I

I

I

I

I

I

I

I

I

I

2

4

6

8

10

12

14

16

18

20

22

24

Hour Fig. 4. Actual power load at M a y 7, 1988.

epochs. The minimum root mean square error (RMSE) of the Taiwan power system data set was used as the learning rate selection factor. 4. N U M E R I C A L

EXAMPLES

To demonstrate the effectiveness of the proposed combination technique, short-term load forecasting is performed on the Taiwan power system data using the neural network that includes the rule-based expert system output. The neural network simulator NETS, developed by NASA [20], was used to develop the load forecasting networks. NETS was implemented on a PC with Pentium 75 MHz CPU. NETS is a C based simulator that provides a system for developing various neural network configurations using the generalized delta backpropagation learning algorithm. The topology of the network is dependent on the problem to be solved. In this network, the

1 1000

10000

/

/

\/



O ¢9

9000

8000 0~

O

\-.-'-.



N o

\,

\



7000

6000

I

I

I

I

I

I

I

I

I

I

I

2

4

6

8

10

12

14

16

18

20

22

Hour Fig. 5. Actual power load at August 19, 1988.

24

C o m b i n e d neural n e t w o r k and rule-based expert system

793

Table 3. The forecasting results t ~ r M a y 7, 1988 Number of hidden nodes 2

3

4

5

6

Learning rate (q)

Test results RMSE

0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00

0.030409 0.018566 0.020024 0.021342 0.035294 0.028118 0.012372 0.019955 0.029895 0.019719 0.018029 0.018836 0.025832 0.019475 0.014110 0.015463 0.026506 0.023117 0.021116 0.033557

output node corresponds to a one step ahead prediction. Each training pattern consists of one previous actual hourly load, the current predicted result made by a rule-based expert system, and the actual current hourly load. That is, the training patterns can be expressed as { J'(2), X(1), J'(2)"}, {J'(3), X(2), J'(3)") ..... and {J(t), X ( t - I), J'(t)"} where t = 1, 2 ..... N. The predictions of the combined approach are presented below as tables and as graphs of predicted vs. actual data. A calculation of RMSE is also included. The normalized outputs of the Taiwan power system for two typical days (May 7, 1988 and August 19, 1988) are shown in Figs 4 and 5 for values of t from 1 to 24. Of the 24 values shown in each figure, the first 17 values were used to train the network and the remaining six values were used for testing. The convergence criterion used for training is a mean square error less than or equal to 0.001 or a maximum of 2000 iterations. The accuracy of the neural network using the predicted output provided by a rule-based expert system with different combinations of hidden node numbers and decreasing gradients in learning rates is compared in Tables 3 and 4. It is observed that the 2-3-1 network with a learning rate of 0.75 provides the best forecasting RMSE for power loads in both days. In other words, the 2-3-1 network is the best forecasting neural network. To examine the convergence characteristics of the proposed combined approach, the RMSE in the learning process for the 2-3-1 network with a learning rate of 0.75 are depicted in Figs 6 and 7. The excellent convergence characteristic of the proposed approach can be easily observed. Table 4. The forecasting results for August 19, 1988 Number of hidden nodes 2

3

4

5

6

CAIE 32,4 D

Learning rate (q)

Test results RMSE

0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00

0.021693 0.019090 0.019912 0.061798 0.021291 0.015842 0.013327 0.063817 0.036166 0.018114 0.017292 0.036216 0.021664 0.047948 0.047630 0.093851 0.028543 0.021197 0.020800 0.028896

C h i h - C h o u C h i u et al.

794 0.6 -

0.5

--

0.4--

0.3 ~-

~5 0.2-0 ,v,

0.1 ~-

k_

0.0

I

I

I

I

0

400

800

1200

I

I

1600

2000

Number of Iteration Fig. 6. T h e R M S E in the l e a r n i n g p r o c e s s f o r M a y 7. 1987.

For comparison, the predicted results for two typical days using the rule-based expert system, the adaptive algorithm proposed by Ho et al. [11] and the combined approach described in this research are summarized in Tables 5 and 6 and Figs 8 and 9. In addition, Tables 7 and 8 are given to illustrate the difference between Ho's neural network model and the network proposed in this paper. In Ho's network, 46 input nodes, 60 hidden nodes, and one output node are included. The inputs contain the forecasted high (low) temperatures in three areas of Taiwan for the day load forecast is being conducted, and the recorded area high (low) temperatures and peak (valley) load in the past 10 days with the same load pattern as the forecast day. A detailed description of the network can be found in Ref. [I 1]. It is observed that the neural network developed using the output provided by the rule-based expert system performed better in most cases than Ho's neural network model and the rule-based expert system. In addition, the proposed combined network required only two input nodes and five hidden nodes to produce a much better prediction than Ho's network. Ho's network required 46 input nodes and 60 hidden nodes, consequently requiring much more computation that the proposed network.

0.7

--

0.6 -

t,r..l

0.5 0.4 -

r~

0.3 0.2 0.1 0.0

-I 0

1

I

I

I

I

400

800

1200

1600

2000

Number of Iteration Fig. 7. T h e R M S E in the l e a r n i n g p r o c e s s f o r A u g u s t 19, 1987.

o

Combined

neural n e t w o r k and rule-based expert s y s t e m

~ m

ooo

I I °°



:g

~,'*,~

m

e~ t~ ~

I

m

ioo

795

i i i i i

~

e~

~

I

~ d d d o o d d

I

l

l

l

l

L

I

t--- r-~

~ ~

Nq o

,o m

o ,o

3 II

tl

.J

~jl

I I I I I I I

~1~ --~ o

~

~

~

I I I I I I I I

m

N N •

. ~. ~. ~. ~.

i

It~

-

[--.

m~

~.

oo

D--

e,l

I

; ~ r-~l

796

C h i h - C h o u C h i u et al.

8400 8200

. . . .

•~.

m



-

A . . . . . . . . .

0 Actual Pred. by E.S. • Pred. by Ho's NN + Pred. by New Tech.

~

8000

"..~.. ~ ,N •~ \ \

7800 @ .2

7600 7400 7200 7000 % .

6800

x \ \

6600 6400

I

I

I

I

I

I

19

20

21

22

23

24

Hour Fig. 8. L o a d forecast results for M a y 7, 1987.

10100 9900

--

--

9700 -9500 -@ .2

.

~ , , / ' A ~ ' + Q%~...'" ".... ~-~. ,~, • '"" "'''4"~x A"' '. \

0 A • +

Actual Pred. byE.S. Pred. b y H o ' s N N Pred. by NewTech.

9300 -9100 --

k~

?

8900 --

::\

8700 -8500 -.. \ +

8300 8100

"A

i

i

I

I

19

20

21

22

I 23

24

Hour Fig. 9. L o a d forecast results for A u g u s t 19, 1987.

Table 7. Comparison between Ho's model and proposed methodology (for May 17, 1987) Neural topology Used training time RMSE

Ho's neural network

Combined E.S. & neural network

46"60"1 more than 600 s (on SUN Workstation) 132.4107

2"5'1 26 s (on PC with Pentium 75 MHz CPU) 52.43551

Table 8. Comparison between Ho's model and proposed methodology (for August 19, 1987) Neural topology Used training time RMSE

Ho's neural network

Combined E.S. & neural network

46'60"1 more than 600 s (on SUN Workstation) 109.8887

2"5"1 26 s (on PC with Pentium 75 MHz CPU) 70.97881

Combined neural network and rule-based expert system

797

5. CONCLUSIONS

The hourly power loads for two typical days of the Taiwan power system were analysed using a combination of neural network and rule-based expert system techniques. Analyses of these data sets were conducted to determine if the inclusion of the prediction from a rule-based expert system in the input data set of a neural network would improve the predictive capability of the network and, thus, eventually help the operational planning of the power system. The effect of the learning rate and the number of hidden nodes on the efficiency of the neural network learning algorithm was extensively studied to identify the learning rate and the number of hidden nodes that resulted in the best predictions of power load. The neural network models developed for the data sets representing two typical days of power usage show that improved predictions of the power load are obtained by including the rule-based expert system prediction in the neural network training data. The inclusion of additional information into the neural network training data set allowed the network to develop an improved representation of hourly power loads without the collection of additional data. Additional data collection is typically expensive and often not possible; consequently, techniques such as the combination technique described, that allow the extraction of additional information from existing data, are valuable to network developers. REFERENCES 1. Pang, C. K., Sheble, G. B. and Albuyeh, F., Evaluation of dynamic programming based methods and multiple area representation for thermal unit commitments. IEEE Trans. Power Apparatus and Systems, 1981, PAS-100, 1212-1218. 2. Hara, K., Kimurs, M. and Honda, N., A method for planning economic unit commitment and maintenance of thermal power systems. IEEE Trans. Power Apparatus and Systems, 1966, PAS-85, 427-436. 3. Christiaanse, W. R., Short-term load forecasting using general exponential smoothing. IEEE Trans. Power Apparatus and Systems, 1971, PAS-90, 900-910. 4. Sharma, P. and Mahalanabis, A. K., Recursive short-term load forecasting algorithm. Proc. lEE, 1974, 121, 59-62. 5. Vemuri, S., Huang, W. L. and Nelson, D. J., On-line algorithms for forecasting hourly loads of an electric utility. IEEE Trans. Power Apparatus and Systems, 1981, PAS-100, 3755-3784. 6. Nakamura, S., Short term load forecasting using weekday load models and bias models. Proc. PICA Con/., 1984, pp. 37-42. 7. Lijesen, L. and Rosing, J., Adaptive forecasting of hourly loads based on load measurements and weather information. 1EEE Trans. Power Apparatus and Systems, 1971, PAS-90, 1757-1767. 8. Van Meeteren, H. and Van Son, P., Short-term load prediction with a combination of different methods. Proe. 1EEE Power Industry Computer Applications Conf., 1979, pp. 192-197. 9. Rahman, S. and Bhatnagar, R., An expert system based algorithm for short-term load forecast. IEEE Trans. Power Systems, 1988, 3, 392-399. 10. Park, Y. M. and Lee, K. Y., Composite modeling for adaptive short term load forecasting. Proe. IEEE PES Summer Meet., No, 90, SM 378-0 PWRS, July 1990. 11. Ho, K. L., Hsu, Y. Y. and Yuan, C. C., Short term load forecasting using a a multilayer neural network with an adaptive learning algorithm. IEEE Trans. Power Systems, 1992, 7, 141-149. 12. Campo, R. and Ruiz, P., Adaptive weather-sensitive short term load forecast. IEEE Trans. Power Systems, 1987, PWRS-s, 592-600. 13. BaFail, A. and Hubele, N., Electric load Bayesian forecasting model. Proc. Winter Con/~ of the American Statistical Association, San Diego, January 1989. 14. Lee, K. Y., Cha, Y. T. and Park, J. H., Artificial neural network methodology for short term load forecasting. NSF Workshop on Art![ieial Neural Network Methodology in Power System Engineering, Clemson University. SC, April 1990. 15. Brainmaker, Vers. 2.3, Sierra Madre, CA Salifronic Scientific Software, 1989. 16. Bauer, B., Parallel distributed processing models using the back-propagation rule for studying analytic and holistic modes of processing in category learning. Master Thesis, Texas A&M University, College Station, 1988. 17. Bolte, P., Applications of neural networks in agriculture. ASAE Paper Number 897591, ASAE, St. Joseph, MI, 1989. 18. Rumelhart, E., Hinton, G. E. and Williams, R. J., Learning Internal Representations b) Error Propagation in Parallel Distributed Processing, Vol. 1. MIT Press, Cambridge, MA, 1986, pp. 318-362. 19. Ho, K. L., Hsu, Y. Y., Chen, C. F., Lee, T. E., Liang, C. C., Lai. T. S. and Chen, K. K., Short term load forecasting of Taiwan power system using a knowledge based expert system. IEEE Trans. Power Systems, 1990, 5, 1214-1221. 20. Baffes, P. T., NETS User Guide. Software Technology Branch, Lyndon B. Johnson Space Center, Clear Lake. TX, 1989.