Medium term system load forecasting with a dynamic artificial neural network model

Medium term system load forecasting with a dynamic artificial neural network model

Electric Power Systems Research 76 (2006) 302–316 Medium term system load forecasting with a dynamic artificial neural network model M. Ghiassi a,∗ ,...

494KB Sizes 116 Downloads 190 Views

Electric Power Systems Research 76 (2006) 302–316

Medium term system load forecasting with a dynamic artificial neural network model M. Ghiassi a,∗ , David K. Zimbra b , H. Saidane c a

Operations & Management Information Systems, Santa Clara University, Santa Clara, CA 95053, USA b Ernst & Young LLP, San Jose, CA 95110, USA c Data Mining Consultant, San Diego, CA 92128, USA Received 2 August 2004; accepted 23 June 2005 Available online 6 October 2005

Abstract This paper presents the development of a dynamic artificial neural network model (DAN2) for medium term electrical load forecasting (MTLF). Accurate MTLF provides utilities information to better plan power generation expansion (or purchase), schedule maintenance activities, perform system improvements, negotiate forward contracts and develop cost efficient fuel purchasing strategies. We present a yearly model that uses past monthly system loads to forecast future electrical demands. We also show that the inclusion of weather information improves load forecasting accuracy. Such models, however, require accurate weather forecasts, which are often difficult to obtain. Therefore, we have developed an alternative: seasonal models that provide excellent fit and forecasts without reliance upon weather variables. All models are validated using actual system load data from the Taiwan Power Company. Both the yearly and seasonal models produce mean absolute percent error (MAPE) values below 1%, demonstrating the effectiveness of DAN2 in forecasting medium term loads. Finally, we compare our results with those of multiple linear regressions (MLR), ARIMA and a traditional neural network model. © 2005 Elsevier B.V. All rights reserved. Keywords: Medium term load forecasting; Artificial neural networks; Dynamic neural networks; Neural network applications

1. Introduction Many researchers have studied power system load forecasting. The majority of this research has primarily focused on short term load forecasting (STLF). The goal in STLF is to predict the future hourly and daily loads for an electrical power generation and distribution system. Some studies extend the forecast horizon to as long as a week. Approaches to STLF have ranged from linear regression models [1], to various ARIMA configurations [2] and artificial neural networks (ANN) [3–8]. The ANN modeling approach, in particular, has seen wide application and acceptance in the past decade. The popularity of this approach can be attributed to the non-linear nature of electrical load over time, and the availability of adequate data in this field of study. Authors have used a number of different



Corresponding author. Tel.: +1 408 554 4687; fax: +1 408 554 5157. E-mail addresses: [email protected] (M. Ghiassi), [email protected] (D.K. Zimbra), [email protected] (H. Saidane). 0378-7796/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.epsr.2005.06.010

ANN architectures to solve STLF problems with feed forward back propagation (FFBP) algorithms. Regardless of the choice of modeling approach, the overall objective in such studies is accurate load forecasting, to determine the optimal utilization of generators and power stations [5]. See ref. [6] for a recent survey of these applications. Medium term load forecasting (MTLF: 1 month to 1 or 2 years) provides useful information for power system planning and operations, and offers significant benefits for firms operating in a regulated or deregulated energy industry. An example of a regulated firm that could benefit from MTLF is the Taiwan Power Company (Taipower), a state-owned, integrated power generation, transmission and distribution company [9]. For Taipower and similar firms, MTLF information can provide an index for regional and national energy consumption and growth [10], and assist in medium and long term energy planning [11]. Medium term load forecasts can also be used to schedule and coordinate maintenance across an integrated system, effectively negotiate fuel purchases for power generation, maximize the utilization of intermittent resources such as wind energy and coordinate

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

production across a network of generators. Typically, major decisions regarding long term power system development require longer term forecasts, such as the construction of a power plant which requires 2 or more years to complete [12]. However, MTLF can provide pertinent information to guide the development of other infrastructure elements that require a shorter timeframe for completion, such as the improvement of the transmission grid. Transmission grid congestion is a significant issue for distribution systems, and can greatly impact the overall system efficiency and cost of energy for the consumer [13]. Generally, in a regulated industry MTLF can be used to optimize energy production and transmission, and improve overall system reliability. Many of the benefits gained in a regulated energy industry through accurate MTLF can also be made in a deregulated energy industry. For example, the issue of transmission congestion impacts all energy distribution systems regardless of regulation, and in a deregulated energy industry transmission and distribution firms are similarly affected. MTLF information can be effectively used by these deregulated firms to guide the improvement of their transmission grid to better serve their customers. The State of California provides an example of a deregulated energy industry with firms that could benefit significantly from MTLF. In competitive markets like California, where energy is traded, the accurate forecast of monthly, quarterly and yearly energy demands can provide an advantage in negotiations, and assist in the development of medium term generation, transmission and distribution contracts. The authors of ref. [14] found that power generators exercising wholesale market power contributed significantly to the problems following California’s energy market deregulation. The risk of such a scenario occurring could be significantly mitigated through the use of medium term contracts between firms, such as the contract between Duke Energy (a generator) and Pacific Gas & Electric (a distributor) [14]. These forward contracts are negotiated more effectively with accurate MTLF by generators, transmitters and distributors. Forward contracting also provides additional benefits to deregulated energy industry firms. The guaranteed revenue stream provided by a medium term contract eliminates the fiscal uncertainty that accompanies daily or weekly power negotiations, allowing generators and distributors the funding for their respective system improvements and operations. A predetermined production agreement also provides generators with the ability to advantageously negotiate and schedule fuel purchases, production cycles and maintenance activities. Distributors also benefit from medium term contracts through improved coordination and management of their energy supply base and transmission network. The economic impact of accurate load forecasting is significant, and more pronounced in a deregulated energy market or an environment of high demand growth, like those of developing nations or rapidly expanding economies [15,16]. Inaccurate forecasts may result in either inadequate supply that could negatively impact the economic growth of a developing region, or oversupply that would result in utility cost overruns that might ultimately be transferred to consumers. Accurate medium and

303

long term load forecasting techniques are required in order to avoid such costly errors [15]. The accuracy of existing STLF and MTLF methods is often measured by the mean absolute percent error (MAPE) metric, with results typically ranging from 3 to 12% [1,8,16–18]. Clearly, the economic significance of improved accuracy is scenario dependent. For example, ref. [18] reports that a 1% reduction of error in load forecasting can save the British power system up to 10 million pounds per year (in 1984). The deregulation of energy markets and subsequent increase in the trade of electricity as a commodity has further emphasized the importance of accuracy in prediction. Medium and long term load forecasting has not been studied as extensively as STLF. Literature on MTLF categorizes the methodologies for modeling MTLF into two general groups. The first group focuses on economic analysis, management and long term planning and forecasting of energy load and energy policies (referred to as the conditional modeling approach). Researchers in this group have noticed that the socioeconomic condition of some regions may rapidly change, and thus impact energy demands. The fast growing economies, migration or religious events such as pilgrimage periods are examples of such scenarios. Medium term load forecasting of such regions require inclusion of additional variables to represent these changes. Researchers often use economic indicators (GNP, CPI, exchange rates, average wages, etc.) and/or electrical infrastructure measures (number of connections, appliance saturation measure, etc.) in addition to information on historical load data and weather related variables to forecast future energy demands. The second approach (referred to as the autonomous modeling approach), primarily uses past monthly loads and weather information to forecast future electricity demand. This modeling approach is more suited for stable economies. However, the choice of weather variables (temperature measures, humidity factors, cooling degree days, duration of bright sunshine, wind speed, etc.) still depends on the regional weather characteristics [15–17,19–23]. Some of the researchers from group one [17,21] have developed models to define a set of optimal economic conditions for energy production. The authors in ref. [21], for example, have developed linear and mixed integer programming models to minimize total production costs of power generation for a region while satisfying a set of economic, physical and environmental constraints. Their solution defines monthly generation requirements and a schedule for self-production, exchange contracts and spot market business for participating power plants. The authors noted that a complete problem formulation resulted in a non-linear model that impeded solvability. Certain assumptions were required to transform the problem and facilitate its solution through linear and mixed integer programming. Other researchers in this group have developed models to forecast energy demand of fast growing regions [15,16,20,23]. They have incorporated the “number of electrical connections at the end of each month” or “summer air-conditioning demand” variables as a measure of economic development for capturing the load demand of such regions. Finally, the authors in ref. [20] have included macroeconomic indicators, such as the consumer price

304

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

index (CPI), the average salary earning (ASE) and the currency exchange rate (CER), in their MTLF analysis. The second group of researchers has used the autonomous approach for modeling MTLF. Researchers in this group have included weather and certain cyclic events in the modeling and forecasting of medium term loads [19,22]. In ref. [19], the authors present a MTLF model that incorporates various weather related parameters (temperature and humidity factors) in addition to historical load profiles. Various modeling tools (ARIMA, ANN and FNN) are employed to forecast future load demand. These studies have used the historical load profile for the region to forecast future load demands. They have noticed that the load profile was not stationary and statistical pre-processing (autocorrelation and partial autocorrelation analysis) was needed for further analysis. For some studies [19], an additional variable (a time index measure that assigns higher value to the most recent observation) had to be introduced to further filter the load profile data. In ref. [22], the authors use a neural net model that includes temperature profiles to estimate medium term power demand. The 1-year forecasts in this model were leveraged in the development of management strategies and infrastructure adjustments. The authors have shown that their neural net formulation outperformed existing models that were based on linear and non-linear regressions. Finally, the authors in refs. [15,20] study the difficulties of applying standard MTLF methods to fast developing regions. Both studies conclude that special models are needed to address the specific circumstances of such regions. In ref. [15], the authors have developed a knowledgebased expert system (ES) that uses a set of decision rules for forecasting demand of fast growing utilities. This ES can identify key variables (electrical and non-electrical) as well as the most suitable load-forecasting model. Almost all methods described above have used weather information in their formulations for medium term load analysis and forecasting. The reliance on accurate weather information is a major challenge in all MTLF approaches. In comparison, when STLF modelers include weather information in their analysis they only need forecasted weather values for the next few hours, or at most next few days. MTLF models that rely on weather information require forecasted weather values for up to 1 or 2 years ahead, which is a significantly more challenging problem. Most modelers, therefore, use forecasted weather values that are provided by weather bureaus. Similarly, when using a conditional modeling approach with macroeconomics parameters, modelers have to obtain accurate forecasted values of these exogenous variables. When considering inclusion of either weather or economic factors in the model, researchers have used statistical measures (often correlation coefficients) and personal experience and intuition to assess the validity, effectiveness and contribution of such variables to energy and load forecasting. In MTLF, weather can play a large role. Medium term load forecasters often rely upon weather forecasters to obtain future weather information. Values provided by weather forecasters for such periods, still may lack accuracy. We present results that signify improved accuracy when weather information is included in MTLF modeling. We also present a model that does

not require explicit inclusion of weather variables, but produces very accurate results. We have developed a dynamic artificial neural network architecture for medium term load forecasting. The model is validated using true real-life data. Its effectiveness when compared to alternate and traditional approaches is also presented. We have used historical system information from the Taiwan Power Company to demonstrate the validity and accuracy of our forecasting method. The remaining sections of this paper are organized as follows. Section 2 presents an overview of our approach to neural network modeling. Section 3 discusses MTLF and applies our model to actual data from the Taiwan Power Company. We also evaluate the performance of our model relative to other approaches and present a comparative analysis in this section. Finally, Section 4 is a summary of our conclusions. 2. The DAN2 model for load forecasting The traditional feed forward back propagation (FFBP) neural network architecture is comprised of an input layer, one or more hidden layers and an output layer. The choice of architecture is problem dependent and often requires experimentation before a final architecture is selected for training and ultimately forecasting. We have developed a neural network model (DAN2: a dynamic architecture for artificial neural networks) that employs a different architecture than the traditional FFBP model. A detailed description of the architecture and its properties are fully presented in ref. [24]. Here, we first briefly present this new architecture and then identify its major differences from the FFBP model. The general philosophy of the DAN2 model is based upon the principle of learning and accumulating knowledge at each layer, propagating and adjusting this knowledge forward to the next layer, and repeating these steps until the desired network performance criteria are reached. Therefore, we classify DAN2 as a purely feed-forward model. As in classical neural networks, the DAN2 architecture is comprised of an input layer, hidden layers and an output layer. The input layer accepts external data to the model. Once the input nodes have been identified, all observations are used simultaneously to train the network. To guard against the problem of differing dimensional metrics, we scale the data using a normalized approach. The overall DAN2 architecture is depicted in Fig. 1. The next modeling decision in the selection of the model’s architecture is the choice of the number of hidden layers and hidden nodes. In DAN2, unlike classical neural nets, the number of hidden layers is not fixed a priori. They are sequentially and dynamically generated until a level of performance accuracy is reached. Additionally, the proposed approach uses a fixed number of hidden nodes (four) in each hidden layer. This structure is not arbitrary, but justified by the estimation approach. At each hidden layer, the network is trained using all observations in the training set simultaneously, so as to minimize a stated training accuracy measure such as a MAPE or MSE value. As shown in Fig. 1, each hidden layer is comprised of four nodes. The

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

305

Fig. 1. The DAN2 network architecture.

first node is the bias or constant (e.g. 1) input node, referred to as the C node. The second node is a function that encapsulates the “current accumulated knowledge element” during the previous training step, and is referred to as a “CAKE” node. The third and fourth nodes represent the current residual (remaining) non-linear component of the process via a transfer function of a weighted and normalized sum of the input variables. Such nodes represent the “current residual non-linear element” and will be referred to as the “CURNOLE” nodes. In Fig. 1, the “I” node represents the input, the “C” nodes are the constant nodes, the “Gk ” and “Hk ” nodes represent CURNOLE nodes and the “Fk ” nodes are the CAKE nodes. The final CAKE node represents the dependent variable or the output. The DAN2 architecture uses a connected network that follows the many-to-one principle. At each layer, the previous four nodes (C, Gk , Hk and Fk−1 ) are used as the input to produce the next output value (Fk ). The parameters on the arcs leading to the output nodes (ak , bk , ck , dk ), represent the weights of each input in the computation of the output for the next layer. The parameter connecting the CURNOLE nodes, µk , is used as part of the argument for the CURNOLE nodes and reflects the relative contribution of each input vector to the final output values at each layer. Determination of the final model constitutes the training process and requires computation of these parameters at each layer in order to reach the stated accuracy measure(s). The training process begins with a special layer where the CAKE node captures the linear component of the input data. Thus, its input (content) is a linear combination (weighted sum) of the input variables and a constant input node. These weights are easily obtainable through classical linear regression. If the desired level of accuracy is reached, we can conclude that the relationship is linear and the training process stops. For nonlinear relations additional hidden layers are required. At each subsequent layer the input to the CAKE node is a weighted sum (linear combination) of the previous layer’s CAKE, CURNOLE and C nodes. For each hidden layer, the parameters in the CURNOLE nodes and their respective weights are determined

to capture as much of the remaining (non-linear) part of the underlying process as possible. Throughout training, the CAKE nodes carry an adequate portion of learning achieved in previous layers forward. This process ensures that the performance or knowledge gained so far is adjusted and improved but not lost. This property of DAN2 introduces knowledge memorization to the model. In ref. [24], we show that DAN2 algorithm ensures that during network training, the residual error is reduced in every iteration and the accumulated knowledge is monotonically increased. The learning process improves model fit and stops when one of the stopping criteria, defined in the following sections, is reached. Although the CURNOLE node parameters are determined through a non-linear optimization, again, the weights are easily obtained through simple linear regression using three independent (one CAKE and two CURNOLE) variables. The training process includes a linear and a non-linear component. The linear component of the input data is captured in the first CAKE node using OLS, the algorithm next transforms the input data set to model the non-linearity of the process in subsequent iterations. DAN2 uses a vector projection approach to perform data transformation. The transformation process defines a reference vector R = {rj ; j = 1, 2, . . ., m}, where m represents the number of attributes of the observation records, and projects each observation record onto this vector to normalize the data as discussed in ref. [24]. This normalization defines an angle, ␣i , between record i and the reference vector R (Fig. 2). DAN2 uses the set of ␣i , s to train the network, and updates their values in every iteration. In ref. [24], we show that this normalization can be represented by the trigonometric function cos(µk ␣i + θ k ). In every hidden layer k of the architecture, we vary (µk αi + θ k ) and measure the impact of this change on the output value. The modification of the angle (µk αi + θ k ) is equivalent to rotating, µk , and shifting, θ k , the reference vector, thus changing the impact of the projected input vectors and their contribution to the output for that iteration (Fig. 2). The modification of the angle between the reference vector R and observation vectors Xi , and the result-

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

306

Thus, the output of this model is represented by the linear combination of the constant, CAKE and CURNOLE nodes. Eq. (1) represents the functional form of this relationship at iteration (layer) k: Fk (Xi ) = ak + bk−1 Fk (Xi ) + ck Gk (Xi ) + dk Hk (Xi )

Fig. 2. The observation and reference vectors.

ing contribution of the observation vectors to network training is a directed search in a non-linear space. In ref. [24], we offer several optimization algorithms that are used to determine the parameters of this non-linear problem in order to reduce residual error and thus improve network training in every iteration. In DAN2, using a trigonometric transfer function to capture a non-linear process is analogous to using a “generalized” Fourier series for function estimation, which has shown to be an effective kernel for approximating non-linear functions [25]. The generalized form of the Fourier series uses many terms (similar to harmonics). The number of terms in the series depends on the complexity and the degree of non-linearity of the process. Similar to the Fourier series, in every layer DAN2 introduces a cos(µk αi + θ k ) function to capture the remaining non-linearity. In this function, αi represents the transformed and normalized input variable, µk represents a constant multiplier and θ k represents a constant shift. This formulation uses two (non-linear) parameters, µk and θ k . The use of the latter can be avoided through the expansion of the cosine function in the form: A cos(µk αi ) + B sin(µk αi ) We use this functional form as the transfer function in our model. The two CURNOLE nodes in Fig. 1 represent this formulation, and are used to reduce the number of non-linear parameters from two to one, making the estimation of the resulting non-linear formulation easier. At any given hidden layer k, if the cos(µk ␣i + θ k ) terms captured in previous layers do not adequately express the non-linear behavior of the process, a new layer with an additional set of nodes is automatically generated, including a new cos(µk αi + θ k ) term. This process is analogous to how the Fourier series adds new terms to improve function approximation. Therefore, the number of layers in DAN2’s architecture is dynamically defined and depends upon the complexity of the underlying process and the desired level of accuracy.

(1)

where Xi represents the n independent input records, Fk (Xi ) represents the output value at layer k, Gk (Xi ) = cos(µk αi ) and Hk (Xi ) = sin(µk αi ) represent the transferred non-linear components and ak , bk , ck , dk and µk are parameter values at iteration k. The training process in the FFBP model systematically updates the parameters until the network is “fully” trained: a set of stopping conditions is met. The DAN2 model also follows the same principle by updating its five parameters in every iteration to satisfy some pre-defined stopping criteria. DAN2, however, uses the entire data set to estimate the weights and the network’s parameters in every iteration. The entire algorithm, the process and its properties are fully discussed in ref. [24]. We next highlight DAN2’s major differences from the traditional FFBP models. This architecture differs from the traditional multi-layer ANN model in a number of ways. The first difference is in the utilization of the input records. Traditionally, input records are processed one at a time. DAN2 uses the entire set of records simultaneously and repeatedly at every layer. This use of the entire input data set allows each input to effect the model individually as well as collectively. The impact of the interaction among input elements is, therefore, better represented in DAN2. This global view of the process provides a training environment that ensures monotonically increasing learning. The second difference is in the choice of the transfer function. DAN2 uses the trigonometric cosine function instead of the traditional sigmoid function to capture the non-linearity of the process under consideration. Estimation of the non-linear component is partitioned into successive layers. At each layer the transfer function introduces one non-linear parameter only, which is optimally or experimentally determined. This partitioning reduces solution complexity for each layer, while maintaining the acquired knowledge from previous layers. The third difference is in the structure and number of hidden layers. Traditional ANN models use a multilayer network with one or more hidden layers. The number of hidden layers and hidden nodes (in each layer) are problem dependent and often require extensive trial and error to find a suitable model. Architecture selection in FFBP has been the subject of many studies and has been extensively reported in the literature [26]. Although, there are guidelines on the number of hidden layers for FFBP architectures, there are still few guidelines on the number of hidden nodes. The lack of a clear set of guidelines for architecture selection makes final model selection an experimental task that often becomes time consuming and inaccurate for problems with smaller input data sets. In our model, the number of hidden nodes is fixed at four. The model determines the number of hidden layers dynamically to achieve specified performance criteria.

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

Traditional methods require the modeler to define and determine the right input variables, correct number of layers and the optimal number of hidden nodes in each layer. Generation and selection of the best model, when there are no theoretical models, has always been a challenge. Experimentation often becomes the preferred method for model selection. DAN2 reduces the model parameter selection to only one decision, the input set, which allows researchers the flexibility to examine a larger set of input combinations and allowing the model to configure the remaining parameters optimally. In ref. [27], we show that even for established benchmarks, we utilized this feature of DAN2 to identify new models that ultimately performed better than the best model found in the literature. Once inputs are selected, DAN2 minimizes a measure of network performance (such as SSE, MSE or MAPE) by introducing hidden layers dynamically. Additionally, in DAN2 final model selection also uses the stopping rules discussed later in this section, which are applied to avoid model under or over-fitting. We believe, and our experiments show [27], that this attribute of our model offers a marked advantage when the data set lacks an established theoretical model that could guide researchers in selecting the right model. Finally, network connectivity in traditional ANN uses manyto-many relationships among interior nodes. This connectivity requires many arcs, which can result in a complex architecture. DAN2’s architecture reduces this complexity by allowing manyto-one relationships only. Since each layer uses four nodes and has fewer arcs, the computational requirements at each layer are reduced. In ref. [27], we have compared DAN2 with traditional FFBP and recurrent neural network (RNN) models. The comparison spans both theoretical and computational perspectives using six established benchmark data sets from the literature. Performance of DAN2 against these models as well as non-neural net alternatives such as MLR and ARIMA is also presented. Our study shows that DAN2 outperforms all other alternatives and produces more accurate training and forecasting results in every case [27]. 2.1. DAN2 parameter estimation The initial step in training the DAN2 model captures the linear component of the process. This step uses OLS to estimate the parameters of the linear model. If the desired level of accuracy is reached, the training terminates. Otherwise, the model generates additional layers to capture the non-linear component of the process. Similar to traditional neural network models, DAN2 at each subsequent layer attempts to capture the remaining nonlinearity by minimizing a measure of total error as represented below:  2 SSEk = [Fk (Xi ) − F ∧ (Xi )] i

Substituting Fk (Xi ) from (1) results in:  SSEk = [ak + bk Fk−1 (Xi ) + ck cos(µk αi ) i

+ dk sin(µk αi ) − F ∧ (Xi )]

2

(2)

307

where F∧ (Xi ) are the observed output values. Minimizing (2) requires the estimation of five parameters. This formulation is linear in the parameter set Ak , where Ak = {ak , bk , ck , dk } and nonlinear in parameter µk . We use a layer-centered training process, and decompose the solution into a linear and a non-linear stage to increase the efficiency of solving (minimizing) the highly non-linear equation (2). The entire training data set is collectively used in both the linear and non-linear stages. In the linear stage, a simple ordinary least squares (OLS) is used to estimate the parameter set Ak for a given value of µk . The second stage uses the estimated values of Ak and searches in the non-linear space of µk for the optimal value of µk that minimizes SSEk . The optimal value of µk alters the product (µk αi ) and changes the contribution of input vector Xi to the output through the cos(µk αi ) and sin(µk αi ) components. Once the best value of µk for layer k is determined, stopping rules are examined. If the stopping criteria are met, training terminates; otherwise, a new layer is introduced and the process repeats. In ref. [24], we present several non-linear optimization strategies to estimate the nonlinear parameter µk . We also show that following this approach, at each layer the knowledge gained is monotonically increased, total error is reduced and the network training improves. Finally, the process continues until a user specified accuracy measure is reached. DAN2 introduces additional stopping criteria to ensure that model under or over-fitting is avoided. 2.2. DAN2 model selection, training and stopping criteria An important challenge in any ANN modeling approach is the selection of a suitable model for process representation. Modelers preliminarily examine the process and its data, and use available theoretical information to hypothesize one or more models to capture the process behavior. When established theoretical relationships exist, model selection becomes a guided search. However, when no validated theoretical model is present, modelers often hypothesize a series of ANN models with different numbers of hidden layers and/or hidden nodes, using the in-sample (training) data for model selection and validation. Several approaches have been presented to evaluate model selection. These include regularization, pruning and early stopping [28,29]. Recently, statistical strategies have been introduced for model selection [30,31]. These strategies include hypothesis testing, use of various information criteria metrics, cross validation and heuristic tools. Model selection is often experimental in nature. There are two basic approaches to ANN model building and selection. The “additive” or “constructive” modeling begins with a minimal network with a single hidden layer and iteratively adds hidden neurons, using measurement tools to assess effectiveness of the resulting model. Alternatively, model selection can start with a very large model and use pruning algorithms to reduce model size. Neural network model selection based on this approach is labeled “subtractive” models. The additive or constructive approach has been recently favored by refs. [31,32] and is the approach used in DAN2 architecture as well. In traditional constructive ANN modeling, one single hidden neuron is added at a time and the effect is assessed. In DAN2, we add

308

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

one hidden layer at a time and assess its effectiveness. One of the hidden nodes of the hidden layer in DAN2 carries the previous layer’s knowledge forward. This property adds knowledge memorization to the DAN2 model selection process. The strategy of progressively adding one hidden node/layer and assessing its effect is shown to minimize the chance of the network over fitting [31]. The assessment process of model selection has been a controversial topic for researchers. The authors in ref. [30] propose three fundamental strategies: sequential hypothesis testing, various information criteria testing and cross validation evaluation. The authors have used a Monte Carlo simulation approach to evaluate each of these strategies. They use the MSE statistic as a measure of model selection and conclude that for their set of problems, they favor the “sequential hypothesis testing” over the information criteria approach, with cross validation also producing good results for some examples. Some researchers [33–35], have used cross validation for their model selection and evaluation while others have shown that cross validation has poor performance for model selection [36]. Finally, the early stopping criteria is also favored and disputed by researchers. This rule monitors network error of the validation data set to fit the most prominent features of the data set first. The challenge is the identification of the “prominent feature” list and how to avoid over fitting. No clear guideline for model selection has yet emerged in literature that applies to all cases. The model selection strategy, thus, continues to be challenging, problem specific and requiring extensive validation by the process data. Traditionally, available process data is divided into two partitions: the in-sample (training) and out-of-sample (testing) data sets. During the model selection process, the in-sample data is used only, and may be further partitioned into two sets: the training and validation data sets. The training and validation data sets are used for model fitting and to assess the validity of the various candidate models. Model performance criteria are used to calibrate and compare the various hypothesized models using both the training and validation data sets to select the best model. Similar to other ANN modeling approaches, model selection in DAN2 benefits from experimentation. Trials are often designed to facilitate and evaluate input set selection, choice of model architecture and final model training and calibration. These efforts can be divided into two groups: input selection and architecture selection. 2.2.1. Input selection This process is common among all modeling approaches. We use statistical data pre-processing tools and measures such as data plotting, autocorrelation and partial autocorrelation to assess validity of including an input variable. Some data transformation may also be used to pre-process the data for introduction to the model. Guidelines for statistical data pre-processing are discussed in refs. [37,38]. 2.2.2. Architecture selection Model selection in DAN2 is a “constructive” or “additive” process in which a hidden layer is progressively added until either a pre-defined measure of accuracy, an over fitting metric

or a maximum number of iterations is reached. The constructive algorithm of DAN2 satisfies the two requirements that are necessary for constructive models to be effective: universal approximation and convergence principles [32]. DAN2 uses the Fourier family of kernel functions to approximate and capture residual errors in every iteration. This function has been shown to be an effective universal approximator [25]. In refs. [24], we have shown DAN2’s convergence properties. The challenge in DAN2, however, as with all ANN modeling approaches, is to train the model while avoiding under/over fitting. Under this condition, modelers need to determine when to stop the model training process. Since DAN2 attempts to reach user specified accuracy measures, additional stopping criteria must be employed to address under or over fitting concerns. Under fitting often occurs when sufficient data for model training is lacking and thus the training process is prematurely shortened. Under fitting can be detected by applying the under fitting stopping rule (ε1 ) described below. Over fitting occurs when a neural network model introduces too many terms (layers or nodes) to describe the underlying process. To avoid over fitting during the model selection process, we use a simplified “cross validation” approach. We first use the plot of the data to identify any seasonality that might be present. We then use a representative fraction of the in-sample data (10–20%) for validation and to compute the over fitting metric ε2 (defined below) after every iteration. Various strategies can be used to select the validation set. The strategies may range from selecting the last 10 to 20% of the in-sample data set, or for cyclic processes, at least one full cycle of the data or a randomly selected set of records from the in-sample data set. If model training stops prematurely, the network is considered to be “under-trained” or “under-fitted.” An under-trained model often has high SSE values for either or both the training and validation data sets. Under-training often occurs when there are insufficient data for model fitting. We use ε1 = (SSEk−1 − SSEk )/SSEk−1 ≤ ε∗1 to assess existence of under-training in our models at iteration k. A common requirement of all ANN models is the availability of adequate data in order to avoid the under fitting problem. Over-training or over-fitting is a more common problem in neural net modeling. A neural net model is considered overfitted (over-trained) when the network fits the in-sample data well but produces poor out-of-sample forecasts. To avoid overfitting, at each iteration k (k > 1), we compute MSE values for both the training (MSET ) and validation (MSEv ) data sets. We use ε2 = |MSET − MSEv |/MSET ≤ ε∗2 to guard against over fitting. The model is considered fully trained when the user specified accuracy criteria and the over fitting constraint are both satisfied. When a user specifies a relatively small value for the accuracy measure, it may be possible to reach the over fitting criterion ε2 before reaching the desired level of accuracy. The modeler then needs to re-examine and revise the desired level of accuracy in order to avoid over fitting. Other researchers [30,31], have also used variations of the over-fitting metric for model selection and evaluation. For a class of problems, such as in electrical load forecasting, a desired level of accuracy (often a MAPE value) can be pre-

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

specified. Achieving such targets, for training and/or forecasting, can also be used as a stopping rule. The accuracy levels ε∗1 and ε∗2 are problem dependent and are often determined experimentally. Finally, DAN2 offers the total number of iterations as a stopping rule for cases that do not follow the previous scenarios. For most applications a DAN2 model only requires a few minutes on a personal computer to train and calibrate. This efficiency of computation allows for fast evaluation of the candidate models. 3. Monthly forecasting models for Taipower We have used DAN2 to model and forecast monthly, quarterly and yearly load (medium term) for the Taiwan Power Company (Taipower). The power company has an installed capacity of 32,000 MW, a peak load average of 27,000 MW and an average load of 19,000 MW [9]. The power production includes hydro (14.1%), nuclear (16.1%) and thermal (69.8%) units. The sample data used in this study included the monthly values of total power sales (MWH) for the company from 1982 to 1997. We use data from 1982 to 1996 (180 data points) to develop and train the model and then forecast the next 12 months to test its effectiveness. As stated earlier (Section 1) the number of observations in a MTLF analysis is often limited. However, the development and training of ANN models is always data intensive. Some researchers in MTLF have only used available monthly data over a few years, while others have introduced exogenous variables (mostly economic indicators), or selective days of the month to develop their models [20]. Approaches that increase the number of variables, and subsequently the number of weights in the model, will inevitably add to model complexity. When sufficient data is lacking, these approaches will reduce model stability and reliability. The challenge in MTLF, thus, is to select the “prominent features” of the model that capture process characteristics and variability. This search often relies on trial and error to select both the input set and the final model architecture. We have used these guidelines in our model selection strategy; our goal was to select a model with the fewest, but most relevant, number of input variables that can achieve a high accuracy (low error as measured by MAPE) value for both training and testing data. We, therefore, first examined modeling methodologies (autonomous versus conditional modeling approaches), and then used statistical pre-processing for the final input selection. We examined Taiwan economic development indicators and have found them to be stable. Therefore, the addition of an economic indicator to the load model would not add significant value. We, then, followed the autonomous approach to develop a medium term load forecasting model. The input variables used in this study include monthly load demands and weather information. Researchers [15,16,19,20] have considered a number of weather factors, including various temperature, humidity and wind measures. The final choice of the relevant weather factors for any model depends on weather characteristics of the region, while also considering seasonality factors, which for some regions have shown to result in a large difference between winter and summer demands [8].

309

Using weather variables to analyze power systems clearly adds to the complexity of any modeling technique employed. This complexity is three-fold: first, the inclusion of exogenous variables always adds complication to any model. Most ANN models are sensitive to the number of input variables, so inclusion of weather variables exacerbates the problem by increasing the number of parameters in the model. The second factor is the existence of a non-linear relationship between power consumption and weather related variables [8]. This non-linearity makes ANN methods particularly appealing since literature has established it as an effective means of modeling non-linear relationships [26]. Finally, the third challenge in using weather related variables in load forecasting is the reliance upon accurate weather forecasting. Although medium term weather forecasting has improved, it is insufficient in terms of accuracy and availability. Thus, forecasting weather for timely inclusion in power forecasting models becomes an additional challenge. Researchers have used a variety of approaches to overcome this problem [2,5,8]. In ref. [8], the authors show that a weather ensemble prediction approach can impact medium term load forecasting. Our models validate that direct or indirect inclusion of weather information improves modeling accuracy in both the training (in-sample) and the forecasting (out-of-sample) data sets. In most regions, dedicated centers exist for the purpose of weather prediction. The U.S. National Center for Environmental Prediction (NCEP) and the European Center for Medium-Range Weather Forecasts (ECMWF) are two examples. In the training portion of the load forecasting models, known historic weather data is provided. For the forecasting periods, on the other hand, predicted values are required. We have studied the Taiwan weather information and have tested the impact of temperature, humidity and wind factors on power system load by using correlation coefficient analysis to evaluate each factor’s impact. Our approach favors effective models that require fewer input parameters. Some studies, on the other hand [20], have introduced multiple factors (up to 50 input variables) in their MTLF model. Training and validation of such models will require large data sets that are typically not available. We have found, and our tests verify, that inclusion of a sole weather variable (such as monthly cooling degree days) is sufficient for capturing and representing the impact of weather in forecasting Taiwan load demand. In our model, the monthly cooling degree days (CDD) value was used as an exogenous variable to represent weather impact. Similarly, CDD data from 1982 to 1996 were used in training the models. When forecasting the load with weather variables, the actual CDD data for 1997 were utilized. The monthly CDD values (in Celsius) are defined as:     ◦ monthly CDDj = (|Th − 28 C|) (3) i

h

i

where h is the hour of the day, Th the temperature at hour h, i the day of the month and j = 1, 2, . . ., 12. The Taiwan Power Company is currently using a MLR model for medium term load forecasting. This model uses both load data and weather information. Load data measures total elec-

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

310 Table 1 Results for DAN2 yearly models and MLR Model number

Lags load

Lags CDD

Train MAPE

Fore-1 MAPE

Fore-3 MAPE

Fore-6 MAPE

Fore-12 MAPE

1 2 3 4 5 6 7 8 9 10

1, . . ., 12 1, . . ., 12 1, . . ., 12 1, 11, 12 1, 11, 12 1, 7, 12 1, 7, 12 1, 2, 12 1, 2, 11, 12 1, 7, 11, 12

1, . . ., 12 1, . . ., 12 NA 1, 11, 12 NA 1, 7, 12 NA 1, 7, 12 1, 7, 12 1, 7, 11, 12

2.855 0.998 0.800 1.139 1.022 1.230 0.967 1.439 1.198 0.830

2.670 0.518 0.702 0.343 0.725 0.074 0.118 0.773 1.202 2.235

3.098 0.897 1.584 1.722 2.238 0.149 0.463 0.601 0.694 1.271

3.277 1.028 1.410 1.226 1.343 0.728 1.031 0.708 0.723 1.138

2.366 0.982 1.378 0.951 1.220 0.809 1.021 0.958 0.894 0.822

trical consumption for Taiwan. The model also uses monthly CDD as an exogenous variable to account for weather variability. Thus, the MLR model is expressed through 12 monthly load and 12 monthly CDD variables. To establish a benchmark for measuring the accuracy of various DAN2 models, we have used the SAS MLR to fit a linear model to this data [39]. The results show a linear fit with R2 = 0.987. We have used this fitted model to forecast monthly loads for the next 12 months. The MAPE values for the training period, 3-, 6- and 12-month forecasts are also computed (Table 1, model number 1). We will use these results as a baseline to evaluate DAN2’s performance. We will next discuss, how DAN2 is used to model the same data, forecast and measure its accuracy. To determine the neural net models, we first plotted the load and CDD data separately. The plotted data suggested the existence of an autoregressive pattern for both sets of data. The data also showed non-stationarity in both the mean and the variance. To analyze this data, similar to the ARIMA process, a number of different models are hypothesized and validated in order to select a set of lagged variables that best represent the existing, time-dependent patterns. Similar to other studies [16,19], we used the autocorrelation and partial autocorrelation values, for both the load and CDD, to assess the impact of including lagged variables in the models. This analysis identified a subset of monthly load and CDD variables that best represent the nature of each trend. Finally, we used this knowledge to hypothesize several alternative models and measure their effectiveness for forecasting load, using the MAPE statistic as the selection criterion. The general form of these models present load as a function of previous loads and CDD values as detailed in (4): Lt = f (Lt−1 , Lt−2 , . . . , Lt−k , CDDt−1 , CDDt−2 , . . . , CDDt−m )

(4)

where Lt represents the load for month t and CDDt represents the monthly cooling degree-days. In ref. [40], the authors emphasize that an important factor in the success of forecasting is “the skill of the analyst in the selection of a suitable ANN model.” Traditional ANN model selection requires the determination of three parameters: the input set, the number of hidden layers and the number of hidden nodes for each hidden layer. The DAN2 method reduces this search to one dimension: the input set. This improvement has allowed us to

experiment with multiple sets of input parameters that include both load and the CDD values. We next present these models and compare each with the traditional MLR model. In Section 3.4, we compare the performance of DAN2 to that of traditional FFBP models. In Section 3.5, comparisons with ARIMA are presented. 3.1. DAN2 model selection for load forecasting The first DAN2 model uses 24 variables (12 monthly loads and the 12 associated monthly CDD values) to compare directly with MLR. Results show that DAN2 performs noticeably better. The MLR model MAPE values for training, 3-, 6- and 12-month forecasts were 2.855, 3.098, 3.277 and 2.366, respectively (Table 1, model number 1). The corresponding results for the DAN2 24-variable model were 0.998, 0.897, 1.028 and 0.982 (Table 1, model number 2), an improvement of 65.1, 71.1, 68.6 and 58.5% for the training and forecasting periods. Although these results are very good and show that DAN2 outperforms the comparable MLR model, we introduce alternative DAN2 models that improve accuracy even further. Tables 1–4 also report forecasted MAPE values for 1 month. The performance of DAN2 for this single month is often very good; however, we have not included these values in our analysis since comparison based on a single observation is not statistically reliable. The remaining models are introduced to evaluate the role of the weather information in MTLF, and choose the best lagged parameters for this set of data. 3.2. The yearly load forecasting model To analyze the role of weather information in MTLF, we first applied DAN2 to load data alone (12 variables) and forecasted monthly loads for the 12 months of 1997. The results of this model are presented in Table 1 (model number 3). Although the training MAPE value is lower than the corresponding model 2, all forecasting MAPE values are worse, validating that the inclusion of weather related information improves forecasting accuracy. Including the entire 24 variable set (12 load and 12 CDD variables) in the forecasting model does not necessarily produce the best results. Selective inclusion of an appropriate set of input variables can improve performance while reducing model noise

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

311

Table 2 Results for DAN2 seasonal models Model number

Lags load

Lags CDD

Train MAPE

Fore-1 MAPE

Fore-3 MAPE

Fore-6 MAPE

Fore-12 MAPE

1 2 3 4 5

1, . . ., 6 1, . . ., 6 1, . . ., 6 1, . . ., 6 1, . . ., 12

1, . . ., 6 NA 1, . . ., 6 NA NA

0.478 0.824 0.915 0.626 0.725

0.158 0.765 0.002 0.003 0.764

0.128 0.541 0.169 1.089 0.540

0.547 0.559 0.612 0.626 0.847

NA NA NA NA 0.592

Table 3 Results for Clementine models Model number

Lags load

Lags CDD

Train MAPE

Fore-1 MAPE

Fore-3 MAPE

Fore-6 MAPE

Fore-12 MAPE

Clem options

Yearly Low-season Hi-season

1, 7, 12 1, . . ., 6 1, . . ., 6

1, 7, 12 NA NA

2.873 3.298 2.885

6.376 2.406 0.617

4.851 1.846 1.364

4.355 1.476 2.311

3.978 NA NA

Prune Prune Prune

entire period. Obtaining forecasting MAPE values of less than 1% for medium term electrical load forecasting shows the effectiveness of DAN2 modeling capabilities for load forecasting. We experimented with other models that have produced results comparable with model number 6, by introducing additional relevant variables. These three models (numbered 8–10, Table 1) have also produced MAPE values below 1%. Model number 8 uses a different set of load months for lag values (1, 2 and 12), than for CDD values (1, 7 and 12). Although the training MAPE value for this model is 1.439, the forecasted values are all below 1%. We then increased the number of input parameters from 6 to 7 and 8, for model numbers 9 and 10. Model number 9 uses lagged load parameters (1, 2, 11 and 12) and lagged CDD parameters (1, 7 and 12). This model produced very good MAPE values for all periods. Model number 10 uses lagged values (1, 7, 11 and 12) for both load and CDD. This model produced very low MAPE values for both the training (0.830) and yearly (0.822) periods but not as good of MAPE values for the 3- and 6-month period. Table 1 shows the results for these models, which are very close to our best model (model number 6). Overall, we prefer model number 6 since it uses fewer input parameters.

and complexity. Therefore, we used the autocorrelation and partial autocorrelation information from each of the load and CDD data sets to identify candidates for the lagged input values. Two sets of lagged values were found to be highly representative of the trend. For the load and CDD values, the two lag sets were (1, 7, 12) and (1, 11, 12). We then used these sets of lagged values to construct our models. We first trained four different models to examine if the inclusion of CDD values had a positive impact on model performance. Two of the models included six inputs (three load and three CDD lagged values) and the other two used only the three load input. Results are presented in Table 1 (models numbered 4–7). All four models produce good results. The results from experimentation with these models verify that the inclusion of CDD information improves forecasting accuracy. For example, model number 4 (Table 1) uses lagged values of (1, 11 and 12) for both the load and CDD variables. The results from this model show forecasted values that are better than the corresponding values from model number 5 (Table 1), which excludes weather information. The (1, 7, 12) set of lagged values also produced excellent results, and has arrived at the same conclusion. Again, we developed two models for this set of lags, one with and one without the CDD parameters. The results for these models are very good, presented in Table 1 (models 6 and 7). Model number 6, which includes both load and CDD parameters, clearly outperforms model 7, which does not include CDD data. The best model thus far is this model (model number 6) with MAPE values of 1.230, 0.149, 0.728 and 0.809 for the training, 3-, 6- and 12-month forecasts, respectively. This is an excellent fit that has produced very good forecast values. The improvements over the MLR model are 56.9, 95.2, 77.8 and 65.8% for the training, 3-, 6- and 12-month forecasts, respectively. Fig. 3 shows the actual versus forecasted values for the

3.3. The seasonal load forecasting model Many electrical distribution regions face seasonal demand variations. We observed this behavior in the Taiwan Power system as well. The load data shows two distinct seasons that we labeled as the high and the low seasons. The high season includes the months of May–October, and the low season is represented by the months of November through April. Each season is characterized by a distinct statistical behavior. We have, therefore,

Table 4 Results for ARIMA models Model number

ARIMA model

Train MAPE

Fore-1 MAPE

Fore-3 MAPE

Fore-6 MAPE

Fore-12 MAPE

Yearly Low-season Hi-season

ARIMA (0, 1, 1) (0, 1, 1) ARIMA (5, 1, 0) ARIMA (5, 1, 0)

3.37 4.03 2.89

2.91 2.62 1.48

2.55 1.87 1.31

1.80 2.53 1.08

1.88 NA NA

312

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

Fig. 3. Actual vs. forecasted loads for the yearly model.

developed two separate neural net models, one for each season. It can be postulated that the role of weather is now implicitly and indirectly modeled in each season. We will show that inclusion of CDD data in the seasonal models does not improve forecasting accuracy significantly. This allows us to remove CDD variables from the model thus reducing model complexity and, more importantly, reliance on forecasted weather information. Once again, for each season, we developed two models: one with and one without the CDD information. Finally, for each season, we only need to forecast 6 months ahead and then combine the results of the two models to obtain the yearly forecast. Table 2 presents the results of the four seasonal models. Models number 1 and 2 represent the low season and models number 3 and 4 the high season. Models number 1 and 3 include the CDD information while models number 2 and 4 are based solely on the load data. The results show excellent fit for each model surpassing the best DAN2 yearly model. These results also show that inclusion of CDD information does not improve the longer term (6 months) forecasting accuracy (as measured by the MAPE values) significantly. Although the results show a slight difference for the training and the 3 months MAPE values, they are all below 1%. This minor improvement does not warrant the added model complexity that results from the inclusion of weather information. Finally, this observation eliminates the need to obtain forecasted values for CDD or other weather related data, which is especially significant since studies have shown that weather forecast errors can seriously impact load-forecasting accuracy [2,41]. Figs. 4 and 5 depict the actual and forecasted load values and demonstrate the strength of the seasonal models. The forecasted MAPE values (for 6 months, without CDD) of 0.559 and 0.626 for the low and high seasons, respectively, are lower than the best yearly model. Thus, we recommend development of seasonal models for medium term load forecasting.

Finally, in order to compare the seasonal model with the yearly models, we have integrated the two seasonal models and have computed the MAPE based on the actual and the corresponding forecasted values. The MAPE values for the integrated seasonal model are 0.725, 0.540, 0.847 and 0.592 for the training, 3-, 6- and 12-month forecasts, respectively. Table 2 (model number 5) presents these results with the corresponding improvements of 74.6, 82.6, 74.2 and 75.0% over the base yearly model (the MLR model). The integrated model’s MAPE values show a good and consistent fit throughout the training and forecasting periods. In evaluating neural net model fitness such consistencies imply a proper fit, without over/under-fitting conditions. We have applied the same principle to avoid under/overfitting in all of the DAN2 models. Fig. 6 shows the plot of the actual versus forecasted values for the integrated seasonal model. 3.4. Comparing DAN2 models with a traditional ANN model As mentioned earlier, the authors have compared and contrasted the effectiveness of DAN2 against a traditional, FFBPbased, neural network system for time series events [27]. Here, we compare DAN2’s performance in forecasting medium term electrical loads to the Clementine neural network system from SPSS [42]. We have applied the same input values for our best yearly model (Table 1, model number 6) and the two seasonal models (Table 2, models number 2 and 4) to the Clementine neural net system. The Clementine software offers a number of configurations for model fitting. We experimented with various configurations and found the “pruning” option of the system to produce the best results. This option uses pruning algorithms to optimize network architecture and model fit. Table 3 presents these results. Results from the yearly model, with three load and three CDD input variables (lag variables of 1, 7 and 12)

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

313

Fig. 4. Actual vs. forecasted loads for the seasonal model (low).

show reasonably consistent MAPE values for both the training (2.873) and 12 months forecasting (3.978) periods. DAN2’s yearly model performance (Table 1, model number 6), against the Clementine values show improvements of 134% for training and over 391% for 12 months forecasting. Results from the seasonal model using only load variables (six inputs for each of the low and high seasonal models) also show significant improvements. The low (high) season 3- and 6-month forecasted MAPE values are 1.846 (1.364) and 1.476 (2.311), respectively (Table 3). DAN2 equivalent models’ forecasted MAPE values for 3 and 6 months are 0.541 (1.089) and 0.559 (0.626), respectively (Table 2, model numbers 2 (low) and 4 (high)). These values show improvement of 241% (25%) and 164% (269%)

for the 3 and 6 month’s forecasted MAPE values for the low (high) seasons, respectively. These results show that DAN2 clearly outperforms Clementine for both the yearly and seasonal models. 3.5. Comparing DAN2 models with traditional ARIMA models The ARIMA modeling approach has been extensively used to analyze time series events in general and load forecasting in particular. We have compared and contrasted the performance of DAN2 with ARIMA for a number of classical benchmark data sets in ref. [27]. Our results show that DAN2 consistently

Fig. 5. Actual vs. forecasted loads for the seasonal model (high).

314

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

Fig. 6. Actual vs. forecasted loads for the integrated seasonal model.

outperformed ARIMA for these data sets. Here, we compare DAN2’s modeling and forecasting capabilities against the SPSS Trends ARIMA software package [43] and compare the two methods using the exact same load data set from the Taiwan Power Company. We present results for the ARIMA models that correspond to the best DAN2 yearly (Table 1, model number 6) and the two seasonal models (Table 2, models number 2 and 4). For the yearly model, we used the same set of lagged variables (1, 7 and 12) for the ARIMA model. We note that ARIMA modeling does not allow inclusion of exogenous variables in its standard form, and therefore we have not included the CDD in ARIMA modeling. The best ARIMA solution for the yearly model was ARIMA (0, 1, 1) (0, 1, 1). Table 4 presents the results for this model. The training, 3-, 6- and 12-month forecast MAPE values are 3.37, 2.55, 1.80 and 1.88, respectively. DAN2’s best yearly model performance (Table 1, model number 6), against the ARIMA values shows improvements of 174% for training, and 147 and 132% for 6- and 12-month forecasting, respectively. Even if we compare ARIMA with the DAN2 model that only uses load lag variables 1, 7 and 12 (Table 1, model number 7), the DAN2 results still show improvements of 248% for the training, and 75 and 84% for 6 and 12 months forecasting MAPE values, respectively. Table 4 also presents results of the ARIMA model for the corresponding seasonal models. We again used the exact same data sets in order to directly compare DAN2 results with those of ARIMA. For the seasonal models, the best ARIMA models were found to be ARIMA (5, 1, 0). The MAPE values for these models are reported in Table 4. For the “low-season” model, the MAPE values for training, 3 and 6 months forecasts are 4.03, 1.87 and 2.53%, respectively. The corresponding DAN2 model (Table 2, model number 2) once again outperforms these values by 389% for training, and 246 and 353% for 3 and 6 months forecasting, respectively. Similarly, the ARIMA (5, 1, 0) MAPE

values for the “high-season” are 2.89% for training, and 1.31 and 1.08% for 3 and 6 months forecasting, respectively (Table 4). The corresponding DAN2 model (Table 2, model number 4) MAPE values are 0.626 for training, and 1.089 and 0.626 for 3 and 6 months forecasting, respectively. DAN2 outperforms the ARIMA (5, 1, 0) model by 362% for training, and 20 and 73% for 3 and 6 months forecasting. In general, this comparison is consistent with the earlier comparative study of DAN2 and ARIMA as reported in ref. [27]. DAN2 has consistently produced better fit for the training data and more accurate forecast values as measured by MAPE (and/or other) statistics. 4. Conclusions Accurate medium term load forecasting allows utilities to plan power generation expansion (or purchase), schedule maintenance activities, negotiate forward contracts and develop cost efficient fuel purchasing strategies. We presented a dynamic artificial neural network (DAN2) system for medium term load modeling and forecasting. This model offers an alternative architecture to existing ANNs, and is shown to be more effective in load forecasting. We found that when yearly models are used, the inclusion of weather information improves forecasting accuracy. However, this improvement is reliant upon weather forecasts that may not be accurate or available. To avoid model reliance on these inaccurate forecasts, we developed seasonal models that did not require weather information. We validated all models with actual data from the Taiwan Power Company. The results show that DAN2 models outperform MLR, ARIMA and an FFBP-based neural network system. Both the yearly and seasonal DAN2 models provided forecasting accuracies above 99% (MAPE values below 1%). We introduced

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

seasonal models that do not require explicit weather information, yet produced excellent fit and forecasts. Finally, the seasonal models outperformed all yearly models. Acknowledgements We would like to thank the Taiwan Power Company for providing the monthly electric load data and weather information for this research. We would also like to thank Michael T. Byer for his contributions to this paper. References [1] R. Ramanathan, R. Engle, C.W.J. Granger, F. Vahid-Araghi, C. Brace, Short-run forecasts of electricity loads and peaks, Int. J. Forecasting 13 (1997) 161–174. [2] H.T. Yang, C.M. Haung, C.L. Huang, Identification of ARIMAX model for short term load forecasting: an evolutionary programming approach, IEEE Trans. Power Syst. 11 (1) (1996) 403–408. [3] A.G. Bakirtzis, V. Petridis, S.J. Klartzis, M.C. Alexiadis, A neural network short-term load forecasting model for the Greek power system, IEEE Trans. Power Syst. 11 (2) (1996) 858–863. [4] S.T. Chen, D.C. Yu, A.R. Moghaddamjo, Weather sensitive short-term load forecasting using nonfully connected artificial neural network, IEEE Trans. Power Syst. 7 (3) (1992) 1098–1105. [5] T.W.S. Chow, C.T. Leung, Neural network based short-term load forecasting using weather compensation, IEEE Trans. Power Syst. 11 (4) (1996) 1736–1742. [6] H.S. Hippert, C.E. Pedreira, R.C. Souza, Neural networks for short-term load forecasting: a review and evaluation, IEEE Trans. Power Syst. 16 (1) (2001) 44–55. [7] K.L. Ho, Y.Y. Hsu, C.C. Yang, Short term load forecasting using a multilayer neural network with an adaptive learning algorithm, IEEE Trans. Power Syst. 7 (1) (1992) 141–149. [8] J.W. Taylor, R. Buizza, Neural network load forecasting with weather ensemble predictions, IEEE Trans. Power Syst. 17 (3) (2002) 626–632. [9] Taiwan Power Company, website: http://www.taipower.com.tw. [10] P.F. Pai, W.C. Hong, Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms, Electric Power Syst. Res. 74 (2005) 417–425. [11] H.M. Al-Hamadi, S.A. Soliman, Long-term/mid-term electric load forecasting based on short-term correlation and annual growth, Electric Power Syst. Res. 74 (2005) 353–361. [12] H.R. Varian, California Must Control Demand For Power While Supply Slowly Catches Up, New York Times, January 11, 2001. [13] W.W. Hogan, Restructuring The Electricity Market: Institutions for Network Systems, Harvard University, April 1999. [14] P.L. Joskow, E. Kahn, A quantitative analysis of pricing behavior in California’s wholesale electricity market during summer 2000, Energy J. 23 (4) (2000) 1–35. [15] M.S. Kandil, S.M. El-Debeiky, N.E. Hasanien, Long-term load forecasting for fast developing utility using a knowledge-based expert system, IEEE Trans. Power Syst. 17 (2) (2002) 491–496. [16] S.M. Islam, M. Al-Alawi, K.A. Ellithy, Forecasting monthly electric load and energy for a fast growing utility using an artificial neural network, Electric Power Syst. Res. 34 (1995) 1–9. [17] D.K. Ranaweera, G.G. Karady, R.G. Farmer, Economic impact analysis of load forecasting, IEEE Trans. Power Syst. 12 (3) (1997) 1388–1392. [18] D.W. Bunn, E.D. Fanner, Review of Short-Term Forecasting Methods in the Electric Power Industry, Comparative Models in Electrical Load Forecasting, John Wiley & Sons, New York, 1985, pp. 13–30. [19] M.M. Elkateb, K. Solaiman, Y. Al-Turki, A comparative study of medium-weather-dependent load forecasting using enhanced artificial/fuzzy neural network and statistical techniques, Neurocomputing 23 (1998) 3–13.

315

[20] M. Gavrilas, I. Ciutea, C. Tanasa, Medium-term load forecasting with artificial neural network models, IEEE Conf. Elec. Dist. Pub. 482 (2001) 383. [21] A. Bart, M. Benahmad, R. Cherkaoui, G. Pitteloud, A. Germond, Long-term energy management optimization according to different types of transactions, IEEE Trans. Power Syst. 13 (3) (1998) 804– 809. [22] E. Doveh, P. Feigin, L. Hyams, Experience with FNN models for medium term power demand predictions, IEEE Trans. Power Syst. 14 (2) (1999) 538–546. [23] E.H. Barakat, S.A. Al-Rashed, Long range peak demand forecasting under condition of high growth, IEEE Trans. Power Syst. 7 (4) (1992) 1483–1486. [24] M. Ghiassi, H. Saidane, A dynamic architecture for artificial neural networks, Neurocomputing 63 (2005) 397–413. [25] P. Bloomfield, Fourier Analysis of Time Series: An Introduction, John Wiley & Sons, New York, 1976. [26] G. Zhang, E.B. Patuwo, M.Y. Hu, Forecasting with artificial neural network: the state of the art, Int. J. Forecasting 14 (1998) 35–62. [27] M. Ghiassi, H. Saidane, D.K. Zimbra, A dynamic artificial neural network model for forecasting time series events, Int. J. Forecasting 21 (2005) 341–362. [28] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995. [29] R. Reed, Pruning algorithms—a review, IEEE Trans. Neural Netw. 4 (1993) 740–747. [30] U. Anders, O. Korn, Model selection in neural networks, Neural Netw. 12 (1999) 309–323. [31] I. Rivals, L. Personnaz, Neural-network construction and selection in nonlinear modeling, IEEE Trans. Neural Netw. 14 (4) (2003) 804–819. [32] T.Y. Kwok, D.Y. Yeung, Constructive algorithms for structure learning in feedforward neural networks for regression problems, IEEE Trans. Neural Netw. 8 (3) (1997) 630–645. [33] S.I. Amari, N. Murata, K.R. Muller, M. Finke, H.H. Yang, Asymptotic statistical theory of overtraining and cross-validation, IEEE Trans. Neural Netw. 8 (5) (1997) 985–996. [34] M.Y. Hu, G.P. Zhang, C.X. Jiang, B.E. Patuwo, A cross-validation analysis of neural network out-of-sample performance in exchange rate forecasting, Decis. Sci. 30 (1) (1999) 197–216. [35] R.J. Thieme, M. Song, R. Calantone, Artificial neural network decision support systems for new product development project selection, J. Marketing Res. 37 (2000) 499–507. [36] I. Rivals, L. Personnaz, On cross validation for model selection, Neural Comput. 11 (1999) 863–870. [37] J.S. Armstrong, Principles of Forecasting: A Handbook for Researchers and Practitioners, Kluwer Academic Publishers, 2001. [38] S.G. Makridakis, S.C. Wheelwright, R.J. Hyndman, Forecasting: Methods and Applications, third ed., John Wiley & Sons, New York, 1998. [39] The SAS System 8e for MS-Windows, Release 8.02, SAS Institute Inc., 2001. [40] J. Faraway, C. Chatfield, Time series forecasting with neural networks: a comparative study using the airline data, Appl. Stat. 47 (2) (1998) 231–250. [41] A.P. Douglas, A.M. Breipohl, F.N. Lee, R. Adapa, The impact of temperature forecast uncertainty on Baysian load forecasting, IEEE Trans. Power Syst. 13 (4) (1998) 1507–1513. [42] The Clementine Data Mining System, SPSS Inc., 1998. [43] SPSS Trends, v13.0, SPSS Inc., 2004. Manoochehr Ghiassi is the MSIS director, Breetwor fellow and professor of information systems in the Operations and Management Information Systems Department at Santa Clara University. He holds a Ph.D. in industrial engineering and an M.S. in computer engineering from the University of Illinois at Urbana-Champaign, an M.S. in economics from Southern Illinois University at Carbondale and a B.S. in mathematical economics from Tehran University. His current research interests include artificial neural network, software engineering, software testing and simulation modeling. He is a member of IEEE and ACM.

316

M. Ghiassi et al. / Electric Power Systems Research 76 (2006) 302–316

David K. Zimbra is currently a Technology and Security Risk Services staff with Ernst & Young LLP. He holds a B.S. in operations and management information systems from Santa Clara University. His current research interests are in artificial neural network theory and application. Hassine Saidane is currently an independent data mining consultant, researcher and an adjunct faculty at National University in San Diego, CA.

He has over 20 years of business experience in forecasting and quantitative data analysis for decision supports at AT&T, Lucent Technologies and NCR’s Data Mining Lab. He holds a B.S. in physics from University of Tunis, Tunisia, an M.S. in managerial accounting and a Ph.D. in industrial engineering both from the University of Illinois at Urbana-Champaign. His current research interests are in machine learning, forecasting and data mining algorithms.