Energy Conversion and Management 49 (2008) 2165–2177
Contents lists available at ScienceDirect
Energy Conversion and Management journal homepage: www.elsevier.com/locate/enconman
Improved estimation of electricity demand function by integration of fuzzy system and data mining approach A. Azadeh a,*, M. Saberi a,b, S.F. Ghaderi a, A. Gitiforouz a, V. Ebrahimipour a a Department of Industrial Engineering, Center of Excellence for Intelligent Based Experimental Mechanics, Research Institute of Energy Management and Planning, College of Engineering, University of Tehran, P.O. Box 11365-4563, Iran b Department of Industrial Engineering, Tafresh University of Technology, Tafresh, Iran
a r t i c l e
i n f o
Article history: Received 14 August 2006 Received in revised form 28 June 2007 Accepted 20 February 2008 Available online 12 May 2008 Keywords: Fuzzy system Data mining Forecasting Preprocessing Time series Electricity consumption
a b s t r a c t This study presents an integrated fuzzy system, data mining and time series framework to estimate and predict electricity demand for seasonal and monthly changes in electricity consumption especially in developing countries such as China and Iran with non-stationary data. Furthermore, it is difficult to model uncertain behavior of energy consumption with only conventional fuzzy system or time series and the integrated algorithm could be an ideal substitute for such cases. To construct fuzzy systems, a rule base is needed. Because a rule base is not available, for the case of demand function, look up table which is one of the extracting rule methods is used to extract the rule base. This system is defined as FLT. Also, decision tree method which is a data mining approach is similarly utilized to extract the rule base. This system is defined as FDM. Preferred time series model is selected from linear (ARMA) and nonlinear model. For this, after selecting preferred ARMA model, McLeod–Li test is applied to determine nonlinearity condition. When, nonlinearity condition is satisfied, preferred nonlinear model is selected and compare with preferred ARMA model and finally one of this is selected as time series model. At last, ANOVA is used for selecting preferred model from fuzzy models and time series model. Also, the impact of data preprocessing and postprocessing on the fuzzy system performance is considered by the algorithm. In addition, another unique feature of the proposed algorithm is utilization of autocorrelation function (ACF) to define input variables, whereas conventional methods which use trial and error method. Monthly electricity consumption of Iran from 1995 to 2005 is considered as the case of this study. The MAPE estimation of genetic algorithm (GA), artificial neural network (ANN) versus the proposed algorithm shows the appropriateness of the proposed algorithm. Ó 2008 Elsevier Ltd. All rights reserved.
1. Introduction Electricity, as a resource of energy, with its ever growing role in world economy, and its multi-purpose application in production and consumption has gained special attention. Through the development of societies and growth of economical activities, electricity becomes more effective on corporations and their services. Corporations use electricity as a production factor. Also, families directly or indirectly rely on electricity. Thus energy consumption determines their and the society’s economical welfare. Recently, intelligent methods are used in energy field vastly1 [1–16]. Models in these and other studies split to main models. Regression based model [1,2,4–6,8,9,11,15,16,62–64] and time series based model [3,7,10,14,29–46,65–67,71]. Present study uses time series based model to study the impact of preprocessing on fuzzy system perfor* Corresponding author. E-mail address:
[email protected] (A. Azadeh). 1 As there are numerous papers in mentioned field, only Energy Conversion and Management (ECM) papers are cited. 0196-8904/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.enconman.2008.02.021
mance. Fuzzy logic is one of intelligent methods. A major topic applicable in this criterion is the estimation of the electricity consumption, which reveals the consumption growth in the forthcoming years. Because our observations and data in this section are usually imprecise, ambiguous or insufficient, a qualitative method capable of explaining these issues must be incorporated [17]. Vagueness and ambiguity are some of problems found in relationships between inputs and output of real world system, and these can be tackled effectively by utilizing treatment of fuzzy logic [18]. Fuzzy logic is a discipline that has been successful in automated reasoning of expert systems [19,47]. Since Zadeh introduced fuzzy set theory in 1965 [20], there has been a growing interest in the application of fuzzy set theory to several scientific or technological areas like control, decision making, economics, robotics, manufacturing and forecasting [18,21–23]. Of utmost importance in this grand, are the study and the formation of fuzzy systems. Several viewpoints for understanding fuzzy systems have been proposed. Zeng and Singh [24,25] dealt with fuzzy systems (a special case of them) as function approximators and proved that they can approximate any real function with any desired accuracy using fuzzy basis functions
2166
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
[6,40–46]. Therefore, an integrated algorithm is needed to overcome the shortcomings of previous studies. The integrated model is discussed in the next section. The algorithm uses four fuzzy models and two time series model. The most suitable fuzzy model is selected with aid of ANOVA, and then compared with the time series model with the aid of ANOVA test. Minimum absolute percentage error (MAPE) is also used to compare the models with actual data and estimate relative errors of the models.
[26,27]. Although fuzzy systems belong to the class of model-free estimators [28] (like the neural networks that came before them), they give a clear view of the real system or the real process which they simulate and they achieve that in linguistic terms. The proposed integrated algorithm of this paper integrates conventional time series with fuzzy systems and data mining tools for forecasting with non-stationary data. Moreover, it provides good estimates of electrical consumption. The preprocessing and postprocessing approaches are used to achieve the main objective. Exploring the literature reveals that combination of traditional concepts with fuzzy system utilizing preprocessing and postprocessing to model time is a new approach [29–39]. Although data preprocessing concept is considered in some literatures, but the covariance stationary concept in data preprocessing is ignored
2. The integrated algorithm The algorithm is shown in Fig. 1. The fuzzy system with preprocessed data (FW) and fuzzy system model without prepro-
Step1 Collect data set in all available previous periods
Is the process stationary?
No
Determine the most suitable preprocessing method for the actual data
Yes
Step 2
Divided data into training and test data sets
Step 3 Run FW, FWO models
Run appropriated ARMA models Post process the results of model
Step 4
Step 5
Select the preferred ARMA model by Box-pierce Qstatistic test, AIC, SBC and defined as L1 . Run ANOVA for comparison of the Fuzzy and Time Series with actual data
Select the preferred nonlinear model and called NL1 and Defined as Time Series
Set L1 as preferred time series model and called Time Series
Is null hypothesis rejected in ANOVA test?
DUNCAN test
Use MAPE errors
Select the preferred model from Fuzzy and Time Series
Fig. 1. The integrated fuzzy algorithm with non-stationary data.
McLeod-LI Test
Yes
No
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
cessed data (FWO) are considered to determine the impact of preprocessing on fuzzy system. FLTW and FDMW are belonging to FW model. Also FLTWO and FDMWO are belonging to FWO model. The preferred time series model for the data set is also identified. This model is belonging to ARMA models if McLeod–Li test do not show nonlinearity condition and else belong to nonlinear model. Finally, a model for estimating is selected among the preferred fuzzy system model and the preferred conventional time series model. This algorithm has the following general basic steps: Step 1: Collect data set in all available previous periods. Then the stationary assumption should be studied for current data. If the data are not covariance stationary, the most suitable preprocessing method should be selected and applied to the model. Step 2: Divide data into two sets, one for estimating the models called the train data set and the other one for evaluating the validity of the estimated model called test data set. Usually train data set contains 70–90% of all data and remaining data are used for test data set [29]. Step 3: Run and estimate all models. Input variables for fuzzy system model can be selected using Autocorrelation function (ACF). However, in most heuristic methods, selecting input variables is experimental or based on the trial and error method [30–37,40–42]. As there is no specified proposed method for selecting input variables in fuzzy system, ACF approach is proposed to select input variables for both fuzzy system models. Importance of this approach is understood when difficulty and careless of trial and error method are considered. Irregular input selection is cause of its lack of preciseness. Even if all the previous lag combinations are used, the trial and error method will be timeconsuming. For example, if all the combinations are selected from the recent 12 lag, the number of combination will be: 12 X 12 i¼1
i
¼ 212
ð1Þ
While ACF approach introduces few combinations for model input in comparison with trial and error process. As well, for time series model preferred ARMA model is selected. Input variables can be selected using autocorrelation function (ACF) and partial autocorrelation function (PACF). By using result of this model, McLeod–Li test is applied. The result of this test shows that nonlinear time series must be construct or not. Step 4: Post-process the estimated data in the models which the data were preprocessed. Step 5: The reliability of each model is evaluated in this step. ANOVA test is used to compare the models. The preferred ARMA model is selected from plausible model by Box– Pierce Q-statistic test, Akaike information criterion (AIC), Schwartz Bayesian criterion (SBC) and is called L1. The nonlinearity of process is determined by McLeod–Li test. The result of L1 is used in McLeod–Li test. If this test shows the nonlinearity process, the plausible nonlinear model is constructed and best nonlinear model is called NL1 and defined as Time Series. Else, L1 is defined as Time Series. Finally, ANOVA test show the preferred model form Time Series and Fuzzy models. The main elements of the proposed algorithm are described next.
2167
2.1. Fuzzy system 2.1.1. Basic of fuzzy set theory 2.1.1.1. Introduction. Fuzzy set theory was first introduced in 1965 by Zadeh [20]. It may be regarded both as a generalization of classical set theory and as a generalization of dual logic. A classical set can be regarded as a grouping together of elements, all of which have at least one common characteristic. If an element possesses this characteristic, it belongs to the set. If an element does not possess this characteristic, it does not belong to the set. In fuzzy set theory, the set is no longer restricted to this binary (yes/no) definition of set membership, but rather allows a graduated definition of membership. This means that a degree of membership to a set can be specified for each element. This set is then referred to as a fuzzy set in following sections, classical sets are indicated by capital letters, while fuzzy sets are identified by capital letters bearing a tilde (). 2.1.1.2. Fuzzy sets. For X a crisp (classical) set of objects which is to e 2 defined as be analyzed by fuzzy means, then A e ¼ fðx; l ðxÞÞ; x 2 Xg A e A
ð2Þ
where le represents the membership function and le ðxÞ represents A A e the degree of membership of element x to the fuzzy set A. 2.1.1.3. Membership function. A membership function (MF) is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1. Triangular, trapezoidal and Gaussian membership functions are defined as below and are showed in Fig. 2: 8 0; xa > > > < xa ; a x b ð3Þ triðx; a; b; cÞ ¼ ba cx > ; bxc > > : cb 0; cx 8 0; xa > > > > > xa ; a x b > < ba ð4Þ trapðx; a; b; c; dÞ ¼ 1; b x c > > > dx ; c x d > > dc > : 0; dx ðxcÞ2 2r2
Gaussðx; r; cÞ ¼ e
ð5Þ
2.1.1.4. Linguistic variables. A linguistic variable is a variable whose values are not numbers (as in the case of a deterministic variable), but rather linguistic constructs (so-called terms). The contents of these terms are defined by fuzzy sets over a base variable. 2.1.2. Knowledge-based fuzzy systems In general, fuzzy system has four basic elements which are: Knowledge base (definition of the linguistic variables, terms and rules). Inference engine (analysis). Processing of the input variables (fuzzifying). Processing results (defuzzifying). Basic structure of a fuzzy system can be seen in Fig. 3. 2.1.2.1. Knowledge based. The knowledge base contains all the ‘‘knowledge” for the solution of the given problem. That is,
2
Represents a fuzzy set of X.
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
Degree of Membership
2168
1.2 1 0.8 0.6 0.4 0.2 0 a
b
c
Triangular Membership Function Parameters
1.2
Gaussian Membership function
Degree of Membership
(1) A triangular membership function
1 0.8 0.6 0.4 0.2 0
1.2 1 0.8 0.6 0.4 0.2 0
a
b
c
d
Trapezoidal Membership Function Parameters
(2) A trapezoidal membership function
c-3σ
c
c+3
Gaussian Membership Function parameters
(3) A Gaussian membership function
Fig. 2. The curve of membership functions.
F
I
Fuzzification
Inference
D Defuzzification
Knowledge Base
Table 1 The aggregation operators Aggregation operators
Definition
Minimum Maximum Algebraic product Algebraic sum
Minfle ðxÞ; le ðxÞg B A Maxfle ðxÞ; le ðxÞg B A le ðxÞ le ðxÞ B A le ðxÞ þ le ðxÞ le ðxÞ le ðxÞ A
B
A
B
Fig. 3. Basic structure of a fuzzy system.
The definition of the linguistic variables and their terms (fuzzy sets). As well as the (fuzzy) production rules. The linguistic variables and their terms form the building blocks for the production rules in the form of: If ðPremiseÞ Then ðConclusionÞ The premises (conditions) and the conclusions in each case consist of statements in the form of: ðVariableÞ Is ðTermÞ And are connected together with one another by the standard logic operator AND. Certain variables (namely input variables) are used only in the premise part of the rule. Others (output variables) are used exclusively in the conclusion part of the rule. The knowledge found in the knowledge base forms the basis of the actual decision process of a knowledge based fuzzy system, the fuzzifying, the inference and the defuzzifying. 2.1.2.2. Fuzzifying. Fuzzifying is the process by which numeric values of the input variables are transformed into memberships of the terms of the linguistic variables. For each term of an input variable, a membership value is given to the scalar x. Note that fuzzifying is not necessary for all variables in a knowledge based fuzzy system. 2.1.2.3. Inference engine. The process, with which conclusions are derived from existing facts and available knowledge, is called infer-
ence. The inference process of a knowledge based fuzzy system uses forward chaining. The given facts (i.e. the membership values of the input variable terms) are analyzed and new statements derived (i.e. the rule conclusion terms). The process is repeated for the new, fuller set of facts and their corresponding rules, until the membership values of the terms of the output variables are known. An inference step (the evaluation of a rule) consists of two steps: Aggregation Accumulation Aggregation: Aggregation is the calculation of the fulfillment of the whole rule, based on the fulfillments of the individual premises. This process generally corresponds to the logical AND operator of the individual premise expressions. Table 1 shows the most important aggregation operators. Accumulation: In knowledge based systems, often more than one rule leads to the same conclusion (e.g. fault is a short-circuit). While this represents no problem for classical crisp logic, this case needs to be handled separately in fuzzy systems. If the conclusion of a rule has a degree of fulfillment of 0.7, but 0.3 with another rule, then the different degrees of fulfillment need to be summarized in just the one. This is achieved by the process of accumulation, which corresponds to the unifying of individual results with the logical OR operator. Table 2 indicates shows the most important ones.
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177 Table 2 Accumulation operators Accumulation operators
Definition
Maximum Algebraic sum
Maxfle ðxÞ; le ðxÞg B A le ðxÞ þ le ðxÞ le ðxÞ le ðxÞ A
B
A
B
Table 3 Defuzzifying operators Defuzzifying operators Center of gravity Center average defuzzifier
Definition R xl ðxÞdx V x ¼ R eB l ðxÞdx V e B PM xw i¼1 i i x ¼ PM i¼1
wi
2.1.2.4. Defuzzifying. The results of the inference process of a knowledge based system must often be translated from fuzzy logic (membership of terms of the linguistic variables) into (crisp) values, in other words a concrete recommendation of the action to take. This process is called defuzzifying. Table 3 shows the most important ones. In this paper, fuzzy systems are used to estimate demand function. Lookup table and decision tree methods are used for extracting fuzzy systems’ rules which are explained below. 2.1.3. Rule extraction methods Human knowledge is categorized into conscious knowledge and unconscious knowledge. Conscious knowledge can be stated in the form of vivid words. But for unconscious knowledge, one method is vital in order to extract related rules. In following two methods are explained. 2.1.3.1. Look up table method. For the unconscious one, look up table can be used to extract the knowledge. Suppose that unconscious knowledge is in the form of input–output pairs. At its best circumstance, for each pair one rule can be extracted. The size of input and output of the fuzzy system is equal to the size of the input and output of the existing data. To extract the rule of ith-pair, for each input and output variable, a fuzzy set with a number of linguistic variables should be defined. For ith-input– output pair, ith-input value should have an intersection with the fuzzy set. The highest membership value shows that to which linguistic variable, ith-input value belongs. 2.1.3.2. Decision tree method. Decision trees have a tree-like structure. A tree consists of different leaves and branching points, the so-called nodes. Each leaf of the tree corresponds to a class and each node corresponds to a feature (attribute). The nodes of the tree are in turn connected by edges. Two edges leading to successive leaves/nodes go out from each branching point. An edge is always assigned to a value of the feature belonging to the node. The ‘‘entry” into the decision tree also begins at a node known as a root. Each leaf of the tree is a correspondence of the following type: ‘‘The object examined belongs to class c”, whereby c is the class assigned to the leaf. At each node (branching point) of the tree, an object is tested with regard to a feature: ‘‘Test object for feature A”, whereby A is the feature assigned to the node. There can be several leaves for one class. There can also be several nodes for one feature. [58]. Two basic types of decision tree are distinguished: Classification tree Regression tree The classification tree type is mainly used for problems with categorical target values and objects with categorical features
2169
which are to be classified. The regression tree type of method is used for problems with a continuous target value and its purpose is the processing of objects with numerical features. Two basic concepts of rule induction are: AQ training Divide and conquer AQ training is a method for rule induction based on a simple psychological model of training. Production rules (path from the root to a leaf) are generated in the form of IF-THEN rules. The IF part contains the description of an object. With the help of the THEN part, an object is assigned to a class. Divide and conquer goes a different way to AQ training. This method goes the opposite way from the ‘‘general to the special”. For this reason, this method is also termed ‘‘Top down induction of decision trees” (TDIDT). In the induction of decision trees, the following basic steps are carried out: 1. Statistical analysis of data records. 2. Evaluation via a measure (attribute/feature with best evaluation result is test attribute). 3. Division of data records according to the different values of the test attribute; then recursion. In the first step, a statistical analysis of the features (attributes) of the objects (data records) is made. In the second step, the features examined are evaluated using the so-called measure (for example information gain, Gini index, etc.). The feature with the best evaluation is selected as a test feature for the decision node. The measure is an important parameter of the induction method. Various different measures are available. In the third step, the data records are divided up into two groups. They are split according to the different values of the test feature. The method is then applied recursively to the already subdivided groups of data records. The recursion of the method is terminated if one of the three following criteria applies: 1. All objects (cases) of a subset belong to the same class. 2. No feature (attribute) causes the classification to improve. 3. No more features are available. In a decision tree, a path through the tree from the root to a leaf is known as a production rule. A production rule can be given in the form of an IF-THEN rule. The decisions in the IF part correspond to the decisions in the nodes of the tree, and the assignment of the object to a class is represented by the THEN part (corresponding with the assignment in a leaf). 2.2. Conventional time series models Time series models are quite well known to predict a variable behavior in the future by knowing its behavior in the past. One of the most famous time series models is autoregressive moving average (ARMA) model. The ARMA model belongs to a family of flexible linear time series models that can be used to model many different types of seasonal as well as non-seasonal time series. In the most popular multiplicative form, the ARMA model can be expressed as Up ðLÞyt ¼ hq ðLÞet with Up ðLÞ ¼ 1 U1 L Up Lp hq ðLÞ ¼ 1 h1 L hq Lq
ð6Þ
2170
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
Preprocessed consumption by MinMax normalization method
20000000
MW
15000000 10000000 5000000 0 1 14 27 40 53 66 79 92 105118 131
1.2 1 0.8 0.6 0.4 0.2 0 1 15 29 43 57 71 85 99 113 127
Months
Months
(b) The preprocessed data by Min-Max
4
Preprocessed consumption by Sigmoidal normalization method
Preprocessed consumption by Zscore Normalization method
(a) The trend of raw data
3 2 1 0
-1 1 15 29 43 57 71 85 99 113127 -2
1 0.5 0 1 15 29 43 57 71 85 99 113 127 -0.5 -1
Months
Months
(c) The preprocessed data by Zscore normalization method
(d) The preprocessed data by Sigmoidal normalization method
Fig. 4. The raw data and preprocessed data by the normalization methods.
where s is the seasonal length, L is the back shift operator defined by Lkyt = ytk and er is a sequence of white noises with zero mean and constant variance. Eq. (6) is often referred to as the ARMA(p, q) model. Box and Jenkins (1970) proposed a set of effective model building strategies for identification, estimation, diagnostic checking and forecasting of ARMA models [59]. In the identification stage, the sample autocorrelation function (ACF) (which shows the correlation between two variables). A slowly decaying autocorrelation function suggests non-stationary behavior. In such circumstances Box and Jenkins recommend differentiating the data. A common practice is to use a logarithmic transformation if the variance does not appear to be constant. After preprocessing, if needed, ACF and PACF3 of preprocessed data are examined to determine all plausible ARMA models. Some non-linear time series patterns were also developed mainly by Granger and Pristly [60].4 One of these nonlinear models is referred to as bilinear of which the first rank model of the bilinear model is as shown in the following equation: X t ¼ aX t1 þ bZ t þ cZ t1 X t1
ð7Þ
In which Zt is the stochastic procedure and a, b and c are the model parameters. It should be noted that only the last part of the above equation is non-linear. Another type of nonlinear models is the threshold autoregressive (TAR) models in which the parameters are dependent to the past values of the procedure. One example of such models is described by the following equation: ( ð1Þ a1 X t1 þ Z t1 if X t1 < d Xt ¼ ð8Þ ð2Þ a2 X t1 þ Z t1 if X t1 P d Furthermore, the proposed algorithm fits the best linear or nonlinear model to the data set. This is quite important because most studies assume that linear time series such as ARMA provide the best fit.
2.3. Data preprocessing In time series methods creating a covariance stationary5 process is one of the basic assumptions. Also, using preprocessed data is more useful in most heuristic methods [30] which requires the investigation of stationary assumption for the models. If the models are not covariance stationary, the most suitable preprocessed method should be defined and applied. In forecasting models, a preprocessing method should have the capability of transforming the preprocessed data in to its original scale (called postprocessing). Therefore, in time series forecasting method, appropriate preprocessing method should have two main properties. It should make the process stationary and must have the postprocessing capability. The most useful preprocessed methods are studied in the sections. 2.3.1. Normalization There are different normalization algorithms which are Min–Max normalization, Z-score normalization and sigmoid normalization. The Min–Max normalization scales the numbers in a data set to improve the accuracy of the subsequent numeric computations. Nayak et al. [40], Karunasinghe et al. [41], Tseng et al. [42], Oliveira et al. [43], Niska et al. [44] and Aznarte et al. [45] used this method in their articles to estimate time series functions using heuristic approach. If xold ; xmax ; xmin , are the original, maximum and minimum values of the raw data, respectively and x0max ; x0min are the maximum and minimum of the normalized data, respectively, then the normalization of xold called x0new can be obtained by the following transformation function: xold xmin ð9Þ ðx0max x0min Þ þ x0min x0new ¼ xmax xmin In Z-score normalization the data are changed so that their mean and variance are 0 and 1, respectively. The transformation function used for this method is as follows where std is the standard deviation of the raw data: xnew ¼
3
Partial auto-correlation function. 4 This book is an updated version of the ‘‘classic” Box and Jenkins [59]. It includes new material on intervention analysis, outlier detection, testing for until roots, and process control.
xold mean std
ð10Þ
5 By definition, an ARMA model is covariance stationary if it has a finite and timeinvariant mean and covariance.
2171
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
0.1
2000000 1000000 0 -1000000 1 16 31 46 61 76 91106 121 -2000000
Preprocessed consumption by difference of logarithm method
Preprocessed consumption by difference method
3000000
0.05 0 1 15 29 43 57 71 85 99113127 -0.05
-3000000
-0.1
Paired Comparison
Paired Comparison
(b) The preprocessed data by the first difference of logarithm method
(a) The preprocessed data by the first difference method
Fig. 5. The raw data and preprocessed data by the difference methods.
The sigmoidal normalization uses a sigmoid function to scale the data in the range of [1, 1]. The transformation function is as follows: 1 ea xnew ¼ 1 þ ea ð11Þ xold mean a¼ std 2.3.2. The first difference method The first step in the Box–Jenkins method is to transform the data so as to make it stationary. The difference method was proposed by Box–Jenkins [59]. Also Tseng et al. used this method in their article to estimation time series functions using heuristic approach [42]. The following transformation should be applied for the method: yt ¼ x1 xt1 ð12Þ However, for the first difference of the logarithm method the transformation is adjusted as follows:
The proposed algorithm is applied to 130 set of data which are the monthly consumption in Iran from April 1992 to February 2004. The proposed algorithm is applied to the data set as follows. 3.1. Step 1 It can be seen from Fig. 4a that raw data has a trend. As removing the trend is needed for more precise estimation in time series methods and also for studying the impact of preprocessing on fuzzy system models, all preprocessing methods are applied on both
2.5. Error estimation methods There are four basic error estimation methods which are listed below: Mean absolute error (MAE) Mean square error (MSE) Root mean square error (RMSE) Mean absolute percentage error (MAPE)
TheyP can be by the following equations, respectively: calculated n 0 t¼1 xt xt MAE ¼ n Pn 0 2 t¼1 ðxt x Þ MSE ¼ n sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn ð14Þ 0 2 t¼1 ðxt x Þ RMSE ¼ n Pn xt x0 t¼1 xt MAPE ¼ n
1.2 1 0.8 0.6 0.4 0.2 0
a
b
(1) A trapezoidal membership functionfor a
McLeod and Li (1983) proposed detecting nonlinearity in time series data [61]. The McLeod–Li test seeks to determine if there are significant autocorrelations in the squared residuals from a linear equation. To perform a test, autocorrelation coefficients in the Ljung–Box statistic are replaced by autocorrelation coefficients of the squared residuals. This statistic determines whether the squared residuals exhibit serial correlation. Ljung–Box statistic is discussed in Appendix II.
Degree of Membership
ð13Þ
2.4. The McLeod–Li test
3. The case study
1.2 1 0.8 0.6 0.4 0.2 0 b
max (x)
(2) A triangular membership functionfor x>b Degree of Membership
yt ¼ logðxt Þ logðxt1 Þ
All methods, except MAPE have scaled output. MAPE method is the most suitable method to estimate the relative error because input data used for the model estimation, preprocessed data and raw data have different scales [62–67].
1.2 1 0.8 0.6 0.4 0.2 0
min(x)
b
(3) A triangular membership functionfor x
2172
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
1) IF
Y(-12)<-0. 042 AND Y(-1)<-0. 035
THEN Y=-0 .04
2) IF
Y(-12)<-0. 042 AND Y(-1)>-0. 035
THEN Y=-0 .07
3) IF
-0.042
THEN Y=-0 .02
4)IF
-0.042
5) IF
-0.042
6) IF
-0.042
0 .009 THEN Y=-0 .006
7) IF
-0.015
THEN Y=-0 .002
8)IF
-0.011
THEN Y=0 .04
9)IF
-0.005
THEN Y=0 .003
10)IF
-0.005
11) IF
-0.005
12) IF
-0.005
THEN Y=0 .003
13) IF
0.008
THEN Y=0 .008
14) IF
-0.011
15) IF
0.0003
16) IF
0. 027
AND Y(-9)< -0 .043 THEN Y=0 .04
17) IF
0. 027
AND 0. 027
18) IF
0. 027
AND -0. 043
19) IF
0. 027
AND 0. 01
20)IF 0. 027
AND -0. 043
21) IF 0. 027
AND -0. 043
Fig. 7. Produced rules with decision tree.
Y(-12)<-0.042
Y(-12)<0.027 Y(-9)<0.
Y(-1)<-0.035 Y(-12)<-0.011 Y(-9)<0.041
Y(-9)<0.011 -0.04
Y(-9)<-0.
0.004
-0.07 Y(-1)<0. Y(-9)<0.007 Y(-12)<-0.015
Y(-12)<-0. 005
Y(-12)<0.0003
Y(-9)<0. 01
0.04
Y(-3)<-0. 0.03 -0.02
-0.03
Y(-9)<-0.016
Y(-1)<0.
-0. 008 -0.002
-0.01
0.05 0.002
0.04 Y(-9)<-0.04
Y(-12)<0. 008
-0.006 Y(-1)<0. 039 0.003
0.003 0.01
0.008
0.013
Fig. 8. Decision tree structure for preprocessed data.
0.05
0.06
2173
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
FW (FLTW & FDMW) and ARMA model. Consequently, the best preprocessing method is selected to convert the model to covariance stationary process. The results of applying preprocessing methods for given data set is discussed in the next section.
not be used for data preprocessing as prescribed by the algorithm. It can be seen in Fig. 5b that the first difference of logarithm is the most likely candidate to have covariance stationary process. Moreover, it is the most applicable preprocessing method for the data set.
3.2. Step 2
3.2.1. Normalization All three methods of normalization are used to preprocess the data, but as can be seen in Fig. 4b–d, in which the normalized consumption data are shown the trend of data cannot be removed by any of the normalization methods. Therefore, normalization is not suitable for preprocessing the data set. 3.2.2. The first difference method The preprocessed data using the first difference method is shown in Fig. 5a. Although, the first difference of the series seems to have a constant mean, the variance has an increasing pattern over time. Thus, this method is not covariance stationary and canTable 4 An inference steps for FDMW Accumulation operator
Implication operator
Algebraic product
Maximum
Minimum
18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0
Actual data FDMW monthly electricity consumption estimation
3.3.1. Fuzzy models In this part, fuzzy model parameter must be determined. Only FDMW parameters are explained in following sections. 3.3.1.1. Knowledge base. In these part Linguistic variables, terms and rules must be determined. Linguistic variables are determined, when input and output variables are determined. According to Fig. 6, yt is the function of consumption in the 1st, 3rd, 9th and 12th lags in preprocessed data. So, yt, yt1, yt3, yt9 and yt12 are Linguistic variables. Three types of terms are: x < b, a < x < b and b < x that x is a linguistic variable. Triangular, trapezoidal and triangular membership functions are used to defining up terms, respectively. Fig. 6 shows these membership functions. Twenty one rules are produced after applying decision tree on preprocessed data. Figs. 7 and 8 produced these rules and related tree. 3.4.1.2. Inference engine. After use all plausible operators, below operators have best performance on FDMW model. Table 4 shows these operators. 18000000 16000000 14000000 12000000 Actual data
10000000 8000000
FDMWO monthly electricity consumption estimation
6000000 4000000 2000000 0 r(2
b(2
4) 00 c(2 De 4) 00 t(2 Oc 4) 00 g(2 Au 4) 00 n(2 Ju 4) 00 4) 00
Ap
Fe
5) 00 n(2 4) Ja (200 ) 4 c De (200 ) v 4 No (200 ) 4 t Oc (200 ) 4 p Se (200 ) g Au 004 l(2 4) Ju (200 4) n Ju (200 ) y 4 Ma (200 ) 4 r Ap r(200 ) Ma 2004 b( Fe
Electricity Consumption (Kwh)
Aggregation operator
3.3. Step 3
Electricity Consumption (Kwh)
The 130 rows of data are divided into 118 training data set and 12 test data set. Also, the 129 preprocessed data are divided into 117 training data set and 12 test data set.
Test months
(c) The comparison of FLTW output with test actual data
Actual data
b( 2
00 4 r(2 ) 00 4) Ju n( 20 Au 04) g( 20 04 O ) ct (2 00 D ec 4) (2 00 4)
FLTWO monthly electricity consumption estimation
Ap
Ap
20 04 r(2 ) 00 Ju 4) n( 20 0 Au 4) g( 20 0 O ct 4) (2 00 D ec 4) (2 00 4)
FLTW monthly electricity consumption estimation
18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0
Fe
Actual data
(b) The comparison of FDMWO output with test actual data Electricity Consumption (Kwh)
18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0 Fe b(
Electricity Consumption (Kwh)
(a) The comparison of FDMW output with test actual data
(d) The comparison of FLTWO output with test actual data
Fig. 9. The comparison of fuzzy models output with test actual data.
2174
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177 Table 6 The McLeod–Li test results
0.4 0.3
Ljung–Box Q-statistic
0.2 0.1
ACF
0 -0.1
PACF 1
3
5
7
9
11
Q(4)a
Q(8)
Q(12)
1.4 (0.846)
7.3 (0.5)
11.2 (0.51)
a
Q(n) reports the Ljung–Box Q-statistic for the autocorrelation coefficients of the squared n residuals of the estimated model. Significance levels are in parentheses.
-0.2 -0.3 Fig. 10. The ACF and PACF chart for preprocessed data.
3.3.1.3. Defuzzifying. Center average has best performance on FDMW versus center of gravity operators. The comparison of fuzzy models output with test actual data is shown in Fig. 9. 3.3.2. Time series model In order to finding preferred time series model appropriated ARMA models must be estimated firstly and preferred ARMA model is selected by aid of ANOVA test. Then By using result of this model, McLeod–Li test is applied. The result of this test shows that nonlinear time series must be used or not. The following sections show that ARMA model is sufficient for the case study. Moreover, pervious researches show that linear time series is the most ideal for our case study [68,69]. 3.3.2.1. ARMA model. To find the best time series model preprocessed approach is used. Fig. 10 shows ACF and PACF charts, respectively. The theoretical ACF of a pure MA(q) process cuts off to zero at lag q and theoretical ACF of an AR(1) models decays geometrically. Examination of Fig. 10 suggests that neither of these specifications seems appropriate for the electricity consumption. The ACF does not decay geometrically and it is suggestive of an AR(2) process or a process with both autoregressive and moving average components. A seasonal factor at lag 12 is incorporated
due to availability of monthly data. Therefore, five models are considered which are AR(1), AR(2), ARMA(1,1), ARMA(1,(1,12)) and AR(2,MA(12)) for the training data set. Table 5 shows models’ information including Q-statistics, AIC, SBC and coefficients estimation. Box–Pierce Q-statistic test shows that AR(1) and AR(2) should be eliminated. However, as measured by AIC and SBC, ARMA(1,1) and AR(2,MA(12)) do not fit the data as well as the ARMA(1,(1,12)). 3.3.2.2. The McLeod–Li test. Residuals of ARMA(1,(1,12)) are used to run McLeod–Li test. Examination of Table 6 shows that nonlinearity condition is not satisfied. So, ARMA(1,(1,12)) is called TM. 3.3.2.3. Nonlinear time series model. As mentioned, ARMA model is sufficient for our data and there is no need to identify an appropriate nonlinear time series model. The comparison of ARMA(1,(1,12)) output with test actual data is shown in Fig. 11. 3.4. Step 4 Since data are preprocessed for FLTW, FDMW and ARMA models, the estimated data obtained by these models should be postprocessed. Let FDMWxi (i: 1, . . ., 12) be the FDMW output for preprocessed test data. FDMWxi is postprocessed by this formula: ð10Þ^ ðxði 1ÞÞ þ FDMWxi
ð15Þ
Table 5 The Q-statistics, AIC, SBC and coefficients estimation AR(1)
AR(2)
ARMA(1,1)
ARMA(1,(1,12))
AR(2,MA(12))
0.009 0.201
0.008 0.235 0.174
0.009 0.616
0.003 0.235
0.008 0.122 0.076
0.879
0.149 0.757 1,3 (0.85) 7.2 (0.51) 11.5 (0.501) 648 640
15.6 (0.00) 25.005 (0.00) 31.35 (0.00) 623 620
Electricity Consumption (Kwh)
A0 A1 A2 B1 B2 Q(4) Q(8) Q(12) AIC SBC
8.94 (0.07) 16.04 (0.05) 21.4 (0.05) 632 625
4.66 (0.33) 11.642 (0.17) 18.13 (0.11) 639 631
20000000 18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0
Actual data ARMA Monthly Electricity Consumption Estimation
5) 00 n(2 Ja 004) c(2 De 004) v(2 No 04) 0 t(2 ) Oc 004 p(2 4) Se 0 0 g(2 ) Au 4 00 l(2 Ju 004) ) n(2 Ju 004 y(2 Ma 04) 0 r(2 Ap 004) r(2 Ma 004) b(2 Fe
Test months Fig. 11. The comparison of ARMA output with test actual data.
0.885 2.02 (0.72) 8.5 (0.41) 12.1 (0.41) 645 642
2175
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177 Table 7 ANOVA table for comparison of ARMA, fuzzy models and actual data
Table 9 The MAPE estimation of the proposed algorithm versus GA and ANN
Groups
Count
Sum
Average
Actual FDMW FDMWO FLTW FLTWO Time series
12 12 12 12 12 12
1.63E+08 1.64E+08 1.56E+08 1.53E+08 1.6E+08 1.65E+08
13,623,656 13,643,155 13,013,172 12,775,555 13,350,555 13,765,363
Source of variation
Sum square
Degree of freedom
Mean square
F
pValue
Between groups (treatment) Blocks (months) Within groups
9.3113E+12
5
1.8623E+12
0.59
0.707
1.0970E+14 1.7323E+14
11 55
9.9728E+12 3.1496E+12
3.17
0.002
Total
2.9224E+14
71
MAPE
Genetic algorithm
Artificial neural network
The proposed algorithm
0.014
0.0156
0.02
4. Comparison with other intelligent methods With the aid of FDMW model, electricity for the next 12-month is forecasted (Fig. 12). Table 9 shows the MAPE estimation genetic algorithm (GA), artificial neural network (ANN) versus the proposed algorithm [65,71]. Examination of this table shows that the proposed algorithm provides good estimation with respect to GA and ANN. 5. Conclusion
Table 8 The relative error (MAPE) of the models ARMA
FLTW
FDMW
FLTWO
FDMWO
Models
0.103
0.2
0.02
0.24
0.13
Electricity Consumption prediction
MAPE
20000000 18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0 6) 00 n(2 Ja 5) 00 c(2 De 5) 00 v(2 No 5) 00 t(2 Oc 5) 00 p(2 ) Se 5 00 g(2 Au ) 5 00 l(2 Ju 5) 00 n(2 Ju 5) 00 y (2 Ma ) 5 00 r(2 Ap 5) 00 r(2 Ma 5) 00 b(2 Fe
Predicted Months Fig. 12. Monthly electrical energy consumption prediction (February 2005–January 2006).
FLTW and ARMA output postprocessed is similar to above-mentioned case. 3.5. Step 5 Fuzzy models (FLTW, FLTWO, FDMW and FDMWO) and ARMA model are compared by ANOVA test. The experiment was designed such that variability arising from extraneous sources can be systematically controlled. Time is the common source of variability in the experiment that can be systematically controlled through blocking [70]. Therefore, a one way blocked design of ANOVA was applied. The results are shown in Table 7. The test of hypothesis is defined as H0 :
l1 ¼ l2 ¼ l3 ¼ l4 ¼ l5 ¼ l6
H1 :
li 6¼ lj
i; j ¼ 1; 2; . . . ; 6; i 6¼ j
ð16Þ
where l1, l2, l3, l4, l5 and l6 are the average estimation obtained from, fuzzy models, ARMA model and actual data. It can be seen from Table 9 that the null hypothesis is not rejected, so MAPE is used. It can be seen from Table 8 that FDMW has the least MAPE which shows the efficiency of FDMW among other models.
An intelligent integrated algorithm was proposed to model non-stationary time series data. The algorithm is composed of conventional time series, fuzzy systems, decision tree, ANOVA, ACF, McLeod–Li test, preprocessing and postprocessing methods. The processes that are not covariance stationary are converted to covariance stationary process by appropriate data preprocessing method. ACF method is used for input selection of the proposed algorithm. Fuzzy system with and without preprocessed data and conventional time series (either linear or nonlinear) are run to select the preferred time series approach. To construct fuzzy systems, a rule base is needed. Because a rule base is not available, for the case of demand function, look up table which is one of the extracting rule methods is used to extract the rule base. Also, decision tree method which is a data mining approach is similarly utilized to extract the rule base. Preferred time series model is selected from linear or nonlinear models. For this, after selecting preferred ARMA model, McLeod–Li test is applied to determine nonlinearity condition. When, nonlinearity condition is satisfied, preferred nonlinear model is selected and defined as the preferred time series model. At last, preferred model from fuzzy models and conventional time series model is selected by ANOVA. As mentioned, the impact of data preprocessing and postprocessing on the fuzzy systems performance is considered by the proposed algorithm. In addition, another unique feature of the proposed algorithm is utilization of autocorrelation function (ACF) to define input variables, whereas conventional methods use trial and error method. Monthly electricity consumption of Iran from 1995 to 2005 was considered as the case of this study. MAPE results showed the negative impact of preprocessing data on fuzzy system. Furthermore, it was shown that FDMW have better performance than conventional time series. Also, the algorithm showed that data postprocessing is needed after data preprocessing. The MAPE estimation of genetic algorithm (GA), artificial neural network (ANN) versus the proposed algorithm shows the appropriateness of the proposed algorithm. In summary, it should be mentioned that pervious studies do not consider the following features of the proposed algorithm: (1) The impact of preprocessing data on fuzzy systems. (2) The use of ACF to input selection versus trail and error process in previous studies. (3) The use of McLeod–Li test to consider both linearity and nonlinearity conditions of time series modeling. (4) The use of ANOVA to compare time series models. (5) The use of decision tree to construct rule base mechanism for fuzzy system estimation.
2176
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177
include clustering, associations, data analysis, rule induction, genetic algorithm, decision tree, and neural network.
Appendix I Akaike information criterion (AIC) and Schwartz Bayesian criterion (SBC) The two most commonly used model selection criteria are the AIC and the SBC. These criteria are used to select the most appropriate model. They have the following formulas: AIC = T ln(sum of squared residuals) + 2n SBC = T ln(sum of squared residuals) + n ln(T) where n is the number of parameters estimated (p + q + possible constant term) and T is the number of usable observations. Ideally, the AIC and SBC will be as small as possible (note that both can be negative). As the fit of the model improves, the AIC and SBC will approach 1. Model A is said to fit better than model B if AIC (or SBC) for A is smaller than for B.
Appendix II. Box–Pierce Q-statistic The Q-statistic can be used to test whether a group of autocorrelations is significantly from zero. Box and Pierce used the sample autocorrelation to form the Q-statistic which has the following formula. Let there be T observations labeled y1 through yT. We can let and rs be estimates of l and qs , respectively, where: y
rk ¼
s X
r 2k k¼1 PT r¼kþ1 ðyt yÞðytk PT 2 t¼1 ðyt yÞ T X
Q ¼T
¼ ð1=TÞ y
Þ y
yt
t¼1
Under the null hypothesis that all values of rk = 0, Q is asymptotically v2 distributed with s degrees of freedom. The intuition behind the use of statistic is that high sample autocorrelations lead to large values of Q. certainly; a white noise process (in which all autocorrelation should be zero) would have Q value of zero. If the calculated value of Q exceeds the appropriate value in a v2 table, the null significant autocorrelations can be rejected.
Appendix III. Data mining Over the past several years, the field of data mining has seen an explosion of interest from both academia and industry [48]. Data mining is an interdisciplinary field and draws heavily on both statistics and machine learning. Thearling (1999) proposed that data mining is ‘the extraction of hidden predictive information from large databases’, a cutting-edge technology with great potential to help companies dig out the most important trends in their huge database [49]. Emerging data mining tools can answer business questions that have been traditionally too time-consuming to solve. Lejeune (2001) addressed that data mining techniques allow the transformation of raw data into business knowledge [50]. The SAS Institute (2000) defines data mining as ‘the process of selecting, exploring and modeling large amount of data to uncover previously unknown data patterns for business advantage’ [51]. Consequently, we would say that data mining is applying data analysis and discovery algorithms to detect patterns over data for prediction and description. With sufficient database size and quality, data mining technology can provide business intelligence to generate new opportunities [52–57]. Data mining techniques most commonly used
References [1] Hosoz M, Ertunc HM, Bulgurcu H. Performance prediction of a cooling tower using artificial neural network. Energy Convers Manage 2007;48:1349–59. [2] Atashkari K, Nariman-Zadeh N, Gölcü M, Khalkhali A, Jamali A. Modelling and multi-objective optimization of a variable valve-timing spark-ignition engine using polynomial neural networks and evolutionary algorithms. Energy Convers Manage 2007;48:1029–41. [3] Tien Pao H. Forecasting electricity market pricing using artificial neural networks. Energy Convers Manage 2007;48:907–12. [4] O’ztopal A. Artificial neural network approach to spatial estimation of wind velocity data. Energy Convers Manage 2006:395–406. [5] Gölcü M. Artificial neural network based modeling of performance characteristics of deep well pumps with splitter blade. Energy Convers Manage 2006:3333–43. [6] Gareta R, Romeo LM, Gil A. Forecasting of electricity prices with neural networks. Energy Convers Manage 2006;47:1770–8. [7] Pai P. Hybrid ellipsoidal fuzzy systems in forecasting regional electricity loads. Energy Convers Manage 2006;47:2283–9. [8] Mandal Paras, Senjyu Tomonobu, Funabashi Toshihisa. Neural networks approach to forecast several hour ahead electricity prices and loads in deregulated market. Energy Convers Manage 2006;47:2128–42. [9] Yalcinoz T, Eminoglu U. Short term and medium term power distribution load forecasting by neural networks. Energy Convers Manage 2005;46:1393–405. [10] Pai P, Hong W. Support vector machines with simulated annealing algorithms in electricity load forecasting. Energy Convers Manage 2005;46:2669–88. [11] Ben-Nakhi AE, Mahmoud MA. Cooling load prediction for buildings using general regression neural networks. Energy Convers Manage 2004:2127–41. [12] Sozen A, Arcaklio lu E, Ozalp M. Estimation of solar potential in Turkey by artificial neural networks using meteorological and geographical data. Energy Convers Manage 2004;45:3033–52. [13] Beccali M, Cellura M, Lo Brano V, Marvuglia A. Forecasting daily urban electric load profiles using artificial neural networks. Energy Convers Manage 2004;45:2879–900. [14] Hsu C, Chen C. Regional load forecasting in Taiwan––applications of artificial neural networks. Energy Convers Manage 2003;44:1941–9. [15] Reddy KS, Ranjan M. Solar resource estimation using artificial neural networks and comparison with other correlation models. Energy Convers Manage 2003;44:2519–30. [16] Yao SJ, Song YH, Zhang LZ, Cheng XY. Wavelet transform and neural networks for short-term electrical load forecasting. Energy Convers Manage 2000;41:1975–88. [17] Azadeh A, Ghaderi SF, Gitiforooz A. Estimating electricity demand function in residential sector by fuzzy regression. In: Proceeding of IEEE international symposium industrial electronics (ISIE06); 2006. [18] Iqbal A, He N, Li L, Ullah Dar N. A fuzzy expert system for optimizing parameters and predicting performance measures in hard-milling process. Expert Syst Appl 2007;32(4):1020–7. [19] Konar A. Artificial intelligence and soft computing. CRC Press/LLC; 2000. [20] Zadeh LA. Fuzzy sets. Inform Control 1965;8:338–53. [21] Marks R. Fuzzy logic technology and applications. In: IEEE technology series. New York: IEEE; 1994. [22] Tzafestas SG, Venetsanopoulos AN. Fuzzy logic in information. Decision and control systems. Norwell, MA: Kluwer; 1994. [23] Perrot N. Fuzzy concept applied to food product quality control: a review. Fuzzy Set Syst 2006;157(9):1143–4. [24] Zeng XJ, Singh MG. Approximation theory of fuzzy systems—SISO case. IEEE Trans Fuzzy Syst 1994;2:162–76. [25] Zeng XJ, Singh MG. Approximation theory of fuzzy systems—MIMO case. IEEE Trans Fuzzy Syst 1995;3:219–35. [26] Approximation accuracy of fuzzy systems as function approximators. IEEE Trans Fuzzy Syst 1996;4:44–63. [27] Klir GJ, Yuan B. Fuzzy sets and fuzzy logic: theory and applications. Englewood Cliffs, NJ: Prentice-Hall; 1995. [28] Kosko. Neural networks and fuzzy systems: a dynamical approach to machine intelligence. Englewood Cliffs, NJ: Prentice-Hall; 1992. [29] Aznarte JL, Sanchez JMB, Lugilde DN, Fernandez CDL, Guardia CD, Sanchez FA. Forecasting airborne pollen concentration time series with neural and neurofuzzy models. Expert Syst Appl 2007;2(4):1218–25. [30] Zhang GP, Qi M. Neural network forecasting for seasonal and trend time series. Eur J Oper Res 2005;160:501–14. [31] Zhang G, Hu MY. Neural network forecasting of the British pound/US dollar exchange rate. Omega Int J Manage Sci 1998;26:495–506. [32] Hwarng HB. Insights into neural-network forecasting of time series corresponding to ARMA(p;q) structures. Omega 2001;29:273–89. [33] Palmer A, Montano J, Sese A. Designing an artificial neural network for forecasting tourism time series. Tour Manage 2006;27:781–90. [34] Kim TY, Oh KJ, Kim C, Do JD. Artificial neural networks for non-stationary time series. Neurocomputing 2004;61:439–47. [35] Al-Saba T, El-Amin I. Artificial neural networks as applied to long-term demand forecasting. Artif Intel Eng 1999;13:189–97.
A. Azadeh et al. / Energy Conversion and Management 49 (2008) 2165–2177 [36] Zhang GP. An investigation of neural networks for linear time-series forecasting. Comput Oper Res 2001;28:1183–202. [37] Ghiassi M, Saidane H, Zimbra DK. A dynamic artificial neural network model for forecasting time series events. Int J Forecast 2005;21:341–62. [38] Dundar MF. Fuzzy logic and its industrial engineering applications. MS thesis, ITU Institute of Science and Technology, Istanbul, Turkey; 1996. [39] Zhang HC, Huang SH. A fuzzy approach to process plan selection. Int J Prod Res 1994;32:1265–79. [40] Nayak PC, Sudheer KP, Rangan DM, Ramasastri KS. A neuro-fuzzy computing technique for modeling hydrological time series. J Hydrol 2004;291:52–66. [41] Karunasinghea DSK, Liong SY. Chaotic time series prediction with a global model: artificial neural network. J Hydrol 2006;323:92–105. [42] Tseng FM, Yu HC, Tzeng GH. Combining neural network model with seasonal time series ARIMA model. Technol Forecast Social Change 2002;69:71–87. [43] Oliveira ALI, Meira SRL. Detecting novelties in time series through neural networks forecasting with robust confidence intervals. Neurocomputing 2006;70(1–3):79–92. [44] Niskaa H, Hiltunen T, Karppinen A, Ruuskanen J, Kolehmainena M. Evolving the neural network model for forecasting air pollution time series. Eng Appl Artif Intel 2004;17:159–67. [45] Aznarte JL, Sanchez JMB, Lugilde DN, Fernandez CDL, Guardia CD, Sanchez FA. Forecasting airborne pollen concentration time series with neural and neurofuzzy models. Expert Syst Appl 2007;32(4):1218–25. [46] Jain A, Kumar AM. Hybrid neural network models for hydrologic time series forecasting. Appl Soft Comput 2007;7(2):585–92. [47] Ross I TJ. Fuzzy logic with engineering applications. New York: McGraw-Hill; 1995. [48] Michalski SR. Machine learning and data mining. John Wiley & Sons; 1998. [49] Thearling K. An introduction of data mining. Direct Market Mag 1999. [50] Lejeune M. Measuring the impact of data mining on churn management. Internet Res: Electron Network Appl Policy 2001;11(5):375–87. [51] SAS Institute. Best price in churn prediction. SAS Institute White Paper; 2000. [52] Lau HCW, Wong CWY, Hui I, Pun KF. Design and implementation of an integrated knowledge system. Knowledge Based Syst 2003;16(2):69–76. [53] Su CT, Hsu HH, Tsai CH. Knowledge mining from trained neural networks. J Comput Inform Syst 2002;42(4):61–70. [54] Langley P, Simon HA. Applications of machine learning and rule induction. Commun ACM 1995;38(11):55–64. [55] Bortiz JE, Kennedy DB. Effectiveness of neural network types for prediction of business failure. Expert Syst Appl 1995;9(4):503–12.
2177
[56] Fletcher D, Goss E. Forecasting with neural networks: an application using bankruptcy data. Inform Manage 1993;3:159–67. [57] Salchenberger LM, Cinar EM, Lash NA. Neural networks: a new tool for predicting thrift failures. Decis Sci 1992;23(4):899–916. [58] Tam KY, Kiang MY. Managerial applications of neural networks: the case of bank failure predictions. Manage Sci 1992;38(7):926–47. [59] Box GEP, Jenkins GM. Time series analysis: forecasting and control. San Francisco; 1970. [60] Box GEP, Jenkins GM, Reinsel GC. Time series analysis: forecasting and control. Englewood Cliffs, NJ: Prentice-Hall; 1994. [61] McLeod AI, Li WK. Diagnostic checking ARMA time series models using squared-residual autocorrelations. J Time Ser Anal 1994;4:269–73. [62] Azadeh A, Ghaderi SF, Anvari M, Saberi M. Measuring performance electric power generations using artificial neural networks and fuzzy clustering In: Capolino GA, Franquelo LG, editors. Proceedings of the 32nd annual conference of the IEEE industrial electronics society, IECON; 2006. [63] Azadeh A, Ghaderi SF, Anvari M, Saberi M, Izadbakhsh H. An integrated artificial neural network and fuzzy clustering algorithm for performance assessment of decision making units. Appl Math Comput 2007;187(2):584–99. [64] Azadeh A, Ghaderi SF, Anvari M, Saberi M. Performance assessment of electric power generations using an adaptive neural network algorithm. Energy Policy 2007;35:3155–66. [65] Azadeh A, Ghaderi SF, Sohrabkhani S. Forecasting electrical consumption by integration of neural network, time series and ANOVA. Appl Math Comput 2007;186:1753–61. [66] Azadeh A, Ghaderi SF, Tarverdian S, Saberi M. Integration of artificial neural networks and GA to predict electrical energy consumption. In: Capolino, GA, Franquelo LG, editors. Proceedings of the 32nd annual conference of the IEEE industrial electronics society – IECON’06; 2006. [67] Azadeh A, Ghaderi SF, Tarverdian S, Saberi M. Integration of artificial neural networks and genetic algorithm to predict electrical energy consumption. Appl Math Comput 2007;186:1731–41. [68] Zamani M. Estimate electricity demand function for the economic sectors. MS thesis, Faculty of Economics, University of Tehran, Iran; 1998. [69] Sadeghi M. Demand stability for energy in Iran. PhD, Faculty of Economics. Iran: VCH; 1999. [70] Montgomery DC. Design & analysis of experiments. New York: John Wiley & Sons; 1991. [71] Azadeh A, Tarverdian S. Integration of genetic algorithm, computer simulation and design of experiment for forecasting electrical consumption. Energy Policy 2007;35:5229–41.