JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.1 (1-17)
International Journal of Approximate Reasoning ••• (••••) •••–•••
Contents lists available at ScienceDirect
International Journal of Approximate Reasoning www.elsevier.com/locate/ijar
Evolutionary learning of fuzzy grey cognitive maps for the forecasting of multivariate, interval-valued time series ✩ Wojciech Froelich a,∗ , Jose L. Salmeron b,∗∗ a b
Institute of Computer Science, University of Silesia, ul. Bedzinska 39, Sosnowiec, Poland Computational Intelligence Lab, University Pablo de Olavide, 1st km. Utrera road, 41013 Seville, Spain
a r t i c l e
i n f o
Article history: Received 1 July 2013 Received in revised form 13 February 2014 Accepted 17 February 2014 Available online xxxx Keywords: Multivariate interval-valued time series Forecasting Fuzzy grey cognitive maps Evolutionary learning
a b s t r a c t Time series are built as a result of real-valued observations ordered in time; however, in some cases, the values of the observed variables change significantly, and those changes do not produce useful information. Therefore, within defined periods of time, only those bounds in which the variables change are considered. The temporal sequence of vectors with the interval-valued elements is called a ‘multivariate interval-valued time series.’ In this paper, the problem of forecasting such data is addressed. It is proposed to use fuzzy grey cognitive maps (FGCMs) as a nonlinear predictive model. Using interval arithmetic, an evolutionary algorithm for learning FGCMs is developed, and it is shown how the new algorithm can be applied to learn FGCMs on the basis of historical time series data. Experiments with real meteorological data provided evidence that, for properly-adjusted learning and prediction horizons, the proposed approach can be used effectively to the forecasting of multivariate, interval-valued time series. The domain-specific interpretability of the FGCM-based model that was obtained also is confirmed. © 2014 Elsevier Inc. All rights reserved.
1. Introduction Forecasting multivariate time series is an important problem that is considered in many application domains, including meteorology, economics, and medicine. Forecasts are usually produced using a certain predictive model that generalizes the dynamics of the observed process; however, in several application areas, the values of times series are uncertain, unstable, or change significantly over time. Therefore, the variations of the data are managed as noise that contains no useful information and that should not be reflected by the model. The solution is to use an interval-valued representation of the data. Instead of accurate values, every observation is represented in the form of the interval in which the actual value falls. Interval-valued time series (ITS) are usually used when data are gathered in a nearly continuous way, but there only is a need to track the range of their changes. Examples of interval-valued data include the minimum and maximum temperatures on a given day or the minimum and maximum stock prices during a trading day. The importance of the prediction of ITS is based on the fact that the interval forecasts contain information on the range of the variation of the observed variables [1,2]. ITS are able to represent phenomena that traditional, single-valued time
✩
This document is a collaborative effort.
* Principal corresponding author. ** Corresponding author.
E-mail addresses:
[email protected] (W. Froelich),
[email protected] (J.L. Salmeron).
http://dx.doi.org/10.1016/j.ijar.2014.02.006 0888-613X/© 2014 Elsevier Inc. All rights reserved.
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.2 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
2
Table 1 Different approaches to the prediction of ITS. Type
Univariate Multivariate
Model Lower and upper bounds separately
Lower and upper bounds combined
Interval arithmetic
[10,11] –
[10–12,9] –
[11,1] this paper
series cannot describe accurately, e.g., when variability in interval form is considered. The advantage of an interval forecast is that it provides an interval that contains a future observation instead of providing a single value for the observation. The predicted interval provides the uncertainty associated with the forecast. Interval-valued numbers are the subject of so-called ‘interval analysis’ [3]. Note that, in comparison to fuzzy sets theory [4], interval analysis does not deal with membership functions. It does not consider the probabilistic distribution of data within intervals. Thus, no background knowledge about the observed data is required. Grey systems theory (GST) proposed a specific approach to analyze interval-valued data [5]. GST uses interval-valued numbers and the same arithmetic as in interval analysis. One of the problems addressed by GST is the prediction of time series; however, the approach proposed by GST [5] is, in fact, still a specific, single-valued time series prediction. The first step of the GST-based approach is to transform the original univariate sequence of real numbers into the accumulated sequence using the so-called ‘accumulating generation operator’ (AGO). The resulting sequence is called a ‘grey time series’ and is not interval-valued. For that reason, the research presented in this paper is not related to the problem of grey time series prediction as it is known within GST. Regarding the relationship between ITS and traditional prediction methods, it is necessary to distinguish three general approaches for forecasting ITS [6]. The first approach breaks the prediction of ITS down into standard time series forecasting, i.e., that corresponding to the lower and upper bounds of the intervals, respectively. In such a case, traditional forecasting methods are used to separately forecast the bounds of the intervals. However, this approach does not simultaneously use all of the information contained in the interval sample. A special case is the use of multivariate methods, e.g., vector auto regression (VAR) or multi-layer perceptron (MLP), to forecast the lower and upper bounds of a single, dependent variable [7,8]. However, such a case is still a prediction of univariate ITS. The second approach combines both bounds of the intervals to predict the resulting ITS, i.e., the prediction uses the dependency between bounds, but it does not use interval arithmetic. Selected traditional forecasting methods can be adapted to ITS following this approach [9]. The third approach applies interval arithmetic to adapt known forecasting methods in order to address ITS. The method presented in this paper represents the third approach. However, in contrast to already-known, univariate methods, it is targeted at the prediction of multivariate ITS. The relationship of our method to existing approaches is clarified in Table 1. To date, several predictive models for univariate ITS forecasting have been proposed. A new method for fitting a linear regression model to ITS has been used for the prediction of medical time series [12]. A regression model also has been presented for use in forecasting ITS in the financial sector [13]. Forecasting of electric power demand was addressed in [7], which included a comparison of autoregressive and multi-layer perceptron models using interval arithmetic. A resampling approach for interval-valued data regression was proposed in [14]. A review and comparison of existing ITS forecasting techniques was presented in [15,8]. The clustering technique we used (i.e., k-nearest neighbor) was reported to be the best method for the prediction of the Euro–Dollar exchange rate. Recently, the autoregressive integrated moving average (ARIMA) model was recommended as the best model for the prediction of the univariate Dow Jones Industrial Average (DJIA) index. In the meteorological domain, an interval analysis approach was proposed for the prediction of weather variables under uncertainty [16]. An exponential smoothing method for univariate (temperature) ITS was proposed in [11]. In addition, the forecasting of minimum and maximum daily average temperatures using autoregressive and artificial neural network (ANN) models was presented in [15]. Recently, the ARIMA model was fitted to daily maximum and minimum temperature anomalies from the mean seasonal cycle using data from a number of weather stations in Australia and New Zealand. It has been reported that the ARIMA model is suitable for describing the variability of temperature time series [17]. The comparison of different approaches to time series prediction is a challenging task because the results they produce depend on the assumed error measure and the type of data used [9]. The consensus in the extant literature is that no single method is superior to the others in every situation. For univariate ITS, it was shown that the ARIMA model outperforms the other methods (i.e., naive, exponential smoothing, and multi-layer perceptron models) [11]. The experiments based on multi-valued linear models for univariate ITS (with multiple explanatory variables and a single dependent variable) confirmed that ITS prediction models based on interval arithmetic outperform single-valued approaches [1]. It is claimed that the separation of lower and upper bounds in information processing deprives the user of much of the variability information that is contained in ITS [1]. The limitations of the existing solutions to the ITS prediction problem are such that, in cases in which real data were used, they were always univariate ITS. To the best of our knowledge, the problem of predicting real-world, multivariate ITS has never been addressed. Moreover, due to the diverse accuracy measures used by different authors to calculate prediction errors, even for univariate ITS, the presented results are hardly comparable with each other from the perspective of their
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.3 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
3
effectiveness. Another problem with the majority of presented results is that, due to the use of data that are not publicly available, the results cannot be reproduced. In this paper, the aforementioned gaps are addressed by the adaptation of Fuzzy Cognitive Maps (FCMs) [18] to the forecasting of multivariate ITS. FCM is a nonlinear, predictive model that represents real-world problems in terms of their concepts and causal relationships. The concepts are represented as nodes, and then causal relationships are represented as directed edges. Each edge is assigned a weight that reflects the strength of the causal relationship between nodes. This representation yields a graph of nodes and arrows in which the concepts are considered as variables of the system that was modeled. The reasoning performed by FCM is a nonlinear, additive process, which is described in detail in Section 2.1. One advantage of FCM over black-box models (e.g., artificial neural networks) is that FCMs can be interpreted easily by experts. The selection of the best number of nodes in the neural network involves experimentation. All nodes within the FCMs are determined by the considered prediction problem. Unlike neural networks, each FCM node has a specific meaning. Another advantage of FCMs is that they are easily scalable with respect to the number of variables. FCMs are learned using scale-independent data, which makes their predictions easily interpretable and comparable. FCMs also have several other desirable properties, including abstraction, flexibility, and adaptability. Application examples can be found in many areas, e.g, decision-making methods, geographical information systems, agriculture, management, software engineering [19–21]. An FCM model can be used effectively to forecast single-valued, multivariate time series [22–25]. FCMs also have been used to forecast meteorological time series [26]. A review of FCM research over the last decade is given in [27]. Extensive research has been done on diverse methods of learning FCMs. Generally, there are two different approaches to FCM learning, i.e., the adaptive approach, which incrementally modifies the weights of FCM, and the population-based approach, which generates the so-called ‘candidate FCMs’ and applies a fitness function to assess each of them. The aim of learning FCMs by using adaptive (Hebbian-based) methodologies is to produce an FCM adjacency matrix that leads the FCM to converge toward a given decision state or convergence region. The Hebbian-based learning methods involve BDA [28], AHL [29] and NHL [19] algorithms. For the purpose of predicting single-valued, multivariate time series, population-based algorithms are the most effective for the of FCMs [30]. Population-based methods involve several algorithms, i.e., the real-coded genetic algorithm (RCGA) [22]; the PSO-based algorithm, which applies particle swarm optimization [31]; the simulated annealing optimization-based algorithm [32]; and the differential evolution-based algorithm [26]. Surveys of learning methods used for FCMs can be found in [33,34]. Standard, single-valued FCMs have three main disadvantages:
• They are unable to deal with uncertain source data, e.g., data that are given in the form of intervals. • The dependencies among concepts of FCM always are given as the real values of weights assigned to the pairs of concepts; however, in practical applications, the dependencies among real factors are not crisp and are expected to be given in an approximate form. • Standard FCMs are unable to estimate the extent to which the deviations of input data influence the uncertainty of the prediction results. Such variations in predicted values are expected by experts due to the uncertainty involved in real processes. On the basis of GST, the standard FCM model has been extended to deal with interval-valued data, and the extended FGCMs were proposed in [35]. FGCMs have been used for modeling different problem domains [36,37]. A single attempt to learn FGCMs was made, and it involved the application of an adaptive approach, i.e., a nonlinear Hebbian-based algorithm. The goal of that learning approach was to determine the success of radiation therapy. The studies presented by Papageorgiou and Salmeron [38] were unrelated to the prediction of ITS. So, the prediction of multivariate ITS by FGCM has not yet been addressed. Also, there have been no attempts to apply evolutionary algorithms to learn FGCM. Recently, as an exemplification of the granular approach to building data models, granular cognitive maps were proposed to deal with the uncertainties in the dependencies among concepts [39]. However, no research has been attempted to date on learning granular cognitive maps. The limitations of the existing ITS prediction methods and the shortcomings of traditional FCMs are addressed in this paper. The contributions of this paper are as follows.
• The application of FGCMs to the prediction of multivariate interval-valued time series. • An interval-based evolutionary learning algorithm that can cope with the problem of learning FGCMs. • Experimental results that provided evidence for the usefulness and satisfactory effectiveness of the proposed method. The remainder of this paper is organized as follows. Section 2 provides an overview of the theoretical background of FCMs, grey systems theory, FGCMs and the estimation of prediction errors. The theoretical contributions of this paper are described in Section 3. In Section 4, the experimental results are discussed. Our conclusions and recommendations for future work are presented in Section 5.
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.4 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
4
2. Theoretical background Fuzzy Cognitive Maps are assumed to be a basic version of the FGCMs; therefore, an introduction to the theory of standard FCMs is presented. 2.1. Fuzzy cognitive maps Let V = { v 1 , v 2 , . . . , v n } be a set of real-valued variables and let t ∈ [1, 2, . . . , t e ] be a discrete time scale, where t e ∈ ℵ is a parameter. The values of V (t ) are observed over time. A multivariate time series is denoted as a sequence { V (t )} = { V (1), V (2), . . . , V (t e )}. Let C = {c 1 , c 2 , . . . , cn } be a superset of fuzzy sets c i , where n = card(C ) denotes the cardinality of C . Every v i (t ) is mapped by the fuzzyfication function μi to the corresponding fuzzy set c i . The value of c i (t ) = μ( v i (t )) denotes the degree to which v i (t ) belongs to the fuzzy set c i at time t. In most practical cases, the construction of μi ( v i (t )) is substantially simplified, and it is assumed to be Eq. (1) which is a normalization to [0, 1] interval.
c i (t ) =
v i (t ) − min( v i )
(1)
max( v i ) − min( v i )
The parameters min( v i ) and max( v i ) are the minimum and maximum values reachable by v i (t ), and they must be provided by experts or calculated on the basis of available data. In fact, using the interpretation of fuzzy sets theory, Eq. (1) calculates the degree to which the value v i (t ) belongs to the fuzzy set represented by the linguistic term ‘v i (t ) is high’. 1 (c i (t )) given by Eq. (2). The defuzzyfication is done using the inversed function μ− i
v i (t ) = c i (t ) max( v i ) − min( v i ) + min( v i )
(2)
The constructed vector C (t ) describes the state of the concepts at time t. Thus, the multivariate time series is represented by the sequence of the concept’s state vectors {C (t )} = {C (1), C (2), . . . , C (t e )}. Due to the assumed normalization, in most known practical applications, prediction errors are calculated just for {C (t )} and not for the original { V (t )} [22,23]. Thus, the errors obtained by different FCMs can be compared easily. FCM is defined as an ordered pair FCM = C , W , where C is a set of concepts and W is the connection matrix [18]. The definition proposed by Kosko [18] assumed that every concept c i = qi ∪ ¬qi , where qi is a fuzzy set and ¬q i is its complement. However, in practical applications, it is usually assumed that concepts are equivalent to fuzzy sets, i.e., c i = qi [33]. The matrix W stores the weights w i j ∈ [−1, 1] that are assigned to the pairs of concepts. The value w i j = 1 represents the full positive impact and the value w i j = −1 represents the full negative impact of the ith concept on the jth effect concept respectively. The intermediate values of weights relate to partial causality and the value w i j = 0 represents the absence of an effect between them. The FCM model is used to predict the subsequent, unknown state vector C (t ). The prediction is conducted numerically, most often by means of the nonlinear reasoning equation (3):
n
c j (t + 1) = f
w i j · c i (t ) ,
(3)
i =1,i = j
where: n = card(C ) is the cardinality of set C , and f (·) is the transformation function that introduces nonlinearity into the reasoning process. Diverse types of transformation functions can be used [22], e.g., bivalent, trivalent, or most commonly logistic as given in Eq. (4):
f (x) =
1 1 + e −λ·x
,
(4)
where: λ > 0 is the parameter that determines the gain and thus the shape of the transformation, e.g., λ = 5 [22]. Depending on the value of λ, the transformation function amplifies or inhibits the weighted sum of the concepts’ states. It also restricts the accumulated impact of the concepts into the interval [0, 1]. Recently, it was proposed that an evolutionary method be used to learn the values of λ j separately for every target concept c j [23]. To accomplish the prediction of c j (t + h) for some prediction horizon h > 1, Eq. (3) must be used iteratively. Iterative calling of the reasoning equation leads to one of the alternative types of behavior of the state vector C (t ) [34], i.e., 1) fixedpoint attractor, in which the state vector becomes fixed after some simulation steps; 2) limit cycle in which the state vector keeps cycling; or 3) chaotic attractor in which the state vector changes in a chaotic way. FCM can be learned from historical time series data. Learning is usually understood as the adjustment of matrix with the objective of minimizing prediction errors. The most effective learning algorithm, RCGA [22] creates the population of chromosomes (genotypes), each of which is a vector of weights of a candidate FCM. The populations of genotypes are evaluated iteratively by decoding them to the FCMs and then calculating the fitness function (5):
fitness =
1
+1
,
(5)
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.5 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
where
5
is the prediction error calculated by Eq. (6) [22]:
=
1
(t e − 1) · n
·
te n c i (t ) − c (t ), i
(6)
t =2 i =1
where t ∈ 1, 2, . . . , t e , t e − 1 is the number of predictions, n is the number of concepts, c i (t ) is the real value of the concept and c i (t ) is the predicted value. 2.2. The arithmetic of grey numbers A grey number is denoted as ⊗ g ∈ [ g , g ], where g g [40]. A white number is ⊗ g ∈ [ g , g ], where g = g, and a black number is ⊗ g ∈ (−∞, +∞). The length of a grey number is defined as (⊗ g ) =| g − g |. In that sense, if the length of the grey number is zero (⊗ g ) = 0, it is a white number. The arithmetic of grey numbers is, in fact, equivalent to interval arithmetic. For two grey numbers ⊗ ga and ⊗ gb the following operations are defined:
⊗ ga + ⊗ gb = [ g a + g b , g a + g b ],
(7)
⊗ ga − ⊗ gb = ⊗ ga + (− ⊗ gb ) = [ g a − g b , g a − g b ],
(8)
⊗ ga · ⊗ gb = min( g a · g b , g a · g b , g a · g b , g a · g b ), max( g · g , g · g b , g a · g , g a · g b ) , a b a b r · ⊗ g = [r · g , r · g ],
(9) (10)
where r denotes a real number.
⊗ ga = [ g a , g b ] · g b−1 , g b−1 ⊗ gb = min g a · g b−1 , g a · g b−1 , g a · g b−1 , g a · g b−1 , max g · g b−1 , g · g −1 , g a · g b−1 , g a · g −1 a a b b
(11)
The transformation of grey numbers into white numbers is called ‘whitenization’ [40]. The white value is calculated using Eq. (12).
g˙ = δ · g + (1 − δ) · g | δ ∈ [0, 1]
(12)
For δ = 0.5 the equal mean whitenization is obtained that calculates, in fact, the center of the considered interval. 2.3. Fuzzy grey cognitive maps Fuzzy Grey Cognitive Maps were proposed by Salmeron [35] as an extension and generalization of conventional FCMs. FGCMs represent approximate knowledge of concepts’ states and the causal relationships among them. Formally, an FGCM is defined as a 4-tuple:
FGCM = ⊗C , ⊗ W , f (·), η ,
(13)
where: ⊗C is the set of grey concepts, ⊗ W is the matrix of grey weights, f (·) is the transformation function, and η is the concepts’ information space (range). For the purposes of this research it was assumed that η ∈ [0, 1]. The graph of FGCM is shown in Fig. 4 in Section 4. At every time step, the state of the ith grey concept is a grey number that is denoted as ⊗c i (t ). The ⊗ W is the quadratic matrix that stores grey weights ⊗ w i j ∈ [ w i j , w i j ], where ( w i j w i j ) and w i j , w i j ∈ [−1, +1]. Reasoning is made using Eq. (14).
⊗c j (t + 1) = f
n
⊗ w i j · ⊗c i (t )
i =1,i = j
= f c j (t + 1), c j (t + 1) = f c j (t + 1) , f c j (t + 1) = c j (t + 1), c j (t + 1)
(14)
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.6 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
6
It is assumed that each concept cannot affect itself, i.e., the diagonal of matrix ⊗ W is not used. If f (·) is a sigmoid function, the inference assumes the following form (Eq. (15)).
⊗c j (t + 1) =
1+e
n
−λ·
i =1
⊗ w i j ·⊗c i (t ) −1
n − 1 , , 1 + e −λ· i=1 ⊗ w i j ·⊗ci (t )
(15)
where:
⊗ w i j · ⊗c i (t ) = min w i j · c i (t ), w i j · c i (t ), w i j · c i (t ), w i j · c i (t ) , ⊗ w i j · ⊗c i (t ) = max w i j · c i (t ), w i j · c i (t ), w i j · c i (t ), w i j · c i (t ) .
(16) (17)
2.4. Accuracy measures for the calculation of prediction errors Let ⊗ g and ⊗ g be the real and predicted grey numbers, respectively. It was shown that the difference ⊗ g − ⊗ g calculated using grey numbers (intervals) arithmetic in Eq. (8) cannot properly reflect prediction errors [8]. Following the example from [8], if we assume that ⊗ g = [−1, 3] and ⊗ g = ⊗ g, then the difference is ⊗ g − ⊗ g = [−4, 4]. The use of intervals arithmetic leads to a resulting interval that bounds all possible results obtained by considering all of the possible real numbers that lie within each of the operands. For that reason, several alternative distance measures have been proposed [10,41]. The main ones are described below using the previously introduced notation of grey numbers. Mean Distance Error based on the Ichino–Yaguchi distance is defined as shown in Eq. (18):
MDE I Y =
1 te − 1
·
te
0.5 · | g − g | + | g − g | ,
(18)
t =2
where t e − 1 is the number of predictions.
g+g
Suppose now that a grey number ⊗ g is given alternatively, by its center: g c = 2 and its radius, gr = Mean Distance Error, based on the Hausdorff distance, is defined as shown in Eq. (19):
MDE H =
1 te − 1
·
te
| gc − gc | + | gr − gr |
g−g . 2
(19)
t =2
The interpretation of MDE I Y and MDE H is clear because they account for the deviation in minima and maxima and in midpoints and radii, respectively. The MDE H is probably the most popular accuracy measure, and it is used commonly when dealing with intervals [42]. The kernel-based distance, which was proposed in [8] is defined as shown in Eq. (20):
MDE K =
1 te − 1
te 2 2 1 · √ · g (t ) − g (t ) + g (t ) − g (t ) t =2
2
(20)
There is an issue with all of the aforementioned measures, i.e., their dependency on the assumed scale. Consequently, if source data are not normalized (as is usually done for experiments using FCMs), the results obtained for different datasets are hardly comparable [43]. There is a possibility of using the scale-independent Mean Absolute Scaled Error (MASE) [44] or Root Mean Squared Scaled Error (RMSSE) [7]. However, the predictions obtained by both the MASE and the RMSSE measures are scaled with respect to the so-called ‘naive prediction,’ assuming v (t + 1) = v (t ). This property may not always be desired. However, the data processed by FCM already are normalized; therefore the use of MASE or RMSSE during the learning phase does not provide benefit. In the testing phase, the relationship between the errors generated by FGCM and the naive prediction can be shown explicitly. The comparison of ITS prediction errors, especially those generated by any standard, single-valued method, is usually made following two general approaches: 1) Independent of the prediction method (single or multi-valued), after the prediction is made, errors are calculated using specific ITS-based measures. 2) Errors are calculated using upper and lower bounds separately [45,7]. It is also worth mentioning that traditional single-valued time series represent a particular case of ITS in which the intervals are degenerated, i.e. g (t ) = g (t ). In such a case, the values of MDE H and MDE I Y are equivalent to the Mean Absolute Error of the traditional time series [10]. The selection of an ITS error measure depends upon the concept of accuracy as determined by the application domain or the requirements stated by experts [10]. In the extant literature, very different measures have been used for the benchmarking of ITS prediction methods, i.e., MDE H [10,11], RMSE [45], and RMSSE [7]. A detailed discussion of the pros and cons of different error measures used for interval-valued time series was presented in [10,44,8]. For the purpose of this research, the aforementioned measures, i.e., MDE I Y , MDE H , MDE K were used. 3. Evolutionary learning of FGCM To obtain efficient predictions, an evolutionary algorithm is proposed for learning FGCMs on the basis of historical ITS.
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.7 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
7
3.1. Source data In accordance with the previous notation, V is the set of single-valued variables that are observed over time. Assume that the observations were made continuously or with high temporal resolution. Vector V (τ ) is observed at τ ∈ , τ ∈ [0, τe ], where τe ∈ is the time horizon, the parameter that is dependent on the availability of data. The sequence of the observations constitute a multivariate single-valued time series { V (τ )}. To introduce a discrete time scale, the interval [0, τe ] is partitioned into t e ∈ ℵ, τe t e equal subintervals [0, 1], (1, 2], . . . , (t e − 1, t e ] indexed by t ∈ ℵ, where the signs ‘(’ and ‘]’ denote open and closed bounds of the intervals respectively. Thus, a new discrete time scale is introduced with the time variable t ∈ [1, t e ] and parameter t e as the time horizon. Assume that the observations v (τ ) of every v ∈ V are changing significantly with regard to τ and that only its minimum v min and maximum v max within a period of time τ ∈ (t − 1, t ] are considered. The time scale of t, the resolution of which is given by experts, typically is much coarser than the frequency of observations recorded in the original time scale of τ . Let min( v (τ )), τ ∈ (t − 1, t ] be the lower bound of the grey number ⊗ g (t ) observed at time t, i.e., g (t ) = min( v (τ )). Similarly g (t ) = max( v (τ )), τ ∈ (t − 1, t ]. Let ⊗G be the set of all grey variables considered, a counterpart of V that contains single-valued variables. The value of every grey variable ⊗ g i ∈ ⊗G at time t is: ⊗ g i (t ) = [ g (t ), g i (t )]. Thus, the univariate i interval-valued time series {⊗ g i (t )} is the sequence of grey numbers, where: ⊗ g i (t ), i ∈ [1, card(⊗G )] is observed over time t = {1, 2, . . . , t e }. Considering all grey variables from ⊗G, the vector of grey numbers ⊗G (t ) is observed at any t and the multivariate ITS assumes the form: {⊗G (t )} = {⊗G (1), ⊗G (2), . . . , ⊗G (n)}. To be processed by FGCM, the ITS = {⊗G (t )} should be fuzzyfied. Fuzzyfication is conducted separately for every grey concept, i.e., ⊗c i (t ) = μ(⊗ g i (t )). The calculation is made for the lower bound, c i (t ) = μ( g (t )), and for the upper bound, i c i (t ) = μ( g i (t )) of the concept. Fuzzyfication (normalization) (Eq. 1) with respect to the interval [ g , g max ] is conducted, i.e., c i (t ) =
g (t )− g
min
g max − g
min
, c i (t ) =
g (t )− g
min
min
g max − g
, where g
min
= min( g (t )), g max = max( g (t )) for the entire time horizon t ∈ [1, t e ]. The min
constants g , g max are the parameters of the fuzzyfication function. min It is worth mentioning that fuzzyfication overcomes the problem with scale-dependent error measures described in Section 2.4. The prediction errors are calculated for the fuzzyfied values, and thus, they are independent of the scale. To summarize, two preprocessing steps of raw data {⊗ V (t )} must be performed: 1. Construction of ITS = {⊗G (t )}, 2. Fuzzyfication of ITS, i.e., ⊗C (t ) = μ(⊗G (t )). All further processing is done by FGCM using multivariate ITS, i.e., {⊗C (t )} = {⊗C (1), ⊗C (2), . . . , ⊗C (t e )} which is in fact the sequence of the exemplary concepts’ states of FGCM. 3.2. Evolutionary algorithm An evolutionary approach was selected for FGCMs because it was the most efficient for standard FCMs. The goal of the evolutionary learning process is to optimize FGCMs with respect to the prediction errors that are produced. First, the matrix ⊗ W must be learned. Second, separate transformation functions were assigned to every target concept c j , and the corresponding parameters λ j were involved in the evolutionary optimization. The proposed algorithm relies on the template of a genetic algorithm (Algorithm 1). The novel approach that we introduced was its adaptation to interval-valued arithmetic required by the source data and by the grey type of reasoning performed by the FGCM. Algorithm 1. Evolutionary Learning of FGCM. Input: The learning sequence: {⊗C (1), ⊗C (2), . . . , ⊗C (t e )} Output: Optimized matrix: ⊗ W , optimized gains: λi Initialize randomly the first population H i , i = 1 of chromosomes ; While(stopping-criterion is not satisfied) { Evaluation( H i ) ; H i +1 ← Selection( H i ) ; Mutation( H i +1 ) ; Crossover( H i +1 ) ; i←i+1 ; } return hbest ∈ H i – the chromosome with the highest fitness value ;
As the template shows, it is necessary to define several key elements of the algorithm. The first element is the population of chromosomes H i , in which every chromosome represents a single possible solution of the problem, i.e., it encodes the so-called ‘candidate FGCM,’ which is later evaluated with respect to the prediction errors that are generated.
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.8 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
8
Chromosome. Every chromosome h i is assumed to consist of the following two parts: 1) Grey part. The first part includes the vector of grey numbers coming from the matrix ⊗ W of the candidate FGCM. Subsequent rows of the matrix ⊗ W are placed linearly one after the other into the vector of the chromosome. The elements on the diagonal of the matrix ⊗ W are omitted because they do not take part in reasoning performed by Eq. (14). The length of the grey part of the chromosome is n2 − n, where n = card(⊗C ). 2) Single-valued part. This is the second part of the genotype, and it includes the vector of single-valued λ j , i.e., the gains of transformation functions assigned to concepts. The length of this part of the chromosome is n = card(⊗C ). The length of the entire chromosome, i.e., the number of genes it has, is n2 . Evaluation of chromosomes. Proper evaluation of every chromosome is the key to obtaining efficient FGCM. For this purpose a fitness function (5) is used in combination with the interval-based error measure assigned to . The calculation of prediction error is performed as an average over multiple concepts (Eq. (21)).
=
n 1
n
·
ERR ⊗c i (t ), ⊗c i (t ) ,
(21)
i =1
where: n = card(⊗C ), ⊗c i (t ) is the state of the grey concept taken from the learning sequence, ⊗c i (t ) is the state of the grey concept predicted by FGCM, and ERR stands for interval-based distance (Ichino–Yaguchi, Hausdorff or Kernel-based). As stated in Section 2.4, the selection of an accuracy measure depends on the application domain and the requirements of experts. During the experiments, we tested all three of the interval-based error measures identified in Section 2.4. Selection. During the selection process, a new population of chromosomes was produced. For the purpose of this paper, the elite selection [46,47] was used. The group (so-called ‘elite’) of the best chromosomes was selected and copied to the next population. The cardinality of the elite was the parameter of the experiments. Afterwards, the newlycreated population was supplemented using the operators of mutation and crossover. Mutation and crossover. To supplement the population, the offspring of the elite chromosomes are produced using standard probabilistic mutation and one-point crossover. The probabilities of mutation and crossover are the parameters. Mutation and crossover are performed for the first grey part and the second single-values part of the chromosome. In the case of mutation, the randomly (with uniform distribution) selected gene of the chromosome is replaced by the new, grey or single-valued number. For grey numbers, to retain the order of their bounds, first the lower bound is generated as h j = rand(−1, 1), and the upper bound is obtained as h j = rand(h j , 1). Here, the function rand(a, b) denotes the generator of real numbers with uniform distribution in the interval [a, b]. Stopping-criterion. The stopping criterion of the algorithm is determined through the identification of the convergence of the evolution. The algorithm stops when the value of fitness of the best individual during the previous i prev iterations does not increase. If the process might not converge, the second condition related to the number of iterations is checked, i.e., the algorithm stops after reaching the maximum number of iterations imax . The other parameters assigned during the experiments were i prev and imax . The results of the algorithm are the matrix ⊗ W and the vector of gains of the transformation functions λi , both encoded within the chromosome that had the highest fitness value. The best chromosome is decoded into FGCM, which is then ready to be tested and used as a predictive model. 4. Experiments Addressing the problem of predicting the weather was used to validate the proposed evolutionary FGCM. The following plan of validation was used:
• • • • •
Preprocessing and statistical analysis of the source data Setting up the main parameters for the experiments Evaluation of in-sample and out-of-sample prediction errors A comparative experiment of the FGCM-based predictions with traditional forecasting methods Meteorological interpretation of the results
4.1. Data and their statistical properties Real, publicly-available, quality-controlled, climatological data from the John F. Kennedy Airport [48] were used for learning and testing of the proposed evolutionary FGCM. Five of the most important variables from the meteorological perspective were selected for experimentation (Table 2). Thus, the considered FGCM relies on only five measurement variables that correspond to five grey concepts. This is a limitation of the model; however, such simple models can be used, e.g., by so-called ‘personal weather stations,’ which are weather-forecasting devices intended for home use. Due to commercial restrictions,
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.9 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
9
Table 2 Meteorological variables. vi
Description
v1
Dry Bulb [Celsius] The temperature of air measured by a thermometer freely exposed to the air, shielded from radiation and moisture. Dew Point [Celsius] The temperature below which the water vapor in a volume of humid air at a constant barometric pressure will condense into liquid. Relative humidity Wind speed Station pressure
v2 v3 v4 v5
Table 3 ADF test for the yearly ITS. Variable
⊗ g1 ⊗ g2 ⊗ g3 ⊗ g4 ⊗ g5
Dry Bulb(temperature) Dew Point Relative Humidity Wind Speed Station Pressure
ADF test/p-value Lower bound
Upper bound
−0.7541/0.9652 −1.5218/0.7788 −5.0816/0.01 −6.1284/0.01 −4.8007/0.01
−0.865/0.9553 −1.9524/0.597 −6.1265/0.01 −5.2422/0.01 −5.545/0.01
we cannot provide references to a specific product. Using the proposed general FGCM-based methodology, the simple climatological model can be scaled up in the future, e.g., by incorporating a larger number of meteorological concepts. The selection of learning and testing data while building models for weather prediction is a complex, unresolved problem, and it is a subject of ongoing interdisciplinary research [49,17]. For that reason, for the purpose of the experiments performed in this paper, we decided to limit our investigations to data from the year 2012. The preprocessing of data was performed in accordance with the methodology described in Section 3.1. In the first step, the hourly data of every day were preprocessed to calculate daily minima assigned to g and daily maxima assigned to g i . i Thus, grey values ⊗ g i were obtained, and the ITS = {⊗G (t )} was constructed. In the second step of data preprocessing, for every month of the year, fuzzyfication was performed. The parameters g , g max of the fuzzyfication function were assigned as monthly minima and maxima of the corresponding grey variables, min respectively. In general, those monthly minima and maxima can be provided by experts on the basis of many years of data. However, it was decided to follow a different approach. First, monthly minima and maxima were calculated separately for every month using historical data from 2012, then the ITS was fuzzyfied, and further experiments were conducted, as described in Section 4.3. In the second trial, both the monthly minima and the monthly maxima were increased by 20%. Afterward, the fuzzyfication and the experiments with predictions were repeated. The prediction errors obtained for both trials differed by less than 10−2 . Thus, the prediction errors were not sensitive to the values of the parameters g and min g max that were used during fuzzyfication. For this reason, it was decided to assign the monthly minima and maxima to them that were calculated using the historical data from 2012, i.e., those data that were used for the first of the described trials. The stationarity of time series is one of the main factors that determine the applicability of many predictive models. The conventional approach to time series analysis in climatology presumes that the climate is stationary [49]. However, the analysis of meteorological data using statistical tests showed that the data can be nonstationary [7]. To examine this issue for the data used in our experiments, several statistical tests were performed using the R software package [50]. The augmented Dickey Fuller (ADF) test [51] was performed on both upper and lower bounds of the intervals, separately for every considered variable. The results for 2012 and the significance level of 0.05 (5%) are shown in Table 3. As noted, the first two series for ⊗ g 1 and ⊗ g 2 are nonstationary with respect to their bounds. The series for ⊗ g 3 , ⊗ g 4 , ⊗ g 5 are stationary. Thus, taking into account the one-year time horizon, the considered multivariate time series was based on a mixture of nonstationary and stationary types of data. In addition to the ADF test, the lower and upper bounds of the ITS were decomposed assuming an additive prediction model. The exemplary result for the temperature series {⊗ g 1 } is shown in Fig. 1. The decomposition recognized that the series contains both a trend and a seasonal component. To test the stationarity of the ITS for periods shorter than a year, the ADF test was performed separately for all quarters and months of 2012. This time, assuming the significance level 0.05 (5%), the nonstationarity hypothesis for all variables and bounds was confirmed. Tests for a series shorter than one month were not performed because such a series would have been less representative from a statistical perspective. Comparing the results of the ADF test for the entire year and those for separate quarters and months, it was recognized that the nonstationarity of the considered meteorological multivariate ITS increases for shorter periods of time. When analyzing a multivariate model, it is also important to determine whether time series are co-integrated and thus share an underlying stochastic trend [52]. To test co-integration, the multivariate type (five variables) of Johansen test was performed [53] separately for the lower and upper bounds of all variables.
JID:IJA AID:7701 /FLA
10
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.10 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
Fig. 1. Decomposition of the temperature ITS.
Table 4 Johansen test. Test/critical value r4 r3 r2 r1 r=0
Lower bound
Upper bound
7.57/8.18 57.22/14.90 100.51/21.07 117.29/27.14 141.58/33.32
8.12/8.18 55.75/14.90 91.61/21.07 108.85/27.14 173.29/33.32
Table 4 provides the results obtained for 2012, and the results confirm the null hypothesis of the existence of cointegration vectors. The hypothesis r 4 (where r denotes the number of co-integration vectors) is rejected because the test value was less than the corresponding critical value. The significance level of 0.05 (5%) was assumed. Similarly, as for the ADF tests, the testing periods were shortened. The co-integration test was performed separately for every quarter and month of 2012, and the results indicated the existence of a single co-integration vector in all cases. A decreasing co-integration between series while shortening the length of the considered time series was recognized. The statistical analysis of the source data indicated that the lengths of learning and testing horizons were important factors that may influence the performance of the model. 4.2. Experimental setup In most of the existing publications on the prediction of ITS, the data were arbitrary split into two disjointed sets. The first sample was used for fitting the model to the data; the second sample was used for the calculation of errors. As a result, the results that were obtained are hardly comparable with each other because of the fixed selection of learning and testing periods within the time series and the fixed cardinalities of both samples. The other approach is to perform repeated learning and testing trials using the idea of a sliding window (also called a moving or rolling window). To the best of our knowledge, the sliding-window approach was used for the prediction of ITS only in [9] and [54]; however, the lengths of learning and testing windows (related to the cardinalities of learning and testing samples) and their influence on the final results have not been investigated. We decided to use the sliding window in our research. Also, we investigated the influence of its selection on the performance of the FGCM. Let us denote the sliding window W as a subsequence of ITS starting at time start ( W ) and finishing at end( W ) with the length( W ) = end( W ) − start ( W ) + 1. The sliding window moves forward within the ITS starting from day start ( W ) = 1 to the last possible starting day within the considered ITS, i.e., start ( W ) = length( I T S ) − length( W ) + 1, where the length( I T S ) denotes the number of time steps. In the example given in Fig. 2, the sliding window is split into two overlapping parts, i.e., the learning window W L and the testing window W T , following one after the other with start( W ) = 2, end( W ) = 7 and length( W ) = 6. The last prediction within W L is made for the day end( W L ). The number of predictions in W L is length( W L ) − 1. The data of the last day end( W L ) of the learning window are taken as the input for the first prediction performed within the testing window
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.11 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
11
Fig. 2. Example of a sliding window.
Table 5 Means of the in-sample errors. Length of learning/testing window [days]
MDE I Y MDE H MDE K
2
3
4
5
6
7
30
0.0039 0.0040 0.0045
0.0344 0.0529 0.0419
0.0522 0.0744 0.0598
0.0580 0.0826 0.0656
0.0659 0.0927 0.0726
0.0674 0.0946 0.0749
0.1106 0.1532 0.1253
that starts at start( W T ) = end( W L ). Thus, the first testing prediction is made for the day start( W T ) + 1, and the number of testing predictions within W T is equal to length( W T ) − 1. Due to the overlapping of W L and W T within W , we calculate length( W ) = length( W L ) + length( W T ) − 1. The other parameters used for the evolutionary-learning algorithm were assigned in accordance with those used by previous researchers [46]. They also were validated for our purposes by performing numerous trials. Similar to the work reported in [22–25], high values of mutation and cross-over probability were assumed due to the large number of sub-optimal solutions. By doing so, the algorithm avoids getting stuck in any sub-optimal solution due to the high degree of exploration. Similarly, as in [47], the elite count was assigned as 20% of the entire population. After numerous trials, it became clear that it was impossible to improve the fitness of the best chromosome after detecting the convergence of the evolution during i prev = 200 generations, so it was unnecessary to iterate the evolution further. Besides that, for the population of 500 chromosomes, the algorithm always converged before 10,000 generations independent of the type of experiment. The parameters of the evolutionary algorithm were as follows: cardinality of the population card( H i ) = 500; elite count 0.2 · card( H i ) = 100; probability of mutation = 0.6; probability of crossover = 0.9; i prev = 200 – parameter of the stopping condition; i max = 10,000 – maximal number of generations; and accuracy of calculations (rounding) = 10−12 . 4.3. Estimation of prediction errors First, we decided to check the approximation capabilities of the proposed evolutionary FGCMs by the calculation of in-sample errors. For that purpose, the data were assumed to be the same for learning and testing, i.e., the sliding window W is not split and W L = W T . Let us remind the reader here that the FGCM is a nonlinear model, with the nonlinearity introduced by the logistic transformation function used during the reasoning process. In addition, the shape of the transformation function changes individually for every jth concept according to the value of the parameter λ j . Therefore, unlike in linear models [55], determining the minimal reasonable cardinality of a learning sample is a complex problem. For this reason, we decided to set the minimal length of the sliding window as length( W L ) = 2 and then increase it in a stepwise manner to check its influence on prediction errors. The sliding window was moved through the entire ITS of 2012, starting from start ( W ) = 1 January 2012. The means of the in-sample errors are given in Table 5. Standard deviations for all measurements were in the range of [0.01, 0.05]. As can be noticed, independent of the accuracy measure, the errors that occurred were low, but they increased as length( W ). For the range of 30 < length( W ) 90, the errors increased less than 0.05; therefore, we did not present them. However, it was recognized that the in-sample prediction errors generated by the FGCM increase substantially as the period being considered increased up to 30 days, after which they continued to grow but at a much smaller rate. To calculate out-of-sample errors, learning and testing windows were separated, as was done in the example given in Fig. 2. The sliding window was moved through the entire period of the year 2012. First, the experiment indicated that the FGCM model that was considered was the most efficient for short-term prediction horizons. Independent of the accuracy measure and the length of the learning window, the mean errors increases as the length of the testing window increased. This effect was interpreted to mean that the learned model becomes inadequate for the testing data.
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.12 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
12
Table 6 Means of the out-of-sample errors for length( W L ) = 9. Testing window [days]
MDE I Y MDE H MDE K
2
3
4
5
0.1327 (0.0612) 0.1887 (0.0951) 0.1487 (0.0706)
0.1364 (0.0471) 0.1959 (0.0752) 0.1538 (0.0557)
0.1382 (0.0410) 0.1984 (0.0672) 0.1535 (0.0488)
0.1395 (0.0373) 0.2002 (0.0613) 0.1553 (0.0450)
Table 7 Monthly prediction errors for 2012.
January February March April May June July August September October November December
MDE I Y
MDE H
0.1556 (0.0792) 0.1412 (0.0755) 0.1385 (0.0636) 0.1263 (0.0599) 0.1203 (0.0586) 0.1068 (0.0552) 0.0965 (0.0474) 0.0919 (0.0422) 0.1198 (0.0554) 0.1310 (0.0598) 0.1379 (0.0732) 0.1432 (0.0746)
0.2066 0.1986 0.1951 0.1803 0.1749 0.1674 0.1566 0.1572 0.1713 0.1865 0.1950 0.1981
MDE K (0.1151) (0.0991) (0.0988) (0.0983) (0.0852) (0.0859) (0.0831) (0.0784) (0.0782) (0.0989) (0.1017) (0.1082)
0.1702 0.1626 0.1503 0.1467 0.1300 0.1213 0.1144 0.1156 0.1217 0.1382 0.1629 0.1683
(0.0801) (0.0796) (0.0817) (0.0767) (0.0704) (0.0633) (0.0673) (0.0632) (0.0636) (0.0679) (0.0775) (0.0854)
Second, it was observed that the prediction errors decreased as the length of the learning window increased, but they started to increase slowly after a certain minimum for length( W L ) = 9 was reached. However, in the range of 9 length( W L ) 90, the prediction errors increased less than 0.01, which was independent of the length of the testing window and the accuracy measurement. Based on the experimental results, we determined that the minimum length of the learning window should be set to length( W L ) 9, and the values of the prediction errors for length( W L ) = 9 are presented in Table 6. In the case of a short learning period, the FGCM is not specific enough to cover the regularity encountered in testing data very well. While the learning sample grows, the FGCM acquires more knowledge, which makes it more capable of making accurate predictions. However, the FGCM cannot fit the data better after the length of the learning period is increased further, and this leads to an increase in prediction errors during testing. In addition to calculating the mean errors for the entire year, we decided to check the efficiency of the predictions with respect to every month of 2012. The parameters were set as length( W L ) = 9 and length( W T ) = 2, corresponding to the minimal mean errors obtained for the entire year. Table 7 shows that the FGCM achieved the best efficiency during the hot summer months of July and August and that the predictions for January and December were not as good. Thus, we concluded that the FGCM performs better during the absence of a trend and when there is less variability in the data, both of which occurred during the summer months. 4.4. A comparative study As already mentioned, most of the results that were available for univariate ITS were presented using scale-dependent accuracy measures or fixed learning and testing periods. To the best of our knowledge, multivariate prediction of ITS was considered for the first time in this paper. Therefore, following the approach presented in [9], we only compared the performance of the proposed evolutionary FGCM with the classic models, i.e., those based on independent forecasts of the lower and upper bounds of ITS. The naive prediction, seasonal ARIMA, vector auto regression (VAR), and exponential smoothing (ES) models were selected for the comparative experiments. This selection is justified as follows:
• The naive approach is often used as a reference method for time series prediction [8]. It assumes that every prediction is the same as its previously observed value, i.e., in the case of ITS, ∀i . ⊗ c j (t + 1) = ⊗c j (t ). • The seasonal ARIMA model was reported in the literature as the best for univariate meteorological time series model. The seasonal ARIMA model is a linear model dedicated to the prediction of nonstationary time series. A known obstacle for using the ARIMA model is the selection of its parameters; an overview of numerous publications that address that problem is given in [56]. • The linear, multivariate VAR model was selected for our experiments because, as reported in [57], the regression models can exhibit better performance than then ARIMA model samples with limited data, which was the case in which our evolutionary FGCM demonstrated its best performance. • The exponential smoothing model uses its parameters to local part of the time series. It is usually an effective method for the short-term prediction of time series [58,9].
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.13 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
13
Table 8 Prediction errors for length( W L ) = 30 and length( W T ) = 2. Method
MDE I Y
naive ARIMA VAR ES FGCM
0.1606 0.1582 0.1423 0.1407 0.1373
MDE H (0.0692) (0.0399) (0.0575) (0.0582) (0.0628)
0.2290 0.2156 0.2023 0.1965 0.1922
MDE K (0.1087) (0.0526) (0.0772) (0.0701) (0.0949)
0.1798 0.1705 0.1603 0.1573 0.1528
(0.0811) (0.0433) (0.0643) (0.0627) (0.0719)
Table 9 DM test for length( W L ) = 30 and length( W T ) = 2.
FGCM/naive FGCM/ARIMA FGCM/VAR FGCM/ES
MDE I Y
MDE H
MDE K
−5.1941 (1) −4.7273 (1) −1.5536 (0.9391) −0.3434 (0.6342)
−6.1418 (1) −4.625 (1) −1.3057 (0.9034) −0.3057 (0.6199)
−7.4937 (1) −6.2510 (1) −2.5106 (0.9936) −1.4516 (0.9259)
The first issue with the comparative experiments was setting the parameters of all of the models in a such a way that they could achieve their best performance for the data that we were considering. Obviously, selecting the parameters for all these models was a complex issue [59]. This was especially true with respect to meteorological data for which the problem of selecting a proper learning horizon is the subject of ongoing research [49,17]. Thus, certain common assumptions had to be made for all of the models in order to compare their performances. For the proposed FGCM model, the adjustment of parameters was, in fact, made by the proposed application of an evolutionary algorithm. For the learning of the seasonal ARIMA, VAR, and ES models, we decided to use the R statistical software package [50]. For fitting the ARIMA ( p , d, q) model to the data and adjusting the parameters automatically, we used the function ‘auto.arima’ from the package ‘forecast.’ The parameters p, d, and q are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model, respectively. In case seasonality is detected, the model is extended and denoted as SARIMA( p , d, q)( P , D , Q ), with the parameters P , D, Q corresponding to the seasonal parts of the model. For the VAR model, the function ‘VAR’ from the ‘svar’ package was used. The exponential smoothing model E S ( E , T , S )(α , β, γ ) was learned using the ‘ets’ function from the ‘forecast’ package with the automatic adjustment of parameters. The first letter E denotes the error type that can exist, i.e., “A” or “M”; the second letter T denotes the trend type that can exist, i.e., “N,” “A,” or “M;” and the third letter S denotes the season type that can exist, i.e., “N,” “A,” or “M.” For all types, the notation was as follows: “N” = none, “A” = additive, and “M” = multiplicative. The smoothing parameters α , β, γ refer to the estimated level, slope, and seasonal effect, respectively. If the smoothing parameter is near 0, the smoothing is high; if it is near 1, there is little smoothing. ITS-based accuracy measures were selected for testing all of the models. Independent of the model, every forecast produced within the testing window was stored and then used for further calculations of ITS-based errors, their means, standard deviations, and testing statistical hypotheses. Besides the calculation of mean errors and their standard deviations, a Diebold–Mariano (DM) statistical test [60] was performed to compare the models. Following the approach presented in [13,54], the DM test was performed assuming a comparison of interval-based error measures. The null hypothesis of the DM test, H 0 : Model 1/Model 2, was that Model 1 is more accurate than Model 2, taking into account the series of the generated prediction errors. The alternative hypothesis was that Model 2 is better than Model 1. The series of prediction errors generated by the models were used as the parameters of the ‘dm.test’ function, which is available in the ‘forecast’ library of the R package. Due to the requirements imposed by the ARIMA and VAR models regarding the reasonable length of the learning period,the length of the learning window was set initially as length( W L ) = 30 for the first comparative experiment. As was described in Section 4.3, such a setting ensured that the performance of the FGCM was very close to its best, but not optimal. As described in Section 4.3, the performance of the FGCM model decreased as the length of the testing window increased during the testing period. Since the main goal of the experiment was to identify any competitive model that might be better than the FGCM, the length of the testing window was initially set to the shortest possible length( W T ) = 2, which was the best for the FGCM. To compare all of the models for different learning and testing horizons, the lengths of the corresponding windows were increased in additional experiments. The sliding window was moved through the entire period of 2012. The mean errors and their standard deviations were calculated based on the stored predictions, and Table 8 presents the results. As can be noticed, taking into account the obtained mean errors, the evolutionary FGCM outperformed all of the other methods. However, the standard deviation of the errors generated by FGCM was high, and only the naive method had a higher standard deviation. For that reason, the DM test was performed. Table 9 presents the values obtained by the test and the corresponding p-values (in brackets). The DM test confirmed the superiority of the FGCM method over the naive, ARIMA,
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.14 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
14
Table 10 Parameters selected for the ARIMA and ETS models, for length( W L ) = 30. Method
Bound
⊗c 1
⊗c 2
⊗c 3
⊗c 4
⊗c 5
ARIMA ARIMA ES ES
lower upper lower upper
(1, 0, 0) (1, 0, 0)
(2, 0, 0) (0, 1, 1)
(0, 0, 1) (0, 1, 1)
(1, 0, 1) (1, 0, 0)
(1, 0, 1) (1, 0, 0)
(A, N, N) (A, N, N)
(A, N, N) (A, N, N)
(M, M, N) (M, M, N)
(A, A, N) (A, A, N)
(A, A, N) (A, A, N)
Table 11 DM test for length( W L ) = 90 and length( W T ) = 4. MDE I Y
−3.1670 2.7388 −4.3261 −0.2131
FGCM/naive FGCM/ARIMA FGCM/VAR FGCM/ES
MDE H
MDE K
−3.2632 4.4108 −4.1463 −0.3832
(0.9991) (0.0033) (1) (0.5192)
−3.5960 7.1973 −4.6271 −0.2899
(0.9994) (0.0008) (1) (0.6481)
(0.9998) (0.000) (1) (0.6139)
Table 12 ARIMA and ES models, length( W L ) = 150. Method
Bound
⊗c 1
⊗c 2
⊗c 3
⊗c 4
⊗c 5
ARIMA ARIMA ES ES
lower upper lower upper
(2, 1, 1) (0, 1, 2)
(2, 1, 3) (2, 2, 1)
(1, 1, 2) (1, 0, 1)
(1, 1, 2) (1, 1, 1)
(2, 1, 1) (2, 1, 2)
(A, N, N) (A, N, N)
(A, N, N) (A, N, N)
(M, M, N) (M, M, N)
(A, A, N) (A, A, N)
(A, A, N) (A, A, N)
and VAR methods. However, as shown in Table 9, the efficiencies of FGCM and ES were very close to each other, i.e., the p-values of the DM test for the MDE I Y and MDE H error measures are low. Depending on the location of the learning window within the time series and the variable considered, the functions ‘auto.arima’ and ‘ets’ provided different values of the parameters. Table 10 provides an overview of the values that were detected, for example, for the month of April 2012. As can be noted, the seasonal component was not detected by any method. In the case of ES, the smoothing parameters were high (> 0.9) for ⊗c 1 and they were low ⊗c 2 and low (< 0.1) for the rest of the variables. This was independent of the model. The experiment was repeated while gradually increasing the length of the testing window up to length( W T ) 7. In this case, the performance of all of the methods decreased; however, taking into account both the mean errors and the DM test, the relationship of the FGCM to the other methods was retained. As the learning horizon was increased and independent of the accuracy measure, the differences between the mean errors generated by all methods were recognized at the third decimal place, and all cases had high standard deviations. This precluded a comparative analysis of the models on the basis of the mean errors and their standard deviation. For this reason, only the DM test was used. Increasing the length of learning window further caused the performance of the VAR model to decrease substantially, and its performance was even worse than that of the naive method. However, the ARIMA method became substantially better. The errors generated by the FGCM method, as stated in Section 4.3, were essentially unaffected. Also the errors produced by the ES method were essentially unaffected. After reaching length( W L ) = 90 and length( W T ) = 4, the ARIMA model outperformed all of the other methods. Table 11 provides the results of the DM test for that case. The ARIMA model also was the best for the longer testing windows length( W T ) 4. For length( W L ) = 120, length( W L ) = 150 and length( W L ) = 180, the efficiency FGCM and ES were both better then ARIMA but, this was true only for the prediction horizons length( W T ) = 2 and length( W T ) = 3. For longer prediction horizons, i.e., 4 length( W T ), the ARIMA model outperformed all of the other models. Table 12 shows the models selected for ARIMA and ES for the learning window with start( W L ) = 1 January 2012 and length( W L ) = 150. None of those functions detected the seasonal component within the data. As can be noticed, the number of lagged variables of all ARIMA models increased in comparison to those given in Table 10. The models detected by the ‘ets’ function remained the same as in Table 10, but the smoothing parameters changed assuming α ∈ [0.2, 0.4] for ⊗c 1 , ⊗c 5 , and ⊗c 5 , ⊗c 5 and α > 0.9 for ⊗c 5 , independent of the model and bound of the interval. The experiment confirmed that the FGCM model performs best for short prediction periods. Being very close to the efficiency of the ES method it outperformed the other single valued methods in the case of up to three predictions ahead. 4.5. Domain-specific interpretation of the FGCM To make intuitive evaluation of the results that were obtained, it was decided to defuzzyfy the predictions separately for every bound of the grey values. As an example, for the month of April, the comparison of real and predicted values of
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.15 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
15
Fig. 3. Prediction of air temperature.
Fig. 4. FGCM for April 2012.
lower and upper bounds of ⊗c 1 are shown in Fig. 3. The learning and testing windows were assumed to be the best for the FGCM, i.e., length( W L ) = 9 and length( W T ) = 2. Intuitively, temperature predictions with errors in the range of 1 to 2 Celsius degrees would be regarded as acceptable by most users of the proposed prediction system. In contrast to black-box models, one of the advantages of standard FCMs and FGCMs is the possibility of interpreting them. Therefore, it was decided to prepare a graphical representation of the month-specific FGCMs and check their interpretation. Fig. 4 shows a graph that was generated for the month of April 2012. For the sake of simplicity, only the weights are shown, which, after equal mean whitenization, i.e., using Eq. (12) with δ = 0.5, assumed absolute values | w˙i j | 0.4. In this way, a certain trade-off between complexity and interpretability was obtained. After decreasing this parameter, the number of weighted edges was too high to be easily interpretable, and the increase of the parameter led to a trivial, degenerated graph that did not provide enough useful information. To improve the readability, the lower and upper bounds of the weights of the FGCM shown in Fig. 4 were rounded to 10−2 . It can be easily noticed that, with the exception of w 54 , the values of the weights were very high or very low, and the radii of the grey weights were very narrow. Fig. 4 shows that, with the exception of w 54 , the dependencies that were determined had a high degree of certainty. Thus, we decided to verify them with respect to available meteorological knowledge. The result were as follows: ⊗ w 13 – the negative dependency between dry-bulb temperature and relative humidity was confirmed, and related calculations can be made using [61]; ⊗ w 23 , ⊗ w 32 – the positive dependency between dew point and relative humidity was confirmed in the tables given in [62]; ⊗ w 24 – the negative dependency between dew point and wind speed also was confirmed [63]; ⊗ w 51 – the positive dependency between air pressure and dry-bulb temperature was confirmed and explained in [64]; ⊗ w 53 – the positive dependency between air pressure and dry-bulb temperature was confirmed and explained in [64]; ⊗ w 54 – the dependency between pressure and wind speed is a complex problem, and the dependency that was identified was not confirmed. The interpretation of other dependencies, i.e., those that were filtered out and are not shown in Fig. 4, is obviously harder and requires further investigation; however, the high degree of accordance between the strongest dependencies identified
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.16 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
16
by FGCM and meteorological information confirmed the interpretability of FGCM. It is worth mentioning that for different periods of time of the climatological ITS, different FGCM models were obtained, taking into account the values of weights, i.e., dependencies among concepts. In a case in which the testing window directly follows the learning window, the learned FGCM can be used most effectively. However, every learned FGCM represents only partial knowledge, and, thus, the FGCM does not generalize all available climatological data. As was determined, the effectiveness of the short-term predictions demonstrated by FGCMs decreases as the length of the learning period increases. For the aforementioned reasons, the discovered FGCMs, even if reasonable and interpretable by experts, cannot be regarded as a complete, global meteorological models. They are usable only in restricted temporal horizons. The challenge and open question for future research is the investigation of similarities between different partial FGCM-based models and the possibility of producing an effective global climatological model. 4.6. Final remarks on the experimental results The results of the experiments we performed are summarized as follows:
• The nonstationarity of the considered meteorological multivariate ITS increases as the period of time that is considered decreases. However, as the period of time decreases, the co-integration within ITS also decreases.
• The application of FGCM is limited to short prediction horizons. The learned model becomes inadequate when the length of the testing period is increased.
• The FGCM performs better during the absence of trends and when the variability of the data is low, e.g., during the summer months.
• The comparative experiments indicated that FGCM overcomes naive, VAR, and ARIMA models for prediction horizons up to three days. Its effectiveness for such short-term predictions is very close to that of exponential smoothing. For longer prediction horizons, ARIMA outperforms FGCM. • The FGCM model that was developed can be interpreted easily and is compatible with existing meteorological knowledge. 5. Conclusions This paper addressed the problem of forecasting multivariate, interval-valued time series. To solve this problem, the use of fuzzy grey cognitive maps was proposed because they are an easy-to-interpret knowledge representation tool. A genetic algorithm was adapted to interval-valued arithmetic and used to learn FGCM. To evaluate the methodology and the evolutionary learned FGCM model, a weather-prediction problem was investigated. Real, publicly-available, climatological data were used. It was found that the proposed evolutionary, learned FGCM is a tool that can be applied for the effective prediction of multivariate interval-valued time series. The results of our experiments provided evidence of the satisfactory effectiveness of the method. Several directions for further research emerged during our study, including the problem of similarity between different FGCMs related to the effectiveness of prediction. Domain-specific research also is necessary to enhance the simple meteorological model that we have proposed; thus, the motivation for interdisciplinary research is enhanced. References [1] L. He, C. Hu, Impacts of interval computing on stock market variability forecasting, Comput. Econ. 33 (3) (2009) 263–276. [2] Y.H.A. Han, S. Wan, Autoregressive conditional models for interval-valued time series data, in: The 3rd International Conference on Singular Spectrum Analysis and Its Applications, 2012, p. 27. [3] R.E. Moore, Interval Analysis, Prentice-Hall Series in Automatic Computation, Prentice-Hall, Englewood Cliffs, NJ, 1966. [4] L. Zadeh, Fuzzy sets, Inf. Control 8 (1965) 338–353. [5] J. Deng, Introduction to grey system theory, J. Grey Syst. 1 (1989) 1–24. [6] P.M. Rodrigues, N. Salish, Modeling and forecasting interval time series with threshold models: An application to s&p500 index returns, Working Papers w201128, Banco de Portugal, Economics and Research Department, 2011. [7] C. García-Ascanio, C. Mate, Electric power demand forecasting using interval time series: A comparison between var and imlp, Energy Policy 38 (2) (2010) 715–725. [8] J. Arroyo, R. Espínola, C. Maté, Different approaches to forecast interval time series: A comparison in finance, Comput. Econ. 37 (2) (2011) 169–191. [9] A.L.S. Maia, F.d.A. de Carvalho, Holt’s exponential smoothing and neural network models for forecasting interval-valued time series, Int. J. Forecast. 27 (3) (2011) 740–759. [10] J. Arroyo, C. Mate, Introducing interval time series: Accuracy measures, in: Proceedings in Computational Statistics, COMPSTAT 2006, Physica-Verlag, Heidelberg, 2006, pp. 1139–1146. [11] J. Arroyo, A.M.S. Roque, C. Mate, A. Sarabia, Exponential smoothing methods for interval time series, in: Proceedings of the 1st European Symposium on Time Series Prediction, 2007, pp. 231–240. [12] E. de, A. Lima Neto, F. de, A.T. de Carvalho, Constrained linear regression models for symbolic interval-valued variables, Comput. Stat. Data Anal. 54 (2010) 333–347. [13] G. Gonzalez-Rivera, C. Mate, Forecasting with interval and histogram data. Some financial applications, in: Communication in 20th International Symposium of Mathematical Programming (ISMP), 2009, pp. 23–27. [14] J. Ahn, M. Peng, C. Park, Y. Jeon, A resampling approach for interval-valued data regression, Statistical Analysis and Data Mining 5 (4) (2012) 336–348. [15] A.L.S. Maia, F. de, A.T. de Carvalho, T.B. Ludermir, Forecasting models for interval-valued time series, Neurocomputing 71 (16–18) (2008) 3344–3352.
JID:IJA AID:7701 /FLA
[m3G; v 1.129; Prn:3/03/2014; 15:37] P.17 (1-17)
W. Froelich, J.L. Salmeron / International Journal of Approximate Reasoning ••• (••••) •••–•••
17
[16] J. Xia, G.H. Huang, B. Bass, Combination of differentiated prediction approach and interval analysis for the prediction of weather variables under uncertainty, J. Environ. Manag. 49 (1) (1997) 95–106. [17] O. Kärner, C.R. Freitas, Modelling long-term variability in daily air temperature time series for southern hemisphere stations, Environ. Model. Assess. 17 (3) (2012) 221–229. [18] B. Kosko, Fuzzy cognitive maps, Int. J. Man-Mach. Stud. 24 (1) (1986) 65–75. [19] E.I. Papageorgiou, C.D. Stylios, P.P. Groumpos, Fuzzy cognitive map learning based on nonlinear hebbian rule, in: Australian Conference on Artificial Intelligence, 2003, pp. 256–268. [20] J.L. Salmeron, Augmented fuzzy cognitive maps for modelling lms critical success factors, Knowl.-Based Syst. 22 (2009) 275–278. [21] J.L. Salmeron, Fuzzy cognitive maps for artificial emotions forecasting, Appl. Soft Comput. 12 (2012) 3704–3710. [22] W. Stach, L. Kurgan, W. Pedrycz, M. Reformat, Genetic learning of fuzzy cognitive maps, Fuzzy Sets Syst. 153 (3) (2005) 371–401. [23] E.I. Papageorgiou, W. Froelich, Application of evolutionary fuzzy cognitive maps for prediction of pulmonary infections, IEEE Trans. Inf. Technol. Biomed. 16 (1) (2012) 143–149. [24] E.I. Papageorgiou, W. Froelich, Multi-step prediction of pulmonary infection with the use of evolutionary fuzzy cognitive maps, Neurocomputing 92 (2012) 28–35. [25] W. Froelich, E.I. Papageorgiou, M. Samarinas, K. Skriapas, Application of evolutionary fuzzy cognitive maps to the long-term prediction of prostate cancer, Appl. Soft Comput. 12 (12) (2012) 3810–3817. [26] P. Juszczuk, W. Froelich, Learning fuzzy cognitive maps using a differential evolution algorithm, Pol. J. Environ. Stud. 12 (3B) (2009) 108–112. [27] E.I. Papageorgiou, J.L. Salmeron, A review of fuzzy cognitive maps research during the last decade, IEEE Trans. Fuzzy Syst. 21 (1) (2013) 66–79. [28] A.V. Huerga, A balanced differential learning algorithm in fuzzy cognitive maps, in: Proceedings of the 16th International Workshop on Qualitative Reasoning, 2002, pp. 1–7. [29] E. Papageorgiou, C.D. Stylios, P.P. Groumpos, Active hebbian learning algorithm to train fuzzy cognitive maps, Int. J. Approx. Reason. 37 (3) (2004) 219–249. [30] W. Froelich, P. Juszczuk, Predictive capabilities of adaptive and evolutionary fuzzy cognitive maps – a comparative study, in: N.T. Nguyen, E. Szczerbicki (Eds.), Intelligent Systems for Knowledge Management, in: Stud. Comput. Intell., vol. 252, Springer, 2009, pp. 153–174. [31] Elpiniki Papageorgiou, Konstantinos E. Parsopoulos, Chrysostomos S. Stylios, Petros Groumpos, Michael Vrahatis, Fuzzy cognitive maps learning using particle swarm optimization, J. Intell. Inf. Syst. 25 (1) (2005) 95–121, http://dx.doi.org/10.1007/s10844-005-0864-9. [32] M. Ghazanfari, S. Alizadeh, Learning FCM with simulated annealing, Chaos Solitons Fractals 41 (3) (2009) 1182–1190, http://dx.doi.org/10.1016/j.chaos. 2008.04.058. [33] E.I. Papageorgiou, Learning algorithms for fuzzy cognitive maps – a review study, IEEE Trans. Syst. Man Cybern., Part C, Appl. Rev. 42 (2) (2012) 150–163. [34] W. Stach, L.A. Kurgan, W. Pedrycz, A survey of fuzzy cognitive map learning methods, in: P. Grzegorzewski, M. Krawczak, S. Zadrozny (Eds.), Issues in Soft Computing: Theory and Applications, Academic Publishing Company Exit, 2005, pp. 71–84. [35] J.L. Salmeron, Modelling grey uncertainty with fuzzy grey cognitive maps, Expert Syst. Appl. 37 (2010) 7581–7588. [36] J.L. Salmeron, E.I. Papageorgiou, A fuzzy grey cognitive maps-based decision support system for radiotherapy treatment planning, Knowl.-Based Syst. 30 (2012) 151–160. [37] J.L. Salmeron, E. Gutierrez, Fuzzy grey cognitive maps in reliability engineering, Appl. Soft Comput. 12 (2012) 3818–3824. [38] E.I. Papageorgiou, J.L. Salmeron, Learning fuzzy grey cognitive maps using nonlinear hebbian-based approach, Int. J. Approx. Reason. 53 (1) (2012) 54–65. [39] W. Pedrycz, W. Homenda, From fuzzy cognitive maps to granular cognitive maps, in: N.-T. Nguyen, K. Hoang, P. Jedrzejowicz (Eds.), Computational Collective Intelligence. Technologies and Applications, in: Lecture Notes in Computer Science, vol. 7653, Springer, 2012, pp. 185–193. [40] S. Liu, Y. Lin, Grey Information, Springer, 2006. [41] J. Arroyo, A. Munoz San Roque, C. Mate, A. Sarabia, Exponential smoothing methods for interval time series, in: Proceedings of the 1st European Symposium on Time Series Prediction, 2007, pp. 231–240. [42] F. Palumbo, C. Lauro, A pca for interval-valued data based on midpoints and radii, in: H. Yanai, et al. (Eds.), New Developments on Psychometrics, Springer-Verlag, Tokyo, 2003, pp. 641–648. [43] A.L. Maia, F.A.T. de Carvalho, T.B. Ludermir, A hybrid model for symbolic interval time series forecasting, in: I. King, L.W.C.J. Wang, D.L. Wang (Eds.), Neural Information Processing, in: Lecture Notes in Computer Science, vol. 4233, Springer, Berlin, Heidelberg, 2006, pp. 934–941. [44] R.J. Hyndman, A.B. Koehler, Another look at measures of forecast accuracy, Int. J. Forecast. (2006) 679–688. [45] A.d.O. Silva, E. d, A. Lima Neto, U.U. Anjos, A regression model to interval-valued variables based on copula approach, in: Proc. 58th World Statistical Congress, Dublin (Session CPS027), 2011, pp. 6481–6486. [46] D. Goldberg, Genetic Algorithms, Pearson Publishing, 2013. [47] M. Mitchel, An Introduction to Genetic Algorithms, MIT Press, 1999. [48] N.C.D. Center, Quality controlled local climatological data, http://cdo.ncdc.noaa.gov/qclcd/QCLCD?prior=N. [49] G.R. North, R.F. Cahalan, Predictability in a solvable stochastic climate model, J. Atmos. Sci. 38 (1981) 504–513. [50] The r foundation for statistical computing, http://www.r-project.org. [51] D.A. Dickey, W.A. Fuller, Distribution of the estimators for autoregressive time series with a unit root, J. Am. Stat. Assoc. 74 (366) (1979) 427–431. [52] P.S.P. Cowpertwait, A.V. Metcalfe, Introductory Time Series with R, 1st edition, Springer Publishing Company, Incorporated, 2009. [53] S. Johansen, Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models, Econometrica 59 (6). [54] Y. Hea, Y. Hongb, A. Hanc, S. Wang, Forecasting interval-valued crude oil prices via autoregressive conditional interval models, http://www.wise.xmu. edu.cn/Master/News/NewsPic/20114141066473.pdf. [55] R.J. Hyndman, A.V. Kostenko, Minimum sample size requirements for seasonal forecasting models, Int. J. Appl. Forecast. 6 (2007) 12–15. [56] R.J. Hyndman, Y. Khandakar, Automatic time series forecasting: The forecast package for r, J. Stat. Softw. 27 (3) (2008) 1–22. [57] T. Abeyshinghe, U. Balasooriya, A. Tsui, Small-sample forecasting regression or arima models?, J. Quant. Econ. 1 (2003) 103–130. [58] E.S. Gardner, E. Mckenzie, Forecasting trends in time series, Manag. Sci. 31 (10) (1985) 1237–1246. [59] R.H. Shumway, D.S. Stoffer, Time Series Analysis and Its Applications, Springer, 2000. [60] F.X. Diebold, R.S. Mariano, Comparing predictive accuracy, J. Bus. Econ. Stat. 13 (3) (1995) 253–263. [61] A Javascript routine that calculates the relative humidity given the wet and dry bulb temperatures, http://home.fuse.net/clymer/water/wet.html. [62] Dew point tables, http://www.table-references.info/meteo-table-dew-point.php. [63] How does wind speed effect dew point temperatures?, http://wiki.answers.com/Q/How_does_wind_speed_effect_dew_point_temperatures. [64] Answers archive: The relation between temperature and air pressure, http://usatoday30.usatoday.com/weather/resources/askjack/2003-03-18-archiveair-pressure_x.htm.