ARTICLE IN PRESS
JID: KNOSYS
[m5G;October 22, 2016;16:48]
Knowledge-Based Systems 0 0 0 (2016) 1–13
Contents lists available at ScienceDirect
Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys
Fuzzy cognitive maps in the modeling of granular time series Wojciech Froelich a,∗, Witold Pedrycz b,c,d a
Institute of Computer Science, University of Silesia, ul. Bedzinska 39, Sosnowiec, Poland Computational Intelligence in the Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6R 2V4 AB Canada c Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589 Saudi Arabia d Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland b
a r t i c l e
i n f o
Article history: Received 30 June 2016 Revised 16 September 2016 Accepted 15 October 2016 Available online xxx Keywords: Fuzzy cognitive maps Granular computing Time series
a b s t r a c t In this study we propose a new approach to granular modeling of time series. In contrast to the existing fuzzy set-based models of time series, we engage information granules in time (granulation resulting in temporal segments). This method subsequently gives rise to information granules formed in the representation space of the series (in particular, the space of amplitude and space of changes of amplitude). Initially the time series is approximated as the sequence of granules forming a so-called granular time series (GTS). To develop a forecasting (prediction) model of the GTS, we cluster all information granules and regard the centers of the clusters obtained through fuzzy clustering as the concepts of the fuzzy cognitive map (FCM). We propose a matching mechanism to carry out description of the GTS and form the results as a vector of the concepts’ activations. In this way the GTS is represented as the sequence of vectors of the concepts’ activations, which is forecasted by the FCM. At the conceptual level, the forecasted granule is the FCM concept associated with the maximal degree of activation. At the numeric level, the predicted granule regarded as a fuzzy set is described in terms of its bounds and modal value. Experimental studies involving publicly available real-world data demonstrate the usefulness and satisfactory efficiency of the proposed approach. © 2016 Elsevier B.V. All rights reserved.
1. Introduction Modeling time series is an important and challenging problem that has long been addressed by many researchers. Less wellknown are the approaches that rely on the approximation of time series aiming to construct their higher-level representation. Let us first observe that in several application domains the information on accurate numerical values of time series is less important; only knowledge of their approximation is required. For example, in the case of meteorological data analyzed in daily time scale (24-h intervals), it is important to predict minimal and maximal daily temperatures [1]. Also, the expected range of precipitation for every day is important because it influences water demand, which in turn affects the management of local water resources [2,3]. Apart from the ranges in which the daily temperatures and precipitation fall, it is beneficial to predict at least an approximated distribution of the amplitude or changes in amplitude of time series for the next day. Another example comes from the stock market. Information about the possible change of stock
∗
Corresponding author. E-mail addresses:
[email protected] (W. Froelich),
[email protected] (W. Pedrycz).
prices in the forthcoming period substantially influences the decisions of investors. They are usually not interested in the short-term random component of the time series, but rather in the range into which the stock prices are expected to fall in the following daily, monthly or even longer time periods. Also in this case, information on the distribution of data within the specified intervals is beneficial to investors. In both examples mentioned above, instead of forecasting numerical time series, there is an interest in modeling and realizing forecasting at the level of symbols or information granules. When looking at the stated problem from a general perspective, we encounter the issues of specificity and generality. On the one hand, we have to deal with crisp, real-valued observations that are usually hard to predict. On the other hand, we generalize source data, making the representation less accurate but in turn increasing the likelihood of a good performance due to the operation at a higher level of abstraction. The performance measures used for the numerical and granular forecasts are not directly comparable. However, when creating more abstract representations of time series, we attempt to get rid of the included random component that, apart from its statistical properties, is hard to predict. Moreover, we approximate time series, taking into account a specific criterion (which in our case is
http://dx.doi.org/10.1016/j.knosys.2016.10.017 0950-7051/© 2016 Elsevier B.V. All rights reserved.
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
JID: KNOSYS 2
ARTICLE IN PRESS
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
Fig. 1. General idea of the proposed approach.
the compromise between the generality and specificity of the representation). In this way we deal with the sequence of granules approximating numerical time series. The sequence of these granules is forecasted. The perfect forecast of this sequence is achieved when the actual and forecasted granules are equal. The performance of the forecasting is measured in terms of the overlap between the actual and forecasted granules. The more overlap recognized between these granules, the better the obtained forecast. For the measurement of the overlap between the granules, we propose a new function (to be explained later) expressed by formulas (28) and (29) for the amplitude and the change of amplitude or time series, respectively. One of the approaches used in the approximation of raw data is information granulation. For example, concept learning from raw data via multi-granularity has been used in [4]. By using granulation, it is possible to form a higher-abstraction model representing time series [5]. The outcome of the granulation is the granular time series (GTS), which is subject to forecasting. The ease of human interpretation of the GTS is one of the benefits of granulation. For the forecasting of the GTS, a specialized granular model is required. This model is used to predict the sequence of granules instead of the numerical time series. In Section 2 we provide a review of previous works related to the modeling and forecasting of granular time series. We also stress in that section the originality of our study with respect to the previous studies. There are two main objectives of this study: 1. Approximation of numerical time series, aiming to represent them in the form of the GTS. 2. Construction of an effective model for the forecasting of the previously formed GTS. The general idea behind the proposed approach is illustrated in Fig. 1. To perform the approximation of time series, we partition the time domain of the series into intervals of the same length. For every interval, we construct an information granule in the form of a triangular fuzzy number. These information granules are developed with the use of the principle of justifiable granularity (PJG). The sequence of the granules is the granular time series. We refer to this process as the first phase of information granulation. The PJG is a general method for the construction of data representation in the form of entities called granules. In comparison to clustering techniques, the PJG does not explicitly consider the distance between data points. Instead, the PJG optimizes the parame-
ters of the granules, finding a trade-off between the two criteria of generality and specificity. For these contradicting criteria, the user specifies the related functions that depend on the dimensionality of data and the planned application. The goal of the optimization performed by the PJG is to form the most appropriate (justified) representation of data in terms of the two provided functions. When it comes to the second objective, to form the forecasting model of the GTS, we cluster all previously obtained granules by regarding them as entities described in a three-dimensional space, namely by the use of their bounds (a,b) and modal values (m). The representatives of the clusters become the concepts used in the fuzzy cognitive map. We call this process the second-phase granulation, and the concepts of the FCM the second-phase granules. In addition, to interpret the GTS at a conceptual level, we order the second-phase granules with respect to their modal values and assign them linguistic terms, e.g., ’LOW’, ’MEDIUM’, ’HIGH’ (as presented in Fig. 1). To perform the forecasting of the GTS, we need to discover and exploit temporal relationships between the second-phase granules. To accomplish this task, we decided to apply the FCM model. FCMs are a specific type of artificial neural network (ANN). However, in the case of FCMs there is no need to specify input and output nodes, as there usually is for most ANNs. All nodes of the FCM play both roles. After initiating the states of the FCM’s concepts, the reasoning formula is used enabling to forecast the future states of all concepts. In addition, contrary to most applications of ANNs, FCMs do not contain hidden nodes. All FCMs concepts are explicitly related to data. Thanks to this, the laborious task of selecting the number of hidden nodes and hidden layers of the ANN is no longer necessary. From a practical point of view, recent studies revealed excellent performances from FCMs for the time series forecasting task [6–9]. To form the forecasting model of the GTS, we assume the second-phase granules as the concepts of the FCM. The concepts play a role similar to that of the regressors in auto-regressive forecasting models. The connections (directed arcs) between the concepts are interpreted as the temporal dependencies recognized between the regressors. Note that the concepts are ordered according to the linguistic terms assigned to them – this means that the arc between two concepts specifies the dependency between the linguistic values of the time series observed at consecutive time stamps. In addition, every of the discovered arc is labeled by the weight, which is the strength of the considered relationship. The strengths of the relationships between concepts are crucial for the effectiveness of forecasting demonstrated by the FCM; therefore, we perform genetic optimization to adjust them. To make the proposed model functional, the relationships of every first-phase granule to all second-phase granules (the concepts of the FCM) are described by the vector of numerical values. To calculate the elements of this vector, we propose a specialized function evaluating the degree of matching between granules. Consequently, for the entire GTS, we obtain the sequence of activation vectors of the FCM concepts. The FCM is then applied to the forecasting of the activation vectors. The result of this forecasting is twofold. At the conceptual level, it is the concept with the highest predicted activation grade. This concept represents the cluster in which the predicted granule is expected to fall. The predicted state of the FCM is degranulated to come up with the numeric results. As the second result of prediction, we obtain the first-phase granule that predicts the GTS in the following time interval. The key contributions of this study can be summarized as follows:
• We propose a new approach to granular modeling of time series. This approach is based on two phases of information gran-
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
JID: KNOSYS
ARTICLE IN PRESS
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
ulation. In the development process, we engage mechanisms of both supervised and unsupervised learning. • We propose a new function to evaluate the degree of matching between the granules. The function is used for the description of the first-phase granule by the set of the second-phase granules. The same function is used for degranulation and calculation of forecasting errors. • We propose an approach to degranulation that enables the conversion of the predicted state of the FCM to the first-phase granule. The proposed modeling approach is performed separately for the amplitude and for the first-order differences of the amplitude of time series. In this way, we obtain the prediction of granular time series in the two-dimensional space of amplitude and its change. The paper is organized as follows. In Section 2, we provide a review of the relevant literature. In Section 3, we present background knowledge on all tools applied in our research: the principle of justifiable granularity, Fuzzy C-Means clustering, and fuzzy cognitive maps. In Section 4, we describe the proposed approach to the double-phase granular modeling and forecasting of time series. An example illustrating the proposed approach is given in Section 5. Section 6 describes the performed experiments, and Section 7 concludes our paper. 2. Granular modeling of time series – literature review It is possible to distinguish two general approaches to the granulation of time series. The first one is the single-phase approach. In this case, the time series is initially granulated using different methods. The forecasting model relies on the set of those granules using a relational, rule-based system or the FCM to generate predictions. The outcome of the forecasting is a numerical value, which means that the time series is forecasted in its original numerical form. The research described in this paper does not consider singlephase granulation and the forecasting of numerical time series. However, because our research is inspired by some of the ideas of single-phase granulation, we decided to provide here a brief overview of recent related works. The most known method, which in fact initiated the research on granulation of time series, relies on the notion of fuzzy time series (FTS) proposed by Song and Chissom [5]. In this case, numerical time series are expressed as the sequence of fuzzy sets. The method for the construction of these fuzzy sets was not specified. Fuzzy logical relationships between the fuzzy sets are drawn from historical data and later applied to forecasting. The obtained forecasts can be defuzzified into numerical values. A review of the most recent works related to the modeling approaches based on fuzzy time series was made in [10,11]. We do not create any models for the forecasting of numerical time series in this paper. Instead, the two-phase granular model of time series is introduced with the aim of forecasting FTS. The other approach related to the single-phase granulation and forecasting of time series was recently proposed in [12]. First, to cluster numerical values of time series the Fuzzy C-Means method was used. It was proposed to forecast the sequence of the clusters by fuzzy cognitive maps. The forecasts were defuzzified to obtain numerical values. Later used simplification strategy led to the removal of less important nodes and arcs from the FCM model [13]. A similar approach to the granulation of time series was used in [6]; however, high-order fuzzy cognitive maps were applied for forecasting. Although in this paper we also use the FCM model for forecasting, we use it for the forecasting of granular rather than numerical time series.
The other method of the single-phase granulation of time series was proposed in [14]. In this work, the principle of justifiable granularity [15,16] was used to construct granules over the selected intervals. In [17], the PJG and the relational forecasting model were used for the partitioning of the universe of discourse. Also in this case, numerical time series rather than granular ones were forecasted. In all aforementioned works, the granulation was used for the partitioning of the universe of discourse of time series: the amplitude, the change of the amplitude, or their two-dimensional Cartesian product. The other single-phased method of the granulation relies on the partitioning of time series in the time domain. In this case, the original time series is partitioned into time intervals, and the granules are constructed over those intervals. The combined approach can also be used when the granulation is made over both domains (the universe of discourse and time). In [18], the information volume was used to measure the informativeness of granules spanned over temporal intervals. The data in every interval were clustered and then forecasted using fuzzy rules. In [19], granules were related to specific characteristic features (e.g., ’rapid increase’, ’steady signal’) and others recognized within the consecutive intervals of time series. Fuzzy relation was approximated by the artificial neural network and, alternatively, a rule-based inference system was applied for forecasting. In [20] the granulation was made in a two-dimensional space of amplitude and amplitude change, and the interval-based granules were fitted to time series. A relational forecasting model, optimized by Particle Swarm Optimization, was used. In all aforementioned works, the final outcome of the forecasting was the numerical time series. As stated before, in this paper we address the problem of forecasting granular time series, not numerical ones. As mentioned previously, the alternative approach is the twophase granulation of time series. In the first phase, the original time series is modeled by the sequence of granules, hence it becomes a granular time series. This GTS is the subject of forecasting. To create the forecasting model of the GTS, the second phase of granulation is performed. The set of second-phase granules serves as the constituents of the higher-abstraction forecasting model. The outcome of the forecasting is the first-phase granules. The research presented in this paper addresses the problem of modeling and forecasting time series using double-phase granulation. In [21], the time series were segmented by time domain, and the PJG was used for the purpose of the granulation of the time series. The approach proposed in [21] assumed that the fuzzy sets playing the role of the second-phase granules had to be defined by the expert. Arbitrary partitioning of time series in the amplitude domain was required for this purpose. Consequently, the firstphase granules were described by arbitrarily defined second-phase granules. The necessity of reliance on expert knowledge is a limitation that can be recognized in [21]. The Chen relational model [22] was used in [21] for the forecasting; in this paper we use FCM for this purpose. After degranulation, we obtain not only the predicted intervals (as in [21]), but also a fully-described sequence of first-phase granules. In addition, instead of using the possibility function applied in [21], we propose a new function for the evaluation of matching between the first-phase and second-phase granules. We provide evidence that our function performs better. In [23], the first-phase interval granules were formed over the Cartesian product of amplitude and change of amplitude. The PJG was used to optimize these interval granules. To form the secondphase granules, Fuzzy C-Means clustering was used, which overcame the problem of requiring expert knowledge in [21]. A relational model was used for the forecasting of granular time series. The limitation of [23] is that the forecasts were considered only at the conceptual level, i.e., at the second phase of granulation. In comparison with [23], in this paper we perform degranulation of
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
3
ARTICLE IN PRESS
JID: KNOSYS 4
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
Table 1 Different approaches to information granulation. Type
1st grade
2nd grade
Model
Forecast
References
Single-phase Single-phase Single-phase Single-phase Single-phase Double-phase Double-phase Double-phase
Not spec. Clustering PJG Clustering Clustering PJG PJG PJG
– – – – – Expert Clustering Clustering
Relations FCM Relations Relations ANN Relations Relations FCM
Numerical Numerical Numerical Numerical Numerical Granular Granular Granular
[5] [6,12,13] [14,17] [18] [19] [21] [23] This paper
the forecasted second-phase granules. As a result, we obtain fullydescribed first-phase granules, and thus more detailed forecasts of the time series. In contrast to [23], we use the newly proposed function for measuring the degree of matching between the granules. In Table 1, we provide an overview of the literature related to the granular modeling of time series and position our paper in relation to other works. 3. Theoretical background In this section, we briefly recall the essential ideas of the principle of justifiable granularity [19,24] and fuzzy clustering as the key conceptual and algorithmic vehicles for designing the granular models of time series. 3.1. Principle of justifiable granularity Although the notion of the granule can be defined in a broader sense [15,16], in this paper we limit our investigation to the specific case in which the granule is represented as a parameterized triangular fuzzy number. Let x ∈ [xmin , xmax ] be a real-valued variable, where xmin , xmax are its lower and upper bounds, respectively. We form in the interval [xmin , xmax ] a parameterized triangular fuzzy number X. Its membership function is denoted as X(x; m, a, b) with the parameters a, b standing for its bounds, and m (a < m < b) as its modal value, i.e., X (x; m, a, b) = 1. We assume that m is calculated as the median [16]. We fit X to the observations of the variable x ∈ [xmin , xmax ]. For this purpose we optimize the parameters a and b of X(x; m, a, b). First, we decompose X(x; m, a, b) to two linear functions. For a ≤ x ≤ m, we have the left-hand side of the membership function:
X (x; m, a ) =
a 1 ·x− . m−a m−a
(1)
For m ≤ x ≤ b, the right-hand side of the membership function is accordingly defined:
X (x; m, b) =
b 1 ·x− m−b m−b
(2)
When developing an information granule on the basis of provided experimental evidence, we have to address two contradictory measures of coverage and specificity. We define the coverage of X in the interval [a, b] by formulas (3) and (4) for the left-hand and right-hand sides of the membership function, respectively:
cov([a, m] ) =
X (x; m, a ),
(3)
X (x; m, b).
(4)
x∈[a,m]
cov([m, b] ) =
x∈[m,b]
The specificity measures the amount of information contained in a fuzzy subset. The specificity also evaluates the degree to which
a fuzzy subset points to one element as its member. An increase in the specificity of information tends to increase the usefulness of the information [25]. For example, in the case of a numeric interval, the inverse of the length of this interval can serve as a sound measure of specificity. The shorter the interval, the better the satisfaction of the specificity requirement [17,23]. When it comes to a triangular fuzzy set A, the specificity measure is determined by considering the specificity of a certain a-cut of the fuzzy set and then integrating the corresponding partial results. This yields the following expressions. For X(x; m, a) we have:
sp([a, m] ) = 1 −
0.5 · |m − a| . |m − xmin |
(5)
Accordingly for X(x; m, b) we have:
sp([m, b] ) = 1 −
0.5 · |b − m| . |xmax − m|
(6)
To optimize the coverage and specificity of the fuzzy granule the performance index Q is defined separately for both sides of the membership function. Formulas (7) and (8) are used for this purpose:
Q (a ) = cov([a, m] ) · sp([a, m )],
(7)
Q (b) = cov([m, b] ) · sp([m, b)].
(8)
The bounds a, b of the granule are optimized to find the tradeoff between the coverage and specificity of the granule. The optimization results in the following expressions:
aopt = argmaxa Q (a ),
(9)
bopt = argmaxb Q (b).
(10)
As stated in the introduction, we will use the Fuzzy C-Means algorithm to group the optimized granules in terms of their similarity. 3.2. Fuzzy C-Means clustering We recall here the Fuzzy C-means algorithm [26], which we apply to cluster information granules. Let us assume xi ∈ Xd is the data element (in our case it is the parameterized information granule) in d - dimensional space and let aj be a d-dimensional center(prototype) of the cluster Aj formed in this space. To partition all data to the clusters a minimization of the following objective function (11) is carried out:
J=
N c i
2 um i j ||xi − a j || ,
(11)
j
where m ≥ 1, N is the cardinality of dataset Xd , c is the number of clusters, uij is the degree of membership of xi in the Aj , and ||.|| is any norm measuring the similarity between the data and the center of the cluster. During every kth iteration, the membership function is updated using the following formula:
ui j =
1 c
2
||xi −a j || m−1 k=1 ( ||xi −ak || )
.
(12)
The position of the cluster center is updated in the form:
N m i ui j xi a j = N . m i ui j
(13)
The iteration stops when maxi j (|uki j+1 − uki j | ) < , where ∈ [0, 1] is the parameter and k is the iteration step. As a result, each
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
5
data element Xi is associated with each cluster Aj by membership grade. The membership grade indicates the strength of the association between xi and the cluster with the center in aj . The new data instance xnew is assigned to the cluster with the highest value of the membership function (12).
For testing, the out-of-sample errors were also calculated by formula (16). In that case, te is the length of the testing part of the time series.
3.3. Fuzzy cognitive maps
The general idea of the proposed approach to the granulation of time series has been discussed in the introduction. The approach consists of the following steps:
As mentioned in the introduction, we apply fuzzy cognitive maps [27,28] for the forecasting of granular time series. We recall here the basic notions of this model. Let x ∈ R be a real-valued variable whose values are observed at a discrete time scale t ∈ [0, 1, 2, . . . , n], where n ∈ ℵ is the length of the considered period. Let {x(t )} = {x(0 ), x(1 ), . . . , x(n )} be the time series defined for the period t ∈ [0, n]. FCM is an ordered pair < A, W >, where A is the collection of fuzzy sets (concepts) and W is the connection matrix that stores the weights wij assigned to the edges of the FCM [28]. The state λj (t) of every concept Aj at time stamp t can be assumed as the degree to which the observed value of time series x(t) belongs to the fuzzy set Aj , i.e., λ j (t ) = A j (x(t )), where Aj (x) denotes the fuzzy membership function. FCM is applied to one-step ahead forecasting of concepts’ states using the following equation [28]:
λˆ j (t ) = f
c
wi j · λi (t − 1 ) ,
(14)
i=1,i= j
where: c - is the number of concepts and f(z) is the selected transformation function. For our purposes we apply the unipolar sig1 moid function f (z ) = . The parameter gain determines how 1+e−gain·z quickly the transformation reaches values of 0 and 1. The learning of FCMs is realized in a supervised mode through adjustments of its weights, leading to the lowest forecasting errors in the selected period of time. In the most known studies, the set of concepts A is provided by an expert, and only the matrix W is learned (using historical data). There are two known approaches to learning FCMs: adaptive and population-based. Adaptive algorithms are based on the idea of Hebbian learning borrowed from the theory of artificial neural networks. The adaptive learning methods involve DHL [29,30], BDA [31], AHL [32] and NHL [33] algorithms. The population-based approaches for learning FCMs are: RCGA [34], PSO [35], simulated annealing [36] and differential evolution [37]. For the purposes of this study, we use the genetic algorithm for learning FCMs because it was identified in earlier studies as one of the best methods [38,39]. When using the genetic algorithm, the subsequent rows of matrix W are placed linearly one after the other into the vector of genotype [34]. The elements on the diagonal of the matrix W are omitted because they do not take part in the reasoning; see formula (14). In the genetic algorithm we used, the populations of candidate FCMs are evaluated iteratively with the use of a fitness function, f itness(F CM ) = −e, where e is the accumulated prediction error given as:
e=
t n e −1 1 · j, (te − 1 ) · n
(15)
t=1 j=1
where t ∈< 0, 1, 2, . . . , te >, te is the length of the learning period, n = card (A ) is the number of concepts, and j is the individual forecasting error. It is calculated by the following formula:
j (t + 1 ) = (λˆ j (t + 1 ) − λ j (t + 1 ))2 ,
(16)
ˆ j (t + 1 ) is the state of the where λ concept at the time t + 1 forecasted by the candidate FCM, and λ j (t + 1 ) is the actual state of the jth concept at the time t + 1 jth
4. Granular – FCM-based approach to modeling time series
1. In first-phase granulation, the original time series is converted to a sequence of granules (first-phase granular time series). A new interval-based time scale is introduced in which the granular time series are considered. We use the PJG to optimize the granules. 2. In second-phase granulation, the Fuzzy C-Means method is used to cluster all granules obtained after the first-phase granulation. As a result, we obtain the set of second-phase granules. 3. Conceptual description of the first-phase granular time series by the second-phase granules. Construction of the secondphase granular time series. 4. Construction of the forecasting model using a fuzzy cognitive map. The second-phase granules become the concepts of the FCM, learning the FCM by the genetic algorithm and forecasting time series for both phases of granulation. 5. Evaluation of the forecasting errors at both phases of granulation. In the following, we detail every step of the proposed approach. 4.1. First-phase granulation of time series Let {X } = {x(0 ), x(1 ), . . . , x(n )}, t ∈ [0, n] be the considered time series. For the purposes of this research, we also consider the first-order dynamics of the time series; i.e, the sequences of differences defined as: {∂ X } = {x(1 ), x(2 ), . . . , x(n )}, where: x(i ) = x(i ) − x(i − 1 ). Let us partition both time series {X} and {∂ X} in the time don main. For this purpose we determine N = w , (w ≥ 2 ) equal length intervals in time period t ∈ [0, n]. The parameter w is the length of every interval. For t ∈ [(k − 1 ) · w, k · w], k ∈ [1, N], the values of x(t) and x(t) belong to the kth interval. In this way, we obtain a new, interval-based time scale k ∈ [1, N] with N approximation intervals. Let us denote G1 ( · ) as the first-phase granulation function which produces a granule for each kth interval. For {X} we have formula (17):
Xk = G1 (x(t )), t ∈ [(k − 1 ) · w, k · w].
(17)
For {∂ X} formula (18) is applied:
∂ Xk = G1 (x(t )), t ∈ [(k − 1 ) · w, k · w]
(18)
G1
Function works according to the Principle of Justified Granularity stated in Section 3.1. First, it calculates modal m, minimum xmin and maximum xmin of the time series from the kth time interval t ∈ [(k − 1 ) · w, k · w]. Then an interval [xmin , xmax ] within the universe of discourse (amplitude or the change of amplitude) is created. Over this interval, a fuzzy triangular number Xk is constructed with the membership function Xk (x; m, a, b), where the parameters a and b are subject to optimization. Initially, they are assumed as a = xmin and b = xmax , then they are optimized using the function of coverage (3),(4), specificity (5),(6) and optimization indexes (7),(8) given in Section 3.1. As a result, the parameters of the granule are optimized as a = aopt and b = bopt (see formulas (9) and (10)). Thus, the resulting, optimized fuzzy membership function is obtained as Xk (x; m, aopt , bopt ).
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS 6
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
Fig. 3. Example of measuring possibility between fuzzy sets X and A.
Fig. 2. First-phase granulation of time series.
In this way, Xk and ∂ Xk are constructed as the first-phase granules describing the amplitude and the first order differences of the amplitude, respectively, for every period t ∈ [(k − 1 ) · w, k · w]. In the interval-based time scale k ∈ [1, N], we obtained two sequences of information granules (first-phase granular time series):
{Xk } = {X1 , X2 , . . . , XN },
(19)
and
{∂ Xk } = {∂ X1 , ∂ X2 , . . . , ∂ XN }.
(20)
Both sequences combined constitute the sequence:
{GX } =< X1 , ∂ X1 >, < X2 , ∂ X2 >, . . . , < XN , ∂ XN > .
(21)
Note that the index of the first-phase granules is equivalent to the label of the time interval over which the granule was formed. The idea of first-phase granulation is illustrated in Fig. 2. 4.2. Second-phase granulation of time series Every granule obtained after the first stage of granulation is represented in the parameterized form as Xk (x; m, a, b) for the amplitude and as ∂ Xk (x; m, a, b) for the change of amplitude. The parameters m, a, b (a < m < b) stand for the modal value and the bounds of the granule. Thus every granule is the entity described in the three-dimensional space of its parameters. The objective of the second-stage clustering is to group similar granules with respect to their parameters. To perform this clustering we assume that the triples of parameters < m, a, b > of granules play the role of data elements xi for the Fuzzy C-Means clustering described in Section 3.2. So, we have two (for the amplitude and the change of amplitude), three-dimensional spaces in which clustering is carried out. Euclidean distance is assumed to be the norm ||.|| used in Eq. (12). We cluster all granules from the sequence {Xk } and then separately cluster all those from the sequence {∂ Xk } obtained after the first-phase granulation. According to the notation used in Section 3.2, we denote the obtained second-phase granules as: A1 , A2 , . . . , Ac and ∂ A1 , ∂ A2 , . . . , ∂ Ac , where c N is the parameter. The second-stage granulation is expressed by the function G2 ( · ). For the amplitude of the time series we get the collection of second-phase granules (22):
{A j } = G2 ({Xk } ), j ∈ [1, c], k ∈ [1, N].
(22)
and consequently for the change of the amplitude we get:
{∂ A j } = G2 ({∂ Xk } ), j ∈ [1, c], k ∈ [1, N].
(23)
Both collections of granules are assumed to be sorted with respect to their modal values. In this way, separately for the amplitude and the change of amplitude we obtain the ordered sets of granules by which we are able to construct the conceptual description of the first-phase granular time series.
4.3. Conceptual description of the first-phase granular time series The objective of the conceptual description of the first-phase granular time series by the second-phase granules is to construct a link between both types of granules. To describe {Xk } and {∂ Xk } by the collections of second-phase granules {Aj } and {∂ Aj }, respectively, we use the generic function Desc( · ), which will be replaced later with the specific one. The objective of the Desc( · ) function is the quantitative measurement of the matching between two granules. For the amplitude, we evaluate the matching between the first-phase granule Xk occurring at the kth time interval and every second-phase granule Aj for j = 1, 2, . . . , c. To evaluate this matching we calculate:
λ1 j (k ) = Desc(Xk , A j ).
(24)
And similarly for the change of amplitude:
λ2 j (k ) = Desc(∂ Xk , ∂ A j ).
(25)
The generic descriptive function Desc is substituted for the specific one. For the purposes of this research, we consider two options. 1. For the first option, we assume that the descriptive function is the possibility function (26), that was already used in [15,16,21].
Poss(Xk , A j ) = maxx min(Xk (x ), A j (x )).
(26)
Similarly, for the change of amplitude, we have (27):
Poss(∂ Xk , ∂ A j ) = maxx min(∂ Xk (x ), ∂ A j (x )),
(27)
Illustration of the possibility function, calculated for two triangular fuzzy sets X(x) and A(x) is given in Fig. 3. The values of min(X(x), A(x)) are marked by the bold line. The maximum of those values (i.e., the highest point of the corresponding line) is the value of the possibility function. Note that the possibility function in fact measures the maximum of the membership function of the product of the two fuzzy sets. The limitation of this function is that it does not relate the degree of overlap of both sets to their cardinalities. This means that the overlap of two small fuzzy sets can lead to the same value of the possibility function as for much larger sets. 2. To overcome the limitation of the possibility function, we propose another function for matching the granules. In this case, we relate the common part (the overlapping area) of both granules to their sum by using the following formulas:
min(Xk (x ), A j (x )) Match(Xk , A j ) = x . max (Xk (x ), A j (x )) x min(∂ Xk (x ), ∂ A j (x )) Match(∂ Xk , ∂ A j ) = x . max (∂ Xk (x ), ∂ A j (x )) x
(28) (29)
Using formulas (24) and (25) and the Desc(.) function, we describe Xk in terms of A1 , A2 , . . . , Ac for every kth time interval. As a result, we obtain the following vector of numerical values:
< λ11 (k ), λ12 (k ), . . . , λ1c (k ) > .
(30)
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
Similarly, for the ∂ Xk , we obtain:
< λ21 (k ), λ22 (k ), . . . , λ2c (k ) > .
(31)
From each of these vectors, we select the highest value of λ. For the amplitude:
j1k = arg max j=1,2,...,c (λ1 j (k )).
(32)
which points A j1k . The second-phase granule A j1k represents the first-phase granule Xk in terms of its description Desc(.). We perform a similar operation for the change of amplitude:
j2k = arg max j=1,2,...,c (λ2 j (k )),
(33)
which points ∂ A j2k . After performing this operation for all time intervals, we obtain the sequences (34) and (35) of the second-phase granules:
{A(k )} = {A j11 , A j12 , . . . , A j1N },
(34)
and
{∂ A(k )} = {∂ A j21 , ∂ A j22 , . . . , ∂ A j2N },
(35)
By combining both sequences, we obtain the second-phase granular time series:
{G2 X } = < A j11 , ∂ A j21 >, < A j12 , ∂ A j22 >, . . . ,
.
The objective is to minimize Q1 and Q2 by learning FCM1 and FCM2, respectively. We use the genetic algorithm for this purpose. After both FCMs are learned, they are exploited for forecasting. The forecasts are obtained in the form of the FCMs states in the following k + 1 interval. For every concept of the FCM, we obtain its predicted activation state determined by (37) and (38) for FCM1 and FCM2, respectively. Using the forecasted states of the FCMs, we calculate the forecast of the time series {G2 X}. For the amplitude, we select the index of the FCM concept for which the activation state is maximal. We use formula (32) for this purpose, pointing toward the forecasted granule Aˆ j1k+1 . Similarly, using formula (33), we obtain ∂ˆ A j . 2k+1
To obtain the forecast of the {GX}, at the first phase of granulation, the predicted states of FCM1 and FCM2 (i.e., the activation states of their concepts) are degranulated. The objective of the degranulation is to calculate the parameters (modal value and bounds) of the predicted first-stage granule. For FCM1 and the amplitude, we achieve this through the following formulas:
c aˆ =
j=1
c
λˆ 1 j (k + 1 ) · a j
j=1
c (36)
The constructed time series {G2 X} is the generalization of granular time series {GX}. After performing both phases of granulation, the original time series {X } is approximated by the granular time series {GX} which are then approximated by {G2 X}. A similar double-phase approximation related to the change of amplitude is performed for {X}. 4.4. Forecasting granular time series using fuzzy cognitive maps
7
j=1
ˆ = m
c
λˆ 1 j (k + 1 )
λˆ 1 j (k + 1 ) · m j
j=1
c bˆ =
j=1
c
,
λˆ 1 j (k + 1 )
λˆ 1 j (k + 1 ) · b j
j=1
λˆ 1 j (k + 1 )
(41)
,
,
(42)
(43)
where aj , mj , bj denote the lower bound, median, and upper bound of the second-phase granules Aj , constituents of FCM1. The ˆ , bˆ are the parameters of the predicted first-phase granule aˆ, m ˆ , aˆ, bˆ ). Xˆk+1 (x; m Similarly, for FCM2, we use the following formulas:
To perform the forecasting of the granular time series, we construct two fuzzy cognitive maps, FCM1 and FCM2 for the amplitude and the change of amplitude, respectively. The second-phase granules are assumed to be the concepts of the FCM1 and FCM2 formed for the amplitude and the change of amplitude, respectively. The activation of the jth concept of the FCM1 is assumed as λ1j (k). For FCM2 the concepts activations are denoted as λ2j (k). To perform the forecasting, we accordingly adapt formula (14) for the amplitude:
λˆ 1 j (k + 1 ) = f
c
wi j · λ1i ,
(37)
and for the change of amplitude:
λˆ 2 j (k + 1 ) = f
c
wi j · λ2i .
(38)
For both FCMs, we define performance indexes measuring forecasting errors over the time horizon k ∈ [1, N] and over all concepts Aj , j ∈ [1, c]. After the adaptation of formula (16), we obtain (39) and (40) for the amplitude and the change of amplitude, respectively:
Q1 =
k=1
Q2 =
N−1 k=1
c
ˆ 1 j (k + 1 ) − λ1 j (k + 1 )) (λ
2
.
(39)
j=1
c
j=1
c
λˆ 2 j (k + 1 ) · a j
j=1
c j=1
ˆ = m
c
λˆ 2 j (k + 1 )
λˆ 2 j (k + 1 ) · m j
j=1
c bˆ =
j=1
c
,
λˆ 2 j (k + 1 )
λˆ 2 j (k + 1 ) · b j
j=1
λˆ 2 j (k + 1 )
(44)
,
,
(45)
(46)
ˆ , bˆ are calculated for the granIn this case, the parameters aˆ, m ˆ ˆ ˆ , aˆ, b) representing the changes of the amplitude. ule Xk+1 (x; m 4.5. Evaluation of forecasting accuracy
i=1
N−1
aˆ =
i=1
c
j=1
G2 e1 =
K 1 1|A j1 k = Aˆ j1 k , K
(47)
K 1 1|∂ A j2 k = ∂ Aˆ j2 k K
(48)
k=1
and
ˆ 2 j (k + 1 ) − λ2 j (k + 1 ))2 . (λ
To calculate the forecasting errors at the conceptual level for {G2 X}, we propose to check whether the actual and the predicted second-phase granules are exactly the same. We assume them to be the same when the indexes of the actual and predicted granules calculated using formulas (32) and (33) are the same; otherwise, they are assumed to be different. To calculate the forecasting error rate over K intervals, we use the following formulas:
(40)
G2 e2 =
k=1
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS 8
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13 Table 2 Parameters of the original time series after temporal partitioning. Interval (k)
1 2 3 4 5 6 7 8 9
Time (t)
∂ x(t)
x(t)
from
to
xmin
xm
xmax
xmin
xm
xmax
1 13 25 37 49 61 73 85 97
12 24 36 48 60 72 84 96 108
32.2 44.9 53.2 56.1 61.5 59.6 55 66.8 63.6
62.5 71.35 82.2 85.3 92 86.55 84.55 88.3 79.85
113.9 127.4 140.4 157.8 157.7 154.9 149.5 149.1 149.6
−31.1 −33 −39.1 −39 −41.3 −38.3 −31 −30.9 −30.9
4.5 4.7 4.9 5 −1 −1 2.7 −0.1 5.5
22.6 24.3 28.7 27.1 27.8 26.5 24.3 25.1 25.1
Fig. 4. Passenger miles (Mil) flown domestic U.K.
At the first level of granulation, we compare the predicted and actual granules in a more refined way. Let us denote the generic function used for the comparison of granules when calculating forecasting errors as Comp(.), similar to Desc( · ); also Comp(.) can be substituted by the selected specific measure as Comp(. ) = Poss(· ) or Comp(. ) = Match(· ). Let us note that the higher value of 0 ≤ Comp(.) ≤ 1 means better prediction and a lower forecasting error rate. For the calculation of errors and reversal of this dependency, we subtract the values of Comp(.) from 1. The indiComp vidual errors for the single prediction are evaluated as: e1 = Comp (1 − Comp(Xk , Xˆk )) and e = (1 − Comp(∂ Xk , ∂ Xˆk )) for the am2
plitude and the change of amplitude, respectively. The error rates over K time steps are calculated using the following formulas:
G1 eComp 1
K 1 = (1 − Comp(Xk , Xˆk )), K
(49)
k=1
Table 3 Granular time series {GX} . Interval (k)
X(t)
from
to
a
m
b
a
m
b
1 2 3 4 5 6
1 13 25 37 49 61
12 24 36 48 60 72
36.04 44.9 53.2 56.1 76.68 59.6
62.5 71.35 82.2 85.3 92 86.55
105.43 118.49 130.48 154.56 157.33 152.12
−7.63 −6.4 −29.59 −32.78 −13.16 −27.54
4.5 4.7 4.9 5 −1 −1
22.47 23.19 26.42 27.09 27.79 26.49
7 8 9
73 85 97
84 96 108
55.75 66.8 63.6
84.55 88.3 79.85
141.08 142.81 147.03
−27.2 −26.34 −17.98
2.7 −0.1 5.5
24.29 24 23.86
Table 4 Second-phase granules. j
and
G1 eComp = 2
K 1 (1 − Comp(∂ Xk , ∂ Xˆk )). K
(50)
∂ X(t)
Time (t)
1 2
∂ Aj
Aj a
m
b
Term
a
m
b
Term
63 42.34
87.42 69.11
152.83 114.67
HIGH LOW
−29.9 −8.73
3.05 3.06
26.68 24.2
LOW HIGH
k=1
5. Illustrative example In this section we present an illustrative example demonstrating the proposed modeling approach. For this purpose we selected a dataset from the domain of transportation and tourism available at: https://datamarket.com and described as “Passenger miles (Mil) flown domestic U.K. Jul. 62-May72” [40]. We used only data from a period of 9 full years, from 1 January 1963 to 31 December 1971 (108 months). A distinctive feature of this monthly series (monthly time scale) is the existence of seasonality (cycles) related to the increased tourist flow during summer holidays. The season in the time series is one year and the forecasting is made at the beginning of every season. The goal of forecasting is to predict the distribution of flights for the following year. The analyzed time series is depicted in Figs. 4a and 4 b for the amplitude and the change of amplitude, respectively. For the first-phase granulation, we set up the parameters in the following way: n = 108 months is the length of the original time series, w = 12 is the length of the approximation interval related to one season (i.e., 12 months of the year), and N = 9 is the number of intervals (the length of the resulting granular time series). To partition both time series {X} and {∂ X} with respect to seasons, we calculated the minimum (xmin ), median (xm ) and maximum (xmax ) of the amplitude and the change of amplitude for every interval. The results are given in Table 2. In the next phase of the experiment, for every interval we applied the PJG to form first-phase granules. Table 3 shows the results. When comparing Table 2 with Table 3, it can be noted that
the lower and upper bounds of the granules have been optimized in each of the considered intervals. The first-phase granular time series {GX}, given in Table 3, is subject to forecasting. For our demonstration, we partitioned it into the learning part (70%, k = 1–6) and the testing part (30%, k = 7–9). To create the forecasting model, we used only the learning part of {GX}. We cluster the granules from the learning part of {GX} using the Fuzzy C-Means method. As stated before, the objective of the considered example is solely to illustrate the proposed theoretical approach. For this reason, as well as to enable the presentation of all calculations and make them easy to analyze, we assumed for this example the minimal possible number of clusters c = 2. The experiments with higher numbers of clusters are described in Section 6. In Table 4 we show the centers of the obtained clusters, which became the second-phase granules. Separately for the amplitude and the change of amplitude, the granules were sorted with respect to their modal values. Accordingly, the linguistic terms ‘LOW’ and ‘HIGH’ were assigned to the granules. In Fig. 5 we illustrate the relationship among the obtained clusters (second-phase granules) and the entire space of first-phase granules. The second-phase granules are marked in bold. In Fig. 6, the relationship between all granules in the two-dimensional space A − ∂ A is shown. Fig. 6a depicts first-phase granules; Fig. 6b illustrates the same set of granules with the added second-phase granules. In the next step, we describe the learning part of {GX} in terms of the second-phase granules. As with the first option, we applied the possibility function (also used in [21]) for the same purpose.
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
9
Table 6 Forecasting results for the amplitude, Desc = Poss, Comp = Poss . k
A j1k
Aˆ j1k
a
m
b
aˆ
ˆ m
bˆ
ePoss 1
7 8 9
A1 A2 A1
A1 A1 A1
55.75 66.8 63.6
84.55 88.3 79.85
141.08 142.81 147.03
52.85 52.77 52.99
78.42 78.36 78.55
134.07 133.94 134.34
0.0724 0.1288 0.0178
Table 7 Forecasting results for the change of amplitude, Desc = Poss, Comp = Poss .
Fig. 5. First-phase and second-phase granules.
k
∂ A j2k
∂ Aˆ j2k
∂a
∂m
∂b
∂ aˆ
∂ mˆ
∂ bˆ
ePoss 2
7 8 9
A2 A2 A2
A2 A2 A2
−27.2 −26.34 −17.98
2.7 −0.1 5.5
24.29 24 23.86
−19.4 −19.35 −19.39
3.05 3.05 3.05
25.45 25.44 25.45
0.0081 0.0679 0.0535
Table 8 Description of {GX} for Desc = Match.
λ11 (k)
A2
k
λ12 (k)
∂ A1 λ21 (k)
∂ A2 λ22 (k)
1 2 3 4 5 6
0.01 0.19 0.43 0.77 0.62 0.85
0.54 0.78 0.38 0.11 0.13 0.13
0.79 0.71 0.35 0.33 0.58 0.37
0.3 0.31 0.91 0.75 0.46 0.76
A1
Fig. 6. Second phase of granulation. Table 5 Description of {GX} for Desc = Poss.
λ11 (k)
A2
k
λ12 (k)
∂ A1 λ21 (k)
∂ A2 λ22 (k)
1 2 3 4 5 6
1 0.78 0.93 0.98 0.94 0.99
0.91 0.97 0.82 0.78 1 0.76
0.96 0.95 0.97 0.97 0.9 0.9
0.96 0.95 0.97 0.97 0.93 0.93
A1
Fig. 7. Performance indexes during the learning of FCM .
The results are provided in Table 5. As can be easily noted, the possibility function does not discriminate well between the granules ∂ A1 , ∂ A2 . The values of λ21 (k) and λ22 (k) are frequently the same for both granules; also, the differences between λ11 (k) and λ12 (k) are not very high. Using the data from Table 5, we learned FCM1 and FCM2 by means of the genetic algorithm. Fig. 7 depicts the variability of the performance indexes Q1 and Q2 during the learning of both FCMs. The experiment was performed assuming that the cardinality of the initial population was equal to 20. As can be noted, the number of 30 iterations was enough to stabilize both indexes. The forecasting results for the amplitude (FCM1) are given in Table 6. We assumed rounding 10−2 for the presentation of data, and 10−4 for the forecasting errors. The columns in Table 6 are
Table 9 Forecasting results for the amplitude, Desc = Match, Comp = Poss . k
A j1k
Aˆ j1k
a
m
b
aˆ
ˆ m
bˆ
ePoss 1
7 8 9
A1 A1 A1
A1 A1 A1
55.75 66.8 63.6
84.55 88.3 79.85
141.08 142.81 147.03
55.31 55.16 55.21
80.61 80.47 80.51
138.63 138.36 138.43
0.0455 0.0986 0.0071
denoted as follows: k is the number of the interval, i.e., the time stamp of the granular time series; A j1k and Aˆ j1k are the actual and predicted second-phase granules; a, m, b denote the actual: upper bound, median and the lower bounds of the first-phase granˆ , bˆ are ular time series, respectively (copied from Table 3); aˆ, m the corresponding forecasts obtained after degranulation; and ePoss 1 denotes the forecasting error obtained at the first-phase of granulation, calculated using the possibility function. Note that in Table 6 only a single forecasting error occurred at the second level of granulation for k = 8. For the testing period k ∈ [7, 9], the error rate is calculated as G2 e1 = 1/3 using formula (47). At the first phase of granulation, the error rate is calculated as G1 ePoss = (0.0724 + 0.1288 + 0.0178 )/3, or G1 ePoss = 0.073 using 1 1 formula (49). The results obtained for the change of amplitude are given in Table 7. In this case, according to formula (49), the error rate is G2 ePoss = (0.0081 + 0.0679 + 0.0535 )/3, i.e., G1 ePoss = 0.0432 . 1 1 For the second option, we used the proposed matching function Desc = Match for the description of the learning part of {GX}. The results are shown in Table 8. As can be noted, the discriminative capability of the function Match( · ) is substantially better. For every kth interval, we marked in bold the concepts Aj and ∂ Aj for which the matching with the first-phase granules was the highest. They constitute the sequences of the second-phase granules, i.e., the second-phase granular time series {G2 X} (defined by formula 36). For the following experiment we assumed the description function as Desc = Match; but for comparative purposes, the forecasting errors were calculated in the same way as before by assuming Comp = Poss. The results are given in Table 9. In this case, we calculate the forecasting error rate as G1 ePoss = 1 (0.0455 + 0.0986 + 0.0071 )/3, i.e., G1 ePoss = 0.0504. As can be 1
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS 10
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
Table 10 Forecasting results for the change of amplitude, Desc = Match, Comp = Poss . k
∂A
∂ Aˆ
∂a
∂m
∂b
∂ aˆ
∂ mˆ
∂ bˆ
ePoss 2
7 8 9
A2 A2 A1
A2 A2 A2
−27.2 −26.34 −17.98
2.7 −0.1 5.5
24.29 24 23.86
−21.17 −21.16 −21.11
3.05 3.05 3.05
25.66 25.66 25.65
0.0079 0.0654 0.0532
Fig. 8. Example FCMs. Table 11 Forecasting results for the amplitude, Desc = Match, Comp = Match . k
A j1k
Aˆ j1k
a
m
b
aˆ
ˆ m
bˆ
eMatch 1
7 8 9
A1 A1 A1
A1 A1 A1
55.75 66.8 63.6
84.55 88.3 79.85
141.08 142.81 147.03
55.31 55.16 55.21
80.61 80.47 80.51
138.63 138.36 138.43
0.1305 0.4408 0.3032
Table 12 Forecasting results for the change of amplitude, Desc = Match, Comp = Match . k
∂ A j1k
∂ Aˆ j1k
7 8 9
A2 A2 A1
A2 A2 A2
∂a
∂m
∂b
∂ aˆ
−27.2 −26.34 −17.98
2.7 −0.1 5.5
24.29 24 23.86
−21.17 −21.16 −21.11
∂ mˆ
∂ bˆ
eMatch 2
3.05 3.05 3.05
25.66 25.66 25.65
0.4116 0.4369 0.3472
noted, the forecasting errors given in Table 9 are always lower than those in Table 6. A comparison of the results seen in Tables 6 and 9 clearly shows the advantage of using Desc = Match instead of Desc = Poss. A comparison of Tables 7 and 10 leads to the conclusion that the forecasting errors measured in terms of Comp = Poss are slightly lower when using Desc = Match than when assuming Desc = Poss. Similar comparisons performed for all datasets considered in this paper confirmed the advantage of using Desc = Match instead of Desc = Poss. For this reason, we decided to use Desc = Match, Comp = Match for both the descriptive purpose and the calculation of errors, respectively. In the following, we show the results obtained with this setting. The obtained FCMs are shown in Figs. 8a and 8 b for the amplitude and the change of amplitude, respectively. The linguistic terms are assigned to the granules, enabling the interpretation of the obtained FCMs. The negative values of weights suggest that an increase in the activation of the concept A1 causes a decrease in the activation of concept A2 which is in accordance with the intuitive expectation. The same is true for the opposite influence. The results of the forecasting are provided in Tables 11 and 12. Due to using the function Match( · ) for comparison of the predicted and actual granules, the obtained eMatch and eMatch cannot 1 2 be compared with those obtained previously. In Fig. 9 we present the interpretation of the results obtained for the interval k = 7 at the second phase of granulation. The obtained forecast is illustrated as a dot in the two-dimensional space of amplitude and the change of amplitude. 6. Experimental studies The objective of the experiments was to check the forecasting accuracy achieved by the proposed method at both phases of gran-
Fig. 9. Interpretation of the forecast at the conceptual level. Table 13 Parameters of the genetic algorithm . Description
Value
initial chromosomes (FCM’s weights) cardinality of the initial population maximal number of generations number of generations without any improvement probability of mutation probability of crossover elite (for the elite selection)
random 30 100 10 0.1 0.8 20%
ularity. We set up the required parameters before starting the experiments. 6.1. Experimental setup As with the previous example, in all further experiments we used a genetic algorithm for learning the FCM. We applied elite selection, standard mutation, and single-point crossover. The values of the parameters of the genetic algorithms are listed in Table 13 and were adjusted on the basis of numerous trial-and-errors. The experiments were performed using many time series exhibiting different characteristics. In the following, we present the results obtained for five of them, representing different phenomena observed during the experiments. The selected time series are publicly available data sets from the repository: https://datamarket. com. A list is given below. 1. Passenger miles (Mil) flown domestic U.K. Jul. 62-May 72 [40]. This dataset was used previously in the numerical example described in Section 5. The length of the partitioning interval was selected according to yearly seasonality as 12 months. We obtained 9 intervals. 2. Monthly water usage (ml/day), London, Ontario, 1966–1988 [41]. According to the recognized seasonality, the interval length is assumed to be 12 months, resulting in 23 intervals. 3. Daily minimum temperatures in Melbourne, Australia, 1981– 1990 [42]. According to the recognized seasonality, the interval length is assumed to be 365 days, resulting in 10 intervals. 4. Number of daily births in Quebec, Jan. 01, 1977 to Dec. 31, 1990 [43] Interval length is assumed to be 365 days, resulting in 14 intervals. 5. IBM common stock closing prices: daily, 29th June 1959 to 30th June 1960 [44]. Interval length is assumed to be 7 days, resulting in 36 intervals.
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
11
Table 14 Forecasting errors for time series 1. Number of clusters c
G2 E1 G1 E1Match G2 E2 G1 E2Match
2
3
4
5
6
0 0.46 0.33 0.4
0 0.29 0.33 0.32
0.67 0.32 1 0.35
0.68 0.32 1 0.37
0.69 0.34 1 0.39
For all data sets, we partitioned the time series into 70% learning and 30% testing. To describe the first-phase granules by the second-phase granules, the proposed matching function Desc = Match was used. For the second-phase of granulation (conceptual phase), the forecasting accuracy was measured using formulas (47) and (48). For the first phase of granulation we used formulas (49) and (50) with Comp = Match for the calculation of forecasting errors. The main parameter of the performed experiments was the number of second-phase granules. This parameter was adjusted individually for every data set. 6.2. Results of experiments The results obtained for data set 1, which was previously analyzed in Section 5 as the numerical example, are given in Table 14. Two types of reported errors are related to the amplitude of time series. The errors G2 E1 and G1 E1Match were measured at the second and at first phases of granulation, respectively. They are given in the first two rows of Table 14. Note that in the second phase of granulation, the forecasts of amplitude are perfect for two and three clusters. In these cases we obtained G2 E1 = 0. When increasing the number of clusters to 4, the forecasting error rapidly grew to 0.67. Further increase of the number of clusters (c ≥ 4) led to the minor variability of errors. A different situation can be observed in the first phase of granulation. The error G1 E1Match decreased rapidly from 0.46 to 0.29 when going from two to three clusters. Then, it increased slightly but remained at a similar level of approximately 0.3 for up to six clusters or more. Since there was no substantial variability of errors observed for c > 6, we do not report those results here. It can be recognized in Table 14 that when considering both phases of granulation and the forecasting of amplitude, the number of clusters c = 3 can be recommended, for which the sum of errors was the lowest. Regarding the variability of forecasts with respect to different numbers of clusters, similar situation can be observed for the change of amplitude (errors G2 E2 and G1 E2Match ). However, when taking into account absolute values of errors in the second phase of granulation the forecasts are much worse. For c ≥ 4 the error G2 E2 = 1. This means that all forecasted and actual granules in the considered testing period were different. When considering both phases of granularity, the number of clusters c = 3 is recommended which is the same as for the forecasts of amplitude. Our interpretation of the observed phenomenon is as follows. At the conceptual level, when the number of clusters is low and their generality is high, the forecasts are good. When the clusters become specific, and the user tries to deal with more detailed data, the forecasting errors grow. At the numeric level, when the assumed number of clusters is low, the forecasting errors are high. A possible cause is that the calculation of the first-phase granule made by the degranulation is less precise. Also, the FCM forecasting model seems not to work well with a low number of concepts. Note that the forecasting errors decrease with an increase of the number of clusters, but, when the complexity of the model becomes higher, the errors begin to increase slightly.
Fig. 10. Mean water consumption in kitchen.
Table 15 Forecasting errors for time series 2. Number of clusters c
2
G E1 G1 E1Match G2 E2 G1 E2Match
2
3
4
5
6
0 0.47 0.33 0.4
0 0.30 0.33 0.38
0.64 0.30 1 0.3
0.64 0.32 1 0.35
0.65 0.34 1 0.37
Apart from the first data set that was previously used in our numerical example (Section 5), we report the results of similar experiments for the other four time series. These time series (2,3,4, and 5) are illustrated in Fig. 10. Time series 2 contains a yearly seasonal component (12 months) related to water usage. However, in comparison to the previously analyzed dataset, it also contains a quite strong and long-term, increasing trend. Note also that in this case, the considered time series is longer, which makes the testing period longer as well. The results of the experiment are shown in Table 15. As can be observed, the obtained forecasting errors and their variability with respect to the number of clusters are very similar to those obtained previously. The conclusion that can be drawn from the experiment is that the stable trend added to seasonality does not strongly influence the obtained forecasting errors. Note that the smallest error for the change of amplitude at the numeric level was obtained for c = 3. This makes the decision as to the selected number of clusters harder. However, when taking into account all the results shown in 15, we can recommend using c = 3, for which the sum of errors is the lowest. Dataset 3 also contains a yearly seasonal component related to daily temperatures (365 days). However, in comparison to time series 1, it contains a strong random component related to the daily time scale used in this case. In comparison to time series 2, we do not deal with trends. By assuming in the first phase of granulation that the partitioning interval contains data for 365 days, we intended to approximate the occurred variability in the yearly time scale.
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
ARTICLE IN PRESS
JID: KNOSYS 12
[m5G;October 22, 2016;16:48]
W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13 Table 18 Forecasting errors for time series 5.
Table 16 Forecasting errors for time series 3. Number of clusters c
G2 E1 G1 E1Match G2 E2 G1 E2Match
Number of clusters c
2
3
4
5
6
0.33 0.42 0 0.20
0.67 0.36 1 0.17
1 0.35 1 0.14
1 0.37 1 0.14
1 0.36 1 0.14
Table 17 Forecasting errors for time series 4.
2
3
4
5
6
0.27 0.75 0.09 0.61
0.45 0.72 0.44 0.59
0.82 0.75 0.65 0.57
0.82 0.75 0.82 0.58
0.83 0.75 0.91 0.58
6.3. Conclusions drawn from the experiments
Number of clusters c
G2 E1 G1 E1Match G2 E2 G1 E2Match
G2 E1 G1 E1Match G2 E2 G1 E2Match
2
3
4
5
6
0.75 0.46 0.25 0.15
0.5 0.44 0.5 0.14
0.5 0.47 0.5 0.15
0.75 0.48 1 0.15
1 0.49 0.75 0.15
The results of the experiment are given in Table 16. As can be noted, for the amplitude at the conceptual level, the results are worse than for previous data sets. Note also that the perfect forecasts of the amplitude at the conceptual level were no longer possible. Also at the numerical level, the forecasts of the amplitude were slightly worse than previously. For the change of amplitude at the conceptual level, the best number of clusters was c = 2 for which perfect forecasting was possible. At the numerical level, four clusters were preferred. In this case, substantially better forecasts were observed than in the previous experiments. This means that the differentiation of the time series helped to decrease errors in the cases in which the time series contained a random component. The conclusion drawn from the experiment with time series 3 is that in spite of the applied first-phase granulation, the existence of a random component in the numerical time series makes the forecasting harder. The existence of randomness also requires an increase in the number of clusters at the second-phase of granulation when considering forecasting errors at the numerical level. In the case of time series 4, both a cyclic component and a strong random component were recognized. An additional problem was that the existing cycle was not related to any particular easily recognizable interval; in other words, the time series were too short to recognize any stable period of the cycle. For this reason, the length of the partition interval was assumed intuitively. We assumed that the forecasting of the number of births would be interesting in terms of their distribution in the following years. For this reason, we assumed the partitioning interval to be a year. As can be noted in Table 17, the forecasting errors reported for the amplitude are again a bit higher than previously. Surprisingly, for the change of amplitude, the errors obtained at numerical level, are satisfactory. Note that the values of G1 E2Match do not change much with respect to the number of clusters. The reason is that the centers of clusters are close to each other in terms of the matching function. The last analyzed time series (time series 5) contains a slight random component and does not include cycles. We decided to partition this time series into weekly intervals according to the potential needs of investors in the stock exchange. Note in Fig. 10d that in its second part, which is partially used only for testing, the time series starts to involve the growing trend. Such characteristics made the forecasting very hard, because the learning part of the time series does not involve any regularity that could be followed during testing. As can be noted in Tables 17 and 18, the obtained results are the worst of all those previously analyzed.
The performed experiments provide evidence that the proposed method can be applied to the approximation and forecasting of time series. A time series containing a stable cyclic component is the most appropriate for the proposed approach. Also, adding stable seasonality does not make the forecasting errors substantially higher. The proposed method is slightly less effective in cases when randomness is involved in the time series, but it is still quite effective when forecasting the changes of the amplitude. As can be observed in the results of the experiments, the most appropriate number of second-phase granules is dependent on the requirements of the particular application and the characteristics of the time series. By selecting the number according to these parameters, the user can influence the grade of generalization of the time series and consequently the obtained forecasting errors. 7. Conclusions A novel double-phased approach to modeling and forecasting time series was presented in this paper. We used the principle of justified granularity and Fuzzy C-Means clustering for the granular approximation of time series. In this way we have found a compromise between specificity and generality when producing the approximation of the time series. In addition, after the performed granulation and assignment of linguistic terms to the obtained granules, the time series can be easily interpreted by humans, which is not normally the case when analyzing raw data. The approximated time series were forecasted using fuzzy cognitive maps. In comparison to the previously used granular models of time series, the advantage of using our approach is the possibility of forecasting time series on both levels of granularity. The performed experiments provide evidence that the proposed approach results in good forecasting accuracy at both levels of granulation of the time series. A challenge for further research is the investigation of granular approximation of time series. For different types of time series, alternative granules with different shapes and parameters of membership functions could be considered. Those granules can be compared in terms of their capabilities to approximate time series and the related, finally produced forecasting errors. The matching function proposed in this paper can be used for the calculation of forecasting errors, independent of the applied granules. Acknowledgments The work of Wojciech Froelich was supported by the ISSEWATUS project which has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 619228. References [1] W. Froelich, J.L. Salmeron, Evolutionary learning of fuzzy grey cognitive maps for the forecasting of multivariate, interval-valued time series, Int. J. Approximate Reasoning 55 (5) (2014) 1319–1335.
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017
JID: KNOSYS
ARTICLE IN PRESS W. Froelich, W. Pedrycz / Knowledge-Based Systems 000 (2016) 1–13
[2] J. Thornton, R. Sturm, G. Kunkel, Water Loss Control, second ed., The McGraw-Hill Companies, Inc., 2008. [3] J. Bougadis, K. Adamowski, R. Diduch, Short-term municipal water demand forecasting, Hydrol. Process. 19 (1) (2005) 137–148. [4] J. Li, C. Huang, J. Qi, Y. Qian, W. Liu, Three-way cognitive concept learning via multi-granularity, Inf. Sci. (2016). http://dx.doi.org/10.1016/j.ins.2016.04.051. [5] Q. Song, B.S. Chissom, Fuzzy time series and its models, Fuzzy Sets Syst. 54 (3) (1993) 269–277. [6] W. Lu, J. Yang, X. Liu, W. Pedrycz, The modeling and prediction of time series based on synergy of high-order fuzzy cognitive map and fuzzy c-means clustering, Knowl.-Based Syst. 70 (0) (2014a) 242–255. [7] W. Lu, J. Yang, X. Liui, Numerical prediction of time series based on fcms with information granules, Int. J. Comput. Commun. 9 (3) (2014b) 313–324. [8] W. Homenda, A. Jastrzebska, W. Pedrycz, Modeling time series with fuzzy cognitive maps, in: FUZZ-IEEE 2014, Beijing, China, 2014, pp. 2055–2062. [9] J.L. Salmeron, W. Froelich, Dynamic optimization of fuzzy cognitive maps for time series forecasting, Knowl.-Based Syst. 105 (C) (2016) 29–37. [10] P. Singh, A brief review of modeling approaches based on fuzzy time series, Int. J. Mach. Learn. Cybern. (2015) 1–24. [11] P. Singh, Applications of Soft Computing in Time Series Forecasting: Simulation and Modeling Techniques, Springer International Publishing, pp. 11–39. [12] W. Homenda, A. Jastrzebska, W. Pedrycz, Joining concept’s based fuzzy cognitive map model with moving window technique for time series modeling, in: 13th International Conference, CISIM, Ho Chi Minh City, Vietnam, 2014a, pp. 397–408. [13] W. Homenda, A. Jastrzebska, W. Pedrycz, Time series modeling with fuzzy cognitive maps: Simplification strategies - the case of a posteriori removal of nodes and weights, in: 13th International Conference, CISIM, Ho Chi Minh City, Vietnam, 2014b, pp. 409–420. [14] W. Lu, X. Chen, W. Pedrycz, X. Liu, J. Yang, Using interval information granules to improve forecasting in fuzzy time series, Int. J. Approx. Reasoning 57 (2015) 1–18. [15] W. Pedrycz, The principle of justifiable granularity and an optimization of information granularity allocation as fundamentals of granular computing, JIPS 7 (3) (2011) 397–412. [16] W. Pedrycz, W. Homenda, Building the fundamentals of granular computing: a principle of justifiable granularity, Appl. Soft Comput. 13 (10) (2013) 4209–4218. [17] L. Wang, X. Liu, W. Pedrycz, Y. Shao, Determination of temporal information granules to improve forecasting in fuzzy time series, Expert Syst. Appl. 41 (6) (2014) 3134–3142. [18] W. Wang, W. Pedrycz, X. Liu, Time series long-term forecasting model based on information granules and fuzzy clustering, Eng. Appl. AI 41 (2015) 17–24. [19] R. Dong, W. Pedrycz, A granular time series approach to long-term forecasting and trend forecasting, Phys. A 387 (13) (2008) 3253–3270. [20] R. Al-Hmouz, W. Pedrycz, A. Balamash, Description and prediction of time series: a general framework of granular computing, Expert Syst. Appl. 42 (10) (2015) 4830–4839. [21] W. Lu, W. Pedrycz, X. Liu, J. Yang, P. Li, The modeling of time series based on fuzzy information granules, Expert Syst Appl 41 (8) (2014) 3799–3808. [22] S.-M. Chen, Forecasting enrollments based on fuzzy time series, Fuzzy Sets Syst. 81 (3) (1996) 311–319. [23] W. Pedrycz, W. Lu, X. Liu, W. Wang, L. Wang, Human-centric analysis and interpretation of time series: a perspective of granular computing, Soft Comput. 18 (12) (2014) 2397–2411.
[m5G;October 22, 2016;16:48] 13
[24] W. Pedrycz, W. Homenda, Building the fundamentals of granular computing: a principle of justifiable granularity, Appl. Soft Comput. 13 (10) (2013) 4209–4218. [25] R.R. Yager, Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 94–113. [26] J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-Separated clusters, J. Cybern. 3 (3) (1973) 32–57. [27] B. Kosko, Fuzzy cognitive maps, Int. J. Man Mach. Stud. 24 (1) (1986) 65–75. [28] J. Dickerson, B. Kosko, Virtual worlds as fuzzy cognitive maps, Presence 3 (2) (1994) 173–189. [29] B. Kosko, Differential hebbian learning, in: Neural Networks for Computing, American Institute of Physics, April 1986, pp. 277–282. [30] W. Froelich, A. Wakulicz-Deja, Mining temporal medical data using adaptive fuzzy cognitive maps, in: Human System Interactions, 2009. HSI’09. 2nd Conference on, IEEE, 2009, pp. 16–23. [31] A.V. Huerga, A balanced differential learning algorithm in fuzzy cognitive maps, in: Proceedings of the 16th International Workshop on Qualitative Reasoning, 2002, pp. 1–7. [32] E. Papageorgiou, C.D. Stylios, P.P. Groumpos, Active hebbian learning algorithm to train fuzzy cognitive maps, Int. J. Approximate Reasoning 37 (3) (2004) 219–249. [33] E.I. Papageorgiou, C.D. Stylios, P.P. Groumpos, Fuzzy cognitive map learning based on nonlinear hebbian rule, in: Australian Conference on Artificial Intelligence, 2003, pp. 256–268. [34] W. Stach, L. Kurgan, W. Pedrycz, M. Reformat, Genetic learning of fuzzy cognitive maps, Fuzzy Sets Syst. 153 (3) (2005) 371–401. [35] E.I. Papageorgiou, K.E. Parsopoulos, C.D. Stylios, P.P. Groumpos, M.N. Vrahatis, Fuzzy cognitive maps learning using particle swarm optimization, J. Intell. Inf. Syst. 25 (2005) 95–121. [36] S. Alizadeh, M. Ghazanfari, Learning fcm by chaotic simulated annealing, Chaos, Solitons Fractals 41 (3) (2009) 1182–1190. [37] P. Juszczuk, W. Froelich, Learning fuzzy cognitive maps using a differential evolution algorithm, Polish J. Environ. Stud. 12 (3B) (2009) 108–112. [38] W. Froelich, P. Juszczuk, Predictive capabilities of adaptive and evolutionary fuzzy cognitive maps – a comparative study, in: N.T. Nguyen, E. Szczerbicki (Eds.), Intelligent Systems for Knowledge Management, Vol. 252 of Studies in Computational Intelligence, Springer, 2009, pp. 153–174. [39] G.A. Papakostas, D.E. Koulouriotis, A.S. Polydoros, V.D. Tourassis, Towards hebbian learning of fuzzy cognitive maps in pattern classification problems., Expert Syst. Appl. 39 (12) (2012) 10620–10629. [40] O. Anderson, Passenger miles (mil) flown domestic U.K. Jul. 62–May 72, 1976. [41] Hipel, McLeod, Monthly water usage (ml/day), London Ontario, 1966–1988, 1994. [42] L.A. Gil-Alana, Long memory behaviour in the daily maximum and minimum temperatures in melbourne, australia, Meteorol. Appl. 11 (4) (2004) 319–328. [43] Hipel, McLeod, Number of daily births in quebec, Jan. 01, 1977 to Dec. 31, 1990, 1994. [44] B. Jenkins, Ibm common stock closing prices: daily, 29th June 1959 to 30th June 1960, 1976.
Please cite this article as: W. Froelich, W. Pedrycz, Fuzzy cognitive maps in the modeling of granular time series, Knowledge-Based Systems (2016), http://dx.doi.org/10.1016/j.knosys.2016.10.017