A novel intelligent approach for state space evolving forecasting of seasonal time series

A novel intelligent approach for state space evolving forecasting of seasonal time series

Engineering Applications of Artificial Intelligence 64 (2017) 272–285 Contents lists available at ScienceDirect Engineering Applications of Artifici...

1MB Sizes 1 Downloads 60 Views

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

A novel intelligent approach for state space evolving forecasting of seasonal time series Selmo Eduardo Rodrigues Júnior a , Ginalber Luiz de Oliveira Serra b, * a b

Federal University of Maranhão, Av. dos Portugueses, 1966, Bacanga, CEP: 65080-805, São Luís-MA, Brazil Federal Institute of Education, Sciences and Technology, Av. Getúlio Vargas, 04, Monte Castelo, CEP: 65030-005, São Luís-MA, Brazil

a r t i c l e

i n f o

Keywords: Seasonal time series Forecasting Unobservable components Evolving systems Neuro-fuzzy modelling Singular spectral analysis

a b s t r a c t This paper proposes a new methodology for modelling based on an evolving Neuro-Fuzzy Takagi–Sugeno (NF-TS) network used for seasonal time series forecasting. The NF-TS considers the unobservable components extracted from the time series data to evolve, that is, to adapt and to adjust its structure, where the fuzzy rules number of this network can increase or decrease according to components behaviour. The method used to extract these components is a recursive version, proposed in this paper, based on the Spectral Singular Analysis (SSA) technique. The NF-TS network adopts the principle divide to conquer, where it divides a complex problem into subproblems easier to deal, forecasting separately each unobservable component, because they present dynamic behaviours that are simpler to forecast. The consequent propositions of fuzzy rules are linear state space models, where the states are the unobservable components data. When there are available observations from the time series, the training stage of NF-TS is performed, i.e., the NF-TS evolves its structure and adapts its parameters to carry out the mapping between the components data and the available sample of original time series. On the other hand, if this observation is not available, the network considers the forecasting stage, keeping its structure fixed and using the states of consequent fuzzy rules to feedback the unobservable components data to NF-TS. The NF-TS was evaluated and compared with other recent and traditional techniques for seasonal time series forecasting, obtaining competitive and advantageous results in relation to other papers. This paper also presents a case study about real-time detection of anomalies based on a patient electrocardiogram data. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction Time series forecasting is an important study theme in several fields, such as physics, economics, medicine, and engineering. The methods of time series forecasting are important tools to help the decision making and planning of preventive actions by experts. These forecasting methods seek to identify patterns in historical time series data such that a model is developed to build the future temporal patterns of considered series (Brockwell and Davis, 2002; Abdollahzade et al., 2015). Besides the forecasting, the experts may also be interested in the characterization of a time series, i.e., identify its unobservable components or patterns in data. These components can have features that are important to understand the hidden behaviour of time series and to improve the forecasting results (Abdollahzade et al., 2015). The amount and type of unobservable components depend on the method used to extract them from the time series data (Brockwell and Davis, 2002; Young, 2011; Harvey and Koopman, 2009). Examples of unobservable

components are trend, seasonality, cyclic pattern, level, impulse, and other types. In particular, this paper is interested in the multi-step ahead forecasting and characterization of seasonal time series. Empirically, seasonal series have patterns that occur regularly over a period of time that can be monthly, quarterly, annual, and so on Brockwell and Davis ˘ epni˘cka et al. (2013). (2002) and St˘ Currently, there is a large volume of data in various formats, where these data are generated in a high speed. Thus, the experts need to deal with an increasing amount of information efficiently, extracting useful knowledge for modelling complex systems. In addition, the real data are usually complex, nonlinear and nonstationary. Therefore, researchers are challenged to develop algorithms that consider large amounts of data, be online and running in real time, adapt to changes in the environment, be computationally efficient, and preserve the transparency of the model (Angelov, 2013). These challenges also involve the time series analysis and the proposed methodology is included in this context.

* Corresponding author.

E-mail addresses: [email protected] (S.E. Rodrigues Júnior), [email protected] (G.L.O. Serra). http://dx.doi.org/10.1016/j.engappai.2017.06.016 Received 26 December 2016; Received in revised form 2 May 2017; Accepted 21 June 2017 0952-1976/© 2017 Elsevier Ltd. All rights reserved.

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Because of these current requirements, the paradigm of evolving systems has recently emerged. This paradigm is based on the concept of system structure modification, i.e., the system is able to adapt according to the changes in the environment, modifying its internal architecture and adjusting its parameters. The evolving models are applied in situations where purely adaptive models are not sufficient to represent systems (Lughofer, 2011). It is important for evolving systems algorithms that are online and work recursively. In addition, these systems are able to extract knowledge from data samples for each instant (Abdollahzade et al., 2015; Angelov, 2013; Lughofer, 2011). Therefore, there is the necessity to develop a methodology to satisfy these current requirements and challenges in time series area. In this context, the main motivation of this paper is the needing of a methodology that deals with the strict requirements of time and with the amount of data arriving at all instants, using the promising theory of evolving systems, and considering seasonal time series that are very important in practice. Another important motivation is the gap that exists in time series domain in relation to methods that work with both objectives recursively: forecasting and characterization. Hence, this paper focuses on the development of a methodology that provides important knowledge to support the decision-making by experts.

Homenda and Jastrzebska, 2017; Rodriguez-Fernandez et al., 2017; Huang et al., 2016). A forecasting model based on a modified fuzzy c-means and information granulation which does not require that data have the same dimensionality is proposed in Wang et al. (2015) for time series long-term prediction. In Wu and Lee (2015), a local modelling strategy and the investigation of effectiveness of local modelling with three popular machine learning based on forecasting methods, Neural Network, Adaptive Neuro-Fuzzy Inference System, and Least Squares Support Vector Machine, applied for time series prediction is studied. An improved extreme learning machine for online sequential prediction of multivariate time series is presented in Wang and Han (2015). On the basis of the specific network function of extreme learning machine, an improved Levenberg–Marquardt algorithm, in which Hessian matrix and gradient vector are calculated iteratively, is developed to implement online sequential prediction of multivariate time series in Wang and Han (2015). We can also highlight, in state-of-art, the use of evolving neuro-fuzzy networks, adapting its structure to the data (Angelov, 2013; Lughofer, 2011; Lughofer and Angelov, 2011; Lughofer, 2013; Rocha and Serra, 2017; Lemos et al., 2011). In Komijani et al. (2012), an eTS-LSSVM algorithm (evolving Takagi–Sugeno Least Square Support Vector Machine) applied for time series forecasting, is presented. The main contribution of Komijani et al. (2012) is that it addresses nonlinear local models based on least squares support vector machines as consequence of fuzzy rules. In Birek et al. (2014), a modified evolving fuzzy Takagi– Sugeno (eTS) algorithm applied to a leakage forecasting problem is described. The eTS groups the data into several clusters based on Euclidean distance between the relevant independent variables. So, the Mod eTS algorithm, which incorporates a modified dynamic update of cluster radii while accommodating new available data is proposed. The created clusters serve as a base for fuzzy If-Then rules with Gaussian membership functions which are defined using the cluster centres and have linear functions in the consequent. An evolving fuzzy neural network (eFNN) predictor is proposed in Li et al. (2014) to extract representative information from multi-dimensional data sets for more accurate system state forecasting. The proposed predictor possesses online learning capability and can address nonstationary properties of data sets. In Júnior and Serra (2016), an algorithm for nonstationary and seasonal time series forecasting, with an evolving Neuro-Fuzzy Takagi– Sugeno (NF-TS) structure, is proposed. For this algorithm, the NF-TS inputs are unobservable patterns extracted from the time series by the Holt–Winters method optimized by Particle Swarm Optimization.

1.1. State-of-the-art To deal with the nonstationary behaviour, uncertainties, unobservable components, and other complexities of real time series, methods for time series forecasting based on computational intelligence techniques has been widely investigated in scientific community, mainly fuzzy, neuro-fuzzy, and neural networks approaches (Siminski, 2017; Papageorgiou and Poczetac, 2017; Bodyanskiy et al., 2017; Bodyanskiy and Vynokurova, 2013; Li et al., 2013; Chai and Lim, 2016; Li and Hu, 2012; Singh and Borah, 2013; Peng et al., 2015; Xiong et al., 2015). A modified Takagi–Sugeno–Kang fuzzy neural system, entitled Locally Recurrent Neuro-Fuzzy Forecasting System (LR-NFFS), is presented in Mastorocostas and Hilas (2012), where the consequent parts of the fuzzy rules are neural networks with an internal recurrence, introducing the dynamics to the overall system. The LR-NFFS is used for telecommunications time series forecasting. In Abdollahzade et al. (2015), a hybrid method for nonlinear and chaotic time series forecasting based on a local linear neuro-fuzzy model and optimized singular spectrum analysis is developed. This hybrid method is applied to forecast several wellknown time series with different structures and characteristics. A hybrid evolutionary system composed by a simple exponential smoothing filter, ARIMA and autoregressive (AR) linear models and a Support Vector Regression (SVR) model is applied for time series forecasting in de Oliveira and Ludermir (2016). Particle swarm optimization is employed to optimize the order of AR model, SVR parameters, and number of time series lags. In Cui et al. (2015), the authors propose a novel hybrid networks model Complex Rotation Quantum Dynamic Neural Networks (CRQDNN) applied to application studies: the chaotic time series prediction and electronic remaining useful life prediction. Some new methodologies is also being developed for seasonal time ˘ epni˘cka et al. (2013), series forecasting (Zhang and Qi, 2005). In St˘ the researchers introduce novel methods based on computational intelligence techniques for multi-step seasonal time series forecasting, for example, evolutionary artificial neural networks, support vector machines, and linguistic fuzzy rules. The most important contribution ˘ epni˘cka et al. (2013) is the introduction of a new hybrid combinaof St˘ tion using linguistic fuzzy rules and others computational intelligence methods presented. In Gan et al. (2014), a quasi-linear autoregressive model, that belongs to a class of varying coefficient models in which its autoregressive coefficients are constructed by radial basis function networks, is applied for seasonal and trend time series forecasting. Many researches involving the time series forecasting are considering the machine learning concepts in their proposals, highlighting the unsupervised learning method called clustering (Lapidot et al., 2017;

1.2. Originality and contributions In this paper, an evolving Neuro-Fuzzy Takagi–Sugeno (NF-TS) modelling approach applied for multi-step forecasting and characterization of seasonal time series is proposed. The unobservable components are extracted from the time series through of a recursive version, proposed in this paper, based on Singular Spectrum Analysis (SSA) method (Golyandina et al., 2001). This new approach was named of Recursive SSA (RSSA). The NF-TS uses the unobservable components data to adapt and evolve its structure based on fuzzy rules (adding or removing) to forecast future observations for the time series. Firstly, the forecasting is performed for each unobservable component separately. Next, the forecasting results for each component are considered to compose the future value for the original time series. The consequent proposition of NF-TS fuzzy rules are equations in the state space, whose the states are formed by the extracted components, representing the dynamic behaviour of time series components. So, the originality and the main contributions of the proposed methodology are presented in the following aspects: ∙ Evolving forecasting methodology of seasonal time series multisteps ahead; ∙ Use of unobservable components data to forecast the time series; 273

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Fig. 1. Structure of the neuro-fuzzy Takagi–Sugeno network.

∙ Forecasting of each unobservable component separately; ∙ The consequent propositions of fuzzy rules are state space model, whose the states are formed by the extracted components; ∙ Recursive SSA algorithm for seasonal time series characterization; ∙ Inclusion of a new condition for quality evaluation of clusters, considering seasonal time series (Condition 𝐶—Eq. (33)).

computational efficiency, and applications in several papers (Júnior and Serra, 2016; Junior, 2006; Yang et al., 2015; Wu et al., 2016; Bergmeir et al., 2016). Another method is the X-11 procedure developed by US. Bureau of the Census, and it is based on iterative estimation of time series components using moving averages (Deng, 2014). Similarly, the Seasonal and Trend decomposition using Loess (STL) is an other method in literature that decompose the trend and seasonal patterns (Bergmeir et al., 2016; Deng, 2014; Theodosiou, 2011). In this proposed methodology, the Singular Spectral Analysis (SSA) technique is explored. The detailed description of SSA algorithm is discussed in Abdollahzade et al. (2015), Golyandina et al. (2001), Deng (2014) and Hassani and Mahmoudvand (2013). SSA method has concepts about classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems, and signal processing. Nevertheless, SSA method does not make any statistical assumptions. Compared to other methods, the SSA does not being restricted to predetermined components, i.e, trend, seasonal, and cycle patterns, for example. Therefore, the components types are determined from the time series data. In addition, the expert can select the number of components or patterns to be extracted. However, this method has a batch feature and the methodology proposed in this paper requires a recursive solution for decomposition task. Because of this limitation, a modification of the original SSA method is proposed in this paper. The modified SSA performs a recursive update of the covariance matrix and it extracts the unobservable components data for each instant 𝑘. This proposed SSA procedure is called RSSA (Recursive Spectral Singular Analysis), where it is summarized in Fig. 2 and fully described below. Initial conditions. In this step, the user define some initial parameters of the proposed methodology and the initial covariance matrix from SSA is formulate. Three parameters are determined initially by the user:

1.3. Paper organization The paper is organized as follows. In Section 2, the description of each layer from the evolving NF-TS, showing the characteristics, the mathematical formalism, and its objectives is performed. The experimental results of the proposed methodology considering a benchmark with real time series are displayed in Section 3. Furthermore, a case study using the NF-TS for the problem of anomalies detection in electrocardiograms data is presented in Section 3 as well. Finally, the final remarks are presented in Section 4. 2. Methodology: functional formulation The proposed NF-TS network is divided into six layers, as showed in Fig. 1. The description of each layer is performed in the following subsections. 2.1. Structural description of NF-TS network 2.1.1. Layer 1: extraction of unobservable components The purpose of this layer is to extract the unobservable components or hidden patterns from the time series data. These patterns are simpler than the original time series and they are very helpful to experts understand the series behaviour. Complex time series can exhibit components or patterns of several natures. The chosen decomposition or extraction method depends on the time series features and experts goals (Abdollahzade et al., 2015; Harvey and Koopman, 2009; de Oliveira and Ludermir, 2016). Many approaches can be used to perform this extraction or decomposition task. We can highlight the large class of exponential smoothing methods, which are easy to implement, they have reasonable accuracy,

∙ 𝑊 : initial amount of time series samples considered; ∙ 𝐿: dimension of lagged vectors, for 1 < 𝐿 < 𝑊2 ; ∙ 𝑛: number of unobservable components extracted. These parameters values are critical factors that influence the proposed methodology performance and they should be properly chosen. The initial conditions of RSSA approach need 𝑊 time series samples. Thus, let 𝑦𝑘 be a time series data in instant 𝑘. A interval  with 𝑊 time 274

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Fig. 2. Cycle of RSSA method for time series decomposition.

interesting computationally. In this proposed approach, all the historical information from the time series is stored in covariance matrix 𝐒𝐤 , whose dimension is fixed (𝐿 × 𝐿) and it is updated recursively for all 𝑘. To update 𝐒𝐤 recursively, the following array is necessary:

series samples is represented by: {

}

 = 𝑦𝑘 ∈ R|𝑘 = 1, … , 𝑊 .

(1)

Next, 𝐾𝑊 lagged vectors named 𝐦𝐢 with dimension 𝐿 are created, using the data contained in interval  above: 𝐦𝐢 = [𝑦𝑖 ⋯ 𝑦𝑖+𝐿−1 ]𝑇 , 𝐦𝐢 ∈ R𝐿×1 , for 𝐾𝑊 = 𝑊 − 𝐿 + 1 and 𝑖 = 1, 2, … , 𝐾𝑊 ,

𝝎𝒌 = [𝑦𝐾𝑘 𝑦𝐾𝑘 +1 ⋯ 𝑦𝑘 ]𝑇 ,

(2)

𝐌 = [𝐦𝟏 𝐦𝟐 ⋯ 𝐦𝐊𝐖 ],

𝐌∈R

,

Therefore:

𝑦𝐾𝑊 ⎤ ⎥ 𝑦𝐾𝑊 +1 ⎥ . ⋮ ⎥ ⎥ 𝑦𝑊 ⎦

⋯ ⋯ ⋮ ⋯

𝑦2 𝑦3 ⋮ 𝑦𝐿+1

Singular value decomposition (SVD). The SVD step begins with the eigenvalues computation from covariance matrix 𝐒𝐤 previously obtained in Eq. (8). These eigenvalues are denoted in decreasing order by (Abdollahzade et al., 2015; Golyandina et al., 2001):

(3)

𝜌1𝑘 ≥ 𝜌2𝑘 ≥ 𝜌3𝑘 ≥ ⋯ ≥ 𝜌𝐿 𝑘 ≥ 0,

(4)

𝐒𝐖 = 𝐌𝐌 ,

𝐿×𝐿

𝐒𝐖 ∈ R

.

𝑦𝑘 = 𝜃𝑘1 + 𝜃𝑘2 + ⋯ + 𝜃𝑘𝑑 , 𝜃𝑘𝑖 = 𝑝̃𝑖𝑘 𝝎𝑻𝒌 𝐩𝐢𝐤 ,

(5)

or

𝐾𝑘 = 𝑘 − 𝐿 + 1.

for

(10)

𝑖 = 1, … , 𝑑,

where 𝑝̃𝑖𝑘 is the 𝐿 index or the last element of the eigenvector 𝐩𝐢𝐤 , because only this element is required to make the SVD of 𝑦𝑘 sample in this proposed approach.

After the initial conditions, the RSSA cycle begins (Fig. 2), where the next RSSA steps will be repeated for all instant 𝑘, with 𝑘 = 𝑊 + 1, 𝑊 + 2, …. This covariance matrix will be recursively update in next step. Recursive update of covariance matrix. For each time series sample 𝑦𝑘 coming in instants 𝑘 = 𝑊 + 1, 𝑊 + 2, …, the 𝐾𝑘 value is incremented by: 𝐾𝑘 = 𝐾𝑘−1 + 1

(9)

Let 𝐩𝟏𝐤 , 𝐩𝟐𝐤 , 𝐩𝟑𝐤 , … , 𝐩𝐋 be the orthonormal system of eigenvectors from 𝐤 matrix 𝐒𝐤 or left-singular vectors corresponding to the eigenvalues in { } Eq. (9). Consider 𝑑 = max 𝑖, such that 𝜌𝑖𝑘 > 0 . The SVD of the time series sample 𝑦𝑘 can be represented by:

We can note that 𝐌 is a Hankel matrix, this is, the values of ascending diagonal from left to right are equal. Finally, the initial covariance matrix considering 𝑊 time series sample is computed by: 𝐓

(8)

The next step performs the Singular Value Decomposition (SVD) of this covariance matrix 𝐒𝐤 .

in other words, the trajectory matrix is: ⎡ 𝑦1 ⎢ 𝑦 𝐌=⎢ 2 ⎢⋮ ⎢𝑦 ⎣ 𝐿

(7)

𝐒𝐤 = 𝐒𝐤−𝟏 + 𝐐𝐤 , 𝐒𝐤 ∈ R𝐿×𝐿 , where 𝐐𝐤 = 𝝎𝒌 𝝎𝑻𝒌 , 𝐐𝐤 ∈ R𝐿×𝐿 .

where 𝐾𝑊 is associated with the time series sample 𝑦𝑊 . In generic case, 𝐾𝑘 is associated with a time series sample 𝑦𝑘 . The 𝐾𝑘 value is incremented for each instant by RSSA step described in next subsection. The trajectory matrix 𝐌 containing time series data from the interval  is defined by: 𝐿×𝐾𝑊

𝝎𝒌 ∈ R𝐿×1 .

Grouping of RSSA. The function of this step is to distribute the set of indices {1, … , 𝑑} previously generated in 𝑛 disjoint subsets defined as 𝑖1 , … , 𝑖𝑛 (Abdollahzade et al., 2015; Golyandina et al., 2001). The 𝑛 value is the number of unobservable components to be extracted by RSSA method defined by the expert in initial conditions. Then, let { } 𝐼 = 𝑖1 , … , 𝑖𝑞 be a group or set of indexes. Thus, the term resulting

(6)

In original SSA method, all historical time series data are contained in the trajectory matrix 𝐌, with dimension 𝐿 × 𝐾𝑘 . With the arrival of a new time series sample and the 𝐾𝑘 update, this matrix needs to modify its dimension to incorporate this new sample, which is not

𝑖

𝑖

𝑖

𝜃𝑘𝐼 associated with the set 𝐼 is defined as 𝜃𝑘𝐼 = 𝜃𝑘1 + 𝜃𝑘2 ⋯ + 𝜃𝑘𝑞 . This procedure is performed for each group 𝐼 = 𝐼1 , … , 𝐼𝑛 , resulting in the 275

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

one of both techniques is important in order to transform the data so that they have comparable ranges. In this paper, the standardization is considered because it is very convenient for this methodology, where the mean and the standard deviation accumulate the effects of all data online. The standardization method for each element or column of data stream 𝐝𝐤 (𝑑𝑘𝑗 , 𝑗 = 1, … , 𝑛 + 1) can be written as (Angelov, 2013, 2010):

following expression: 𝐼

𝐼

𝐼

𝑦𝑘 = 𝜃𝑘1 + 𝜃𝑘2 + ⋯ + 𝜃𝑘𝑛 .

(11)

Unobservable components determination. The last step in this approach converts each term of the grouped decomposition in Eq. (11) into an unobservable component sample in instant 𝑘. Then, this step can be written as: 𝐼

𝑐𝑘𝑗 = 𝜃𝑘𝑗 ,

for 𝑗 = 1, … , 𝑛,

𝑑̂𝑘𝑗 =

(12)

𝑦𝑘 = 𝑐𝑘1 + 𝑐𝑘2 + ⋯ + 𝑐𝑘𝑛 ,

where 𝑗 = 1, … , 𝑛 represents the unobservable component or hidden pattern 𝑗 extracted from time series sample 𝑦𝑘 . Note that the sum of these components is exactly equal to the original time series data. Moreover, the unobservable components are linearly independent of each other (Golyandina et al., 2001). The following algorithms present the procedure of first layer. Alg. 1 displays the initial conditions of RSSA.

(14)

(15)

2.1.2. Layer 2: evolving computation of antecedent propositions After the unobservable components extraction, the Evolving Computation of Antecedent Propositions layer is initiated. The main purpose of this layer is to generate the antecedent propositions of fuzzy rules using the knowledge extracted from the unobservable components data. In this layer, the number of fuzzy rules can grow or shrink, modifying the structure of neuro-fuzzy network and indicating the evolving feature of this methodology (Angelov, 2013; Li et al., 2014; Lughofer and Angelov, 2011; Tung et al., 2013; Lughofer, 2013; Kazemi and Abdollahzade, 2015). This layer generate a data stream 𝐝𝐤 for each instant 𝑘 (Lughofer and Angelov, 2011; Lughofer and Sayed-Mouchaweh, 2015; Bordignon and Gomide, 2014). The 𝐝𝐤 consider 𝑛 unobservable components from first 1 , 𝑐 2 , ⋯ , 𝑐 𝑛 ), i.e.: layer (𝑐𝑘−1 𝑘−1 𝑘−1 ,

= 0,

𝑗 = 1, … , 𝑛 + 1, 𝑘 = 𝑊 + 1,

= 1,

𝑗 = 1, … , 𝑛 + 1, 𝑘 = 𝑊 + 1,

= 0,

𝑗 = 1, … , 𝑛 + 1, 𝑘 = 𝑊 + 1.

(16)

where 𝑊 + 1 is the first instant 𝑘 after the initial conditions of RSSA method presented in Section 2.1.1. Next, the proposed methodology uses an unsupervised learning technique to extract knowledge from standardized data streams 𝐝̂𝐤 . This technique is an evolving clustering method based on eClustering+ algorithm proposed by P. Angelov in Angelov (2010). This recursive and evolving method divides the data space from 𝐝̂𝐤 in overlap clusters, where each one is associated with a fuzzy rule. The evolving clustering algorithm in this layer aims to build clusters with high generalization capacity, determine the suitable amount of clusters automatically from standardized data streams, and simplify the fuzzy rules base, removing redundant clusters. The criterion used to create new clusters is based on the concept of Recursive Estimation Density (RDE) (Angelov, 2013). A data stream 𝐝̂𝐤 that satisfies the Condition 𝐴 below has high generalization capacity or it has information about unexplored regions in data space. Therefore, in this case, the data stream points will form the focal point of a new cluster in 𝑛-dimensional space. However, if this condition is not satisfied, the points from 𝐝̂𝐤 are associated with the closest existing cluster. The Condition 𝐴 is denoted by (Angelov, 2010):

Algorithm 2 Extraction of Unobservable Components by RSSA 1: Compute 𝐾𝑘 value (Eq. (6)); 2: Formulate the vector 𝝎𝒌 (Eq. (7)); 3: Update the covariance matrix 𝐒𝐤 (Eq. (8)); 4: Eigenvalues and eigenvectors of 𝐒𝐤 ; 5: Use the SVD procedure (Eq. (10)); 6: Grouping of RSSA (Eq. (11)); 𝑗 7: Determine the unobservable components 𝑐 (Eq. (12)). 𝑘

𝐝𝐤 ∈ R

for 𝑗 = 1, … , 𝑛 + 1, 𝑘 = 𝑊 + 2, 𝑊 + 3, … ,

The initial conditions of standardization process are: 𝑑̄𝑘𝑗 𝛿𝑘𝑗 𝑑̂𝑘𝑗

The Alg. 2 effectively begins the unobservable components extraction, repeating this cycle for each instant 𝑘, as Fig. 2.

𝐝𝐤 =

,

𝑘 − 1 ̄𝑗 1 𝑑̄𝑘𝑗 = 𝑑 + 𝑑 𝑗 , 𝑗 = 1, … , 𝑛 + 1, 𝑘 = 𝑊 + 2, 𝑊 + 3, … , 𝑘 𝑘−1 𝑘 𝑘 √ 𝑘−1 𝑗 2 1 𝛿𝑘𝑗 = (𝛿𝑘−1 ) + ‖𝐝𝐤 − 𝐝̄𝐤 ‖2 , 𝑘 𝑘 𝑗 = 1, … , 𝑛 + 1, 𝑘 = 𝑊 + 2, 𝑊 + 3, … .

Algorithm 1 Initial Conditions of RSSA 1: Set 𝑊 , 𝑛, and 𝐿 values; 2: Initial interval  (Eq. (1)); 3: Compute 𝐾𝑊 = 𝑊 − 𝐿 + 1; 4: for 𝑖 = 1 to 𝐾𝑊 do 5: Obtain lagged vectors 𝐦𝐢 (Eq. (2)); 6: end for 7: Trajectory matrix 𝐌 (Eq. (3)); 8: Initial covariance matrix 𝐒𝐖 (Eq. (5)).

1×𝑛+1

𝛿𝑘𝑗

where 𝑑̂𝑘𝑗 is the standardized value of original data stream element 𝑑𝑘𝑗 . The mean 𝑑̄𝑘𝑗 and the standard deviation 𝛿𝑘𝑗 can be updated recursively by (Angelov, 2013, 2010):

𝑐𝑘𝑗 , for

1 2 𝑛 [𝑐𝑘−1 , 𝑐𝑘−1 , … , 𝑐𝑘−1 , 𝑦𝑘 ],

𝑑𝑘𝑗 − 𝑑̄𝑘𝑗

̂ 𝐷𝑘 (𝐝̂𝐤 ) > max𝑅 𝐷 (𝐟̂ ) 𝐎𝐑 𝐷𝑘 (𝐝̂𝐤 ) < min𝑅 𝑖=1 𝐷𝑘 (𝐟𝐢∗ ) 𝑖=1 𝑘 𝐢∗

(17)

where 𝑅 is the number of clusters or fuzzy rules, 𝐷𝑘 (𝐝̂𝐤 ) is the density ̂ corresponds to the focal point of 𝑖th cluster, of the data stream 𝐝̂𝐤 , 𝐟𝐢∗ ̂ and 𝐷𝑘 (𝐟𝐢∗ ) is the density of this focal point. The density value for each standardized data stream 𝐝̂𝐤 is computed according to the following expression (Angelov, 2010): 𝐷𝑘 (𝐝̂𝐤 ) = (𝑘 − 1)

𝑘−1 (∑ ) ∑𝑛+1 ̂𝑗 𝑗 , 𝑛+1 ̂𝑗 2 ( 𝑑 ) 𝑗=1 𝑘 + 1 + 𝑏𝑘 − 2 𝑗=1 𝑑𝑘 𝑔𝑘

𝑘 = 𝑊 + 2, 𝑊 + 3, … where 𝐷𝑊 +1 (𝐝̂𝐖+𝟏 ) = 1 (initial condition), 𝑛+1 ∑ 𝑏𝑘 = 𝑏𝑘−1 + (𝑑̂𝑘𝑗 )2 , 𝑏𝑊 +1 = 0,

(13)

1 , 𝑐2 , … , 𝑐𝑛 where 𝑐𝑘−1 are the inputs to the fuzzy inference system 𝑘−1 𝑘−1 of NF-TS network and 𝑦𝑘 is the desired output used for NF-TS training. Note that the components data at instant 𝑘 − 1 are considered to forecast time series data at instant 𝑘. After data stream building, it is necessary to normalize or standardize 𝐝𝐤 , because the data of different components can have significantly different ranges in its values (Angelov, 2013, 2010). Thus, applying

(18)

𝑗=1 𝑗 𝑗 𝑔𝑘𝑗 = 𝑔𝑘−1 + 𝑑̂𝑘𝑗 , 𝑔𝑊 = 0, +1

for 𝑗 = 1, 2, … , 𝑛 + 1,

where the first data stream is defined as the focal point of the first cluster density, 𝐷𝑊 +1 (𝐝̂𝐖+𝟏 ) = 1, remembering that the evolving process begins at time 𝑊 + 1. When 𝐝̂𝐤 is chosen as the focal point of a new cluster by Condition 𝐴, its computed density value is stored by 𝐷𝑘 (𝐟̂𝐢∗ ). This density 276

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

value must be updated by the following equation while this cluster exists (Angelov, 2013): 𝐷𝑘 (𝐟̂𝐢∗ ) = 𝑘 − 1 + (𝑘 − 2)

(

2.1.3. Layer 3: computation of the normalized activation degrees of rules With the antecedents of fuzzy rules built, the third layer is initiated to compute the influence of each fuzzy rule in time series forecasting. This NF-TS network layer uses the product t-norm to compute the activation degree of each fuzzy rule, considering the membership functions obtained in previous layer, as follows (Angelov, 2013, 2010):

𝑘−1 1 𝐷𝑘−1 (𝐟̂𝐢∗ )

) ∑ ̂𝑗 ̂𝑗 − 1 + 𝑛+1 𝑗=1 (𝑑𝑘 − 𝑑𝑘−1 )

(19)

because all the data stream that arrive for each time influence on the density of all existing clusters. After that a new fuzzy rule is generated, the following Condition 𝐵 is evaluated for each existing cluster (Angelov, 2013): 𝐈𝐅 𝜇𝑖𝑗 (𝑑̂𝑘𝑗 ) > 𝑒−1 ∀𝑗 (𝑖 = 1, 2, … , 𝑅 and 𝑗 = 1, 2, … , 𝑛) 𝐓𝐇𝐄𝐍 𝑅 ← 𝑅 − 1,

𝑛 } ∏ { 𝜏𝑘𝑖 = 𝑡 𝜇𝑖𝑗 (𝑑̂𝑘𝑗 ) = 𝜇𝑖𝑗 (𝑑̂𝑘𝑗 ),

𝜏𝑖 𝜆𝑖𝑘 = ∑𝑅 𝑘

𝑞 𝑞=1 𝜏𝑘

(25)

.

The Alg. 4 presents the simple implementation of this layer: Algorithm 4 Computation of the Normalized Activation Degrees of the Rules 1: for 𝑖 = 1 to 𝑅 do 2: Compute the activation degree 𝜏𝑘𝑖 (Eq. (24)); 3: end for 4: for 𝑖 = 1 to 𝑅 do 5: Compute the normalized activation degree 𝜆𝑖𝑘 (Eq. (25)). 6: end for

𝑗 where 𝜎𝑘𝑖 is the radii of the cluster 𝑖 with input axis 𝑗 in 𝑛-dimensional 𝑗 space in instant 𝑘. The 𝜎𝑘𝑖 is updated recursively by (Angelov, 2013): √ 1 𝑗 𝑗 𝑗 2 𝜎𝑘𝑖 = 𝜁(𝜎(𝑘−1)𝑖 )2 + (1 − 𝜁) 𝑖 (𝑑̂𝑘𝑗 − 𝑓̂𝑖∗ ) , 𝑆𝑘

2.1.4. Layer 4: recursive computation of consequent propositions The value of 𝜆𝑖𝑘 computed previously is fundamental to measure the contribution of each fuzzy rule. The fourth layer involves the consequent propositions of fuzzy rules. In this paper, a complete NF-TS fuzzy rule is represented by:

(22) 𝑆𝑘𝑖

where 𝜁 is a learning constant and is the support or the number of points in 𝑛-dimensional space associated with the 𝑖th cluster. The 𝜁 value and the initial condition for clusters radii are critical factors that affect the proposed methodology performance, therefore, their values must be properly chosen. Finally, the propositions of fuzzy rules antecedents can be created using the focal point of clusters identified by the evolving clustering algorithm described. Therefore, they have the following composition, considering the standardized data stream 𝐝̂𝐤 (Angelov, 2013; Birek et al., 2014; Lughofer, 2013): Rule𝑖 ∶

(24)

The 𝜏𝑘𝑖 values should be normalized in order to measure the contribution of each fuzzy rule output. The sum of all activation degrees must be equal to 1, i.e.:

(20)

where 𝜇𝑖𝑗 is the membership function of the 𝑖th cluster and 𝑗th input or component in 𝐝̂𝐤 , for 𝑗 = 1, … , 𝑛. This condition is very important to avoid the redundancy and control high overlap between the clusters. All existing clusters that satisfy this condition are replaced by the new cluster generated in Condition 𝐴. The Gaussian membership function 𝜇𝑖𝑗 is computed for the inputs or components in a data stream 𝐝̂𝐤 according to the following equation (Angelov, 2013; Birek et al., 2014): ( ) 𝑗 − 𝑑̂𝑘𝑗 )2 (𝑓̂𝑖∗ 𝑗 ̂𝑗 𝜇𝑖 (𝑑𝑘 ) = exp − , for 𝑖 = 1, 2, …, 𝑅 and 𝑗 = 1, 2, …, 𝑛, (21) 𝑗 2 2(𝜎𝑘𝑖 )

𝑖 = 1, 2, … , 𝑅; 𝑗 = 1, 2, … , 𝑛 + 1,

for 𝑖 = 1, 2, … , 𝑅.

𝑗=1

Rule𝑖 ∶

1 𝑛 𝐈𝐅 (𝑐̂𝑘−1 ∼ 𝑓̂𝑖1∗ ) 𝐀𝐍𝐃 ⋯ 𝐀𝐍𝐃 (𝑐̂𝑘−1 ∼ 𝑓̂𝑖𝑛∗ ) 𝐢 𝐢 𝐓𝐇𝐄𝐍 𝐱̂ 𝐤 = 𝐀𝐤 𝐱̂ 𝐤−𝟏 + 𝐁𝐤 𝐮̂ 𝐤−𝐬 ,

(26)

where 𝐀𝐢𝐤 ∈ R𝑛×𝑛 , 𝐁𝐢𝐤 ∈ R𝑛×𝑛 , and: 𝐱̂ 𝐤 = [𝑐̂𝑘1 , 𝑐̂𝑘2 , … , 𝑐̂𝑘𝑛 ]𝑇 ,

𝐱̂ 𝐤 ∈ R𝑛×1 ;

1 2 𝑛 𝐱̂ 𝐤−𝟏 = [𝑐̂𝑘−1 , 𝑐̂𝑘−1 , … , 𝑐̂𝑘−1 ]𝑇 , 𝐱̂ 𝐤−𝟏 ∈ R𝑛×1 ;

(27)

1 2 𝑛 𝐮̂ 𝐤−𝐬 = [𝑐̂𝑘−𝑠 , 𝑐̂𝑘−𝑠 , … , 𝑐̂𝑘−𝑠 ]𝑇 , 𝐮̂ 𝐤−𝐬 ∈ R𝑛×1 .

In other words, the extracted unobservable components are the states and the state space models represent their dynamic behaviours. The vector 𝐮̂ 𝐢𝐤−𝐬 considers patterns information at time 𝑘 − 𝑠, where 𝑠 is the seasonal period of the time series. The 𝑠 value should be informed by experts or estimated from sample autocovariance function or periodogram (Brockwell and Davis, 2002). For seasonal series, the information at this instant is important because it shows a similar behaviour to the current instant 𝑘. The Local Fuzzy Weighted Recursive Least Squares (L-FWRLS) (Angelov, 2010) method estimates the matrices 𝐀𝐢𝐤 and 𝐁𝐢𝐤 of consequent propositions for each fuzzy rule, using the normalized activation degrees 𝜆𝑖𝑘 (Eq. (25)). So, considering the patterns or states obtained from Extraction layer:

1 2 3 𝐈𝐅 (𝑐̂𝑘−1 ∼ 𝑓̂𝑖1∗ ) 𝐀𝐍𝐃 (𝑐̂𝑘−1 ∼ 𝑓̂𝑖2∗ ) 𝐀𝐍𝐃 (𝑐̂𝑘−1 ∼ 𝑓̂𝑖3∗ ) 𝐀𝐍𝐃 ⋯ (23) 𝑛 𝑛 ̂ 𝐀𝐍𝐃 (𝑐𝑘−1 ∼ 𝑓𝑖∗ )

where 𝑓̂𝑖𝑗∗ is the focal point of the cluster 𝑖 for the 𝑗th input or component, with 𝑗 = 1, … , 𝑛. The algorithm that represents the running of this layer is presented below: Algorithm 3 Evolving Computation of Antecedent Propositions 1: Formulate the data stream 𝐝𝐤 (Eq. (13)); 2: Standardize 𝐝𝐤 (Eq. (14), Eq. (15) and Eq. (16)); 3: Compute the density 𝐷𝑘 (𝐝̂𝐤 ) (Eq. (18)); 4: for 𝑖 = 1 to 𝑅 do 5: Compute 𝐷𝑘 (𝐟̂𝐢∗ ) (Eq. (19)); 6: end for 7: if Condition 𝐴 is true (Eq. (17)) then 8: Generate a new cluster: 𝑅 = 𝑅 + 1 and 𝐟̂𝐑∗ = 𝐝̂𝐤 ; 9: if Condition 𝐵 is true (Eq. (20)) then 10: Replace the clusters that satisfy this condition; 11: end if 12: end if 13: for 𝑖 = 1 to 𝑅 do 14: Update the membership functions 21; 15: Update the clusters radii 22. 16: end for

𝐱̂ 𝐤 = 𝐀𝐢𝐤 𝐱̂ 𝐤−𝟏 + 𝐁𝐢𝐤 𝐮̂ 𝐤−𝐬 , [ ] [ ] 𝐱̂ 𝐤−𝟏 𝐱̂ 𝐤 = 𝐀𝐢𝐤 𝐁𝐢𝐤 , 𝐮̂ 𝐤−𝐬

(28)

𝐱̂ 𝐤 = 𝚫𝒊𝑻 𝐯𝐓 . 𝐤 Thus, the recursive estimation for each fuzzy rule 𝑖 = 1, 2, … , 𝑅 is given by: 𝐆𝐢𝐤 =

𝐏𝐢𝐤−𝟏 𝐯𝐓 𝐯𝜆𝑖𝑘 𝐏𝐢𝐤−𝟏 𝐯𝐓 + 1

,

𝐆𝐢𝐤 ∈ R2𝑛×1 ,

𝚫𝒊𝒌 = 𝚫𝒊𝐤−𝟏 + 𝜆𝑖𝑘 𝐆𝐢𝐤 (𝐱̂ 𝐤𝐓 − 𝐯𝚫𝒊𝐤−𝟏 ), , 𝐏𝐢𝐤 = 𝐏𝐢𝐤−𝟏 − 𝜆𝑖𝑘 𝐆𝐢𝐤 𝐯𝐓 𝐏𝐢𝐓 𝐤−𝟏 277

𝚫𝒊𝒌 ∈ R2𝑛×𝑛 ,

𝐏𝐢𝐤 ∈ R2𝑛×2𝑛 ,

(29)

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

where 𝐆𝐢𝐤 is the gain matrix for each rule and 𝐏𝐢𝐤 is the 𝑖th fuzzy rule covariance matrix. A suggestion to initialize the L-FWRLS for each fuzzy rule is to consider the matrix 𝚫𝒊𝒌 null and the covariance matrix 𝐈 × 103 ≤ 𝐏𝐢𝐤 ≤ 𝐈 × 107 , where 𝐈 is the identity matrix (Ljung, 1987). The running of this layer is represented in the following algorithm:

2.2. Monitoring of clusters quality During the procedure of NF-TS layers, the clusters are monitored, removing those have low quality. The following condition 𝐶 is responsible to perform the clusters or fuzzy rules quality evaluation for seasonal time series forecasting. If this condition is satisfied, the normalized activation degree of cluster 𝑖 (𝜆𝑖𝑘 ) will be equal to zero, indicating that the fuzzy rule output will not have influence on NF-TS global states (Eq. (30)). The proposed Condition 𝐶 is:

Algorithm 5 Recursive Computation of Consequent Propositions 1: for 𝑖 = 1 to 𝑅 do 2: Obtain the normalized activation degree values 𝜆𝑖𝑘 (Eq. (25)); 3: Apply L-FWRLS method (Eq. (29)). 4: end for

𝐈𝐅 (𝛺𝑘𝑖 < 𝛺̄ 𝑘 − 𝛺̂ 𝑘 ) 𝐀𝐍𝐃 (𝑆𝑘𝑖 < 3) 𝐀𝐍𝐃 (𝑘 ≥ 𝐼𝑖∗ + 10) 𝐓𝐇𝐄𝐍 𝜆𝑖𝑘 ← 0,

where 𝑆𝑘𝑖 is the support of cluster 𝑖 and 𝐼𝑖∗ is the time 𝑘 when the cluster or fuzzy rule 𝑖 was generated. The term 𝛺𝑘𝑖 corresponds to 𝑖th cluster utility at the instant 𝑘, which is computed as (Angelov, 2013, 2010): ∑𝑘 𝑖 𝑞=1 𝜆𝑞 𝛺𝑘𝑖 = , for 𝑖 = 1, 2, … , 𝑅, (34) 𝑘 − 𝐼𝑖∗

2.1.5. Layer 5: inference engine—composition of rules by weighted average After the procedure performed by L-FWRLS method, the Inference Engine layer has the function of generating the global states of NF′ TS denoted by 𝐱̂ 𝐤′ , considering the contribution 𝐱̂ 𝐤𝐢 of each fuzzy rule 𝑖 through the weighted average by the normalized activation degree 𝜆𝑖𝑘 . These terms mean:

where the numerator is the sum of all normalized activation degrees of cluster 𝑖 since its generation. The terms 𝛺̄ 𝑘 and 𝛺̂ 𝑘 denote respectively the mean value and standard deviation of utility after 𝑘 time series samples are read. Both can be computed recursively by (Angelov, 2013, 2010):



∙ 𝐱̂ 𝐤𝐢 : standardized output containing the components values after the update of state matrices for each fuzzy rule 𝑖, i.e., the submodel output; ∙ 𝐱̂ 𝐤′ : global standardized output containingcontaining the components values after the update of state matrices, i.e., the forecasting of each component.

𝑘−1 ̄ 1 𝛺̄ 𝑘 = 𝛺̄ 𝑊 +1 = 0 (initial value), 𝛺𝑘−1 + 𝛺𝑘 , 𝑘 𝑘 √ 𝑘 − 1 ̂2 1 𝛺̂ 𝑘 = 𝛺𝑘−1 + (𝛺𝑘 − 𝛺̄ 𝑘 )2 , 𝛺̂ 𝑊 +1 = 0 (initial value). 𝑘 𝑘

In this case, the global states are computed by: 𝐱̂ 𝐤′ =

𝑅 ∑



𝜆𝑖𝑘 𝐱̂ 𝐤𝐢 .

𝑖=1

2.3. Training stage formulation These layers perform their functions for each time series sample that arrives every instant, therefore, the NF-TS operate cyclically. The proposed NF-TS has two operation stages: the training stage and the forecasting stage. In the first, the observation 𝑦𝑘 from the time series is available. So, the NF-TS is trained and it evolves (add or remove fuzzy rules) to perform the mapping between the unobservable components data at time 𝑘 − 1 and one step ahead time series data at time 𝑘. The Alg. 8 presents the complete operation of training stage, considering 𝑁 time series samples used for NF-TS training.

Algorithm 6 Inference Engine 1: for 𝑖 = 1 to 𝑅 do ′ 𝐢 2: Compute the local states 𝐱̂ 𝐤𝐢 = 𝐀𝐢𝐤 𝐱̂ 𝐤−𝟏 + 𝐁𝐢𝐤 𝐮̂ 𝐢𝐤−𝐬 ; 3: end for 4: Compute the global state 𝐱̂ ′ (Eq. (30)). 𝐤 2.1.6. Layer 6: forecasting based on unobservable states The last layer uses the global states obtained from the previous layer (Eq. (30)) and rebuild the time series in instant 𝑘 sum up the components (property of SSA method Golyandina et al., 2001). However, 𝐱̂ 𝐤′ data are standardized, therefore, this layer needs to return to original data range using the standard deviation and the mean previously computed in Eq. (15). Thus, the global states in original range 𝐱̃𝑘′ are: ′



for 𝑗 = 1, 2, … , 𝑛.

2.4. Forecasting stage formulation On the other hand, when there is not available data 𝑦𝑘 , the forecasting stage works. In this case, the NF-TS does not evolve and it does not update the fuzzy rules consequent, i.e., it keeps its structure fixed. So, the Extraction of Unobservable Components layer and the Recursive Computation of Consequent Propositions layer do not run. The first layer does not work because there is not 𝑦𝑘 available to extract the unobservable components. In order to obtain the unobservable components or patterns values and build the data stream 𝐝𝐤 , the global ′ state 𝐱̃ 𝐤−𝟏 computed in previous instant by NF-TS is feedback to its input, then:

(31)

The forecasting value for the time series at instant 𝑘 of NF-TS network is computed as: 𝑦̃𝑘 = [1 1 ⋯ 1]̃𝐱𝐤′ , 𝑛 ∑ ′ 𝑦̃𝑘 = 𝑥̃ 𝑗𝑘 , 𝑦̃𝑘 =

(35)

After the Inference Engine layer, the clusters that satisfy the condition 𝐶 are removed.

(30)

Note that the proposed methodology performs the forecasting of each component separately, since the global output is composed of the global state vector 𝐱̂ 𝐤′ . Below, the Alg. 6 details the procedure performed in Inference Engine layer:

𝑥̃ 𝑗𝑘 = 𝑥̂ 𝑗𝑘 𝛿𝑘𝑗 + 𝑑̄𝑘𝑗 ,

(33)



𝐓 𝐝𝐤 = [̃𝐱𝐤−𝟏 NULL],

(32)

1 2 𝑛 𝐝𝐤 = [𝑐̃𝑘−1 𝑐̃𝑘−1 ⋯ 𝑐̃𝑘−1 NULL].

𝑗=1 𝑐̃𝑘1 + 𝑐̃𝑘2 + ⋯ + 𝑐̃𝑘𝑛 .

(36)

After, the Computation of the Normalized Activation Degrees of the Rules layer runs normally, as well as Inference Engine and the Computation of Time Series Based on Unobservable Components layers to generate new forecasting data. The standardization process of this data stream is performed too. To describe the forecasting stage operation, Alg. 9 is presented below, considering the forecast horizon ℎ and 𝑁 the number of time series samples used in training stage.

The Alg. 7 represents this layer of proposed methodology: Algorithm 7 Forecasting Based on Unobservable States Transform the global state data 𝐱̂ 𝐤′ to the original scale (Eq. (31)); 2: Obtain the NF-TS forecasting by Eq. (32). 1:

278

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Algorithm 8 Training Stage 1: Set 𝜁 and initial radii values; 2: Initial Conditions of RSSA (Alg. 1); 3: for 𝑘 = 𝑊 + 1 to 𝑁 do 4: Use Alg. 2; 5: if the first 𝐝𝐤 then 6: Initial Conditions of Standardization (Eq. (16)); 7: Initial Conditions of Utility (Eq. (35)); 8: Initial Conditions of Density (Eq. (18)); 9: else 10: Data stream standardization (Eq. (14)); 11: Evolving Computation of Antecedent Propositions Layer (Alg. 3); 12: Computation of the Normalized Activation Degrees of the Rules Layer (Alg. 4); 13: if Condition 𝐴 is not satisfied (Eq. (17)) then 14: Associate the point 𝐝̂𝐤 with the existing cluster with the highest activation degree 𝜏𝑘𝑖 ; 15: end if 16: Recursive Computation of Consequent Propositions Layer (Alg. 5); 17: Update the utility (34); 18: Update parameters of utility equation (35); 19: for 𝑖 = 1 to 𝑅 do 20: if Condition 𝐶 is satisfied (33) then 21: Disable the influence of fuzzy rule 𝑖 on the global output (𝜆𝑖𝑘 = 0); 22: end if 23: end for 24: Inference Engine Layer (Alg. 6); 25: Global output of NF-TS (Alg. 7); 26: Removing of clusters that satisfy the Condition 𝐶. 27: end if 28: end for

Therefore, the values for these parameters must be properly selected for each time series in following experiments. 3.1. Quantitative analysis Seven benchmark time series are considered in this subsection to evaluate quantitatively the performance of evolving methodology, comparing this proposed approach with others forecasting techniques ˘ epni˘cka et al. paper (St˘ ˘ epni˘cka et al., 2013). These time presented in St˘ series contain real-world data from different areas and domains, which makes them interesting to forecast. All these series are monthly, with stronger seasonality defined by 𝑠 = 12. They can be easily obtained from the Hyndman’s Time Series Data Library (Hyndman, 0000) and their descriptions are: 1. International airline passengers: monthly totals in thousands, Jan 49–Dec 60 (144 samples); 2. Monthly total number of pigs slaughtered in Victoria, Jan 1980– Aug 1995 (188 samples); 3. Monthly car sales in Quebec, 1960–1968 (108 samples); 4. Monthly gasoline demand Ontario gallon millions, 1960–1975 (192 samples); 5. Monthly milk production: pounds per cow, Jan 62–Dec 75 (168 samples); 6. Industry sales for printing and writing paper (in thousands of French francs). January 1963–December 1972 (120 samples); 7. Portland Oregon average monthly bus ridership (/100) January 1973 through June 1982 (114 samples). ˘ epni˘cka et al. paper (St˘ ˘ epni˘cka et The first technique discussed in St˘ al., 2013) is a well-known method used as comparison benchmark: a seasonal variant of Autoregressive Integrated Moving Average (ARIMA) model. The ARIMA model was implemented in this reference paper through the ForecastingPro (Goodrich, 2000) software, which executed a full automatic parameter selection for ARIMA. Other method is the Automatic Design of Artificial Neural Networks (ADANN), which considers genetic algorithms to build ANN structures. The next technique discussed in this experiment is the Support Vector Machines (SVM). Finally, the last two techniques have hybrid characteristics, combining the linguistic fuzzy approach with Artificial Neural Networks (FANN) and SVM (FSVM). ˘ epni˘cka et al. paper (St˘ ˘ epni˘cka The same evaluation criteria used in St˘ et al., 2013) are adopted in this analysis, that are:

Algorithm 9 Forecasting Stage 1: for 𝑘 = 𝑁 to 𝑁 + ℎ do 2: States feedback; 3: Formulate the data stream (Eq. (36)); 4: Data stream standardization (Eq. (14)); 5: Computation of the Normalized Activation Degrees of the Rules Layer (Alg. 4); 6: Inference Engine Layer (Alg. 6); 7: Global output of NF-TS (Alg. 7). 8: end for

∙ Symmetric Mean Absolute Percentage Error (SMAPE): SMAPE =

𝑁+ℎ |𝑦𝑘 − 𝑦̃𝑘 | 1 ∑ × 100%; ℎ 𝑘=𝑁+1 (|𝑦𝑘 | + |𝑦̃𝑘 |)∕2

(37)

∙ Mean Absolute Scaled Error (MASE):

3. Experimental results

MASE =

In this section, the proposed evolving methodology is evaluated and its results are compared with other important papers widely cited in literature. First, seven benchmarks seasonal time series are considered in the experiments. All experiments are carried out using MATLAB on a 1.7 GHz CPU. The proposed methodology has some critical parameters whose variation affects the results:

𝑁+ℎ 1 ∑ ℎ 𝑘=𝑁+1

1 𝑁−1

|𝑦𝑘 − 𝑦̃𝑘 | . ∑𝑁 𝑗=2 |𝑦𝑗 − 𝑦𝑗−1 |

(38)

The SMAPE e MASE methods are considered simultaneously in order to perform a consistent evaluation of compared methodologies. It is important to compare the results for different forecasting horizons as well. Therefore, two different forecasting horizons are adopted: ℎ1 = 12 and ℎ2 = 24. The comparison results between the techniques according to the SMAPE and MASE criteria are shown in Tables 1 and 2, respectively. The proposed methodology presented better results for all time series in both forecasting horizons (ℎ1 and ℎ2 ), except for the gasoline series, where the ADANN technique was better for ℎ2 , with SMAPE value of 3.8 and with MASE value of 0.77. In relation to writing time series, there was a technical draw between the NF-TS and ADANN for ℎ2 , both with SMAPE value of 7.7 and with MASE value of 0.5. An overall comparison is performed using the arithmetic mean and median computed for each

1. 𝑊 : initial amount of time series samples considered (Section 2.1.1); 2. 𝐿: lagged vectors dimension of SSA method (Section 2.1.1); 3. 𝑛: number of components (Section 2.1.1); 𝑗 4. 𝜎𝑊 : initial cluster radii when the fuzzy rules are generated for +1 each input 𝑗 (Eq. (22)); 5. 𝜁: learning constant (Eq. (22)). 279

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Table 1 Quantitative comparison of the techniques by SMAPE criteria (best values in bold). Time series

ARIMA

ADANN

SVM

FANN

FSVM

NF-TS

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

Passengers Pigs Car sales Gasoline Milk Writing Bus ridership

6.5 6.1 7.4 5.5 0.8 7.3 9.0

8.0 7.1 9.1 6.2 0.9 9.9 13.8

2.2 7.3 10.6 4.1 1.4 7.1 3.5

2.5 11.5 9.8 3.8 2.3 7.7 6.0

5.0 5.8 11.0 6.2 1.1 6.1 5.6

7.1 7.2 10.0 7.0 1.3 8.3 7.4

2.1 6.7 12.1 4.0 1.1 8.8 16.1

2.5 8.2 10.0 5.6 1.1 8.7 18.5

3.0 7.7 11.6 4.8 1.0 7.5 14.1

3.9 7.8 10.9 5.9 1.0 9.9 17.7

1.9 4.6 5.4 2.3 0.5 5.3 2.5

2.3 6.3 7.8 4.0 0.9 7.7 3.2

Mean Median

6.1 6.5

7.9 8.0

5.2 4.1

6.2 6.0

5.8 5.8

6.9 7.2

7.3 6.7

7.8 8.2

7.1 7.5

8.2 7.8

3.2 2.5

4.6 4.0

Table 2 Quantitative comparison of the techniques by MASE criteria (best values in bold). Time series

ARIMA

ADANN

SVM

FANN

FSVM

NF-TS

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

Passengers Pig Car sales Gasoline Milk Writing Bus ridership

1.24 0.64 0.54 1.12 0.19 0.47 3.06

1.60 0.76 0.66 1.27 0.21 0.69 4.77

0.42 0.74 0.70 0.83 0.31 0.42 1.15

0.51 1.17 0.68 0.77 0.53 0.50 1.95

1.02 0.61 0.72 1.24 0.25 0.42 1.84

1.52 0.76 0.69 1.41 0.29 0.60 2.42

0.39 0.70 0.79 0.80 0.24 0.61 5.66

0.49 0.87 0.75 1.15 0.24 0.62 6.52

0.56 0.80 0.75 0.96 0.22 0.52 4.91

0.80 0.81 0.74 1.22 0.22 0.69 6.18

0.37 0.50 0.39 0.45 0.12 0.38 0.76

0.48 0.66 0.53 0.80 0.20 0.50 1.00

Mean Median

1.04 0.64

1.42 0.76

0.65 0.70

0.87 0.68

0.87 0.72

1.10 0.76

1.31 0.70

1.52 0.75

1.25 0.75

1.52 0.80

0.42 0.39

0.60 0.53

tables column. Therefore, according with mean and median, NF-TS presented the best performance for all time series and both horizons in these experiments, because the NF-TS has lower error values. However, it is not possible to generalize that the proposed methodology will get better results for any time series, but its forecasting potential was clearly shown in these experiments. Additionally, Table 3 shows the values for critical parameters in proposed NF-TS that provided the best results represented in Tables 1 and 2, considering both horizon forecasts. Furthermore, the number of fuzzy rules or clusters generated in training stage used to forecast each time series (𝑅) and the running time in seconds of experiments are also shown in Table 3. Observing the mean, note that for horizon ℎ2 is necessary a higher initial amount of time series samples 𝑊 for the SSA method in general. Moreover, greater values for 𝐿 and a larger number of unobservable components 𝑛 is required to ℎ2 comparing to ℎ1 . On the other hand, the initial radii value is smaller for ℎ2 . Thus, more clusters are needed to cover the data space, increasing 𝑅. The learning constant 𝜁 and the runtime are in same level according to the mean for both forecasting horizons. Another advantage of the proposed method is its running time. While the approaches based on SVM and ADANN running in ten of seconds and ˘ epni˘cka et al., 2013), the NF-TS needs tens of minutes, respectively (St˘ less than a second, due to its recursive feature. For monthly time series, time requirements are not very demanding, but in other applications, the time can be a key issue. The NF-TS is still able to evolve its structure of fuzzy rules, fitting to the data behaviour.

Fig. 3. Writing time series.

for ℎ1 = 12. Thus, the first 108 samples from time series are used in the training stage, and the remainder is compared with the forecasting data of proposed NF-TS. Remembering that the chosen parameters for this experiment with writing time series and ℎ1 = 12 is shown in Table 3. In Fig. 5, the results of training and forecasting stages for this time series are showed. In black, the real time series data are represented. The black initial part corresponds to the 𝑊 samples from initial interval  in Eq. (1). The NF-TS network results of training stage, when this network evolve its structure, is represented in blue. In red, the forecasting data ℎ1 = 12 steps ahead are displayed. Note that a good approximation was achieved even when there are not available data of original time series to evolve the NF-TS. The three extracted unobservable components or patterns from this time series (respectively 𝑐𝑘1 , 𝑐𝑘2 , and 𝑐𝑘3 ) are showed in Figs. 6, 7, and 8. In blue, the value of components or states are displayed and the values of states that are feedback in forecasting stage are showed in red. The state space model keeps the pattern of the dynamics from each unobservable component, obtaining good results in forecasting.

3.2. Characteristics evaluation of proposed forecasting methodology In this subsection, an evaluation of the characteristics of proposed forecasting methodology is performed, considering the writing time series (Fig. 3) commented previously. This time series has 120 samples and it is monthly with seasonal period is 𝑠 = 12. This seasonal period can be confirmed when we visualize the sample autocorrelation function of the time series with lag until 40 in Fig. 4, which indicates a repeating pattern every 12 time series samples. Therefore, the information at the time 𝑘−12 is very useful for forecasting. The evolving NF-TS network needs to forecast the last twelve time series data, i.e., the experiment performed in the previous subsection 280

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Table 3 Parameter values for each time series in both forecast horizons. Time series

𝑊 1

𝐿 1

𝑖 𝜎1𝑗

𝑛 2

1

2

𝜁

Time (s)

𝑅



2











ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

ℎ1

ℎ2

Passengers Pig Cars Gasoline Milk Writing Bus ridership

14 21 13 13 13 14 13

19 20 15 25 17 30 13

6 3 3 5 4 3 3

3 6 6 5 6 7 4

6 3 3 5 4 3 3

3 6 6 5 6 7 4

0.14 0.09 0.33 0.09 0.05 0.45 0.12

0.06 0.14 0.34 0.179 0.14 0.1 0.14

0.5 0.5 0.5 0.5 0.5 0.5 0.5

0.5 0.5 0.5 0.51 0.5 0.5 0.5

5 5 6 3 6 3 6

19 11 4 9 4 5 3

0.22 0.25 0.17 0.26 0.37 0.17 0.19

0.36 0.37 0.19 0.29 0.24 0.17 0.16

Mean

14.4

19.8

3.86

5.28

3.86

5.29

0.18

0.16

0.5

0.5

4.86

7.86

0.23

0.25

Fig. 6. Unobservable component 𝑐𝑘1 of writing time series. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 4. Sample autocorrelation function of writing time series.

Fig. 5. Results in training and forecasting stage for writing time series. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 7. Unobservable component 𝑐𝑘2 of writing time series. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Because of the evolving feature of this methodology, the number of fuzzy rules or clusters is modified, increasing or reducing, trying to fit its structure to time series data. Thus, the evolution of these fuzzy rules or clusters for the writing series in training stage (where the NF-TS structure evolve) is represented in Fig. 9. The density behaviour of data streams for writing time series is shown in Fig. 10. For each instant 𝑘 where the data stream density 𝐷𝑘 (blue) goes beyond the minimum (green) or maximum (red) limits presented in Condition 𝐴 (Eq. (17)), a fuzzy rule or cluster is generated.

To analyse the features of clusters, we consider a particular cluster generated at 𝑘 = 16. The radii of this cluster on each input axis related with the three unobservable components and the time series one-step ahead observation (4-dimensional space) are shown in Fig. 11. The radii value begins with 0.45 (Table 3) and they will adapt according to the arriving data in training stage. The utility of this same cluster can be seen in Fig. 12. Along the time, the utility of clusters decreases. This fact usually occurs with older 281

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Fig. 11. Radii in 4-dimensional space of a particular cluster generated in 𝑘 = 16. Fig. 8. Unobservable component 𝑐𝑘3 of writing time series. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 12. Utility of a particular cluster generated in 𝑘 = 16.

Fig. 9. Clusters amount during training stage for writing time series.

clusters. If there were more samples in training stage, this cluster would probably be removed with the Condition 𝐶 (Eq. (33)). Finally, the normalized activation degree 𝜆𝑖𝑘 for each instant 𝑘 is showed in Fig. 13. The 𝜆𝑖𝑘 values for training stage and forecasting stage are displayed in blue and red, respectively. These values show the influence degree of this cluster in time series forecasting conducted by NF-TS. 3.3. Case study: detection of anomalies based on electrocardiograms (ECG) data In this case study, a time series with data from an electrocardiogram (ECG) of a patient with heart failure, is considered (Baim et al., 1986). An electrocardiogram is a type of examination where the variations of electrical potentials generated by the heart are recorded. This test is used for analysis of cardiac diseases, where the experts can visualize anomalies and arrhythmias (Baim et al., 1986). Thus, the objective of the proposed methodology, in this case study, is to evolve and adapt to the ECG behaviour for monitor the electrical potentials generated by the heart in real-time. In training stage, the normal behaviour of patient is learned and represented by NF-TS, while the forecasting stage performs the monitoring and comparison. If the difference between the forecasting data and the measured data is very large, the error will be considered high, when the expert should be

Fig. 10. Density of data streams for condition 𝐴 evaluation. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

282

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

Fig. 15. Number of clusters or fuzzy rules for electrocardiogram data.

Fig. 13. Normalized activation degree of a particular cluster generated in 𝑘 = 16.

Fig. 14. NF-TS Results for electrocardiogram data of a patient. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 16. Unobservable component 𝑐𝑘1 of electrocardiogram data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

alerted about the nonstandard behaviour. The seasonal period was determined by inspection, in this case, where 𝑠 = 210. In Fig. 14, the electrocardiogram data in the training stage is showed in blue and the proposed methodology reproduces the behaviour considered normal in red. However, the real data (in black) of electrocardiogram indicate this signal magnitude is lower than expected. Therefore, the proposed methodology may alert about the patient’s anomaly in this case. The evolving behaviour is evidenced in Fig. 15, where the modification of NF-TS structure through the quantity of fuzzy rules or clusters, is shown. Finally, the four unobservable components extracted from the electrocardiogram are shown in Figs. 16–19. 4. Final remarks This paper presented an evolving methodology applied to the problems of forecasting and characterization of time series. This methodology is based on an evolving neuro-fuzzy architecture and it use the extracted unobservable patterns data to forecast future time series observations. Fig. 17. Unobservable component 𝑐𝑘2 of electrocardiogram data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

4.1. Conclusions The proposed methodology achieved good forecasting results for seasonal time series compared with other techniques in literature, according the evaluation criteria considered. This results are relevant because 283

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285

∙ Consider time series with other characteristics, for example, chaotic and multivariate; ∙ Evolving extraction of useful knowledge from Big Data, exploring the deep learning concept. Acknowledgements The authors thank the financial support by CAPES, to IFMA and PPGEE-UFMA for encouraging the development of this paper. References Abdollahzade, M., Miranian, A., Hassani, H., Iranmanesh, H., 2015. A new hybrid enhanced local linear neuro-fuzzy model based on the optimized singular spectrum analysis and its application for nonlinear and chaotic time series forecasting. Inform. Sci. 295, 107–125. http://dx.doi.org/10.1016/j.ins.2014.09.002. Angelov, P., 2010. Evolving Takagi-Sugeno fuzzy systems from streaming data. In: Angelov, P., Filev, D., Kasabov, N. (Eds.), Evolving Intelligent Systems: Methodology and Applications. IEEE Press Series on Computacional Intelligence. Published by John Wiley & Sons, Hoboken, New Jersey, pp. 21–50 Chapter 2. Angelov, P., 2013. Autonomous Learning Systems: From Data Streams to Knowledge in Real-time. John Wiley & Sons, Lancaster University, United Kingdom. Baim, D., Colucci, W., Monrad, E., Smith, H., Wright, F., Lanoue, A., Gauthier, D., Ransil, B., Grossman, W., Braunwald, E., 1986. Survival of patients with severe congestive heart failure treated with oral milrinone. J. Amer. College Cardiol. 7, 661–670. Bergmeir, C., Hyndman, R.J., Bentez, J.M., 2016. Bagging exponential smoothing methods using STL decomposition and BoxCox transformation. Int. J. Forecast. 32, 303–312. http://dx.doi.org/10.1016/j.ijforecast.2015.07.002. Birek, L., Petrovic, D., Boylan, J., 2014. Water leakage forecasting: the application of a modified fuzzy evolving algorithm. Appl. Soft Comput. 14, 305–315. http://dx.doi. org/10.1016/j.asoc.2013.05.021. Bodyanskiy, Y., Vynokurova, O., 2013. Hybrid adaptive wavelet-neuro-fuzzy system for chaotic time series identification. Inform. Sci. 220, 170–179. http://dx.doi.org/10. 1016/j.ins.2012.07.044. Bodyanskiy, Y., Vynokurova, O., Setlak, G., Peleshko, D., Mulesa, P., 2017. Adaptive multivariate hybrid neuro-fuzzy system and its on-board fast learning. Neurocomputing 230, 409–416. http://dx.doi.org/10.1016/j.neucom.2016.12.042. Bordignon, F., Gomide, F., 2014. Uninorm based evolving neural networks and approximation capabilities. Neurocomputing 127, 13–20. http://dx.doi.org/10.1016/j.neucom. 2013.04.047. Brockwell, P.J., Davis, R.A., 2002. Introduction to Time Series and Forecasting, second ed.. Springer, New York. Chai, S., Lim, J., 2016. Forecasting business cycle with chaotic time series based on neural network with weighted fuzzy membership functions. Chaos Solitons Fractals 90, 118– 126. http://dx.doi.org/10.1016/j.chaos.2016.03.037. Cui, Y., Shi, J., Wang, Z., 2015. Complex Rotation Quantum Dynamic Neural Networks (CRQDNN) using Complex Quantum Neuron (CQN): Applications to time series prediction. Neural Netw. 71, 11–26. http://dx.doi.org/10.1016/j.neunet.2015.07. 013. Deng, C., 2014. Time series decomposition using singular spectrum analysis, Master’s thesis, East Tennessee State University, Johnson City, Tennessee. de Oliveira, J.F.L., Ludermir, T.B., 2016. A hybrid evolutionary decomposition system for time series forecasting. Neurocomputing 180, 27–34. http://dx.doi.org/10.1016/ j.neucom.2015.07.113. Gan, M., Cheng, Y., Liu, K., Zhang, G., 2014. Seasonal and trend time series forecasting based on a quasi-linear autoregressive models. Appl. Soft Comput. 24, 13–18. http: //dx.doi.org/10.1016/j.asoc.2014.06.047. Golyandina, N., Nekrutkin, V., Zhigljavsky, A., 2001. Analysis of Time Series Structure: SSA and Related Techniques. Chapman & Hall/CRC, New YorkLondon. Goodrich, R., 2000. The Forecast Pro methodology. Int. J. Forecast. 16, 533535. http: //dx.doi.org/10.1016/S0169-2070(00)00086-8. Harvey, A., Koopman, S., 2009. Unobserved components models in economics and finance: The role of the Kalman filter in time series econometrics. IEEE Control Syst. Mag. 29, 71–81. http://dx.doi.org/10.1109/MCS.2009.934465. Hassani, H., Mahmoudvand, R., 2013. Multivariate singular spectrum analysis: a general view and new vector forecasting approach. Int. J. Energy Stat. 1, 55–83. http://dx. doi.org/10.1142/S2335680413500051. Homenda, W., Jastrzebska, A., 2017. Clustering techniques for Fuzzy Cognitive Map design for time series modeling. Neurocomputing 232, 3–15. http://dx.doi.org/10. 1016/j.neucom.2016.08.119. Huang, X., Ye, Y., Xiong, L., Lau, R., Jianga, N., Wang, S., 2016. Time series k-means: A new k-means type smooth subspace clustering for time series data. Inform. Sci. 367–368, 1–13. http://dx.doi.org/10.1016/j.ins.2016.05.040. Hyndman, R., 0000. Time Series Data Library URL https://datamarket.com/data/list/?q= provider%3Atsdl Accessed: 07.22.16. Junior, E.S.G., 2006. Exponential smoothing: The state of the art - Part II. Int. J. Forecast. 22, 637–666. http://dx.doi.org/10.1016/j.ijforecast.2006.03.005.

Fig. 18. Unobservable component 𝑐𝑘3 of electrocardiogram data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 19. Unobservable component 𝑐𝑘4 of electrocardiogram data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

the methodology use the divide-and-conquer philosophy, identifying simpler patterns, whose dynamic behaviour of them is modelled using equations in state space. Besides that, the method has the advantage to be evolving, i.e., learning and identifying NF-TS structure from unobservable components data, where the fuzzy rules number can increase or decrease. This paper showed that the unobservable components can be exploited to perform the forecasting. It is also important to highlight the contribution of this paper in relation to RSSA method, which has been modified to be incorporated in evolving proposed methodology. Another contribution of this proposed methodology is the condition of clusters or fuzzy rules exclusion using some quality measures, especially for seasonal time series. Therefore, this proposed methodology is an important alternative to forecast seasonal time series, because offers good forecasting results when consider the identifies unobservable time series patterns that are interesting to experts. 4.2. Future works Finally, future works are presented below: ∙ Choose suitable critical parameters values for NF-TS automatically; 284

S.E. Rodrigues Júnior, G.L.O. Serra

Engineering Applications of Artificial Intelligence 64 (2017) 272–285 Peng, H., Wu, C.W.S., Lee, S., 2015. Time series forecasting with a neuro-fuzzy modeling scheme. Appl. Soft Comput. 32, 481–493. http://dx.doi.org/10.1016/j.asoc.2015.03. 059. Rocha, O., Serra, G., 2017. Adaptive neuro-fuzzy black-box modeling based on instrumental variable evolving algorithm. J. Control Autom. Syst. 28, 50–67. http://dx.doi.org/ 10.1007/s40313-016-0285-8. Rodriguez-Fernandez, V., Menendez, H.D., Camacho, D., 2017. Analysing temporal performance profiles of UAV operators using time series clustering. Expert Syst. Appl. 70, 103–118. http://dx.doi.org/10.1016/j.eswa.2016.10.044. Siminski, K., 2017. Interval type-2 neuro-fuzzy system with implication-based inference mechanism. Expert Syst. Appl. 79, 140–152. http://dx.doi.org/10.1016/j.eswa.2017. 02.046. Singh, P., Borah, B., 2013. High-order fuzzy-neuro expert system for time series forecasting. Knowl.-Based Syst. 46, 12–21. http://dx.doi.org/10.1016/j.knosys.2013.01.030. ˘ epni˘cka, M., Cortez, P., Donate, J.,St˘ ˘ epni˘ckov, L., 2013. Forecasting seasonal time St˘ series with computational intelligence: on recent methods and the potential of their combinations. Expert Syst. Appl. 40, 1981–1992. http://dx.doi.org/10.1016/j.eswa. 2012.10.001. Theodosiou, M., 2011. Forecasting monthly and quarterly time series using STL decomposition. Int. J. Forecast. 27, 1178–1195. http://dx.doi.org/10.1016/j.ijforecast.2010. 11.002. Tung, S., Quek, C., Guan, C., 2013. eT2IS: An Evolving Type-2 Neural Fuzzy Inference System. Inform. Sci. 220, 124–148. http://dx.doi.org/10.1016/j.ins.2012.02.031. Wang, X., Han, M., 2015. Improved extreme learning machine for multivariate time series online sequential prediction. Eng. Appl. Artif. Intell. 40, 28–36. http://dx.doi.org/10. 1016/j.engappai.2014.12.013. Wang, W., Pedrycz, W., Liu, X., 2015. Time series long-term forecasting model based on information granules and fuzzy clustering. Eng. Appl. Artif. Intell. 41, 1724. http://dx.doi.org/10.1016/j.engappai.2015.01.006. Wu, S., Lee, S., 2015. Employing local modeling in machine learning based methods for time-series prediction. Expert Syst. Appl. 42, 341–354. http://dx.doi.org/10.1016/j. eswa.2014.07.032. Wu, L., Liu, S., Yang, Y., 2016. Grey double exponential smoothing model and its application on pig price forecasting in china. Appl. Soft Comput. 39, 117–123. http: //dx.doi.org/10.1016/j.asoc.2015.09.054. Xiong, T., Bao, Y., Hu, Z., Chiong, R., 2015. Forecasting interval time series using a fully complex-valued RBF neural network with DPSO and PSO algorithms. Inform. Sci. 305, 77–92. http://dx.doi.org/10.1016/j.ins.2015.01.029. Yang, D., Sharma, V., Ye, Z., Lim, L.I., Zhao, L., Aryaputera, A.W., 2015. Forecasting of global horizontal irradiance by exponential smoothing, using decompositions. Energy 81, 111–119. http://dx.doi.org/10.1016/j.energy.2014.11.082. Young, P., 2011. Recursive Estimation and Time-Series Analysis: An Introduction for the Student and Practitioner, second ed. Springer. Zhang, G.P., Qi, M., 2005. Neural network forecasting for seasonal and trend time series. European J. Oper. Res. 160, 501–514. http://dx.doi.org/10.1016/j.ejor.2003.08.037.

Júnior, S.R., Serra, G., 2016. An evolving algorithm based on unobservable components neuro-fuzzy model for time series forecasting, In: IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), pp. 122–129 http://dx.doi.org/10.1109/EAIS. 2016.7502502. Kazemi, R., Abdollahzade, M., 2015. Introducing an evolving Local Neuro-Fuzzy Model Application to modeling of car-following behavior. ISA Trans. 59, 375–384. http: //dx.doi.org/10.1016/j.isatra.2015.09.002. Komijani, M., Lucas, C., Araabi, B., Kalhor, A., 2012. Introducing evolving Takagi-Sugeno method based on local least squares support vector machine models. Evol. Syst. 3, 81–93. http://dx.doi.org/10.1007/s12530-011-9043-0. Lapidot, I., Shoa, A., Furmanov, T., Aminov, L., Moyal, A., Bonastre, J.-F., 2017. Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization. Comput. Speech Lang. 45, 1–20. http://dx.doi.org/10.1016/j. csl.2017.01.011. Lemos, A., Caminhas, W., Gomide, F., 2011. Multivariable Gaussian evolving fuzzy modeling system. IEEE Trans. Fuzzy Syst. 19, 91–104. http://dx.doi.org/10.1109/ TFUZZ.2010.2087381. Li, C., Chiang, T., Yeh, L., 2013. A novel self-organizing complex neuro-fuzzy approach to the problem of time series forecasting. Neurocomputing 99, 467–476. http://dx.doi. org/10.1016/j.neucom.2012.07.014. Li, C., Hu, J., 2012. A new ARIMA-based neuro-fuzzy approach and swarm intelligence for time series forecasting. Eng. Appl. Artif. Intell. 25, 295–308. http://dx.doi.org/10. 1016/j.engappai.2011.10.005. Li, D.Z., Wang, W., Ismail, F., 2014. An evolving fuzzy neural predictor for multidimensional system state forecasting. Neurocomputing 145, 381–391. http://dx.doi. org/10.1016/j.neucom.2014.05.014. Ljung, L., 1987. System Identification: Theory for the User. Prentice-Hall, New Jersey, USA. Lughofer, E., 2011. Evolving Fuzzy Systems: Methodologies, Advanced Concepts and Applications. Springer. Lughofer, E., 2013. On-line assurance of interpretability criteria in evolving fuzzy system Achievements, new concepts and open issues. Inform. Sci. 251, 22–46. http://dx.doi. org/10.1016/j.ins.2013.07.002. Lughofer, E., Angelov, P., 2011. Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl. Soft Comput. 11, 2057–2068. http://dx.doi.org/10. 1016/j.asoc.2010.07.003. Lughofer, E., Sayed-Mouchaweh, M., 2015. Autonomous data stream clustering implementing split-and-merge concepts Towards a plug-and-play approach. Inform. Sci. 304, 54–79. http://dx.doi.org/10.1016/j.ins.2015.01.010. Mastorocostas, P., Hilas, C., 2012. A computational intelligence-based forecasting system for telecommunications time series. Eng. Appl. Artif. Intell. 25, 200–206. http://dx. doi.org/10.1016/j.engappai.2011.04.004. Papageorgiou, E., Poczetac, K., 2017. A two-stage model for time series prediction based on fuzzy cognitive maps and neural networks. Neurocomputing 232, 113–121. http: //dx.doi.org/10.1016/j.neucom.2016.10.072.

285