A fast learning algorithm for evolving neo-fuzzy neuron

A fast learning algorithm for evolving neo-fuzzy neuron

Applied Soft Computing 14 (2014) 194–209 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/...

3MB Sizes 13 Downloads 183 Views

Applied Soft Computing 14 (2014) 194–209

Contents lists available at ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

A fast learning algorithm for evolving neo-fuzzy neuron Alisson Marques Silva a,∗ , Walmir Caminhas a , Andre Lemos a , Fernando Gomide b a b

Graduate Program in Electrical Engineering, Federal University of Minas Gerais, Avenida Antonio Carlos, 6627, 31270-901, Belo Horizonte, MG, Brazil School of Electrical and Computer Engineering, University of Campinas, Avenida Albert Einstein, 400, 13083-852, Campinas, SP, Brazil

a r t i c l e

i n f o

Article history: Received 8 September 2012 Received in revised form 19 January 2013 Accepted 24 March 2013 Available online 3 May 2013 Keywords: Evolving neural fuzzy systems Neo-fuzzy neuron Adaptive modeling

a b s t r a c t This paper introduces an evolving neural fuzzy modeling approach constructed upon the neo-fuzzy neuron and network. The approach uses an incremental learning scheme to simultaneously granulate the input space and update the neural network weights. The neural network structure and parameters evolve simultaneously as data are input. Initially the space of each input variable is granulated using two complementary triangular membership functions. New triangular membership functions may be added, excluded and/or have their parameters adjusted depending on the input data and modeling error. The parameters of the network are updated using a gradient-based scheme with optimal learning rate. The performance of the approach is evaluated using instances of times series forecasting and nonlinear system identification problems. Computational experiments and comparisons against alternative evolving models show that the evolving neural neo-fuzzy network is accurate and fast, characteristics which are essential for adaptive systems modeling, especially in real-time, on-line environments. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Fuzzy sets and systems have been successful in many instances of problems in various application domains such as modeling, control, classification, regression. Examples of recent developments include [1–4] in system modeling, [5–7] in system control, [8–15] in forecasting and prediction, [16–21] in classification, and [22–27] in fault detection and diagnosis, to mention a few. There is an increasing interest to augment fuzzy systems with learning and adaptation capabilities. Two of the most visible approaches combine fuzzy systems with the learning and adaptation ability of neural networks and evolutionary algorithms methods. The current challenge is to develop methodologies and algorithms to build systems with high degree of adaptability and autonomy developed from data streams collected on-line, eventually in real time [28]. Adaptability and autonomy mean the aptitude to simultaneously modify the system structure and its parameters whenever new information is found in input data. The evolving system paradigm provides such a capability because it allows learning from experience, inheritance, gradual changes, and knowledge generation from data streams. From the neural networks point of view, an evolving approach may start with an empty or initial untrained neural network and develop new links and weight values from

∗ Corresponding author. Tel.: +55 37 3229 1166; fax: +55 37 3229 1154. E-mail addresses: [email protected], [email protected] (A.M. Silva), [email protected] (W. Caminhas), [email protected] (A. Lemos), [email protected] (F. Gomide). 1568-4946/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.asoc.2013.03.022

data. The development of the neural network structure is gradual and neurons are neither fixed, nor pre-defined. Continuous learning may produce new neurons each time new data do not fit the existing model [29]. This paper suggests an evolving fuzzy neural modeling approach using the neo-fuzzy neuron (NFN) introduced by [30] and its extensions [31,32] as the basic processing unit. The choice of the NFN is due to its simplicity, transparency, low computational cost and accuracy, items of utmost importance especially in on-line and real time environments. The original NFN and its extensions granulate the input space with triangular membership functions whose number and modal values remain fixed. The network weights are updated, but the number of fuzzy neurons does not change during learning. Contrary to the previous NFN models, in this paper both, the granulation and modal values of the membership functions of each input variable can be simultaneously updated. The neural fuzzy network evolve its structure and modify its weights whenever necessary, as dictated by input data. That is, whenever a new sample is input, connection weights and the parameters of membership functions can be modified and/or new membership functions added/eliminated to keep a uniform and small output error for any value of the input variable. After this introduction the rest of the paper proceeds as follows. Next section briefly overviews related work to recall the main ideas and approaches for evolving fuzzy systems modeling. Section 3 describes the neo-fuzzy neuron (NFN) and network while Section 4 addresses its learning algorithm. Section 5 details the evolving neo-fuzzy neural network (eNFN) approach. In Section 6 the computational complexity of eNFN is compared with alternative

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Fig. 1. Mechanism to update model structure.

evolving methods. Section 7 focuses on streamflow forecasting, the classic Mackey–Glass benchmark, and on the identification of a high-dimensional nonlinear system. It also includes experiments to compare the processing time of eNFN against other evolving models considering the dimension of the input space and the number of neurons. Finally, Section 8 concludes the paper summarizing its contributions and issues for future developments. 2. Related works Evolving fuzzy system modeling is a synergy between fuzzy systems and methods of machine learning. It is characterized by its ability to extract and update knowledge from data streams to account for environmental changes [33]. The system model structure is not fixed or predetermined a priori. Model structure is produced incrementally, allowing new features to be added, deleted or modified to better fit information contained in data. The algorithms developed for evolving fuzzy system modeling constitute a major contribution to build adaptive, nonlinear models. Most of them use a form of recursive clustering algorithm that processes new samples to incrementally adjust or review the existing clusters [34]. Usually, the model structure is found during the clustering step because key model components, such as fuzzy neurons or fuzzy rules, are directly associated with clusters. Evolving clustering procedures may add, delete or modify existing components dynamically. A typical mechanism to update the model structure is shown in Fig. 1. The evolving fuzzy system modeling approach was originally introduced in [35–38], to meet a demand for flexible, adaptive, robust and interpretable models needed by many applications in several areas such as smart sensors, autonomous systems, nonlinear adaptive control, fault detection and diagnosis, forecasting, knowledge extraction. Industrial applications have also been in the applications agenda [39]. There is nowadays an increasing demand for evolving fuzzy systems problem solving methodologies and algorithms. Currently, such methodologies and algorithms are under intensive development to form a basis for truly intelligent systems [40]. Several evolving functional fuzzy rule-based modeling methods have been developed lately, especially for fuzzy functional models in the Takagi–Sugeno (TS) form [41]. Here, the model structure (number of rules) is found by unsupervised recursive clustering algorithms. The rule consequent parameters are updated using recursive least squares and its variations. Major approaches include the ones reported in [3,4,12,24,27,29,35,42–47]. Online systems identification is addressed in [48]. This scheme is inspired on a learning algorithm which recursively updates the

195

structure of TS models and their parameters through a combination of supervised and unsupervised learning. This is the Evolving Takagi–Sugeno (eTS) approach. The adaptive nature, in combination with transparent and extremely compact rule base makes eTS a important tool for on-line modeling and control of complex processes. Faster version of eTS, Simpl eTS, was suggested in [44,48]. The aim of Simpl eTS is to simplify computations of potential information. Simpl eTS combines the concept of dispersion with a measure of data density, which makes the algorithm computationally more efficient. It uses Cauchy membership functions instead of Gaussians, and recursive least squares algorithm for dynamic parameter learning. Further development of eTS [48] is the xTS (eXtended Takagi–Sugeno) [49]. The xTS combines zeroorder and first-order Takagi–Sugeno models with incremental data clustering. Two early approaches for evolving fuzzy neural networks (EFuNNs), mEFuNNs and dmEFuNNs [50] adopt hybrid learning (supervised/unsupervised) to obtain Mamdani [51] linguistic and functional Takagi–Sugeno rules, respectively. The dynamic evolving neuro-fuzzy inference system (DENFIS) was proposed in [37,52]. DENFIS evolves incrementally using a hybrid learning method to insert, update or delete rules and classes. Learning employs evolving clustering method (ECM) [53] to adapt the rule-base structure, and a weighted recursive least squares with forgetting factor algorithm to update rules consequent parameters. DENFIS uses a local generalization, as EFuNNs [50] and the CMAC networks [54], and needs more training data than models based on global generalization such as ANFIS [55] and MLP [34]. The flexible fuzzy inference system (FLEXFIS) [46,56,57], uses a recursive clustering algorithm derived from a modification of the vector quantization technique (eVQ) developed in [58]. Consequent parameters estimation is done with the weighted recursive least squares algorithm. Different implementations of FLEXFIS are addressed in literature, e.g. [4,14,15,57,59]. Applications of FLEXFIS includes control and prediction of pollutant emissions (NOx) of automotive internal combustion engines, as detailed in [15]. A feature selection scheme with incremental learning for evolving fuzzy classifier (FLEXFIS-Class) was introduced in [21]. In FLEXFISClass, weights are assigned to each input variable according to their relevance. Variables important for class discrimination have high values while less important variables have their weights reduced. The larger the weight of a variable, the higher its impact on learning. The weights are continuously updated during learning. A on-line self-organizing fuzzy modified least squares network (SOFMLS) [60] was another major development. SOFMLS employs a novel evolving nearest neighborhood clustering algorithm. An important development is the sequential adaptive fuzzy inference system (SAFIS) [61] which uses a distance criterion together with an influence measure for the rules. Participatory learning (PL) [62], in particular a robust clustering method based on it developed in [63], was used in [64] to perform granulation of the input space in the evolving fuzzy participatory modeling approach (ePL). The PL paradigm is based on a mechanism of human learning whose main idea is that current knowledge is part of the learning process. In PL an association is made between the information collected and current knowledge to decide whether current knowledge should be smoothed or revised [65]. ePL uses TS fuzzy rules, participatory clustering to update rule base, and a recursive least squares procedure to update rule consequent parameters. A new approach to modeling evolving fuzzy systems called multivariable Gaussian evolving fuzzy modeling (eMG) was introduced in [8]. The eMG is an evolving TS functional fuzzy scheme that uses a new recursive clustering algorithm based on participatory learning to continuously update the rule base. The evolving clustering can be seen as an extension of the approach detailed in [63]. Unlike the

196

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

complementary triangular membership functions as shown in Fig. 3. The corresponding mi TS rules are as follows

Ri1

if xi is Ai1

then yi is yi1 = qi1

Ri2

if xi is Ai2

then yi is yi2 = qi2

.. . j

then yi is yij = qij

if xi is Aij

Ri

(2)

.. . mi

Ri

then yi is yimi = qimi

if xi is Aimi

The values of yi = fi (xi ) are found using

mi yi = fi (xi ) =

j=1

wij yij

mi

w j=1 ij

=

a b

(3)

where, wij = Aij (xi ) and yij = qij . Thus,

mi yi =

j=1

Aij (xi )qij

mi

j=1

Aij (xi )

=

a . b

(4)

Fig. 2. Neo-fuzzy neuron and network structure.

original algorithm, in eMG each cluster is represented by a multivariable Gaussian function and the cluster structure is updated recursively with thresholds set automatically. More recently, [66] introduced a fuzzy linear regression tree approach for evolving fuzzy system modeling. The topology of the tree is continuously updated through a statistical model selection test. The approach was evaluated using time series forecasting problems. The fuzzy linear regression tree is generated from a data stream and statistical methods of model selection. This is done recursively, replacing current tree leaves by sub-trees that improve the model quality. Quality is evaluated by a screening test, taking into account the accuracy and the number of parameters of the resulting model. An extension of evolving fuzzy linear regression tree was reported in [67]. This is a new approach that allows feature selection during learning. A structure evolved learning method (SELM) for fuzzy systems identification was proposed in [68]. The structure of SELM evolves aiming at reducing modeling error. Error reduction is performed by adding a new partition to a sub-region with the largest average error. Adding a new partition proceeds as follows. Problem domain is initially splitted into two sub-regions. The sub-region with the maximal average error is selected for further splitting. This splitting process continues until satisfactory accuracy is obtained. SELM was used to develop linguistic and functional fuzzy models. 3. Neo-fuzzy neuron and neural network This section reviews the fuzzy neural network model (NFN) constructed with the neo-fuzzy neuron introduced in [30]. Basically, the structure of the NFN (see Fig. 2) mirrors a set of n zero-order TS functional rules. Its output is computed using y = f1 (x1 ) + · · · + fn (xn ) =

n  i=1

fi (xi ) =

n 

yi (xi ).

(1)

i=1

Each output yi is computed using set of mi TS fuzzy rules. The domain of each input variable xi is granulated into m

Since the membership functions are complementary, at most two of the mi rules are active for a given input xi . These two rules are indexed by ki and ki + 1. A rule is active when its firing degree is positive. Fig. 3 shows an example. In this case only rules Ri2 and Ri3 are active because the membership degree of xi is positive for fuzzy sets Ai2 and Ai3 . Alternatively, we may say that the membership functions of fuzzy sets Ai2 and Ai3 are active. The value of ki is the index of the first active membership function, and it can be easily found computing the smallest positive difference between the input variable xi and the modal values of the membership functions. Because the membership functions are complementary b becomes: b=

mi 

Aij (xi ) = Aik (xi ) + Aik +1 (xi ) = 1. i

i

(5)

j=1

Therefore, as Fig. 4 shows, the structure of the neo-fuzzy neural network is simplified because yi (xi ) = a = Aik (xi )qiki + Aik +1 (xi )qiki +1 . i

i

(6)

Notice that only the active neurons are relevant during data processing, and thus only the weights of the active neurons need updating during NFN learning. This explain why, as it will be seen later, the NFN has smaller learning and processing speed than alternative neural network models. This is especially important when modeling high dimensional systems because the number of operations is kept smaller and only simple functions computation is required [69]. 4. Learning algorithm of NFN In this section we summarize the procedure to update neo-fuzzy network weights using a gradient-based algorithm with optimal learning rate. Here learning is supervised and aims to adjust parameters qij . The procedure developed next works in a per sample basis. Alternatives, such as batch and momentum learning, have been addressed in [30].

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

197

Fig. 3. Complementary membership functions.

Let xt = (xt1 , xt2 , . . ., xti , . . ., xtn ) be the input at t-input, yt the corresponding output, and ytd the desired output value. With the values of yt and ytd , one can compute the squared error (7) at t 1 Et = (yt − ytd )2 = Et (qij ). 2

(7)

If granulation and membership functions is fixed, then learning of NFN translates in finding weights to minimize Et . Recalling that for each input only membership functions iki (xti ) and iki +1 (xti ) have nonzero values, minimization of Et turns out to be min Et = Et (qij ) = Et (˜q)

qiki = qiki − ˛(yt − ytd )iki (xti ).

(10)

In general the learning rate ˛ is found using unidimensional search [70]. In this work we adopt the closed formula developed in [32] to compute ˛. Because Et (˜q) is quadratic we can compute ˛ easily to achieve zero error at each learning step. The learning step formula is

n

1

(8)

˛=

(9)

It is always possible to find a q˜ 1 such that Et (˜q) is null at t, independently of the initial value of q˜ 0 .

where qij ∈ R, ∀i = 1, . . . , n; j = ki , (ki + 1), and q˜ is q˜ = [q1k1 q1(k1 +1) . . . qiki qi(ki +1) . . . qnkn qn(kn +1) ].

simplest search direction is the steepest descent dt = (yt − ytd ). Therefore, qiki is updated as follows

Note that Et (˜q) has 2n variables and is a quadratic function of the weights [32]. Thus, local minimum is also global. Et (˜q) is a nonlinear function and hence convergence to the optimal solution depends of the initial point, search direction, and learning rate (step-size). NFN learning sets null initial values. The

 (x )2 + iki +1 (xti )2 i=1 iki ti

.

(11)

5. Evolving neo-fuzzy neural network The evolving neo-fuzzy neural network (eNFN) updates its weights using the gradient search with optimal learning rate similarly as described in Section 4. However, in contrast with NFN, the evolving approach updates the model structure whenever data is input. Moreover, the modal values of the membership functions, number of membership functions, and the number of neurons can be modified simultaneously during weights update. The eNFN learning can be summarized by the following four steps: 1. Choose initial values of the membership functions parameters. This step is performed only once using the lower and upper bounds of the input variables domains. 2. Compute the membership degrees of input xt , find the most active membership functions, and update their modal values. 3. Check whether the most active membership function represents well the neighborhood of xt . Decide if a new membership function should be created and inserted to refine the neighborhood of xt . 4. Find the oldest inactive membership function. Decide if this membership function should be deleted. A membership function is deleted if its age, the number of steps during which it has been inactive, is greater than a threshold. A detailed description of the four steps is given next. 5.1. Initialization of the membership functions

Fig. 4. Simplified neo-fuzzy neural network structure.

We assume complementary triangular membership functions. A triangular membership function can be represented by three parameters: the lower limit (a), the modal value (b) and the upper limit (c). Since the membership functions are complementary, the

198

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Fig. 5. Membership functions parameters.

modal value of the k-th function is bk , and its lower limit is the modal value of the lower adjacent membership function, ak = bk−1 , and its upper limit is the modal value of the upper adjacent membership function, ck = bk+1 , as shown in Fig. 5. The modal value of the membership functions are found as follows

greater than the global mean error, then the local region is refined adding new membership function as follows. 2 of the global modeling error ˆ m and variance ˆ m The mean value  is computed recursively, for input xt , as follows  ˆ mt =  ˆ mt−1 − ˇ( ˆ mt−1 − et )

(16)

bij = xmini + (j − 1)i

2 2 ˆ m = (1 − ˇ)(ˆ m + ˇ( ˆ mt − et )2 ) t t−1

(17)

(12)

where i indexes the input variable, j indexes the membership function, xmini is the lower bound of the i-th input variable, and i is i =

(xmaxi − xmini )

(13)

mi − 1

and xmaxi the upper bound of the ith input variable, and mi the number of the membership functions of the i-th input variable. In systems modeling there may be changes in the environment that enable the emergence of data whose values are outside the range defined by maximum and minimum bounds. Therefore, it is important that the values of xmini and xmaxi be updated to adapt to the prevailing environmental conditions. The procedure to update input variables bounds is called context adaptation and can be done as follows: if xti < xmini then xmini = xti and bi1 = xmini , if xti > xmaxi then xmaxi = xti and bim1 = xmaxi .

(14)

5.2. Modal values of the membership functions adaptation This step finds b∗i , the index of the most active membership function enabled by xti , i = 1, . . ., n. If b∗i > 1 and b∗i < mi , than the b∗i -th membership function has its modal value updated using (15). Otherwise, if the most active function is such that b∗i = 1 or b∗i = mi , then the modal value is kept the same because this is the case in which modal values corresponds to xmini and xmaxi , respectively. bnew = bold + ˇ(xti − bold ) i,b∗ i,b∗ i,b∗ i

i

(15)

i

ˇ is the learning rate.

where, et = yt − ytd . Similarly, the local mean error  ˆ bt corresponding to the most active membership function (b∗i ) is computed recursively, for input xt  ˆ bt =  ˆ bt−1 − ˇ( ˆ bt−1 − ebt ).

To prevent an excessively fine granulation of a region of the input domain a limit  is use to control the smallest distance between the modal values of adjacent membership functions. That is, if b∗i > 1 and b∗i < mi , then the distance between the modal values of membership functions after a new function is added is given by (20). Otherwise, if the most active function is such that b∗i = 1 or b∗i = mi , then the distance is computed using (23) and (25), respectively. Notice that the number of rules is not fixed a priori; it depends of the learning process only. Limiting excessive granulation of a particular region of the input domain is a mechanism to avoid complex models and overfitting. The value of  allows, indirectly, control of the number of rules. 2 and dist > , then a new memMore precisely, if  ˆ bt >  ˆ mt + ˆ m t bership function is created. The value of  is computed using =

xmaxi − xmini

(19)



where,  is the number of membership functions that granulates the input variable domain. The new membership function requires updating the i-th input variable domain as follows. • Case 1: If b∗ = / 1 and b∗i < mi , then the most active function is i replaced by two membership functions whose modal values are given by (21) and (22). This procedure is illustrated in Fig. 6.

5.3. Creation of the membership functions dist = Creation and insertion of a new membership functions aims to refine input space granulation to reduce output error uniformly. Unlike [68], where the refinement of the input space granulation is performed for the region with the maximal average error, in this paper the granulation of input space is done based on the error of the current most active membership function. Here, the mean value of the local output errors of the rules corresponding to the most active membership functions are compared against the mean value of the overall modeling error. If the local mean error value is

(18)

bi,b∗ +1 − bi,b∗ −1 i

i

.

(20)

new b1 = bi,b∗ −1 + dist.

(21)

new b2 = bi,b∗ −1 + 2dist.

(22)

3 i

i

• Case 2: If the most active membership function is such that b∗ = 1, i then a new membership function is created between the first and second. The modal value of the newly created membership function is given by (24). Fig. 7 shows the most active function

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Fig. 6. Membership function creation – Case 1.

199

Fig. 8. Membership function creation – Case 3.

and the input space granulation before and after the creation the new membership function. dist =

(bi,b∗ +1 − bi,b∗ ) i

i

2

.

new b = bi,b∗ + dist.

(23) (24)

i

• Case 3: If the most active function is such that b∗ = mi , then the i new membership function is created and inserted between the last and the previous with modal value given by (26), as shown in Fig. 8. dist =

(bi,b∗ − bi,b∗ −1 ) i

i

2

.

new b = bi,b∗ − dist. i

(25) (26)

In any of the three cases, the mean error of most active membership function and the number of membership functions for the corresponding variable should be updated. Notice that creation of a new membership function modifies the support of adjacent membership functions, but their modal values remains the same. Modifications of the supports is done to keep membership functions complementary. 5.4. Elimination of the membership functions This section describes a method for automatic elimination of membership functions and fuzzy rules using the concept of age of a rule. The concept of age of rule was introduced in [44] and extended in [71]. In this paper the concept of age is used to determine for how long a membership function has been inactive. The age of a membership function is found using Fig. 7. Membership function creation – Case 2.

agei = k − ai

(27)

200

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Fig. 10. Membership function elimination and adjustment – Case 2.

Fig. 9. Membership function elimination and adjustment – Case 1.

The learning algorithm of the eNFN has four parameters: where ai is the step at which i-th membership function was activated, and k is the current step. An membership function is eliminated as follows. For each input variable i, find b− , the index of the least active membership function i enabled by xt . The least active membership function is excluded if ageb− > ω and mi > 2

(28)

i

where ω, indicates the age limit of a membership function. After elimination of a membership function, granulation is updated as follows. • Case 1: If b− = / 1 and b− < mi , then the least active membership i i function is eliminated and the lower and upper limits of the support of adjacent membership functions adjusted. The procedure is illustrated in Fig. 9a which shows input domain granulation before and after Fig. 9b, exclusion of the membership function. In the example illustrated (9) Ai4 is the least active membership function. Exclusion of Ai4 requires the upper limit of the support of Ai3 and the lower limit of the support of Ai5 to be adjusted. The upper and lower limits change to keep membership functions complementary. • Case 2: If the least active membership function is such that b− = 1, i then the least active membership function is eliminated and the modal value of the adjacent membership functions set to xmini . In this case, exclusion of the membership function modifies the lower limit of the support of adjacent membership function with its modal value modified. Fig. 10a shows the input space and the least active function before, and Fig. 10b the input space granulation after excluding the membership function. Notice in Fig. 10b that exclusion of a membership function changes the modal value of Ai2 and the lower limit of the support of Ai3 . • Case 3: If the least active membership function is such that b− = mi , then the least active membership function is eliminated i and the modal value of the adjacent membership functions set to xmaxi . Here, exclusion of a membership function modifies the upper limit of the support of its adjacent membership functions, as shown in Fig. 11a and b, respectively.

• mi : the number of the membership functions in the granulation of the i-th input variable domain, i = 1, . . ., n. At least two complementary triangular membership functions to cover the input variable domain are necessary. Granulations should start with two or more membership functions, that is mi ≥ 2. In all experiments reported in this paper, the domain of each input variable is initially granulated using two membership functions (mi = 2) because if necessary new membership functions can be added, depending on the input data and modeling error. • ˇ: learning rate used to update the modal values of the membership functions, to compute the mean value and the variance of the global modeling error, and to compute the mean error of the

Fig. 11. Membership function elimination and adjustment – Case 3.

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

most active membership function. The learning rate ˇ is usually set to a small value e.g. ˇ ∈ [10−3 , 10−1 ]. • : parameter to help computing the smallest distance allowed between modal value of the membership function to be created and the modal values of its adjacent membership functions. Therefore, indirectly it limits for the number of rules. Values for  are chosen considering the following: (1) low values of  may not be enough to capture information in input data and may decrease model performance; (2) high values of  increase model complexity and may cause overfitting. Practice suggests  ∈ [5, 25]. • ω: threshold to delete a membership function by age. The threshold used to eliminate a membership function is associated with the ability of the model to forget old knowledge whenever it becomes useless. The value of ω is typically chosen between 50 and 250. Algorithm 1 shows how to compute the eNFN model output and Algorithm 2 details the learning procedure. Algorithm 1.

Compute eNFN output.

Input: xt , ytd ; Output: yt ; Initialize bij ; for t = 1, b, . . . do Read xt , ytd ; for i = 1 : n do Compute iki (xti ), iki +1 (xti ); Compute yi ; endfor Compute ˛; Update q˜ ; Compute yt ; endfor

Algorithm 2.

Learning of eNFN.

Input: xt , mi , ˇ, , ω; for t = 1, 2, . . . do // Update Parameters; Check context adaptation; Update bij ; j Update qik ;

The complexity of eNFN depends primarily on the number of input variables. The number of membership functions and/or number of rules are not relevant once they do no affect eNFN performance. Therefore complexity of eNFN is O(n), where n is the number of input variables. Complexity of DENFIS, eMG, eTS and xTS depend on the number of input variables (n) and the number of rules (r). DENFIS and eMG are O(rn2 ) and O(rn)3 , respectively, while eTS and xTS are both O(rn3 ). Clearly eNFN has the lowest complexity among DENFIS, eMG, eTS and xTS. The eNFN is the only model whose complexity depends on the dimension of the input space linearly. Another important factor to note is that operations performed by eNFN are multiplication, addition and comparison only. 7. Computational results The eNFN was evaluated through forecasting of the average weekly inflow for a hydroelectric plant using actual data, Section 7.1, using the Mackey–Glass benchmark, Section 7.2, and a highdimensional system identification problem, Section 7.3. The results obtained were compared with four evolving systems: DENFIS, eMG, eTS and xTS. The Matlab implementation of DENFIS adopted for comparison is available in the Knowledge Engineering and Discovery Research Institute (Kedros).1 The Matlab code of eMG, and the Java implementation of eTS and xTS were provided by their respective authors. The eNFN itself was developed in Matlab. Evolving systems frameworks are primarily developed for nonstationary dynamic systems and time-varying environments. In these circumstances, traditional techniques used to estimate parameters, such as cross-validation, are not useful. In the experiments reported in this work we opted to use a new approach to estimate parameters, and to evaluate the performance of the models. This approach splits the dataset into two subsets with 50% of the samples each. The first subset is used to estimate the parameters of the models, and the second to test modeling and model performance. The best values of model parameters are found through exhaustive search. Parameters values that achieves the lowest error are used to evaluate model and modeling performance. The error measure adopted is the RMSE (Root Mean Squared Error) (29):



i

ˆ mt ; Compute  2 Compute ˆ m ; t for i = 1 : n do // Creation of Membership Functions; Find b∗i ; Compute  ˆ bt ; Compute ; Compute dist; 2 ˆ bt >  ˆ mt + ˆ m ) and (dist > ) then if ( t Create and insert new membership function; Update parameters; endif // Elimination of Membership Functions; Update agei ; ; Find b− i if ageb− > ω and mi > 2 then

6. Complexity analysis The computational complexity of neural fuzzy networks based on NFN has been investigated and compared with feedforward multilayer artificial neural networks (ANN) in [32] where it is shown that the NFN-based neural network complexity is substantially smaller, and that its processing speed much faster than ANN.

1/2

1 (yt − yˆ t ) N

RMSE =

N

(29)

t=1

where N is the number of data, yt is the actual output and yˆ t is the model output. The performance of the models is evaluated through the RMSE and all samples in the test subset. During parameter estimation and model evaluation, model structure evolves for all data set samples. The RMSE is a common measure to verify model accuracy, but it does not reveal whether a model is statistically superior [66]. This paper uses the MGN [72], a parametric statistical test that helps to compare the accuracy of two models. The MGN is computed using

i

; Remove membership function indexed by b− i Update parameters; endif endfor endfor

201

MGN =

 ˆ sd



2 )/(n − 1)) ((1 − sd

(30)

where  ˆ sd is the estimated correlation coefficient between s = r1 + r2 and d = r1 − r2 , with r1 and r2 the residues of the two models. The test statistic is viewed as a Student’s t distribution with n − 1 degrees of freedom. In MGN test, if the models are equally accurate, then the correlation between s and d should be zero. Evolving models adapt their parameters and structure according to the data stream. Modeling starts with an initial sample and

1

http://www.aut.ac.nz/research/research-institutes/kedri/books.

202

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Table 1 Performance of the streamflow forecast methods.

Table 3 Performance of Mackey–Glass time-series forecasting.

Model

Number of rules

RMSE

Model

Number of rules

RMSE

DENFIS [37] eMG [8] eNFN eTS [48] xTS [49]

15 05 14 11 05

0.0562 0.0454 0.0551 0.0409 0.0408

DENFIS [37] eMG [8] eNFN eTS [48] xTS [49]

23 07 21 05 04

0.0425 0.0871 0.0745 0.3739 0.3750

estimate the output and adjust the parameters and structure for each subsequent inputs. To verify the behavior of the residual modeling error over time we suggest the Residual Sum of Squares (RSS) analysis. The RSS is computed as the sum of squared residuals for all inputs processed up to step k RSSk =

k 

(yt − yˆ t )2

Table 2 shows that eNFN performance is statistically similar to DENFIS, but lower than the eMG, eTS and xTS. Fig. 15 shows the Residual Sum of Squares (RSS) of models used for streamflow forecasting. The accumulated residual eNFN curve confirms the comparable performance of DENFIS and eNFN, and the statistical superiority of eMG, eTS and xTS.

(31)

t=1

7.2. Mackey–Glass time-series forecasting

7.1. Streamflow forecasting In this section, the eNFN is used to forecast the average weekly inflow of a large hydroelectric plant, located in northeastern Brazil (Sobradinho). The data cover the period between 1931 and 2000. As in [66] data were normalized between 0 and 1 to preserve privacy. Analysis and forecasting of seasonal streamflow series are extremely important in the management of water resources. A major difficulty in this type of forecasting is its non-stationary nature, due to periods of flood and drought during the year. The purpose of the model is to predict the current average weekly flow using lagged weekly flow values in the series. The model employed by [66] is of the form yˆ k = f (yk−1 , yk−2 , yk−3 )

(32)

The dataset used in the experiment consists of 3707 samples, of which 1854 is used to estimate the parameters of the models and 1853 to evaluate the performance of the models. The models evolve their structures and parameters for all observations in the dataset. Fig. 12 illustrates the membership functions (initial and final) to the streamflow forecasting. Fig. 13 shows the evolving of the structure (number of membership functions) of the model for each input variable and Fig. 14 illustrates the desired output and the output of eNFN for 500 first samples, where we see its fast convergence. All observations of the testing dataset were used to verify the performance of models using the RMSE, MGN and RSS. DENFIS was run as first-order Takagi–Sugeno with parameters following: distance threshold-dthr = 0.07 and number of rules in dynamic fuzzy inference system-mofn = 3. The eNFN parameters used were: ˇ = 0.01, w = 200 and  = 15. The eMG parameters adopted were: −2 = 0.05, w = 10, init = 10 I3 , ˛ = 0.01. The parameters of the eTS were r = 0.02, ˝ = 100 and xTS ˝ = 250. Table 1 shows the RMSE and the number of rules after simulation. It suggests that the eNFN performs better than the DENFIS and worse than the eMG, eTS and xTS. Table 2 illustrates the pairwise comparison between forecasts of the eNFN and remaining models using the MGN. The table shows the MGN statistic and the corresponding p-value. The results of Table 2 Pairwise comparison for streamflow forecast.

The Mackey–Glass time-series [73] is a classic benchmark used for long-term time-series forecasting. The series is created using the differential equation with time delay (33): dx(t) 0.2x(t − ) = − 0.1x(t) dt 1 + x(t − )10

(33)

where x(0) = 1.2 and  = 17. The goal is to make the prediction of xt+6 for any value of t, considering inputs xt−18 xt−12 xt−6 xt . The dataset consists of 3500 samples, where 3000 samples were collected with t ∈ [201, 3200] and 500 with t ∈ [5001, 5500] [37]. In these experiments we use the first 1750 samples to estimate the models parameters and the last 1750 samples to evaluate the performance of the models. Fig. 16 shows how the structure of the model evolves for each input. Fig. 17 depicts the actual value and the eNFN output. The experiment was conducted evolving model structure and parameters for all samples of the dataset. The DENFIS was run in online mode with parameters following: dthr = 0.1 and  mofn = 10. The eMG parameters adopted were: = 0.05, w = 25, init = 10−2 I4 , ˛ = 0.01. The eNFN parameters used were: ˇ = 0.01, w = 100 and  = 05. The parameters of the eTS were r = 0.06, ˝ = 500 and xTS ˝ = 500. The performance of the models were evaluated using the RMSE, MGN and RSS computed for all samples of the testing dataset. Table 3 shows the RMSE and the number of rules. They suggest that the best performance is obtained by DENFIS, followed by eNFN and eMG. The results achieved by DENFIS, eNFN and eMG are comparable, and are better than the eTS and xTS in an order of magnitude. Table 4 shows the pairwise comparisons between the eNFN and the remaining models using MGN. The results of Table 4 show that the eNFN is statistically superior to the eMG, eTS and xTS. They also show that the DENFIS is better than the eNFN. The Logarithm of the Residual Sum of Squares -Log (RSS)- of models for each step t is shown in Fig. 18. It suggests that the eNFN accumulated residual curve is lower than eTS and eTS, and similar to the eMG curve, and higher with respect to DENFIS. These results confirms that DENFIS adapts better to the flow of data than eNFN.

Table 4 Pairwise comparison for Mackey–Glass time-series forecasting.

Model

MGN

p-Value

Model

MGN

p-Value

eNFN vs DENFIS [37] eMG [8] vs eNFN eTS [48] vs eNFN xTS [49] vs eNFN

1.0249 15.0234 19.2015 18.0250

0.1528 0.0000 0.0000 0.0000

DENFIS [37] vs eNFN eNFN vs eMG [8] eNFN vs eTS [48] eNFN vs xTS [49]

27.8106 6.9862 5.7841 8.6803

0.0000 0.0000 0.0000 0.0000

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Fig. 12. Initial (above) and final (below) membership functions for streamflow forecast.

Fig. 13. Number of membership functions for streamflow forecast during learning.

Fig. 14. Streamflow forecast.

203

204

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Fig. 15. Residual sum of squares for streamflow forecast methods.

Fig. 16. Number of membership functions for times-series forecasting during learning.

Fig. 17. Mackey–Glass time-series.

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

205

Fig. 18. Logarithm of the residual sum of squares for Mackey–Glass time-series methods.

7.3. High-dimensional system identification problem This experiment evaluates the behavior of eNFN modeling when handling higher dimensional instances. The nonlinear system to be identified is as follows yt =

m yt−i i=1 m

1+

i=1

(yt−i )2

+ ut−1

(34)

where, ut = sin(2 t/20), and yj = 0 for j = 1, . . ., m and m = 10 [8]. The aim is to predict the current output using past inputs and outputs. The model is of the form yˆ t = f (yt−1 , yt−2 , . . . , yt−10 , ut−1 )

(35)

where, yˆ t is the model output. A total of 3300 samples have been produced, of which 1650 are used to estimate the parameters of the models, and 1650 to evaluate the performance of the model. Fig. 19 illustrates the actual and eNFN model outputs for the last 250 samples. Performance of the models were computed for all samples of the test data. DENFIS was run in online mode using the following = 4. The values of the paramparameters values: dthr = 0.1 and mofn −1 eters of eMG were: = 0.05, w = 10, init = 10 I11 , ˛ = 0.01. The eNFN adopted the following parameters values: ˇ = 0.01, w = 100 and  = 15. The eTS used r = 0.04 and ˝ = 750, and xTS ˝ = 750. The RMSE performance of the modeling approaches are summarized in Table 5. The best performance is achieved by eNFN followed by eMG. The performance of the eNFN and eMG are better than DENFIS, eTS and xTS by one order of magnitude. The pairwise comparisons between the eNFN and the remaining models using the MGN are presented in Table 6. The results of Table 2 show that the eNFN is statistically better than DENFIS, eTS and xTS and that the eNFN and eMG are statistically comparable.

Table 5 Modeling performance for high-dimensional nonlinear system identification.

The Logarithm of the Residual Sum of Squares -Log (RSS)- of models of Fig. 20 support the results of Tables 5 and 6, showing the superiority of eNFN when compared with DENFIS, eTS, and xTS. 7.4. Analysis of the runtime of the modeling approaches This section summarizes the simulation results performed to evaluate the runtime of the evolving modeling approaches addressed in this paper. The algorithms were run using the same parameters and datasets of sections 7.1, 7.2 and 7.3. Table 7 shows the average processing time (in milliseconds (ms)) of the algorithms to forecast average weekly inflows. Table 8 refers to the average processing times to forecast Mackey–Glass time-series, and Table 9 the average processing time for highdimensional system identification problem. Averages and standard deviations were computed after 10 repetitions of each experiment.

Table 6 Pairwise comparison for high-dimensional nonlinear system modeling. Model

MGN

p-Value

eNFN vs DENFIS [37] eNFN vs eMG [8] eNFN vs eTS [48] eNFN vs xTS [49]

20.8619 1.1195 109.5962 109.5019

0.0000 0.1316 0.0000 0.0000

Table 7 Processing time for streamflow forecast. Model

Number of rules

Time

Stand. deviation

DENFIS [37] eMG [8] eNFN eTS [48] xTS [49]

15 05 14 11 05

1.9643 1.3028 0.3128 1.5417 1.4580

0.0214 0.0175 0.0115 0.0125 0.0152

Table 8 Processing time for Mackey–Glass time-series forecast.

Model

Number of rules

RMSE

Model

Number of rules

Time (ms)

Stand. deviation

DENFIS [37] eMG [8] eNFN eTS [48] xTS [49]

13 10 107 9 3

0.2080 0.1244 0.1210 0.8303 0.8316

DENFIS [37] eMG [8] eNFN eTS [48] xTS [49]

23 07 21 05 04

6.0275 3.3908 0.5607 1.8574 1.5416

0.0245 0.0458 0.0365 0.0218 0.0167

206

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

Fig. 19. High-dimensional nonlinear system identification.

Fig. 20. Logarithm of the residual sum of squares for the high-dimensional nonlinear.

Tables 7–9 reveal that the eNFN shows the smallest processing time. For the three datasets, the processing time of the eNFN is at least an order of magnitude smaller than the processing time of the remaining approaches.

7.5. Analysis of the relationship between the size of the input space, number of rules and processing time This section addresses the scalability of eNFN in terms of the dimension of the input space and number of fuzzy rules. For this purpose, 19 datasets were generated with 3500 samples each, varying the dimension of the input space from 2 to 20-dimensional. Datasets were created using (34) and (35). As stated earlier, it is expected that the processing of the eNFN grows linearly with the dimension of the input space. It is also expected that the number of rules does not affect the processing time of eNFN.

Table 9 Processing time for high-dimensional nonlinear system identification. Model

Number of rules

Time (ms)

Stand. deviation

DENFIS [37] eMG [8] eNFN eTS [48] xTS [49]

13 10 107 09 03

3.3646 3.8738 1.3825 37.2541 7.1874

0.0216 0.0189 0.0258 0.0987 0.0584

Fig. 21 shows how the number of rules and processing time of the eNFN vary with the dimension of the input space. During this experiment, parameter ω was set to 100 for both models and parameter  was set to 10 and 20 for eNFN1 and eNFN2 , respectively. The number of rules generated for each dimension of the input space is displayed in Table 10. Table 10 Number of rules and dimension of the input space. Dimension

eNFN1

eNFN2

02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20

16 29 41 51 58 66 78 86 98 110 118 132 136 147 158 173 172 187 197

31 46 65 75 95 105 119 139 138 169 183 209 212 218 246 257 288 294 302

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

207

Fig. 21. Scalability as a function of the input space dimension.

As expected, the processing time of the eNFN increases linearly with the input space dimension and is affected slightly by the number of rules. Processing time increases when the number of rules increases mainly due to the computational cost to include/exclude rules. 8. Conclusion This paper has introduced a new approach to develop evolving fuzzy models. The approach uses the structure of neo-fuzzy neurons and network with triangular membership functions whose structure and parameters evolve as data are input. It is particularly attractive to model nonstationary nonlinear systems and time-varying environments. The approach was evaluated using time series forecasting and nonlinear systems identification problems. Computational results suggest that the eNFN has comparable error performance when compared with alternative fuzzy and neural fuzzy evolving modeling approaches. In addition to its accuracy, the eNFN requires lower computational resources than the remaining evolving alternatives by at least an order of magnitude. This indicates that the evolving neo-fuzzy approach detailed in this paper is a promising alternative to build accurate and fast evolving fuzzy models. Its simplicity and low processing time suggests eNFN as a suitable approach in real-time environments. Future work shall address techniques of adaptive feature selection and generalization of the model for problems with multiple outputs. Acknowledgements This work was supported by CAPES of the Brazilian Minister of Education and Innovation, the Brazilian National Research Council (CNPq) and the Research Foundation of the State of Minas Gerais (FAPEMIG). The authors are grateful to the anonymous reviewers for their constructive comments and help to improve the paper. References [1] W. Farag, V. Quintana, G. Lambert-Torres, A genetic-based neuro-fuzzy approach for modeling and control of dynamical systems, IEEE Transactions on Neural Networks 9 (5) (1998) 756–767, http://dx.doi.org/10.1109/72.712150.

[2] P. Angelov, R. Buswell, Identification of evolving fuzzy rule-based models, IEEE Transactions on Fuzzy Systems 10 (5) (2002) 667–677, http://dx.doi.org/10.1109/TFUZZ.2002.803499. [3] C. Cernuda, E. Lughofer, W. Marzinger, J. Kasberger, Nir-based quantification of process parameters in polyetheracrylat (pea) production using flexible nonlinear fuzzy systems, Chemometrics and Intelligent Laboratory Systems 109 (1) (2011) 22–33, http://dx.doi.org/10.1016/j.chemolab.2011.07.004. [4] C. Cernuda, E. Lughofer, L. Suppan, T. Roder, R. Schmuch, P. Hintenaus, W. Marzinger, J. Kasberger, Evolving chemometric models for predicting dynamic process parameters in viscose production, Analytica Chimica Acta 725 (2012) 22–38, http://dx.doi.org/10.1016/j.aca.2012.03.012. [5] F. Smith, A. Tighe, Adapting in an uncertain world, in: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 6, 2005, pp. 5958–5963, http://dx.doi.org/10.1109/ICSMC.2004.1401148. [6] R. Babuska, H. Verbruggen, An overview of fuzzy modeling for control, Control Engineering Practice 4 (11) (1996) 1593–1606, http://dx.doi.org/10.1016/0967-0661(96)00175-X. [7] J. Barros, A. Dexter, Evolving fuzzy model-based adaptive control, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’07, 2007, pp. 1–5, http://dx.doi.org/10.1109/FUZZY.2007.4295552. [8] A. Lemos, W. Caminhas, F. Gomide, Multivariable gaussian evolving fuzzy modeling system, IEEE Transactions on Fuzzy Systems 19 (1) (2011) 91–104, http://dx.doi.org/10.1109/TFUZZ.2010.2087381. [9] P. Chang, C. Fan, J. Hsieh, A weighted evolving fuzzy neural network for electricity demand forecasting, in: Proceedings of the Asian Conference on Intelligent Information and Database Systems, ACIIDS ’09, 2009, pp. 330–335, http://dx.doi.org/10.1109/ACIIDS.2009.93. [10] M. Nguyen, J. Guo, D. Shi, Esofcmac: evolving self-organizing fuzzy cerebellar model articulation controller, in: Proceedings of the IEEE International Joint Conference on Neural Networks, IJCNN ’06, 2006, pp. 3694–3699, http://dx.doi.org/10.1109/IJCNN.2006.247384. [11] P. Nayak, K. Sudheer, Fuzzy model identification based on cluster estimation for reservoir inflow forecasting, Hydrological Processes 22 (6) (2008) 827–841, http://dx.doi.org/10.1002/hyp.6644. [12] W. Wang, J. Vrbanek, An evolving fuzzy predictor for industrial applications, IEEE Transactions on Fuzzy Systems 16 (6) (2008) 1439–1449, http://dx.doi.org/10.1109/TFUZZ.2008.925918. [13] E. Lughofer, S. Kindermann, Sparsefis: data-driven learning of fuzzy systems with sparsity constraints, IEEE Transactions on Fuzzy Systems 18 (2) (2010) 396–411, http://dx.doi.org/10.1109/TFUZZ.2010.2042960. [14] E. Lughofer, B. Trawinski, K. Trawinski, O. Kempa, T. Lasota, On employing fuzzy modeling algorithms for the valuation of residential premises, Information Sciences 181 (23) (2011) 5123–5142, http://dx.doi.org/10.1016/j.ins.2011.07.012. [15] E. Lughofer, V. Macian, C. Guardiola, E. Klement, Identifying static and dynamic prediction models for nox emissions with evolving fuzzy systems, Applied Soft Computing 11 (2) (2011) 2487–2500, http://dx.doi.org/10.1016/j.asoc.2010.10.004. [16] H. Mallinson, P. Bentley, Evolving fuzzy rules for pattern classification, in: Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation, CIMCA ’99, vol. 55 of Concurrent Systems Engineering Series, IOS Press, 1999, pp. 184–191. [17] P. Angelov, X. Zhou, F. Klawonn, Evolving fuzzy rule-based classifiers, in: Proceedings of the IEEE Symposium on Computational Intelligence in Image and Signal Processing, CIISP ’07, 2007, pp. 220–225, http://dx.doi.org/10.1109/CIISP.2007.369172.

208

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209

[18] P. Angelov, X. Zhou, D. Filev, E. Lughofer, Architectures for evolving fuzzy rule-based classifiers, in: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2007, pp. 2050–2055, http://dx.doi.org/10.1109/ICSMC.2007.4413728. [19] B. Carse, A. Pipe, A framework for evolving fuzzy classifier systems using genetic programming, in: Proceedings of the International Florida Artificial Intelligence Research Society Conference, AAAI Press, 2001, pp. 465–469. [20] C. Xydeas, P. Angelov, S. Chiao, M. Reoullas, Advances in classification of eeg signals via evolving fuzzy classifiers and dependant multiple hmms, Computers in Biology and Medicine 36 (10) (2006) 1064–1083, http://dx.doi.org/10.1016/j.compbiomed.2005.09.006. [21] E. Lughofer, On-line incremental feature weighting in evolving fuzzy classifiers, Fuzzy Sets Systems 163 (1) (2011) 1–23, http://dx.doi.org/10.1016/j.fss.2010.08.012. [22] J. Gomez, D. Dasgupta, Evolving fuzzy classifiers for intrusion detection, in: Proceedings of the IEEE Workshop on Information Assurance, IEEE Computer Press, 2002, pp. 1–7. [23] A. Lemos, W. Caminhas, F. Gomide, Fuzzy multivariable gaussian evolving approach for fault detection and diagnosis, in: Computational Intelligence for Knowledge-based Systems Design, Lecture Notes in Computer Science, vol. 6178, Springer, Berlin/Heidelberg, 2010, pp. 360–369, http://dx.doi.org/10.1007/978-3-642-14049-5 37. [24] E. Lughofer, C. Guardiola, Applying evolving fuzzy models with adaptive local error bars to on-line fault detection, in: Proceedings of the International Workshop on Genetic and Evolving Systems, GEFS ’08, 2008, pp. 35–40, http://dx.doi.org/10.1109/GEFS.2008.4484564. [25] J. Gomez, E. Leon, A fuzzy set/rule distance for evolving fuzzy anomaly detectors, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’06, 2006, pp. 2286–2292, http://dx.doi.org/10.1109/FUZZY.2006.1682017. [26] J. Iglesias, P. Angelov, A. Ledezma, A. Sanchis, Modelling evolving user behaviours, in: Proceedings of the IEEE Workshop on Evolving and Self-Developing Intelligent Systems, ESDIS ’09, 2009, pp. 16–23, http://dx.doi.org/10.1109/ESDIS.2009.4938994. [27] P. Angelov, R. Ramezani, X. Zhou, Autonomous novelty detection and object tracking in video streams using evolving clustering and Takagi–Sugeno type neuro-fuzzy system, in: Proceedings of the IEEE International Joint Conference on Neural Networks, IJCNN ’08, 2008, pp. 1456–1463, http://dx.doi.org/10.1109/IJCNN.2008.4633989. [28] P. Angelov, D. Filev, N. Kasabov, Guest editorial evolving fuzzy systems: preface to the special section, IEEE Transactions on Fuzzy Systems 16 (6) (2008) 1390–1392, http://dx.doi.org/10.1109/TFUZZ.2008.2006743. [29] P. Angelov, X. Zhou, Evolving fuzzy-rule-based classifiers from data streams, IEEE Transactions on Fuzzy Systems 16 (6) (2008) 1462–1475, http://dx.doi.org/10.1109/TFUZZ.2008.925904. [30] T. Yamakawa, E. Uchino, T. Miki, H. Kusabagi, A neo fuzzy neuron and its applications to system identification and predictions to system behavior, in: Proceedings of the International Conference on Fuzzy Logic and Neural Networks, vol. 1, 1992, pp. 477–484. [31] E. Uchino, T. Yamakawa, High speed fuzzy learning machine with guarantee of global minimum and its applications to chaotic system identification and medical image processing, in: Proceedings of the International Conference on Tools with Artificial Intelligence, 1995, pp. 242–249, http://dx.doi.org/10.1109/TAI.1995.479590. [32] W. Caminhas, F. Gomide, A fast learning algorithm for neofuzzy networks, in: Proceedings of the Information Processing and Management of Uncertainty in Knowledge Based Systems, IPMU ’00, vol. 1, 2000, pp. 1784–1790. [33] N. Kasabov, D. Filev, Evolving intelligent systems: methods, learning & applications, in: Proceedings of the International Symposium on Evolving Fuzzy Systems, 2006, http://dx.doi.org/10.1109/ISEFS.2006.251185. [34] R. Duda, P. Hart, D. Stork, Pattern Classification, 2nd ed., Wiley-Interscience, New York, NY, USA, 2000. [35] P. Angelov, R. Buswell, Evolving rule-based models: a tool for intelligent adaptation, in: Proceedings of the Joint IFSA World Congress and NAFIPS International Conference, vol. 2, 2001, pp. 1062–1067, http://dx.doi.org/10.1109/NAFIPS.2001.944752. [36] N. Kasabov, Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning, IEEE Transactions on Systems Man and Cybernetics Part B: Cybernetics 31 (6) (2001) 902–918, http://dx.doi.org/10.1109/3477.969494. [37] N. Kasabov, Q. Song, Denfis: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction, IEEE Transactions on Fuzzy Systems 10 (2) (2002) 144–154, http://dx.doi.org/10.1109/91.995117. [38] N. Kasabov, Q. Song, Dynamic evolving fuzzy neural networks with “m-out-ofn” activation nodes for on-line adaptive systems, in: Tech. Rep., Department of Information Science, University of Otago, Dunedin, New Zealand, 1999. [39] P. Angelov, A. Kordon, X. Zhou, Evolving fuzzy inferential sensors for process industry, in: Proceedings of the International Workshop on Genetic and Evolving Systems, GEFS ’08, 2008, pp. 41–46, http://dx.doi.org/10.1109/GEFS.2008.4484565. [40] P. Angelov, N. Kasabov, Evolving intelligent systems-eis, IEEE SMC eNewsLetter 15 (2006) 1–13. [41] T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control, IEEE Transactions on Systems, Man and Cybernetics 15 (1) (1985) 116–132.

[42] S. McDonald, P. Angelov, Evolving Takagi Sugeno modelling with memory for slow processes, KES Journal: Innovation in Knowledge-based & Intelligent Engineering Systems 14 (1) (2010) 11–19. [43] P. Angelov, C. Xydeas, D. Filev, On-line identification of mimo evolving Takagi–Sugeno fuzzy models, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’04, 2004, pp. 55–60, http://dx.doi.org/10.1109/FUZZY.2004.1375687. [44] P. Angelov, D. Filev, Simpl ets: a simplified method for learning evolving Takagi–Sugeno fuzzy models, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’05, 2005, pp. 1068–1073, http://dx.doi.org/10.1109/FUZZY.2005.1452543. [45] P. Angelov, J. Victor, A. Dourado, D. Filev, On-line evolution of Takagi–Sugeno fuzzy models, in: Proceedings of the IFAC Workshop on Advanced Fuzzy/Neural Control, 2004, pp. 67–72. [46] E. Lughofer, Flexfis: a robust incremental learning approach for evolving Takagi Sugeno fuzzy models, IEEE Transactions on Fuzzy Systems 16 (6) (2008) 1393–1410, http://dx.doi.org/10.1109/TFUZZ.2008.925908. [47] P. Angelov, X. Zhou, On line learning fuzzy rule-based system structure from data streams, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’08, 2008, pp. 915–922, http://dx.doi.org/10.1109/FUZZY.2008.4630479. [48] P. Angelov, D. Filev, An approach to online identification of Takagi–Sugeno fuzzy models, IEEE Transactions on Systems Man and Cybernetics Part B: Cybernetics 34 (1) (2004) 484–498, http://dx.doi.org/10.1109/TSMCB.2003.817053. [49] P. Angelov, X. Zhou, Evolving fuzzy systems from data streams in real-time, in: Proceedings of the International Symposium on Evolving Fuzzy Systems, 2006, pp. 29–35, http://dx.doi.org/10.1109/ISEFS.2006.251157. [50] N. Kasabov, D. Filev, Evolving fuzzy neural networks-algorithms, applications and biological motivation, in: Methodologies for the Conception, Design and Application of Soft Computing, World Scientific, Iizuka, Fukuoka, Japan, 1998, pp. 271–274. [51] E. Mamdani, S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, International Journal of Human–Computer Studies 51 (2) (1999) 135–147, http://dx.doi.org/10.1006/ijhc.1973.0303. [52] N. Kasabov, Q. Song, Dynamic evolving neuro-fuzzy inference system (denfis): on-line learning and application for time-series prediction, IEEE Transactions of Fuzzy Systems 10 (2) (2002) 144–154, http://dx.doi.org/10.1109/91.995117. [53] Q. Song, N. Kasabov, Ecm – a novel on-line, evolving clustering method and its applications, in: Proceedings of the Biannual Conference on Artificial Neural Networks and Expert Systems, ANNES ’01, 2001, pp. 87–92. [54] J. Albus, A new approach to manipulator control: the cerebellar model articulation controller (cmac), Journal of Dynamic Systems, Measurement and Control 97 (1975) 220–227. [55] J. Jang, Anfis: adaptive-network-based fuzzy inference system, IEEE Transactions on Systems Man and Cybernetics 23 (3) (1993) 665–685, http://dx.doi.org/10.1109/21.256541. [56] E. Lughofer, E. Klement, Flexfis: a variant for incremental learning of Takagi–Sugeno fuzzy systems, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’05, 2005, pp. 915–920, http://dx.doi.org/10.1109/FUZZY.2005.1452516. [57] E. Lughofer, P. Angelov, X. Zhou, Evolving single- and multi-model fuzzy classifiers with flexfis-class, in: Proceedings of the IEEE International Fuzzy Systems Conference, FUZZ-IEEE ’07, 2007, pp. 1–6, http://dx.doi.org/10.1109/FUZZY.2007.4295393. [58] E. Lughofer, Extensions of vector quantization for incremenclustering, Pattern Recognition 41 (3) (2008) 995–1011, tal http://dx.doi.org/10.1016/j.patcog.2007.07.019. [59] P. Angelov, E. Lughofer, Data-driven evolving fuzzy systems using ets and flexfis: comparative analysis, International Journal of General Systems: Special Issue: Soft Computing Systems 37 (1) (2008) 45–67, http://dx.doi.org/10.1080/03081070701500059. [60] J. Rubio, Sofmls: online self-organizing fuzzy modified least-squares network, IEEE Transactions on Fuzzy Systems 17 (6) (2009) 484–498, http://dx.doi.org/10.1109/TFUZZ.2009.2029569. [61] H. Rong, N. Sundararajan, G. Huang, P. Saratchandran, Sequential adaptive fuzzy inference system (safis) for non-linear system identification and prediction, Fuzzy Sets Systems 157 (9) (2006) 1260–1275, http://dx.doi.org/10.1016/j.fss.2005.12.011. [62] R. Yager, A model of participatory learning, IEEE Transactions on Systems, Man and Cybernetics 20 (5) (1990) 1229–1234, http://dx.doi.org/10.1109/21.59986. [63] L. Silva, F. Gomide, R. Yager, Participatory learning in fuzzy clustering, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’05, 2005, pp. 857–861, http://dx.doi.org/10.1109/FUZZY.2005.1452506. [64] E. Lima, M. Hell, R. Ballini, F. Gomide, Evolving Fuzzy Modeling Using Participatory Learning Evolving Intelligent Systems: Methodology and Applications, 1st ed., John Wiley & Sons, Inc., Hoboken, NJ, USA, 2010, pp. 67–86, http://dx.doi.org/10.1002/9780470569962 (Chapter 4). [65] R. Yager, Participatory learning: a paradigm for more human like learning, in: Proceedings of the IEEE International Conference on Fuzzy Systems, FUZZ-IEEE ’04, 2004, pp. 79–84, http://dx.doi.org/10.1109/FUZZY.2004.1375693. [66] A. Lemos, W. Caminhas, F. Gomide, Fuzzy evolving linear regression trees, Evolving Systems 2 (2011) 1–14, http://dx.doi.org/10.1007/ s12530-011-9028-z. [67] A. Lemos, W. Caminhas, F. Gomide, Evolving fuzzy linear regression trees with feature selection, in: Proceedings of the IEEE Workshop on Evolving

A.M. Silva et al. / Applied Soft Computing 14 (2014) 194–209 and Adaptive Intelligent Systems, EAIS ’11, vol. 1, 2011, pp. 31–38, http://dx.doi.org/10.1109/EAIS.2011.5945919. [68] D. Wang, X. Zeng, J. Keane, A structure evolving learning method for fuzzy systems, Evolving Systems 1 (2010) 83–95, http://dx.doi.org/10.1007/s12530-010-9009-7. [69] A. Bacelar, E. Filho, F. Neves, R. Landim, On-line linear system parameter estimation using the neo-fuzzy-neuron algorithm, in: Proceedings of the IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, 2003, pp. 115–118, http://dx.doi.org/10.1109/IDAACS.2003.1249529.

209

[70] M. Bazaraa, H. Sherali, C. Shetty, Nonlinear Programming: Theory and Algorithms, 3rd ed., John Wiley & Sons, Hoboken, NJ, USA, 1993. [71] E. Lughofer, P. Angelov, Handling drifts and shifts in on-line data streams with evolving fuzzy systems, Applied Soft Computing 11 (2) (2011) 2057–2068, http://dx.doi.org/10.1016/j.asoc.2010.07.003. [72] F. Diebold, R. Mariano, Comparing predictive accuracy, Journal of Business & Economic Statistics 13 (3) (1995) 253–263. [73] M. Mackey, L. Glass, Oscillation and chaos in physiological control systems, Science 197 (4300) (1977) 287–289, http://dx.doi.org/10.1126/science.267326.