Regional models: A new approach for nonlinear system identification via clustering of the self-organizing map

Regional models: A new approach for nonlinear system identification via clustering of the self-organizing map

Neurocomputing 147 (2015) 31–46 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Regional ...

1MB Sizes 0 Downloads 27 Views

Neurocomputing 147 (2015) 31–46

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Regional models: A new approach for nonlinear system identification via clustering of the self-organizing map Amauri H. Souza Júnior a, Guilherme A. Barreto a,n, Francesco Corona b a Federal University of Ceará, Department of Teleinformatics Engineering, Av. Mister Hull, S/N, Center of Technology, Campus of Pici, CP 6005, CEP 60455-760 Fortaleza, Ceará, Brazil b Aalto University, Department of Information and Computer Science, Konemiehentie 2, Espoo, Finland

art ic l e i nf o

a b s t r a c t

Article history: Received 17 April 2013 Received in revised form 31 August 2013 Accepted 28 November 2013 Available online 11 June 2014

Global modelling consists in fitting a single regression model to the available data, using the whole set of input and output observations. On the other side of the spectrum stands the local modelling approach, in which the input space is segmented into several small partitions and a specialized regression model is fit to each partition. In this paper, we propose a novel approach, called Regional Models (RM), that stands in between the global and local modelling ones. The proposal extends the two-level clustering approach by Vesanto and Alhoniemi (2000 [1]) to regression problems, more specifically, to system identification. In this regard, we first partition the input space using the Self-Organizing Map (SOM), and then perform clustering over the prototypes of the trained SOM. Finally, regional regression models are built over the clusters (i.e. over the regions) of SOM prototypes, not over each SOM prototype as in local modelling. Under the proposed framework, we build regional linear and nonlinear regression models. For the linear case, we use autoregressive models with eXogenous (ARX) whose parameters are estimated using the ordinary least-squares (OLS) method. Regional nonlinear ARX (NARX) models are built using the Extreme Learning Machine network. Additionally, we develop robust variants of the proposed regional models through the use of M-estimation, a statistical framework for handling outliers, since the OLS is highly sensitive to them. Comprehensive performance evaluation of the proposed models on synthetic and real-world datasets is carried out and the results compared to those achieved by standard global and local models. & 2014 Elsevier B.V. All rights reserved.

Keywords: System identification Self-Organizing Maps Global models Local models Outliers Robust regression

1. Introduction System identification is concerned with the development of regression models that describe the dynamics of a system from measurements of its inputs and outputs. Knowing a model that describes the diversity of behaviors that a dynamical system can reveal, specially the nonlinear ones, is essential not only for theoretic or applied research fields, but also for the process or control engineer who is interested in understanding better the dynamics of the system he/she is dealing with. As an ultimate goal, the resulting model must approximate the actual system as faithfully as possible in order to be used for several purposes, such as predictive control or fault detection. Modern industrial plants have been the source of challenging tasks in dynamical system identification and control [2]. Designing control systems to achieve the level of quality demanded by

n

Corresponding author. E-mail addresses: [email protected] (A.H. Souza Júnior), [email protected] (G.A. Barreto), francesco.corona@aalto.fi (F. Corona). http://dx.doi.org/10.1016/j.neucom.2013.11.046 0925-2312/& 2014 Elsevier B.V. All rights reserved.

current industry standards requires building accurate models of the plant being controlled. Building accurate models requires reliable data, usually in the form of input and output time series. Once such data are available, they can be used for building direct and/or inverse models of nonlinear systems by means of computational intelligence methods, such as neural networks [3–6], Takagi–Sugeno–Kang fuzzy models [7–9] or hybrid systems [10–14], to mention just a few possibilities. Although several techniques for nonlinear dynamical system identification have been proposed [15,16], they can be categorized into one of the two following approaches: global and local modelling. Global modelling involves the utilization of a single regression model, such as a feedforward or recurrent neural network model, that approximates the whole mapping between the input and the output of the system being identified. Global models constitute the mainstream in nonlinear system identification and control [17–20,5]. Local modelling utilizes instead multiple models to represent the input–output dynamics of the system of interest [21]. These approaches have been a source of much interest because they have the ability to fit to the local shape of an arbitrary surface (i.e. mapping). This feature is particularly important when the dynamical system

32

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

characteristics vary considerably throughout the input space. The input space is usually divided into smaller, localized regions, each one being associated with a simpler (usually linear) model. To estimate the system output at a given time, a single model is chosen from the pool of available local models according to some criteria defined on the current input data. In the neural network literature, multiple local models for system identification and control have been implemented mostly either by means of the Local Model Network (LMN) [22–27] or the Self-Organizing Map (SOM) [28–33,6,34,35]. The LMN uses basis functions to implement its localized nature, similar to the standard Radial Basis Function networks (RBFN). In LMNs, however, the output weights of a RBFN are replaced with local functions of the inputs. As a consequence, due to the better approximation properties of these local functions, the LMN model does not require as many basis functions as the standard RBFN to achieve the desired accuracy and, hence, the number of parameters is reduced dramatically. The SOM is an unsupervised competitive learning algorithm which has been commonly applied to clustering, vector quantization and data visualization tasks [36–38]. When applied to multiple local modelling, the SOM is used to partition the input–output space into smaller regions, over which the local models are built. The results reported on the aforementioned studies are rather appealing, indicating that SOM-based local models can be feasible alternatives to global models based on supervised neural network architectures, such as the Multilayer Perceptron (MLP) and the Extreme Learning Machine (ELM) [39]. The main advantage of the local modelling approach over the global one relies on the fact that complex (e.g. nonlinear) dynamics of the input–output mapping can be represented by multiple simpler mappings. Another advantage is interpretability. Since multiple local models are used, one can easily associate IF– THEN rules with them in order to describe the current dynamics of the system. However, there is no free-lunch in the realm of local modelling. The alleged flexibility of using multiple local models comes with some costs. One of the main problems is that it is not straightforward to select the appropriate number of local models beforehand, without any a priori information. An inappropriate selection may cause the over- or under-identification of the original system dynamics [40]. Another shortcoming is related to the occurrence of dead or interpolating neurons1 after SOM training. In this case, it is impossible to associate a local model with this type of neuron, since there are no data points to estimate the parameters of the corresponding local model. To handle these shortcomings, we propose a novel approach to system identification, called Regional Models (RM), that stands in between the global and local modelling approaches.2 RM is motivated by the two-level clustering approach introduced by Vesanto and Alhoniemi [1] and can be thought as an extension of their work to regression problems, more specifically, to nonlinear dynamical system identification. For this purpose, we first partition the input space using a single SOM network with C prototypes, and then perform clustering using the K-means algorithm [42] over the prototypes of the trained SOM in order to find an optimal number Kopt (K opt {C) of clusters of SOM prototypes. The optimal number of clusters can be found by using any cluster validation technique, such as the Davies–Bouldin index [43,44]. A given cluster of SOM prototypes defines a region in the input space formed by merging the Voronoi cells of the prototypes belonging to that cluster. Finally, for each individual cluster of SOM

prototypes we build a regional regression model using only the data vectors mapped to that specific cluster. It is worth mentioning that by using Kopt regional models instead of C local models, the RM approach is much more parsimonious than the local modelling approach, i.e. few models are required to faithfully describe the dynamics of the system of interest (recall that K opt {C). A second advantage is that the user has not to worry too much about the specification of the number C of SOM prototypes, since the subsequent application of K-means clustering to the SOM prototypes makes the number of regional models relatively decoupled from large variations in the value of C. In other words, large variations in C do not produce large variations in Kopt. Finally, a third advantage of the RM approach over local modelling is that regional models can be constructed even if dead/interpolating neurons occur after SOM training. This is possible because any of the existing dead/interpolating neurons will belong eventually to one out of the Kopt regions available. Using the proposed RM framework, we develop regional linear and nonlinear regression models. For the linear case, we use autoregressive models with eXogenous (ARX) whose parameters are estimated using the ordinary least-squares (OLS) method. Regional nonlinear ARX (NARX) models are built using the Extreme Learning Machine. Additionally, we develop robust variants of the proposed regional models through the use of Mestimation, a robust statistics framework introduced by Huber [45,46] for handling outliers, since the OLS is highly sensitive to them. Comprehensive performance evaluation of the proposed models on synthetic and real-world datasets is carried out and the results compared to those achieved by standard global and local models. The remainder of the paper is organized as follows. In Section 2, we briefly review the basics of dynamical system identification using ARX and NARX models. In Section 3, the SOM network and its learning process are described. In Section 4, the regional modelling framework for nonlinear system identification is introduced. In Section 5 we present the fundamentals of M-estimation and its use in the context of regional modelling. Computer simulations and results are presented and discussed in Section 6. The paper is concluded in Section 7.

2. Basics of dynamical system identification Let us assume that the dynamical system we are dealing with can be described mathematically by the following ARX model [47]: yðkÞ ¼ a1 yðk  1Þ þ ⋯ þ ap yðk  pÞ þ b1 uðk  1Þ þ ⋯ þ bq uðk  qÞ; p

q

j¼1

l¼1

¼ ∑ aj yðk  jÞ þ ∑ bl uðk  lÞ;

ð1Þ

where uðkÞ A R and yðkÞ A R denote, respectively, the input and output of the model at time step k, while q Z 1 and p Z 1 (q r p) are the input-memory and output-memory orders, respectively. The coefficients aj A R, j ¼ 1; …; p and bl A R, l ¼ 1; …; q are the parameters of the model to be estimated using the available data. By defining the input vector xðkÞ A Rp þ q at time step k and the vector of parameters θ A Rp þ q as xðkÞ ¼ ½yðk  1Þ ⋯ yðk  pÞjuðk  1Þ ⋯ uðk  qÞT ;

ð2Þ

θ ¼ ½a1 ⋯ ap jb1 ⋯ bq T ;

ð3Þ

we can write the output of the ARX model in Eq. (1) simply as yðkÞ ¼ θ xðkÞ: T

1

Neurons whose prototypes are located at positions without data points. 2 A previous shorter version of this work [41] has been published in the 9th Workshop of Self-Organizing Maps (WSOM'2012), held in Santiago, Chile.

ð4Þ

The parameter vector θ is commonly estimated by means of the ordinary least-squares (OLS) method, which leads to the following

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

estimating equation:

θ^ ¼ ðXT XÞ  1 XT y;

ð5Þ

where, assuming that p 4 q, X is an ðN  pÞ  ðp þ qÞ regressor matrix defined as 2 3 yðpÞ ⋯ yð1Þ uðpÞ ⋯ uðp  q þ 1Þ 6 7 yð2Þ uðp þ 1Þ ⋯ uðp  q þ 2Þ 7 6 yðp þ 1Þ ⋯ 7; ð6Þ X¼6 6 7 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 4 5 yðN  1Þ ⋯ yðN  pÞ uðN  1Þ ⋯ uðN qÞ while y A RN  p is the output vector defined as y ¼ ½yðp þ 1Þ yðp þ 2Þ ⋯ yðNÞT ;

ð7Þ N fuðkÞ; yðkÞgk ¼ 1

where N is the amount of input–output data pairs used to estimate the parameters of the model. The nonlinear version of Eq. (1) defines the NARX model [15]: yðkÞ ¼ f ½yðk  1Þ; …; yðk  pÞ; uðk  1Þ; uðk  2Þ; …; uðk  qÞjΘ; ¼ f ½xðkÞjΘ;

ð8Þ

where f ðÞ is an unknown nonlinear mapping, whose parameter matrix is denoted generically as Θ. In a neural network context, Eq. (8) is usually implemented by a global model, such as an ELM or an MLP network with p þ q inputs, h hidden neurons and one linear output neuron. Mathematically, we can write yðkÞ ¼ ELMðxðkÞjΘÞ ¼ mT φðWxðkÞÞ;

ð9Þ

where W is a h  ðp þ qÞ input-to-hidden-layer weight matrix and m A Rh is the hidden-to-output-layer weight vector. The function φðÞ denotes the nonlinear activation function of the h hidden neurons, usually of sigmoidal form. In comparison with Eq. (8), we have the parameter matrix defined as Θ ¼ ½mjW. 3. The self-organizing map The Self-Organizing Map (SOM) is a well-known competitive learning algorithm [36]. The SOM learns from examples a mapping (projection) from a high-dimensional continuous input space X onto a low-dimensional discrete space (lattice) A of C neurons which are arranged in fixed topological forms, e.g., as a rectangular 2-dimensional array. The map in ðxÞ : X -A, defined by the weight matrix W ¼ fw1 ; w2 ; …; wC g; wi A Rp þ q  X , assigns to each input vector xðkÞ A Rp þ q  X a winning neuron in ðkÞ A A, determined by in ðkÞ ¼ arg min J xðkÞ  wi ðkÞ J ;

ð10Þ

8i

where J  J denotes the Euclidean distance and k symbolizes a discrete time step associated with the iterations of the algorithm. The weight vector of the current winning neuron as well as the weight vectors of its neighboring neurons are simultaneously adjusted according to the following learning rule: wi ðk þ1Þ ¼ wi ðkÞ þ αðkÞhðin ; i; kÞ½xðkÞ  wi ðkÞ

ð11Þ

where 0 o αðkÞ o 1 is the learning rate and hði ; i; kÞ is a weighting function which limits the neighborhood of the winning neuron. A usual choice for hðin ; i; kÞ is given by the Gaussian function: ! J ri ðkÞ  rin ðkÞ J 2 n ð12Þ hði ; i; kÞ ¼ exp  2σ 2 ðkÞ n

where ri ðkÞ and rin ðkÞ are, respectively, the coordinates of the neurons i and in in the output array, and σ ðkÞ 4 0 defines the radius of the neighborhood function at time k. The variables αðkÞ and σ ðkÞ should both decay with time to guarantee convergence of the weight vectors to stable steady states. In this paper, we adopt an

33

exponential decay for both variables:  ðk=TÞ  ðk=TÞ α σT αðkÞ ¼ α0 T and σ ðkÞ ¼ σ 0

α0

σ0

ð13Þ

where α0 (σ0) and αT (σ T ) are the initial and final values of αðkÞ (σ ðkÞ), respectively. Weight adjustment is performed until a steady state of global ordering of the weight vectors has been achieved. In this case, we say that the map has converged. The resulting map also preserves the topology of the input samples in the sense that adjacent patterns are mapped into adjacent regions on the map. Due to this topology-preserving property, the SOM is able to cluster input information and spatial relationships of the data on the map. Despite its simplicity, the SOM algorithm has been applied to a variety of complex problems [37,38,48] and has become one of the most popular artificial neural networks.

4. The proposed approach: regional models As mentioned in the Introduction, regional modelling is an alternative approach to the more traditional local and global modelling techniques. A single global model may not be able to capture the dynamics of regions with high curvatures, smoothing out the details. On the one hand, local models are good to capture such details of the system dynamics, but on the other hand, it is relatively difficult to specify an adequate number of local models. A too small number may cause under-identification, with too few submodels to represent important regions of the system dynamics. A large number cause over-identification, with submodels attached to unimportant regions of the system dynamics. The RM approach then comes out as an alternative to finding a tradeoff between the standard global and local approaches. In what follows, we describe the main stages required for the RM design through a step-by-step procedure. For the task, we use a synthetic plant [40] to illustrate each step. It consists of SISO system given by   29 16uðnÞ þ 8yðnÞ 2 yðn þ 1Þ ¼ sin ð14Þ þ ðuðnÞ þ yðnÞÞ; 2 2 40 10 3 þ 4u ðnÞ þ 4y ðnÞ with exogenous input uðnÞ ¼ 10 sin ð0:25nÞ. The dataset contains 100 samples and the input space is two-dimensional, since the input and output memory terms are equal to 1 (see Fig. 1(a)). Step 0: Setting the hyper-parameters. In this step, we specify the memory orders p, q (recall that p must be greater than q), and the number of SOM prototypes C. It must be also defined as the maximum number of regions Kmax. p Without any prior knowledge, ffiffiffiffi we usually set K max ¼ C or K max ¼ C . For the experiments with the synthetic plant, we directly set p ¼ q ¼ 1 and K max ¼ C ¼ 50. Step 1: SOM training. In order to build regional models, follow the procedure introduced by Vesanto and Alhoniemi [1]. Thus, the very first step requires training the SOM as usual, with C prototypes, using the rows of the regression matrix in Eq. (6) as training vectors. Once trained, to each SOM prototype wi , i¼1,…,C, it is assigned a Voronoi cell Vi: V i ¼ fx A Rp þ q j J x  wi J r J x  wj J ; 8 j ¼ 1; …; C; j aig:

ð15Þ

As an illustration, the resulting Voronoi cells are shown in Fig. 1 (b), where blue dots depict the SOM prototypes. Step 2: Clustering of the SOM. The second step consists in performing clustering over the C SOM prototypes. Although one may use any clustering algorithm for this step, for the sake of simplicity, we use the standard K-means algorithm [49] in combination with the Davies–Bouldin (DB) index. The DB index [43,44]

34

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

10

10

5

5

0

0

−5

−5

−10

−10

−3

−3

−2

−1

0

1

2

3

−3

−2

−1

0

1

2

3

−3

−2

−2

−1

0

1

2

3

10

5

0

−5

−10

15

10

5

0

−5

−10

−15 −4

−1

0

1

2

3

4

Fig. 1. Step-by-step illustration of the main stages in RM design. (a) The data. (b) SOM prototypes. (c) Clustering of the SOM. (d) Kopt regions in the map. (e) Data partitioning. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

is a clustering validity measure commonly used for finding the optimal number of clusters, but any suitable measure can be equally used (see [44] and references therein). The DB index algorithm is given in Appendix A. Thus, we compute K ¼ 1; 2; …; K max partitioning of the SOM prototypes and the corresponding DB index value as well. The optimal partitioning, represented by Kopt partitions, is then chosen by the following search procedure: K opt ¼ arg

min

K ¼ 1;…;K max

DBðW; PK Þ;

ð16Þ

where PK ¼ fpj gKj¼ 1 , pj A Rp þ q , denotes the set of K prototypes of the K-means algorithm, while W ¼ fwi gCi¼ 1 , wi A Rp þ q , is the set of C prototypes of the SOM. It is worth mentioning that despite the fact that we have used the standard Vesanto and Alhoniemi's approach to SOM clustering [1], one can decide for using more recent approaches, such as the one introduced in [50]. The clustering of the SOM for the synthetic plant is shown in Fig. 1(c), where the black dots denote the centroids of each cluster and the blue dots correspond to SOM prototypes. The optimal number of clusters is K opt ¼ 5.

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

Step 3: Partitioning SOM prototypes into regions. Once Kopt is selected, the r-th cluster of SOM prototypes, r ¼ 1; …; K opt , is composed of all weight vectors wi that are mapped onto the prototype pr of the K-means algorithm. More formally, the set of SOM prototypes associated with the r-th prototype of the K-means algorithm is defined as Wr ¼ fwi A Rp þ q jJ wi  pr J o J wi  pj J ; 8 j ¼ 1; …; K opt ; j a rg:

ð17Þ

This step is illustrated in Fig. 1(d) by coloring the trained map in order to highlight the different regions of SOM prototypes. Step 4: Mapping data points to regions. The fourth step consists in finding Kopt data partitions, denoted by fX1 g, fX2 g; …; fXK opt g of the training dataset by mapping each row of the regression matrix in Eq. (6) to a region r A f1; …; K opt g. In other words, let us denote Nr as the number of data vectors in fXr g. Then, the partition fXr g is composed of those input vectors xrμ , μ ¼ 1; …; N r , whose closest SOM prototype belongs to Wr . This step is illustrated in Fig. 1 (e) through the convex hull of the each region. Step 5: Building regression models over the regions. Finally, once the original dataset has been divided into Kopt subsets (one per region), the last step consists in building Kopt regional regression models using Xr , r ¼ 1; …; K opt . For regional linear models, one can simply estimate the coefficient vector θr of the r-th regional model by the standard OLS method:

θr ¼ ðXTr Xr Þ  1 Xr yr ;

r ¼ 1; …; K opt ;

ð18Þ

where yr A RNr is a vector of Nr observed output values associated with the Nr row-vectors in Xr . For this computation, we assume that the vectors xrμ , μ ¼ 1; …; N r , are arranged as the rows of Xr . Henceforth, we denote by Regional Linear Model (RLM) a regional model built using Kopt linear models. If we choose to build nonlinear regional models, one can train the r-th regional model using the input vectors in Xr and the corresponding target values in yr , for r ¼ 1; …; K opt . In this paper, we use ELM networks in order to build nonlinear regional models. We will use the term Regional Extreme Learning Machine Models (RELMs) to denote regional models composed of ELM networks. It is worth clarifying that the main reason for the choice of the ELM network to build regional models relies on its fast learning algorithm [51], which does not requires long training procedures as the majority of MLP-like architectures. Additionally, the standard ELM uses the OLS estimation method to compute the output weights (input-to-hidden layer weights are randomly selected). This feature will be used later in this paper to extend the proposed regional modelling approach to deal with outliers.

35

selected as r n ðkÞ ¼ arg

min

r ¼ 1;…;K opt

J xðkÞ pr J ;

ð21Þ

where pr is the r-th prototype of the K-means algorithm. Such procedure is valid for linear and nonlinear models. It is worth noticing that the regional models are able to deal with multiple input multiple output (MIMO) systems since the regression models handle multiple outputs. This is the case for the proposed RLM and RELM. In the next section, we discuss a principled approach aiming at extending the just proposed RLM and RELM to deal with outliers.

5. Extending the regional modelling through M-estimation An important feature of OLS is that it assigns the same importance to all error samples, i.e. all errors contribute the same way to the final solution. A common approach to handle this problem consists in removing outliers from data and then try the usual least-squares fit. A more principled approach, known as robust regression, uses estimation methods not as sensitive to outliers as the OLS. Huber [45] introduced the concept of M-estimation, where M stands for “maximum likelihood” type, where robustness is achieved by minimizing another function than the sum of the squared errors. Based on Huber theory, a general M-estimator applied to the output of the r-th RLM minimizes the following objective function: Nr

Nr

Nr

μ¼1

μ¼1

μ¼1

Jðθr Þ ¼ ∑ ρðerμ Þ ¼ ∑ ρðdrμ  yrμ Þ ¼ ∑ ρðdrμ  θr xrμ Þ; T

ð22Þ

where the function ρðÞ computes the contribution of each error erμ ¼ drμ  yrμ to the objective function, drμ is the target value of the r-th RLM for the μ-th input vector xrμ A Xr , and θr is the parameter vector associated with the r-th RLM. Reasonable choices for the function ρðÞ should possess the following properties [52]: (i) ρðerμ Þ Z 0; (ii) ρð0Þ ¼ 0; (iii) ρðerμ Þ ¼ ρð  erμ Þ; and (iv) ρðerμ Þ Z ρðer0 μ Þ, for jerμ j 4 jer0 μ j. The OLS is a particular M-estimator, achieved when ρðerμ Þ ¼ e2rμ . Parameter estimation is defined by estimating the equation which is a weighted function of the objective function derivative. Let ψ ¼ ρ0 be the partial derivative of ρ with respect to error erμ . Thus, differentiating the objective function in Eq. (22) with respect to the estimated parameter vector θ^ , and setting the partial derivatives to 0, we have N r ∂ρðe Þ Nr ∂ρðe Þ ∂e Nr T ∂Jðθ^ r Þ rμ rμ rμ ¼ ∑ ¼ ∑ ¼ ∑ ψ ðyrμ  θ^ r xrμ ÞxTrμ ¼ 0; ^ ^ μ ¼ 1 ∂θ μ ¼ 1 ∂er μ ∂θ μ¼1 ∂θ^

4.1. Testing regional models

r

r

r

ð23Þ Once the regional models are trained, given an out-of-sample input vector xðkÞ A Rp þ q , the test procedure consists in computing the predicted output value using the suitable regional model. For RLMs, the output value is computed as T y^ RLM ðkÞ ¼ θrn xðkÞ;

ð19Þ

where r n is the index of the winning regional model. By the same token, for RELMs the output value is computed as y^ RELM ðkÞ ¼ ELMðxðkÞjΘrn Þ ¼ mTrn φðWrn xðkÞÞ;

ð20Þ

where Θrn ¼ ½mrn jWrn  denote the parameter matrix of the winning regional ELM model. Assuming that we are using the K-means algorithm as clustering method, the index of the winning regional model, r n , is

where 0 is a (p þ q)-dimensional row vector of zeros. Then, dividing both sides of Eq. (23) by erμ , we get Nr

ψ ðerμ Þ

μ¼1

erμ



Nr

Nr

μ¼1

μ¼1

xTrμ ¼ ∑ wðerμ ÞxTrμ ¼ ∑ wrμ xTrμ ¼ 0:

ð24Þ

where wrμ ¼ wðerμ Þ and wðerμ Þ ¼ ψ ðerμ Þ=erμ . Thus, solving the estimating equations corresponds to solving a weighted leastsquares problem, minimizing ∑μ w2rμ e2rμ . It is worth noting, however, that the weights depend upon the residuals, the residuals depend upon the estimated coefficients, and the estimated coefficients depend upon the weights. As a consequence, an iterative estimation method called iteratively reweighted least-squares (IRLS) [52] is commonly used. The steps of the IRLS algorithm in the context of training the RLM are described next.

36

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

approach to online system identification problems, where models have to adapt themselves to non-stationary and noisy environments in real-time. In the next section, the performances of the proposed RM and R2M methods will be evaluated on a synthetic dataset for the proof of concept and on two benchmarking real-world datasets for more in-depth evaluation.

IRLS algorithm for RLM training. Step 1: Provide an initial estimate θ^ r ð0Þ using the OLS solution in Eq. (18). Step 2: At each iteration t of the IRLS algorithm, collect the residuals from the previous iteration eðtrμ 1Þ for all input vectors xrμ A Xr , μ ¼ 1; …; N r , and then compute their corresponding weights wðtrμ 1Þ ¼ wðeðtrμ 1Þ Þ. ðtÞ Step 3: Solve for the new weighted-least-squares estimate of θr : ðtÞ

θ^ r ¼ ½XTr Bðtr  1Þ Xr   1 XTr Bðtr  1Þ yr ;

ð25Þ 6. Computer experiments and results

where Bðtr  1Þ ¼ diagfwðtrμ 1Þ g is an N r  Nr weight matrix. Repeat Steps 2 and 3 until either the convergence of the ðtÞ estimated coefficient vector θ^ r or t reaches its maximum allowed value tmax.

In this section, we present the results of computer simulations carried out with synthetic and real-world datasets commonly used for benchmarking purposes in system identification. The experimental section is divided into (i) description of datasets; (ii) experiments without outliers and (iii) experiments with outliers.

Several weighting functions for the M-estimators can be chosen, such as Huber's weighting function: 8 > < ke if jerμ j 4 ke wðerμ Þ ¼ jerμ j ð26Þ > :1 otherwise:

6.1. Datasets In this subsection a brief explanation about each dataset deriving from three cases of study to be analyzed in this paper is presented. One dataset is a simulated system, while two cases correspond to real-world data. Artificial plant: The artificial plant has been firstly evaluated by Gregorčič and Lightbody [23]. It consists of a discrete-time SISO system corrupted by additive noise given by

where the parameter ke is a tuning constant. According to Huber's function in Eq. (26), once the condition jerμ j 4 ke is met, the higher the absolute value of the residual jerμ j, the smaller the values of the weight wðerμ Þ. Smaller values of ke produce more resistance to outliers, but at the expense of lower efficiency when the errors are normally distributed. In particular, ke ¼ 1:345 s for the Huber function, where s is a robust estimate of the standard deviation of the residuals. An usual approach is to take s ¼ MAR=0:6745, where MAR is the median absolute residual. Despite the theoretic development just presented has been focused on the RLM, it is straightforward to apply it to the RELM as well. In this case, just replace the parameter vector θ^ r with the hidden-to-output layer weight vector mr . From now onwards, the proposed robust variants of the RLM and RELM will be denoted by Robust RLM (R2LM) and Robust RELM (R2ELM), respectively. As a general group, they will be referred to as Robust Regional Models (R2M). As a final remark, it is worth mentioning that in order to allow both linear and nonlinear regional models to handle outliers, we have replaced their standard OLS estimation method with the M-estimation framework. However, an alternative approach would be to use robust versions of the SOM algorithm which are inherently capable of dealing with outliers, such as those introduce in [53,54]. In particular, Salas et al. [53] have introduced robust M-estimators into the learning process of a hierarchical tree structure of growing self-organizing maps. This robust hierarchical SOM model could be used, for example, to extend the regional

yðt þ1Þ ¼ 0:2 tanhðyðtÞÞ þ sin ðuðtÞÞ þ ϵðtÞ:

ð27Þ

In order to illustrate the nonlinearity of the system, Fig. 2(a) shows the target function. The corresponding dataset contains 2000 samples with uniformly distributed random input signal, uðtÞ U½  5; 5, and ϵðtÞ N ð0; 0:01Þ, see Fig. 2(b). The hydraulic actuator dataset: This dataset consists of input and output time series of a hydraulic actuator of a mechanical crane. The structure has four actuators: one for the rotation of the whole structure, one to move the arm, one to move the forearm and one to move a telescopic extension of the forearm. This plant was chosen because it has a long arm and a long forearm with considerable flexibility on mechanic structure, making the movement of the whole crane oscillative and hard to control. The position of the arm is controlled by a hydraulic actuator. The oil pressure in the actuator is controlled by the size of the valve opening through which the oil flows into the actuator. The position of the robot arm is then a function of the oil pressure. Fig. 3 shows measured values of the valve position (input time series, fuðtÞg) and the oil pressure (output time series, fyðtÞg), which are input and output signals, respectively. As seen in the oil pressure, it contains a very oscillative settling period after a step

1 1 0.5

0.5

0

0

−0.5

−0.5

−1

−1

4 4

2 2

0

0

−2

−2 −4

4 2 0 −2

−4 Fig. 2. Artificial plant. (a) The target function. (b) The data.

−4

−1

−0.5

0

0.5

1

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

37

3

1

2 0.5 1 0

0 −1

−0.5 −2 −1

−3

100

200

300

400

500

600

700

800

900

100

1000

200

300

400

500

600

700

800

900

1000

Fig. 3. Hydraulic actuator dataset. (a) Measured values of valve position. (b) Measured values of the oil pressure.

2

12 11.5 11

11

1.6

10

1.4

10.5

9

1.2 8

1

10 9.5 9

0.8

7

0.6

6

0.4

8.5 8

1.8

5

0.2 4

0 500

1000

1500

2000

500

1000

1500

2000

500

1000

1500

2000

Fig. 4. pH neutralization process data. (a) HAC, u1 ðtÞ. (b) NaOH, u2 ðtÞ. (c) pH, y(t).

change of the valve size. These oscillations are caused by mechanical resonances in the robot arm [55]. pH neutralization process: This benchmark dataset consists of simulation data of a pH neutralization process in a constant volume stirring tank [56]. For the task, the underlying MISO (multiple inputs single output) system is modelled by a NARX model where the output variable corresponds to the pH of the solution in the tank, and the inputs u1 ðtÞ and u2 ðtÞ are the acid solution (HAC) and the base solution (NaOH), respectively. Fig. 4 illustrates the discrete input and output time-series with 10 s of sampling period. 6.2. Experiments with outlier-free data In this section, we evaluate performance of the proposed regional models (R2ELM, RELM, R2LM and RLM) on outlier-free data and we compare their performances with those of an ELMbased global model (ELM/OLS), a global linear model adapted by OLS (ARX/OLS) and two local models: the KSOM model [29] and the LLM model [57]. Additionally, aiming at a comprehensive and fair performance evaluation of all models against the proposed robust regional models, we implement a robust variant of the global ARX model using the M-estimation framework. The resulting model is referred to as the ARX/IRLS model. By the same token, we implement a robust variant of the ELM proposed by Horata et al. [58] (ELM/IRLS). For the sake of completeness, we also compare the proposed regional models with the clustering-based approach by Wang and Syrmos [40], henceforth referred to as Direct-Clustering (DirClust) approach. In the DirClust approach, a clustering algorithm (e.g. K-means) is applied directly to the training data samples, while an optimal

number of clusters is searched for, using for example the DB cluster validity index. Regression models are then constructed, one for each cluster, using the subset of data samples mapped to a given cluster. A related approach has also been developed by Ferrari-Trecate et al. [59]. These direct-clustering approaches can be considered a variant of regional modelling, without the intermediate step of clustering the SOM prototypes. The hyper-parameters for all the methods have been selected using the hold-out validation methodology. For the task, we split the whole dataset into training and test sets, with 70% and 30% of the samples respectively. Throughout the validation procedure, 20% of the training samples are randomly selected in order to estimate the generalization error. After the selection of the optimal hyper-parameters, the whole training set is used to build the final model. Moreover, the datasets have been evaluated without any pre-processing step. The hidden neurons in the global ELM use a sigmoid logistic activation function and the input-to-hidden layer weights are randomly selected from a normal distribution with zero mean and unit variance. The number of hidden neurons is searched within the set f22 ; 23 ; 24 ; …; 29 g and the number that produces the minimum MSE on the validation set is selected. The same configuration is used for ELMs in the regional modelling approach (RELM and R2ELM). The regional approaches have been trained using the default settings from the SOM Matlab toolbox [60], with the number of pffiffiffiffi prototypes given by 5 N , where N corresponds to the number of training samples. By using the default settings, we expect that if the regional modelling paradigm provides acceptable performance using its basic configuration, then such property is a good indicative to use the method. For the clustering step required by the proposed

38

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

regional models and by the DirClust approach, we use the K-means algorithm with the maximum number of clusters given by the square root of the number of units to be clustered. For the regional pffiffiffiffi approaches they correspond to the SOM units (K max ¼ C ), while for pffiffiffiffithe DirClust approach they are the learning samples (K max ¼ N ). As originally proposed, for the SOM-based local models (KSOM and LLM), we use the sequential training of a one-dimensional SOM with the same amount of prototypes used for the regional pffiffiffiffi approaches, i.e. C ¼ 5 N . The initial and final learning rates are set to α0 ¼ 0:1 and αT ¼ 0:01. The initial and final values of the neighborhood function radius are σ 0 ¼ 10 and σ T ¼ 0:01. The learning rate α0 is set to 0.01 and the number of epochs is 100. For the KSOM model, we also perform an exhaustive search on the parameter K ranging from 5% to 20% (step size of 5%) of the number of training samples. For the robust models (R2LM, R2ELM, ARX/IRLS and ELM/IRLS), we use the IRLS estimation method with Huber's weighting function with the tuning parameter ke being defined as exposed in Section 5. Regarding the memory order terms of each dataset, we set q¼ 1 and p ¼1 directly for the artificial plant. For the hydraulic actuator plant we use the same memory order selected by Sjöberg et al. [55]: p ¼3 and q ¼2. With respect to the pH process data, we carried out grid-search over the range f1; 2; …; 5g, and then we selected the combination of q1, q2 and p that minimizes the Minimum Description Length (MDL) for the ARX/OLS model, thus resulting on q1 ¼ 1, q2 ¼ 1 and p¼ 2. The choice based on the ARX/ OLS performance is due to the fact that such a model is still the mainstream on system identification and it consists of one of the simplest choices for modelling. All the models are evaluated using the mean and standard deviation of the resulting MSE (Mean Squared Error) over 30 independently drawn test sets. In addition, objecting to compare the accuracy achieved by the best performing model to all the other methods from a statistical point of view, we carried out the nonparametric two-sample Kolmogorov–Smirnov (KS) test. The

null hypothesis corresponds to random variables coming from the same distribution. The random variables in this case are the residuals. The results are reported in Table 1. On the basis of the reported results for the artificial plant, the best performances were achieved by the ELM-based models, including both global and regional approaches. All the ELMbased models are equivalent based on the KS test. It is worth pointing out that since the best performances have been achieved by models using a nonlinear neural network as building block (ELM/OLS, ELM/IRLS, RELM and R2ELM), this may be indicative of the highly nonlinear nature of the underlying system. This also gives us insight into the fact that the LLM model achieved better performance on average than those provided by the regional linear approaches, indicating that a large number of linear models were beneficial. The worst performance was achieved by the global ARX/OLS model. Moreover, the use of M-estimation has not considerably degraded the performance of the standard approaches. Fig. 5 illustrates the test results when estimating the response with instances of the three modelling paradigms considered in this work: global, local and regional modelling. As one can observe, a single (i.e. global) linear model is not able to learn the dynamic whereas the reconstruction provided by the RELM is very accurate. For the artificial dataset, we can evaluate the assumptions taken about the residuals. In this case, the modelling assumption is that the sequence of residuals should be normally distributed. By analyzing the sequence of residuals produced by a model using the test data, the user can assess the degree of matching between the statistical properties of the sequence of residuals and the theoretic modelling assumptions. This analysis can be carried out qualitatively by plotting the histogram of the sequence of residuals and quantitatively by applying a Gaussianity test. The histograms are shown in Fig. 6 for the RELM, RLM, ELM/OLS, LLM, KSOM and ARX/OLS models. On top of the histograms are the KS tests applied to the residuals in order to ensure Gaussianity. As one may see, the assumptions of Gaussianity of the sequence of residuals are satisfied by the RELM and ELM/OLS

Table 1 Test performance: MSE statistics and Kolmogorov–Smirnov tests (✓ for accept and  for reject). The best performing models are in boldface. Models

Artificial Plant

Actuator

pH process

R2ELM

1.12e  2 7 1.27e  3 ✓

2.57e  2 7 2.45e  2 

2.85e  2 7 2.03e  2 

R2LM

8.86e  2 7 8.28e  2 

7.89e  3 7 5.42e  4

3.10e 3 7 1.37e  4

RELM

1.07e  2 7 6.76e  4 ✓

3.54e  2 7 3.87e  2 

8.05e  2 7 5.06e  2 

RLM

8.66e  2 7 8.02e  2 

8.70e 3 7 8.77e 4 

2.82e  2 7 4.58e  3 

DirClust [40]

1.19e  1 78.08e  2 

7.93e  3 7 3.63e  4 

2.62e  2 7 1.06e  17 

KSOM

5.11e  1 7 1.55e  2 

1.02e  2 7 3.53e  18 

1.01e  1 7 4.23e  2 

LLM

3.35e 2 7 2.88e  3 

2.21e  2 7 2.06e  4 

9.00e  2 7 1.06e  2 

ELM/IRLS

1.07e  2 7 6.54e  4 ✓

8.79e  3 7 1.66e  3 

1.59e  2 71.14e  2 

ARX/IRLS

5.12e  1 7 1.61e  2 

8.70e 3 7 1.76e  18 

4.87e  3 7 8.82e  19 

ELM/OLS

1.06e  2 7 5.89e  4

8.52e  3 7 1.96e  3 

1.09e  1 71.06e  1 

ARX/OLS

5.12e  1 7 1.61e  2 

9.19e  3 7 3.53e  18 

5.17e  2 7 1.41e  17 

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

39

1.5

1.5

0.6

1

1

0.4

0.5

0.5

0

0

0.2 0 −0.2

−0.5

−0.5

−1

−1

−1.5 −1.5

−1.5 −1.5

−1

−0.5

0

0.5

1

1.5

−0.4 −0.6

−1

−0.5

0

0.5

1

1.5

−0.8 −1.5

−1

−0.5

0

0.5

1

1.5

Fig. 5. Artificial plant: responses vs. estimates. (a) RELM. (b) LLM. (c) ARX/OLS.

KS test :Normal

KS test :Not Normal

4500

4500

4000

4000

3500

3500

3000

3000

2500

2500

2000

2000

1500

1500

1000

1000

500

500

KS test :Normal

3000 2500 2000 1500 1000 500

0 −1

−0.5

0

0.5

0 −1.5

KS test :Not Normal

−1

−0.5

0

0.5

1

1.5

0 −0.5

KS test :Not Normal

1600

1600

3500

1400

1400

3000

1200

1200

2500

1000

1000

2000

800

800

1500

600

600

1000

400

400

500

200

200

−0.5

0

0.5

1

0 −3

−2

−1

0.5

KS test :Not Normal

4000

0 −1

0

0

1

2

3

0 −3

−2

−1

0

1

2

3

Fig. 6. Artificial plant: histogram of the residuals and Gaussianity test. (a) RELM. (b) RLM. (c) ELM/OLS. (d) LLM. (e) KSOM. (f) ARX/OLS.

models. Interestingly, the MSEs achieved by such models tend to the variance of the noise and thus also to the smallest MSE that any regression model can achieve without over-fitting. The residuals for the ARX/OLS model deviate from Gaussianity considerably. By visual inspection, one might argue that the residuals provided by the LLM and RLM models follow a normal distribution, but such hypotheses have been rejected based on the KS test. For the hydraulic actuator data, the R2LM achieved the smallest average MSE value, followed by the DirClust approach and the ELM/OLS model. In addition, the standard deviation value of the R2LM is smaller than those produced by the DirClust approach and the ELM/OLS model. The use of nonlinear models has not allowed a significant improvement in comparison to a single linear model. It is worth mentioning that the poor performance of the RELM and R2ELM was mainly due to ill conditioning problems presented by the regional ELMs during the model-building phase. The Mestimation framework has slightly improved over the corresponding OLS-based models even though there are no explicit outliers.

With respect to the estimated outputs, we show in Fig. 7 typical estimated output time series provided by a regional (RLM model), a local (LLM) and a global (ARX/OLS) model for the testing time series. All the models are able to reconstruct the system dynamics, even though the LLM has not accurately detected the peak around the sample 210. For the pH process plant, the R2LM provided the smallest MSE. The worst results were achieved by the ELM/OLS and K-SOM. For this dataset, the robust approaches seem to perform better than the classical ones. The use of the M-estimation framework improves almost one order of magnitude of the MSE value in comparison to the non-robust models, both regional and global. Again, as the same as for the hydraulic actuator data, ELM-based models have not improved over linear ones. Fig. 8 illustrates output estimates for the R2LM, KSOM and ARX-OLS models. As aforementioned, the R2LM better describes the system dynamic whereas the KSOM model presents a very varying response, thus providing a large reconstruction error.

40

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

3

Observations RLM (MSE:9.17e−03)

3

Observations LLM (MSE:2.16e−02)

2

2

2

1

1

1

0

0

0

−1

−1

−1

−2

−2

−2

−3

−3 50

100

150

200

250

Observations ARX−OLS (MSE:9.19e−03)

3

−3

300

50

100

150

200

250

300

50

100

150

200

250

300

Fig. 7. Estimates for the actuator plant. (a) RLM. (b) LLM. (c) ARX/OLS.

11.5

11.6

11.6

11.4 11.4

11.4

11.3 11.2

11.2 11.1

11

11

10.8

10.9

11.2 11 10.8

10.6

10.6

10.8 10.4

10.7

Observations R2LM (MSE:3.06e−03)

10.6 100

200

300

400

500

10.4

Observations KSOM (MSE:1.30e−01)

10.2 100

200

300

400

Observations ARX−OLS (MSE:5.17e−02)

100

500

200

300

400

500

Fig. 8. Estimates for the pH process data. (a) R2LM. (b) KSOM. (c) ARX/OLS.

14

Regional modeling Direct clustering

8

12

7

10

6

8

5

Regional modeling Direct clustering

3

Regional modeling Direct clustering

2.8 2.6 2.4 2.2 2 1.8

6

4

4

3

2

2

1.6 1.4 1.2

0

5

10

15

20

25

30

1 0

5

10

15

20

25

30

0

5

10

15

20

25

30

Fig. 9. Optimal number of clusters (Kopt) found during training phase for the regional and the DirClust approaches. (a) Artificial plant. (b) Actuator. (c) pH process.

An interesting experiment consists in comparing the number of clusters found by the regional and the DirClust approaches. The results are shown in Fig. 9 for the 30 independent training runs. In this figure, we observe that the regional modelling approach generates, on average, a higher number of clusters than the DirClust approach. This behavior may be indicative that the use of the intermediate step of SOM clustering is essential for discovering important sub-domains (i.e. regions) within the input space which are useful for data modelling purposes. Another advantage of clustering the SOM instead of performing it directly on the data is the computational time [1]. Interestingly, both approaches achieved the same optimal number of clusters for the pH process data. This explains the similar results provided by the RLM and DirClust models in Table 1. Since the amount of available data in each partition of the regional modelling is obviously smaller than the amount in the entire input space, one may also expect a smaller number of

optimal hidden units in the RELM in comparison to a global ELM/ OLS model. In order to verify such assertion, we show in Fig. 10 the optimal number of hidden neurons found during the training phase for each independent run. As expected, the regional modelling provides smaller ELMs. It indicates that the computational complexity of the RELM does not necessarily grow as we increase the number of regions. Actually, the use of many smaller ELMs can even reduce the overall computational time in contrast to a single global ELM network, since such complexity grows linearly with the number of regions but quadratically with the number of hidden units. 6.3. Experiments with outliers In order to evaluate the proposed methods in the presence of outliers, we corrupt the output time-series with impulsive normally distributed outliers, whose form is N ðδ; 1Þ, to replace the

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

RELM ELM−OLS

7

RELM ELM−OLS

7

6

6.5

6

6

5.5

5.5

5

5

5.5

4.5

4.5

4

4

5

3.5

3.5

4.5

3

3

2.5

2.5

2

4 0

5

10

15

20

25

30

RELM ELM−OLS

7

6.5 6.5

41

2 0

5

10

15

20

25

30

0

5

10

15

20

25

30

Fig. 10. Optimal number of hidden units found during validation phase for the ELM-OLS and RELM models. (a) Artificial plant. (b) Actuator. (c) pH process.

Table 2 Test performance for the artificial plant with outliers: MSE statistics and Kolmogorov–Smirnov tests (✓ for accept and  for reject). The best performing models are in boldface. Models

N o ¼ 5%

N o ¼ 15%

N o ¼ 30%

R2ELM

1.22e  2 7 2.24e 3 ✓

1.80e  2 7 1.67e 2 ✓

5.72e  2 7 4.46e  2 ✓

R2LM

8.50e  2 78.19e  2 

8.67e  2 7 8.07e 2 

1.38e  1 7 9.16e  2 

RELM

1.91e  2 7 3.11e  3 

4.64e  2 7 9.65e  3 

1.28e  1 7 2.32e  2 

RLM

8.76e  2 7 8.07e 2 

1.08e  1 7 7.80e  2 

2.03e  1 7 7.38e  2 

DirClust

1.35e 1 7 7.79e  2 

1.26e  1 7 7.89e  2 

1.91e  1 7 7.51e  2 

KSOM

5.12e  1 7 1.46e  2 

5.17e  1 7 1.49e  2 

5.17e  1 71.58e  2 

LLM

3.97e  2 7 3.89e  3 

6.92e  2 78.47e  3 

1.51e  1 7 2.01e  2 

ELM-IRLS

1.15e  2 7 2.77e 3

1.52e  2 7 7.72e  3

5.66e 2 7 7.63e  2

ARX-IRLS

5.12e  1 7 1.43e  2 

5.20e  1 71.63e  2 

5.52e  1 7 1.91e  2 

ELM-OLS

1.68e  2 7 3.25e  3 

4.80e  2 71.38e  2 

1.36e  1 7 4.79e  2 

ARX-OLS

5.12e  1 7 1.42e  2 

5.23e  1 71.73e  2 

5.65e  1 7 2.07e 2 

output values. In the experiments, we adopt δ ¼ μy þ σ y , where μy denotes the mean value of the output-time series y(t) and σy is the corresponding standard deviation. It is important to mention that the contamination of the dynamical systems with outliers is randomly applied to a portion of the output samples only. Again, we adopt exactly the same parameter settings and methodology used in the experiments without outliers (Section 6.2). The performance metric used is the approximation error to the actual (i.e. observed) response. Thus, all models are evaluated using the mean and standard deviation of the resulting MSE values between the estimated and the actual output time series over 30 independently drawn test sets. All models are evaluated for simulated scenarios corresponding to 5%, 15% and 30% of the total of samples of the training output time series being replaced by outliers. There are no outliers in the portion of the output time series used for testing the trained models. The rationale for this approach is that we want to evaluate how efficient is a given

model in learning the system dynamics when outliers are present in the training data. The results for the experiments with the artificial plant corrupted by outliers are reported in Table 2. As one can observe, the best models are the ELM/IRLS and R2ELM models, for all the three cases. In addition, the R2ELM and ELM/IRLS models have achieved MSE values at the same order of magnitude for all the cases. It is worth pointing out the improvements caused by the use of Mestimation techniques by some models, a fact that becomes even more significant as we add more and more outliers. For example, the performance difference between the ARX/OLS and ARX/IRLS models, as well as for the RLM and R2LM, increases proportionally to the number of outliers. We show in Fig. 11 typical estimates provided by the robust models for the case corresponding to 30% of outliers. The models built using nonlinear models (R2ELM and ELM/IRLS) estimate better outputs than those built using linear models (R2LM and

42

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

1.5

1.5

1

1

0.5

0.5

0

0

−0.5

−0.5

−1

−1

−1.5 −1.5

−1

−0.5

0

0.5

1

1.5

−1.5 −1.5

0.5

1.5

0.4

1

−1

−0.5

0

0.5

1

1.5

−1

−0.5

0

0.5

1

1.5

0.3 0.5 0.2 0 0.1 −0.5 0 −1

−0.1 −0.2 −1.5

−1

−0.5

0

0.5

1

1.5

−1.5 −1.5

Fig. 11. Typical estimates provided by robust (regional and global) models trained with data from the artificial plant corrupted by 30% of outliers. (a) R2ELM. (b) R2LM. (c) ARX/IRLS. (d) ELM/IRLS. Table 3 Test performance for the hydraulic actuator plant with outliers: MSE statistics and Kolmogorov–Smirnov tests (✓ for accept and  for reject). The best performing models are in boldface. Models

N o ¼ 5%

N o ¼ 15%

N o ¼ 30%

R2ELM

7.70e 2 76.57e  2 

2.59e  1 7 7.43e  1 

9.37e  1 72.39e þ 0 

R2LM

2.87e  2 71.01e  2 

1.70e 1 7 2.73e  1 

4.72e  1 7 3.81e  1 

RELM

1.06e  1 7 7.14e  2 

3.53e  1 7 5.03e  1 

5.84e  1 7 3.26e  1 

RLM

8.34e  2 72.08e  2 

3.10e  1 7 3.50e  1 

6.62e  1 7 4.01e  1 

DirClust

8.45e  2 72.15e  2 

2.78e  1 7 3.18e  1 

1.12e þ 0 7 1.32e þ 0 

KSOM

2.68e  1 7 1.05e  1 

4.51e  1 71.52e  1 

6.75e  1 7 2.53e  1 

LLM

8.98e  2 77.59e  2 

3.41e  1 7 1.58e  1 

2.24eþ0 7 2.20e þ0 

ELM-IRLS

1.76e  2 7 5.85e  3

7.29e  2 7 5.08e  2

2.64e  1 7 2.46e  1 

ARX-IRLS

2.33e  2 7 6.95e  3 

9.86e  2 7 2.27e  2 

2.37e  1 7 3.25e  2

ELM-OLS

5.60e  2 71.94e  2 

2.01e  1 7 9.46e  2 

4.14e  1 7 8.19e  2 

ARX-OLS

9.68e  2 7 1.69e  2 

2.14e  1 7 3.09e  2 

4.73e  1 7 6.10e  2 

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

ARX/IRLS). By visual inspection of Fig. 11, one can notice that the ELM/IRLS and R2ELM are able to reconstruct the system dynamic even with 30% of outliers in the training samples. The performance results for the hydraulic actuator with outliers are reported in Table 3. Generally speaking, the best performances were achieved by global models using M-estimation: ELM/ IRLS (N o ¼ 5% and N o ¼ 15%) and global ARX/IRLS (N o ¼ 30%). Similar to the case without outliers, a single linear model is one of the best models in reconstructing the system dynamics considering one step ahead predictions. As expected, again the robust extensions have improved over the corresponding non-robust methods. On the side of the spectrum, the local models (LLM and DirClust) seem to be very sensitive to outliers, since that for the case with 30% of outliers such models presented the highest MSE values. Comparing the RLM with the DirClust, the performance results are rather similar, except for the case with N o ¼ 30% where the RLM reported a better performance. In Fig. 12, we show typical estimates provided by robust models for the hydraulic actuator data corrupted by 30% of outliers in its training samples. One may observe that all the approaches are not able to reconstruct the dynamics accurately, with the difference that the ELM/IRLS is the model that best approximates the peak around the sample number 210. The estimates provided by the regional approaches are strongly deformed in this case. The performance results for the pH process plant corrupted by outliers are reported in Table 4. The best model overall is the R2LM, which has achieved the smallest MSE values on average for

3

Observations R2ELM (MSE:6.55e−01)

all the three cases. Moreover, the null hypotheses have been rejected for all the tests. Interestingly, the ARX-IRLS models seem to be very sensitive to outliers since their performance considerably degrades as we increase the amount of outliers, whereas the R2LM and ELM-IRLS performances are less sensitive in this case. For this dataset, it is worth mentioning the importance of the Mestimation framework, since it has significantly improved the classical linear and nonlinear approaches. In Fig. 13, we show estimates for the ARX-IRLS, ELM-IRLS, R2LM and R2ELM models when estimating the test output time-series of the pH process plant with 30% of outliers. One may observe that the R2LM is the one that better captures the trend even though it assigns high frequency to time periods where it was suppose to be low frequency. All the other models do not capture the system dynamics.

7. Conclusions In this paper we have introduced the regional modelling framework, a novel approach for nonlinear dynamical system identification based on the SOM. The proposed methodology led to the design of regional models, which stands in between the global and local models. Regional modelling extends the two-level clustering approach by Vesanto and Alhoniemi [1] to regression problems, more specifically, to system identification. The first step in regional modelling requires the partition of the input space

3

2

2

1

1

0

0

−1

−1

−2

−2

Observations R2LM (MSE:9.68e−01)

−3

−3 50

3

43

100

150

200

250

300

Observations ARX−IRLS (MSE:2.36e−01)

50

3

2

2

1

1

0

0

−1

−1

−2

−2

100

150

200

250

300

200

250

300

Observations ELM−IRLS (MSE:1.99e−01)

−3

−3 50

100

150

200

250

300

50

100

150

Fig. 12. Typical estimated output time series for robust (regional and global) models trained with data from the hydraulic actuator plant corrupted by 30% of outliers. (a) R2ELM. (b) R2LM. (c) ARX/IRLS. (d) ELM/IRLS.

44

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

Table 4 Test performance for the pH process plant with outliers: MSE statistics and Kolmogorov–Smirnov tests (✓ for accept and  for reject). The best performing models are in boldface. Models

N o ¼ 5%

N o ¼ 15%

N o ¼ 30%

R2ELM

3.55e  2 71.55e  2 

3.90e  2 7 1.88e  2 

8.07e 2 7 2.55e  2 

R2LM

6.01e  3 72.25e  3

3.16e  2 7 5.14e  3

6.05e  2 7 1.18e  2

RELM

8.21e  2 76.38e  2 

8.74e  2 7 3.90e  2 

1.51e  1 7 6.15e  2 

RLM

1.18e  1 7 2.92e  2 

1.21e  1 7 6.62e  2 

2.25e  1 7 4.00e  2 

DirClust

1.31e  1 7 2.14e  2 

1.94e  1 7 9.85e  2 

2.26e  1 7 3.78e  2 

KSOM

6.14e  1 71.61e  1 

9.58e  1 7 2.18e  1 

1.19e þ0 7 1.61e  1 

LLM

7.16e  2 7 4.18e  2 

8.05e  2 7 2.47e  2 

1.68e  1 73.91e  2 

ELM-IRLS

4.88e  2 77.30e  2 

5.38e  2 7 3.09e  2 

9.34e  2 7 4.68e  2 

ARX-IRLS

8.14e  3 73.75e  4 

3.30e  2 7 1.09e  2 

1.54e  1 7 1.77e 2 

ELM-OLS

2.40e  1 76.11e  1 

2.70e  1 7 4.82e  1 

2.05e  1 7 1.36e  1 

ARX-OLS

1.99e  1 7 2.42e  2 

3.64e  1 73.04e  2 

4.34e  1 7 3.89e  2 

11.6

11.6 11.5

11.5

11.4

11.4

11.3

11.3

11.2

11.2

11.1

11.1

11

11

10.9

10.9 10.8

10.8 Observations R2ELM (MSE:1.09e−01)

10.7 100

200

300

400

Observations R2LM (MSE:2.74e−02)

10.7

500

100

11.6

12

11.4

11.8

200

300

400

500

11.2 11.6 11 11.4 10.8 11.2

10.6

11

10.4 10.2

Observations ARX−IRLS (MSE:1.57e−01)

10 100

200

300

400

500

Observations ELM−IRLS (MSE:8.70e−02)

10.8 100

200

300

400

500

Fig. 13. Typical estimated output time series for robust (regional and global) models trained with data from the pH process plant corrupted by 30% of outliers. (a) R2ELM. (b) R2LM. (c) ARX/IRLS. (d) ELM/IRLS.

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

using the SOM, and then performs clustering over the prototypes of the trained SOM. In the second step, regional regression models are built over the clusters (i.e. over the regions) of SOM prototypes. In this paper, the optimal number of clusters of SOM prototypes was searched for using the Davies–Bouldin validity index. Using the proposed framework, we built regional linear and nonlinear regression models. For the linear case, we used ARX models whose parameters were estimated using the ordinary least-squares method. Regional NARX models were built using the ELM network. Additionally, we extended the regional modelling framework to handle data with outliers by developing robust variants of the proposed regional models through the use of Mestimation. It is worth noticing that the regional paradigm is also inherently able to operate on multidimensional responses (multiple input multiple output systems), since it just depends on the regression model to be used into the regional approach. In this paper, both a linear regression model and the Extreme Learning Machine deal with multiple output systems. Comprehensive performance evaluation of the proposed regional models on a synthetic and a benchmarking real-world dataset was carried out. The obtained results have shown that the proposed regional modelling framework constitutes a viable and competitive alternative to the standard global and local modelling approaches to nonlinear dynamical system identification. Currently, we are evaluating the effects in performance when the SOM is replaced with another neural vector quantization network, such as the neural gas network [61] or the PLSOM [62]. We are also developing adaptive variants of the proposed regional models using growing self-organizing neural networks [63] in order to provide an online version of the regional models.

Acknowledgments The authors thank CNPq (grant 309841/2012-7) for the financial support and NUTEC (Fundação Núcleo de Tecnologia Industrial do Ceará) for providing the laboratory infrastructure for the execution of the research activities reported in this paper.

Appendix A. The Davies–Bouldin index The DB index is a function of the ratio of the sum of withincluster scatter to between-cluster separation. Let Ci be a cluster of vectors. First, we define the within i-th cluster scatter as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Si ¼ ∑ J x mi J 2 ; ðA:1Þ Ni x A C i where mi is the centroid of cluster Ci, Ni is the number of data points in this cluster and J  J denotes the Euclidean norm. Then, we compute the distance (M i;j ) between the i-th and j-th clusters as M i;j ¼ J mi  mj J ;

ðA:2Þ

where mj is the centroid of cluster Cj. Finally, if K is the number of clusters, the DB index is defined as the ratio of Si and M i;j such that DBðKÞ ¼

1 K ∑ D; Ki¼1 i

ðA:3Þ

where

  Si þ Sj Di ¼ max : M i;j j:i a j

ðA:4Þ

45

References [1] J. Vesanto, E. Alhoniemi, Clustering of the self-organizing map, IEEE Trans. Neural Netw. 11 (2000) 586–600. [2] H. Peng, K. Nakano, H. Shioya, A comprehensive review for industrial applicability of artificial neural networks, IEEE Trans. Control Syst. Technol. 15 (2007) 130–143. [3] K.S. Narendra, K. Parthasarathy, Identification and control of dynamical systems using neural networks, IEEE Trans. Neural Netw. 1 (1990) 4–27. [4] S. Chen, S.A. Billings, P.M. Grant, Nonlinear system identification using neural networks, Int. J. Control 51 (1990) 1191–1214. [5] K.S. Narendra, Neural networks for control theory and practice, Proc. IEEE 84 (1996) 1385–1406. [6] G.A. Barreto, A.F.R. Araújo, Identification and control of dynamical systems using the self-organizing map, IEEE Trans. Neural Netw. 15 (2004) 1244–1259. [7] T. Takagi, M. Sugeno, Fuzzy identification of systems and its application to modeling and control, IEEE Trans. Syst. Man Cybern. 15 (1985) 116–132. [8] E. Kim, H. Lee, M. Park, M. Park, A simply identified Sugeno-type fuzzy model via double clustering, Inf. Sci. 110 (1998) 25–39. [9] B. Rezaee, M. Fazel Zarandi, Data-driven fuzzy modeling for Takagi–Sugeno– Kang fuzzy system, Inf. Sci. 180 (2010) 241–255. [10] C.A.M. Lima, A.L.V. Coelho, F.J. Von Zuben, Hybridizing mixtures of experts with support vector machines: investigation into nonlinear dynamic systems identification, Inf. Sci. 177 (2007) 2049–2074. [11] M.F. Azeem, M. Hanmandlu, N. Ahmad, Generalization of adaptive neuro-fuzzy inference systems, IEEE Trans. Neural Netw. 11 (2000) 1332–1346. [12] R. Babuška, H. Verbruggen, Neuro-fuzzy methods for nonlinear system identification, Annu. Rev. Control 27 (2003) 73–85. [13] J. de Jesús Rubio, SOFMLS: online self-organizing fuzzy modified least-squares network, IEEE Trans. Fuzzy Syst. 17 (2009) 1296–1309. [14] J.-Q. Chen, Y.-G. Xi, Nonlinear system modeling by competitive learning and adaptive fuzzy inference system, IEEE Trans. Syst. Man Cybern. Part C 28 (1998) 231–238. [15] M. Norgaard, O. Ravn, N.K. Poulsen, L.K. Hansen, Neural Networks for Modelling and Control of Dynamic Systems, Springer-Verlag, London, 2000. [16] J. Abonyi, Fuzzy Model Identification for Control, Birkhäuser, Boston, 2003. [17] X. Han, W.F. Xie, Z.J. Fu, W.D. Luo, Nonlinear systems identification using dynamic multi-time scales neural networks, Neurocomputing 74 (2011) 3428–3439. [18] W.F. Xie, Y.Q. Zhu, Z.Y. Zhao, Y.K. Wong, Nonlinear system identification using optimized dynamic neural network, Neurocomputing 72 (2009) 3277–3287. [19] W. Yu, Nonlinear system identification using discrete-time recurrent neural networks with stable learning algorithms, Inf. Sci. 158 (2004) 131–157. [20] X. Li, W. Yu, Dynamic system identification via recurrent multilayer perceptrons, Inf. Sci. 147 (2002) 45–63. [21] S. Lawrence, A.C. Tsoi, A.D. Back, Function approximation with neural networks and local methods: bias, variance and smoothness, in: Australian Conference on Neural Networks (ACNN), 1996, pp. 16–21. [22] C. Hametner, S. Jakubek, Local model network identification for online engine modelling, Inf. Sci. 220 (2013) 210–225. [23] G. Gregorčič, G. Lightbody, Nonlinear system identification: from multiplemodel networks to gaussian processes, Eng. Appl. Artif. Intell. 21 (2008) 1035–1055. [24] G. Gregorčič, G. Lightbody, Local model network identification with Gaussian processes, IEEE Trans. Neural Netw. 18 (2007) 1404–1423. [25] R. Shorten, R. Murray-Smith, R. Bjørgan, H. Gollee, On the interpretation of local models in blended multiple model structures, Int. J. Control 72 (1999) 620–628. [26] E.N. Skoundrianos, S.G. Tzafestas, Fault diagnosis via local neural networks, Math. Comput. Simul. 60 (2002) 169–180. [27] R. Murray-Smith, H. Gollee, A constructive learning algorithm for local model networks, in: Proceedings of the IEEE Workshop on Computer-Intensive Methods in Control and Signal Processing, 1994, pp. 21–29. [28] S.E. Papadakis, V.G. Kaburlasos, Piecewise-linear approximation of non-linear models based on probabilistically/possibilistically interpreted intervals' numbers (INs), Inf. Sci. 180 (2010) 5060–5076. [29] L.G.M. Souza, G.A. Barreto, On building local models for inverse system identification with vector quantization algorithms, Neurocomputing 73 (2010) 1993–2005. [30] J. Liu, D. Djurdjanovic, Topology preservation and cooperative learning in identification of multiple model systems, IEEE Trans. Neural Netw. 19 (2008) 2065–2072. [31] A. Fatehi, K. Abe, Flexible structure multiple modeling using irregular selforganizing maps neural network, Int. J. Neural Syst. 18 (2008) 347–370. [32] J. Cho, J. Principe, D. Erdogmus, M. Motter, Quasi-sliding mode control strategy based on multiple linear models, Neurocomputing 70 (2007) 962–974. [33] J. Cho, J. Principe, D. Erdogmus, M. Motter, Modeling and inverse controller design for an unmanned aerial vehicle based on the self-organizing map, IEEE Trans. Neural Netw. 17 (2006) 445–460. [34] J. Abonyi, S. Nemeth, C. Vincze, P. Arva, Process analysis and product quality estimation by self-organizing maps with an application to polyethylene production, Comput. Ind. 52 (2003) 221–234. [35] J.C. Principe, L. Wang, M.A. Motter, Local dynamic modeling with selforganizing maps and applications to nonlinear system identification and control, Proc. IEEE 86 (1998) 2240–2258.

46

A.H. Souza Júnior et al. / Neurocomputing 147 (2015) 31–46

[36] T. Kohonen, Essentials of the self-organizing map, Neural Netw. 37 (2013) 52–65. [37] M. van Hulle, Self-organizing maps, in: G. Rozenberg, T. Baeck, J. Kok (Eds.), Handbook of Natural Computing: Theory, Experiments, and Applications, Springer-Verlag, Berlin Heidelberg, 2010, pp. 1–45. [38] H. Yin, The self-organizing maps: background, theories, extensions and applications, in: J. Fulcher, L.C. Jain (Eds.), Computational Intelligence: A Compendium, Studies in Computational Intelligence, vol. 115, SpringerVerlag, Berlin Heidelberg, 2008, pp. 715–762. [39] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (2006) 489–501. [40] X. Wang, V.L. Syrmos, Nonlinear system identification and fault detection using hierarchical clustering analysis and local linear models, in: Proceedings of the 15th Mediterranean Conference on Control & Automation (MED)'07, 2007, pp. T23–015. [41] A.H. Silva Jr., F. Corona, G.A. Barreto, Robust regional modeling for nonlinear system identification using self-organizing maps, in: P.A. Estévez, J.C. Príncipe, P. Zegers (Eds.), Advances in Self-Organizing Maps, vol. 198, Springer, Berlin Heidelberg, 2013, pp. 215–224. [42] J. MacQueen, Some methods for classification and analysis of multivariate observations, in: L.M. Le Cam, J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, University of California Press, Berkeley, CA, pp. 281–297. [43] D.L. Davies, D.W. Bouldin, A cluster separation measure, IEEE Trans. Pattern Anal. Mach. Intell. 1 (1979) 224–227. [44] M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques, J. Intell. Inf. Syst. 17 (2001) 107–145. [45] P.J. Huber, Robust estimation of a location parameter, Ann. Math. Stat. 35 (1964) 73–101. [46] P.J. Huber, Robust Statistics, John Wiley & Sons, New York, 2004. [47] L. Ljung, System Identification: Theory for the User, 2nd edition, Prentice-Hall, Englewood Cliffs, NJ, 1999. [48] T.K. Kohonen, E. Oja, O. Simula, A. Visa, J. Kangas, Engineering applications of the self-organizing map, Proc. IEEE 84 (1996) 1358–1384. [49] A.K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett. 31 (2010) 651–666. [50] S. Wu, T.W.S. Chow, Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density, Pattern Recognit. 37 (2004) 175–188. [51] Y. Miche, A. Sorjamaa, P. Bas, O. Simula, C. Jutten, A. Lendasse, OP-ELM: optimally pruned extreme learning machine, IEEE Trans. Neural Netw. 21 (2010) 158–162. [52] J. Fox, S. Weisberg, An R Companion to Applied Regression, 2nd edition, Sage Publications, Thousand Oaks, 2010. [53] R. Salas, S. Moreno, H. Allende, C. Moraga, A robust and flexible model of hierarchical self organizing maps for nonstationary environments, Neurocomputing 70 (2007) 2744–2757. [54] A.M. Muñoz, J. Muruzábal, Self-organizing maps for outlier detection, Neurocomputing 18 (1998) 33–60. [55] J. Sjöberg, Q. Zhang, L. Ljung, A. Benveniste, B. Deylon, P.-Y. Glorennec, H. Hjalmarsson, A. Juditsky, Non-linear black-box modeling in system identification: a unified overview, Automatica 31 (1995) 1691–1724. [56] J. Abonyi, F. Szeifert, System identification using Delaunay tessellation of selforganizing maps, in: Neural Networks and Soft Computing, Advances in Soft Computing, vol. 19, Physica-Verlag HD, Heidelberg, 2003, pp. 680–685. [57] J. Walter, H. Ritter, K. Schulten, Non-linear prediction with self-organizing map, in: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'90), vol. 1, pp. 587–592. [58] P. Horata, S. Chiewchanwattana, K. Sunat, Robust extreme learning machine, Neurocomputing 102 (2013) 31–44. [59] G. Ferrari-Trecate, M. Muselli, D. Liberati, M. Morari, A clustering technique for the identification of piecewise affine systems, Automatica 39 (2003) 205–217. [60] J. Vesanto, J. Himberg, E. Alhoniemi, J. Parhankangas, Self-organizing map in Matlab: the SOM toolbox, in: Proceedings of the Matlab DSP Conference, 1999, pp. 35–40. [61] T. Martinetz, K. Schulten, Topology representing networks, Neural Netw. 7 (1994) 507–522.

[62] E. Berglund, J. Sitte, The parameterless self-organizing map algorithm, IEEE Trans. Neural Netw. 17 (2006) 305–316. [63] B. Fritzke, Growing grid—a self-organizing network with constant neighborhood range and adaptation strength, Neural Process. Lett. 2 (1995) 9–13.

Amauri Holanda Souza Júnior was born in Fortaleza, Ceará, Brazil, in 1986. He received his B.S. degree in Teleinformatics from the Federal Institute of Education, Science and Technology of Ceará in 2004, and the M.Sc. degree in Teleinformatics Engineering from the Federal University of Ceará in 2009. Currently, he is pursuing his Ph.D. degree in Teleinformatics Engineering at the same institution. He is also an assistant professor of the Department of Telematics, at the Federal Institute of Education, Science and Technology of Ceará, Campus of Maracanaú, Maracanaú, Ceará, Brazil. His main research interests are neural networks for pattern classification and regression, nonlinear system identification and computer programming.

Guilherme Barreto was born in Fortaleza, Ceará, Brazil, in 1973. He received his B.S. degree in Electrical Engineering from the Federal University of Ceará in 1995, and both the M.Sc. and Ph.D. degrees in Electrical Engineering from the University of São Paulo in 1998 and 2003, respectively. Currently, he is a full professor of the Department of Teleinformatics Engineering, Federal University of Ceará (UFC), Fortaleza, Ceará, Brazil. At this institution, Prof. Guilherme Barreto leads the Group of Advanced Machine Learning (GRAMA), whose members pursue a variety of research topics, such as neural networks & computational intelligence, pattern recognition & machine learning, nonlinear system identification, time series prediction, and intelligent robotics. More recently, members of GRAMA have been collaborating extensively with outstanding research groups in Portugal (FEUP), Spain (University of Granada), Germany (University of Bielefeld) and Finland (Aalto University). Prof. Barreto has been serving as a reviewer for several international journals (IEEE TNNLS, Neural Networks, Neurocomputing, Neural Processing Letters, Computers and Electrical Engineering, Biomedical Signal Processing and Control, and IEEE Sensors) and conferences (IJCNN, ESANN, IbPRIA, IDEAL, among others). He is also serving as the editor-inchief of the Journal Learning & Nonlinear Models (L&NLM) published by the Brazilian Computational Intelligence Society (http://sbic.ct.ufrn.br/) and as an associate editor of the Springer's International Journal of Machine Learning and Cybernetics (IJMLC).

Francesco Corona received the Laurea degree (M.Sc.) in Chemical Engineering and the Dottorato di Ricerca (Ph.D.) in Industrial Engineering from the University of Cagliari (Italy). He is now an Adjunct Professor in ‘Industrial Applications of Machine Learning’ at the Department of Information and Computer Science at the Aalto University (Finland). His research interest concentrates on Machine Learning technologies for process modelling, visualization, control and optimization. Specific domains are chemometrics and information visualization for industrial and environmental systems.