Recursive System Identification Using Outlier-Robust Local Models

Recursive System Identification Using Outlier-Robust Local Models

12th IFAC Symposium on Dynamics and Control of Process including Biosystems 12th IFACSystems, Symposium on Dynamics and Control of Process Systems, in...

496KB Sizes 0 Downloads 67 Views

12th IFAC Symposium on Dynamics and Control of Process including Biosystems 12th IFACSystems, Symposium on Dynamics and Control of Process Systems, including Biosystems 12th IFACSystems, Symposium on Dynamics Control of Florianópolis - SC,including Brazil, April 23-26,and 2019 Process Biosystems Available online at www.sciencedirect.com Florianópolis SC, Brazil, April 23-26, 2019 Process Systems, including Biosystems 12th IFAC Symposium on Dynamics and Control of Florianópolis - SC, Brazil, April 23-26, 2019 Florianópolis SC, Brazil, April 23-26, 2019 Process Systems, including Biosystems Florianópolis - SC, Brazil, April 23-26, 2019

ScienceDirect

IFAC PapersOnLine 52-1 (2019) 436–441

Recursive System Identification Using Recursive System System Identification Identification Using Using Recursive Recursive System Identification Outlier-Robust Local Models Outlier-Robust Local Models ModelsUsing Recursive System Identification Outlier-Robust Local Outlier-Robust Local ModelsUsing Jessyca A. A. Outlier-Robust Local Models Jessyca A. Bessa Bessa and and Guilherme Guilherme A. Barreto Barreto

Jessyca A. Bessa and Guilherme A. Barreto Jessyca A. Bessa and Guilherme A. Barreto Federal University of Cear´ a Graduate Program on Teleinformatics Jessyca A. Guilherme Federal University of Bessa Cear´ a,,, and Graduate ProgramA.on onBarreto Teleinformatics FederalEngineering, University of Cear´ a Graduate Program Teleinformatics Campus of Pici, Fortaleza, Cear´ a Brazil FederalEngineering, University ofCampus Cear´ a, of Graduate ProgramCear´ on Teleinformatics Pici, Fortaleza, a,,, Brazil Brazil Engineering, Campus of Pici, Fortaleza, Cear´ a e-mails: [email protected], [email protected] Campus of Pici, Fortaleza, Cear´ a, Brazil e-mails: [email protected], [email protected] FederalEngineering, University of Cear´ a , Graduate Program on Teleinformatics e-mails: [email protected], [email protected] e-mails: [email protected], [email protected] Engineering, Campus of Pici, Fortaleza, Cear´ a, Brazil e-mails: [email protected], [email protected] Abstract: In this paper we revisit the design of neural-network based local linear models for Abstract: In In this this paper paper we we revisit revisit the the design design of of neural-network neural-network based based local local linear linear models models for for Abstract: dynamic system identification aiming at extending their use to scenarios contaminated with Abstract: In thisidentification paper we revisit theatdesign of neural-network based local linear modelswith for dynamic system aiming extending their use to scenarios contaminated dynamic system identification aimingwell-known at extending their usemodels to scenarios contaminated with outliers. To this purpose, we modify local linear by replacing their original dynamic system identification aiming atdesign extending their usemodels to scenarios contaminated with outliers. To To this purpose, we revisit modify well-known local linear by replacing replacing their original Abstract: In this paper we the oflocal neural-network based local linear models for outliers. this purpose, we modify well-known linear models by their original recursive rules with outlier-robust variants developed from the M -estimation framework. The outliers. To thiswith purpose, we modify well-known localtheir linear models by replacing their original recursive rules outlier-robust variants developed from the M -estimation framework. The dynamic system identification aiming at extending use to scenarios contaminated with recursive rulesofwith outlier-robust variants developed from the M -estimation framework. The performances the proposed variants are evaluated in free simulation tasks over 3 benchmarking recursive rules outlier-robust variants developed from the M -estimation The performances ofwith the proposed variants are evaluated evaluated in free free simulation tasks over framework. benchmarking outliers. To this purpose, we variants modify well-known local linear models by replacing their original performances of the proposed are in simulation tasks over 33performance benchmarking datasets. The obtained results corroborate the considerable improvement in the of performances of the proposed variants are evaluated in free simulation tasksinover 3performance benchmarking datasets. The obtained results corroborate the considerable improvement the of recursive rules with outlier-robust variants developed from the M -estimation framework. The datasets. The models obtained results corroborate the considerable improvement in the performance of the proposed in the presence of outliers. datasets. The models obtained results corroborate the considerable improvement inover the3performance of the proposed in the presence of outliers. performances of the proposed variants are evaluated in free simulation tasks benchmarking the proposed models in the presence of outliers. the proposed models in results the presence outliers. datasets. The(International obtained corroborate the considerable improvement of © 2019, IFAC Federation ofofAutomatic Control) Hosting by Elsevier in Ltd.the Allperformance rights reserved. Keywords: System identification, neural networks, local local linear linear models, models, outliers, outliers, M -estimation. the proposed models in the presence of outliers. Keywords: System identification, neural networks, M -estimation. Keywords: System identification, neural networks, local linear models, outliers, M -estimation. Keywords: System identification, neural networks, local linear models, outliers, M -estimation. 1. INTRODUCTION the evaluated models. For this sake, comprehensive Keywords: System identification, neural networks,of linear models, outliers, M -estimation. 1. INTRODUCTION INTRODUCTION oflocal the evaluated evaluated models. For this this sake, aaa comprehensive comprehensive 1. of the models. For sake, performance comparison is carried out using three datasets 1. INTRODUCTION of the evaluated models.isFor this out sake, a comprehensive performance comparison carried using three datasets datasets performance comparison is carried out using three Dynamical System identification is a very challenging recommonly used for benchmarking purposes in the field of Dynamical System identification is a very challenging reperformance comparison is carried out using three datasets 1. INTRODUCTION commonly used for benchmarking purposes in the field of of of the evaluated models. For this sake, a comprehensive Dynamical System identification is aavery challenging re- commonly used for benchmarking purposes in the field search area, whose goal is to build useful model from system identification. Dynamical System identification is a very challenging research area, whose goal is to build a useful model from commonly used for benchmarking purposes in the field of system identification. performance comparison is carried out using three datasets search area, whose goal is to build aoberg usefuletmodel from system identification. the observed input-output data (Sj¨ al., 1995). search area,System whose goal is to build useful from the observed observed input-output data (Sj¨ berg etmodel al., 1995). 1995). Dynamical identification is(Sj¨ aaoovery challenging re- commonly system identification. usedoffor benchmarking purposes in the field of The remainder the paper is organized as follows. In Secthe input-output data berg et al., For this purpose, there are several design and modeling The remainder remainder of of the the paper paper is is organized organized as as follows. follows. In In SecSecthe observed input-output data (Sj¨ oberg etmodel al., 1995). For this purpose, there are several design and modeling search area, whose goal is to build a useful from The system identification. tion 2 we describe the three local models evaluated in this For this purpose, there are several select designa suitable and modeling methodologies to describe a system, strucof the organized follows. In tion remainder we describe describe thepaper three islocal local modelsasevaluated evaluated in Secthis For this purpose, there are several design and modeling methodologies to describe describe system, select suitable struc- The the observed input-output data (Sj¨ obergaa suitable et al., 1995). tion 22 we the three models in this article. The fundamentals the robust framework of M methodologies to aa system, select structure for the model and then estimate its parameters. The tion 2 we describe thepaper threeof local modelsas evaluated in this article. The fundamentals of the robust framework of M --The remainder of the is organized follows. In Secmethodologies to describe a system, select a suitable structure for the model and then estimate its parameters. The For this purpose, there are several design and modeling article. The fundamentals of the robust framework of M estimation is briefly described in Section 3. Experiments ture forapproach, the modelforand then estimate its parameters. The article. global example, is based on the assumption The theinrobust of this Mestimation isfundamentals briefly described Sectionframework 3. Experiments Experiments 2 we describe thedescribed threeoflocal evaluated in ture the model and then estimate its The tion globalforapproach, approach, for example, is based based onparameters. the assumption methodologies to for describe a system, select athe suitable strucestimation is briefly in models Section 3. are described and the results are reported in Section 4. global example, is on assumption that all the parameters of the approximating model (e.g. estimation is briefly described in Section 3. Experiments are described and the results are reported in Section 4.article. The fundamentals of the robust framework of M global forand example, isapproximating basedits onparameters. the assumption that for allapproach, the model parameters of the the model (e.g. (e.g. ture then estimate The are described and the results are 5. reported in Section 4. The paper is concluded in Section that all the parameters of approximating model a neural network) would be estimated using the entire are described and the results are reported in Section 4. The paper is concluded in Section 5. estimation is briefly described in Section 3. Experiments that allapproach, the parameters of be the estimated model (e.g. The paper is concluded in Section 5. neural network) would using the entire entire global forwould example, isapproximating based onusing the assumption aa neural network) be estimated the (Narendra and Parthasarathy, 1990). not The2.described paper is concluded in Section and the results are 5. reported in Section 4. adata neural network) would be estimated usingThis the is entire data (Narendra and Parthasarathy, 1990). This is not are that all the parameters of the approximating model (e.g. ANN-BASED LOCAL MODEL APPROACHES data (Narendra and Parthasarathy, 1990). This isnumnot 2. ANN-BASED LOCAL MODEL APPROACHES always computationally feasible, especially if a large The paper is concluded in Section 5. 2. ANN-BASED LOCAL MODEL APPROACHES (Narendra and Parthasarathy, 1990). not always computationally feasible, especially if aaThis large numadata neural network) would be estimated using the isentire 2. ANN-BASED LOCAL MODEL APPROACHES always computationally feasible, especially if large number of observations are available. An alternative is to use always computationally feasible, especially if aThis large numIn this section, we describe three ANN-based local linear ber of observations are available. An alternative is to use data (Narendra and Parthasarathy, 1990). is not In this this section, we we describe describe three ANN-based local linear linear ber of observations are available.are An alternative is toonly use In 2. ANN-BASED LOCALthree MODEL APPROACHES local whose parameters using section, ANN-based local ber ofmodels observations are available. Anestimated alternative is tonumuse In models for system identification: the local linear map local models whose parameters are estimated using only always computationally feasible, especially if a large this section, we describe three ANN-based local linear models for system system identification: the local local linear linear map local modelsofwhose parameters are Tanaka, estimated1999). usingLocal only models aa partition the data (Wang and for identification: the map local models whose parameters are estimated using only (LLM) (Walter et al., 1990), the radial basis function partition of the data (Wang and Tanaka, 1999). Local ber of observations are available. An alternative is to use models for system identification: the local linear map (LLM) (Walter et al., 1990), the radial basis function this section, we describe three ANN-based local linear amodeling partitionhas of the datahistory (Wangofand Tanaka, 1999). Local In a long contributions in system (LLM) (Walter et al., 1990), the radial basis function amodeling partition ofwhose the data (Wangofand Tanaka, Local network (RBFN) (Chen et al., 1990; Yan et al., 2000) and has long history contributions in system system local models parameters estimated1999). using only models (LLM) et al., 1990), the the radial function network (Walter (RBFN) (Chen et al., al., 1990; 1990; Yanlocal etbasis al.,linear 2000) and for system identification: map modeling has aa long history of are contributions in identification, with special contributions originated within network (RBFN) (Chen et Yan et al., 2000) and has a long history ofand contributions in system the local model network (LMN) (Belz et al., 2017). For all identification, with special contributions originated within amodeling partition of the data (Wang Tanaka, 1999). Local network (RBFN) (Chen et al., 1990; Yan etbasis al., 2000) and the local model network (LMN) (Belz et al., 2017). For all (LLM) (Walter et al., 1990), the radial function identification, with special contributions originated within the fields of neural networks (Moshou and Ramon, 1997) the local model network (LMN) (Belz eteach al., 2017). For all identification, with special contributions originated within these approaches we assume that that input vector the fields of neural networks (Moshou and Ramon, 1997) modeling has a long history of contributions in system the local model network (LMN) (Belz eteach al., 2017). For all these approaches we assume that that input vector (RBFN) (Chen et al., 1990; Yan et al., 2000) and the fields ofmodeling neural networks (Moshou and Ramon, 1997) network and Fuzzy (Takagi and Sugeno, 1985). Recent p+q these approaches we assumeasthat that each input vector the neural networks (Moshou and Ramon, 1997) these x(t) ∈ R and fields Fuzzyofmodeling modeling (Takagi and Sugeno, Sugeno, 1985). Recent Recent identification, with special contributions originated within p+q will be wedefined assume thateteach input For vector x(t)local ∈approaches Rp+q willnetwork be defined asthat the model (LMN) (Belz al., 2017). all and Fuzzy (Takagi and 1985). applications are still being reported in specialized conferx(t) ∈ R will be defined as T and Fuzzyofmodeling (Takagi and Sugeno, 1985). Recent applications are still being reported in specialized conferthe fields neural networks (Moshou and Ramon, 1997) p+q ,, (1) x(t) = [u(t − 1), . . . , u(t − p), y(t − 1), . . . , y(t − q)] x(t) ∈ R will be defined as T these approaches we assume that that each input vector applications are still being reported in specialized conferences and journals (Belz et al., 2017; M¨ u nker et al., 2017; (1) x(t) = [u(t − 1), . . . , u(t − p), y(t − 1), . . . , y(t − q)] T applications are still being inM¨ specialized conferencesFuzzy and journals journals (Belz et reported al.,and 2017; unker nker et al., al.,Recent 2017; x(t) and modeling (Takagi Sugeno, 1985). p+q − 1), . . . , u(t − p), y(t − 1), . . . , y(t − q)]T , (1) x(t)∈=x(t) R[u(t will asvector ences and (Belz et al., 2017; M¨ u et 2017; Barreto and Souza, 2016; et al., 2015). the of regressors and + ,p x(t) =x(t) [u(tis 1),be . .called .defined , u(t − y(t − 1), . . . , y(t − q)] ences and journals (Belz etCosta al., 2017; M¨ unker et al.,confer2017; where Barreto and Souza, 2016; Costa et al., 2015). where is−also also called thep),vector vector of regressors regressors and p(1) + applications are still being reported in specialized T p so Barreto and Souza, 2016; Costa et al., 2015). where x(t) is also called the of and + q = n. Also, y(t) = f (x(t)) is the observed output, x(t) =x(t) [u(t 1), . .called . , u(t − y(t − 1), . . . , y(t − q)] Barreto and Souza, 2016;etCosta et al., 2015). is−also thep),vector ofobserved regressors and, p(1) + = n. Also, y(t) = f (x(t)) is the output, so ences and journals (Belz al., 2017; M¨ unker et al.,global 2017; qqwhere Regardless of the structure of the model (either n = n. Also, y(t) f= : fR(x(t)) is the observedand output, so Regardless of of the the structure structure of of the the model model (either (either global global qthat is unknown will be the function n → R = n. Also, y(t) called = : fR(x(t)) is the observed output, so Regardless → R is unknown and will be that the function f Barreto and Souza, 2016; Costa et al., 2015). where x(t) is also vector of regressors and p + or local), it is desirable that the approximating model nthe → Rlinear is unknown and will be the function f : Rn local Regardless structure the approximating model (either model global that or local), local), it itofis isthe desirable thatof the the approximated by multiple models. → R is unknown and will be theAlso, function f= : fR(x(t)) or desirable that approximating model qthat approximated by multiple local linear models. = n. y(t) is the observed output, so be capable of dealing with abnormal samples, commonly by multiple local linear models. or desirable model be local), capableitof ofisdealing dealing withthat abnormal samples, commonly Regardless the structure of the the approximating model (either global approximated approximated by multiple models. be capable with abnormal samples, commonly → Rlinear is the unknown andfunction will be that the function f approximates : Rn local called outliers, in order to avoid biased/unstable responses. The LLM network unknown be capable ofisdealing with abnormal samples, commonly called outliers, in order to avoid biased/unstable responses. The LLM network approximates the unknown unknown function function or local), it desirable that the approximating model called outliers, in order to avoid biased/unstable responses. The LLM network approximates the approximated by multiple local linear models. Bearing this in mind, in this paper we revisit the design of with a set of linear filters, each of which is constrained to called outliers, in orderin tothis avoid biased/unstable responses. approximates the unknown function Bearing thisofin indealing mind, paper we samples, revisit the thecommonly design of of The with aaLLM set of ofnetwork linear filters, filters, each of of which which is constrained constrained to aa a be capable with abnormal n to Bearing this mind, in thislinear paper we revisit design with set linear each is neural network based local models for identification non-overlapping local partition the input space R n . Each Bearing this in mind, in this paper we revisit the design of with a set of linear filters, each of which is constrained neural network based local linear models for identification non-overlapping local partition the input space R . Each called outliers, in order to avoid biased/unstable responses. The LLM network approximates the unknown function n to a neural network based Our local ultimate linear models for identification non-overlapping local partition of the input space Rn .coeffiEach of dynamic systems. goal is extend the partition is associated with a prototype vector and neural network based in local linear forto non-overlapping local partition the input space Each of dynamic dynamic systems. Our ultimate goal is toidentification extend the partition isofassociated associated witheach prototype vector andRaa a .coefficoeffiBearing this systems. in mind, this papermodels we revisit the designthe of partition with a setis linear filters, of which is constrained to a of Our ultimate goal is to extend with aainput prototype vector and recursive learning rules of such local models aiming at cient vector. The continuous space is partitioned by a n of dynamic systems. Our ultimate goalmodels is toidentification extend the partition is associated with ainput prototype vector andRa .coeffirecursive learning rules of such local aiming at cient vector. The continuous space is partitioned by neural network based local linear models for non-overlapping local partition of the input space Each recursive learning rules of suchinlocal modelswith aiming at cient vector. Theof continuous input space is partitioned by aa improving their performances scenarios strong reduced number prototype vectors, while the coefficients recursive learning rules suchinlocal aiming at partition cient vector. Theof continuous space is partitioned by a improving their performances scenarios with strong reduced number prototype vectors, while theand coefficients of dynamic systems. Ourofultimate goalmodels is towith extend the is associated with ainput prototype vector a coeffiimproving their performances in scenarios strong reduced number of prototype vectors, while the coefficients occurrence of outliers in the data. For this purpose, we of the linear filter associated to each prototype vector improving their performances inlocal scenarios with strong reduced number of prototype vectors, while the coefficients occurrence of outliers in the data. For this purpose, we of the linear filter associated to each prototype vector recursive learning rules of such models aiming at cient vector. The continuous input space is partitioned by a occurrence of outliersused in the data. For this purpose, we of the linear filter associated to output each prototype vector selected three local linear and modify aa local estimator of the of the mapping. occurrence of widely outliers in the data. Formodels this purpose, we provides of the linear filter associated to output each prototype vector selected three widely used local linear models and modify provides local estimator of the of the mapping. improving their performances in scenarios with strong reduced number of prototype vectors, while the coefficients selected three widely used local linear models and modify provides a local the estimator of the χoutput of the mapping. their adaptive learning rules with the help of mechanisms More formally, input space is partitioned via the selected three used local linear models and modify provides a local estimator of the of the mapping. their adaptive adaptive learning rules with theFor help of purpose, mechanisms More formally, the input space χoutput is partitioned partitioned via the occurrence of widely outliers in the data. this we More of theformally, linear filter associated toχ each prototype vector their learning rules with the help of mechanisms the input space is via the grounded in the robust statistical framework known as M self-organizing map (SOM) algorithm (Kohonen, 2013), their adaptive learning rules with the help ofknown mechanisms More formally, the input space χoutput is partitioned via the grounded in the robust statistical framework as M self-organizing map (SOM) algorithm (Kohonen, 2013), selected three widely used local linear models and modify provides a local estimator of the of the mapping. grounded in the robust statistical framework known as M - self-organizing map (SOM) aalgorithm (Kohonen, estimation 1964). The main goal here is to validate each neuron jj owning prototype vector w where j ,, 2013), grounded in(Huber, thelearning robust statistical framework known as M - with self-organizing map (SOM) algorithm (Kohonen, 2013), estimation (Huber, 1964). The main goal here is to validate with each neuron owning a prototype vector w where their adaptive rules with the help of mechanisms More formally, the input space χ is partitioned via the j estimation (Huber, 1964). The main goal here is to validate with each neuron j owning a the prototype vector wj , where our hypothesis that the use of learning rules on jj = 1, .. .. .. ,, S, with S denoting number of neurons of the estimation (Huber, 1964). The main goal here is tobased validate with each neuron j owning aalgorithm prototype vector wj , 2013), where our hypothesis that the use of learning rules based on = 1, S, with S denoting the number of neurons of the grounded in the robust statistical framework known as M self-organizing map (SOM) (Kohonen, our hypothesisconsiderably that the use of learning rules based on jSOM. = 1, .The . . , S,linear with Sfilter denoting the number ofj-th neurons of the M -estimators increase the outlier-robustness associated to the prototype our hypothesis that1964). the The use of learning rules based on with jSOM. = 1,each .The . . , S, with S denoting the number ofj-th neurons of the M -estimators considerably increase the outlier-robustness linear filter associated to the prototype estimation (Huber, main goal here is to validate neuron j owning a prototype vector w , where j M -estimators considerably increase the outlier-robustness SOM. The linear filter associated to the j-th prototype M -estimators increase the outlier-robustness associated to theofj-th prototype our hypothesisconsiderably that the use of learning rules based on jSOM. = 1, .The . . , S,linear with Sfilter denoting the number neurons of the Copyright © 2019, 2019 IFAC 436Hosting M -estimators increase the outlier-robustness The linear associated 2405-8963 © IFAC (International Federation of Automatic Control) Elsevier Ltd.filter All rights reserved. to the j-th prototype Copyright © 2019considerably IFAC 436 SOM. by

Copyright 2019 responsibility IFAC 436Control. Peer review© of International Federation of Automatic Copyright ©under 2019 IFAC 436 10.1016/j.ifacol.2019.06.101 Copyright © 2019 IFAC 436

2019 IFAC DYCOPS Florianópolis - SC, Brazil, April 23-26, 2019Jessyca A. Bessa et al. / IFAC PapersOnLine 52-1 (2019) 436–441

vector wj is defined by a coefficient vector aj ∈ Rp+q , which plays the role of the coefficients of the j-th local ARX model: aj (t) = [aj,1 , . . . , aj,p , . . . , aj,p+q ]T . (2) Thus, the adjustable parameters of the LLM model are the set of prototype vectors wj and their coefficients vectors aj , for j = 1, . . . , S. Given the winner-take-all nature of the SOM, only one neuron per iteration is used to estimate the output of the LLM model. The index of the winning neuron at time t is obtained as follows: j ∗ (t) = arg min x(t) − wj (t)2 , (3) ∀j

where  ·  denotes the Euclidean norm. The estimate of the output of the LLM is then computed as yˆ(t) = yj ∗ (t) = aTj∗ (t)x(t), (4) where aj ∗ (t) is the coefficient vector of the linear filter associated with the current winning neuron j ∗ (t) and it is used to build a local estimate of the output. The learning rule for the prototype vectors wj is that of the usual SOM algorithm: wj (t + 1) = wj (t) + α(t)h(j ∗ , j; t)[x(t) − wj (t)], (5) while the learning rule of the coefficient vectors aj (t) is given by aj (t + 1) = aj (t) + α (t)h(j ∗ , j; t)∆aj (t), (6) where 0 < α, α  1 are, respectively, the learning rates of the weight and coefficient vectors. The correction term ∆aj (t) is computed by means of a variant of the normalized LMS algorithm (Widrow, 2005): ∆aj (t) = ej (t)

x(t) x(t) = [y(t) − aTj (t)x(t)] (7) x(t)2 x(t)2

437

In the current paper, we are interested in identifying a MISO system, thus we assume only one output neuron 1 . Hence, the estimate of the output of the RBFN is then computed as S  wj (t)zj (t), (8) yˆ(t) = wT (t)z(t) = j=1

T

where w = [w1 w2 · · · wS ] is the weight vector of the output neuron e z = [z1 z2 · · · zS ]T is the vector of basis function activations. The LMS-rule is used to adapt the output weights as w(t + 1) = w(t) + α(t)e(t)z(t),

(9)

= w(t) + α(t)[y(t) − yˆ(t)]z(t), where e(t) = y(t) − yˆ(t) is the model’s prediction error. The NARX-RBFN model can be understood as an ANNbased implementation of a zero-order Takagi-Sugeno (TS) model (Takagi and Sugeno, 1985). The LMN was introduced by Johansen and Foss (1992) as a generalization of the RBF network. It can be viewed as implementing a decomposition of the complex, nonlinear system into a set of locally accurate submodels which are then smoothly integrated by associated basis functions. This means that a smaller number of local models can cover larger ares of the input space, when compared with the simple RBFN model. In LMN, the output estimation of the RBF model in Eq. (8) is extended to involve not only a constant weight associated with each basis function, but instead a function fj (x; wj ) associated with each basis function: S  yˆ(t) = zj (t)fj (x; wj ), (10) j=1

where ej (t) is the prediction error of the j-th local model and y(t) is the actual observed output.

where a common choice for fj (x; wj ) is the multiple linear regression function: fj (x; wj ) = wjT x.

The RBFN is a classical feedforward neural network architecture with a single hidden layer of neurons (Chen et al., 1990). Hidden neurons have nonlinear activation functions, hereafter referred to as radial basis functions, while output neurons use linear ones. In RBFNs, the j-th basis function is comprised of two elements: a distance metric dj (x) = dj (x; cj ) and the basis function itself zj = φ(dj (x)), where cj denotes the center of the j-th function. An Euclidean distance metric, dj (x) = x − cj , and a Gaussian basis function zj = exp{−d2j /2σj2 } are common choices, with σ denoting the radius (or width) of the basis function.

Like the NARX-RBF, the NARX-LMN model can be interpreted as an ANN-based implementation of the TS model. In the NARK-LMN model, however, the radial basis functions correspond to the rules in the TS model, while the local function fj (x; wj ) = wjT x is used in the rules’ consequents. LMN is still a common choice for ANNbased system identification and control (K¨onig et al., 2014; Costa et al., 2015; M¨ unker et al., 2017; Belz et al., 2017; Maier et al., 2018).

The design of the RBF network basically involves the specification of the number S of basic functions, determination of their parameters (cj , σj ), j = 1, . . . , S, and computation of the output weights. For this purpose, we follow the approach introduced by Moody and Darken (1989), which consists in three stages sequentially executed: (i) the positions of the S centers are found by means of a vector quantization algorithm, such as the SOM network. (ii) Heuristics are used for specifying the radius σj for the S basis functions. In this paper, σj is computed as half of the distance between the center cj and the nearest center. (iii) Computation of the output weights by means of the LMS learning rule (a.k.a. Widrow-Hoff rule). 437

3. OUTLIER-ROBUSTNESS VIA M -ESTIMATION In real-world applications of recursive estimation, the available data are often contaminated with outliers, which are roughly defined as observations differing markedly from those usually expected in the task of interest. Robust regression techniques, such as those developed within the framework of M -estimation (Huber, 1964), seek to develop estimators that are more resilient to outliers. In this context, Zou et al. (2000) and Chan and Zhou (2010) introduced modifications to the standard LMS rule in order to robustify it for outlier-rich data. The proposed method was named least mean M -estimate (LMM) and 1 However, the proposed approach can be easily extended for handling MIMO systems

2019 IFAC DYCOPS 438 Florianópolis - SC, Brazil, April 23-26, 2019Jessyca A. Bessa et al. / IFAC PapersOnLine 52-1 (2019) 436–441

·10−2

Table 1. Summary of the evaluated datasets.

LMN-LMS

Evaluated Datasets Estimation Samples

Test Samples

ˆu L

ˆy L

Dryer pH Silverbox

500 200 91072

500 800 40000

5 5 10

5 5 10

4 RMSE

Dataset

one of its main virtues is simplicity, in the sense that no extra computational burden is added to the parameter estimation process. As a consequence, the robust LMM rule works as fast as the original non-robust LMS rule. As an example, let us take the LMS rule shown in Eq. (9), the equivalent LMM rule is simply written as w(t + 1) = w(t) + α(t)q(e(t))e(t)z(t),

LMN-LMM

3 2 1

0%

5%

10% 15% Outliers

20%

(11)

Fig. 1. RMSE values for validation data as a function of the amount of outliers in training data (Dryer dataset).

where q(e(t)) is a scalar function that penalizes high values of the errors e(t) (usually caused by outliers). Similar change is introduced in the LMS-based rule of the other local models discussed previously.

Silverbox datasets are summarized in Table 1. All models are by the root mean square error: RM SE =  evaluated N 2 1 t=1 e (t). It must be pointed out that our goal in N the experiments is not to find the best performing model for each dataset, but rather, to confirm the improvement in performance of all models by the use of the outlier-robust LLM rule. Thus, for each dataset we report the results for only one of the local models described in Section 2.

= w(t) + α(t)q(e(t))[y(t) − yˆ(t)]z(t)

In this paper, we will use q(e(t)) as the Huber function:  κ , se |e(t)| > κ, q(e(t)) = |e(t)| (12) 1, otherwise. where the constant κ > 0 is a user-defined threshold. When |e(t)| is less than κ, the weight function q(e(t)) is equal to 1 and Eq. (12) reduces to the LMS rule. When |e(t)| is greater than κ, q(e(t)) decreases exponentially to zero as |e(t)| → ∞. This way, the LMM rule effectively reduces the effect of large errors, usually caused by outliers. It is recommended to use κ = 1.345σ, where σ corresponds to the standard deviation of the residuals. As mentioned in the introduction, the main goal of this paper is to replace the original LMS rule used by the local linear models described in Section 2 with the outlierrobust LMM rule. We hypothesize that this replacement provides greater resilience to outliers to the local models. The computational experiments testing this hypothesis and the obtained results are presented in the next section. 4. RESULTS AND DISCUSSION In this section, we report the results of a comprehensive performance comparison of the local models described in Section 2. Six models are implemented from scratch using Octave 2 : the three original versions (LLM/LMS, RBF/LMS and LMN/LMS) and the three proposed outlier-robust variants (LLM/LMM, RBF/LMM and LMN/LMM). We first report results on two benchmarking datasets (Dryer and pH ), which are publicly available for download from the DaISy repository website at the Katholieke Universiteit Leuven 3 . Additionally, we evaluate all aforementioned models on a large-scale dataset. For this matter, we choose the Silverbox dataset, which was introduced in Schoukens et al. (2003). Some important features of Dryer, pH and 2 3

www.gnu.org/software/octave/ http://homes.esat.kuleuven.be/~smc/daisy/

438

Training was carried out using one-step-ahead prediction mode, a.k.a as series-parallel training mode (Ribeiro and Aguirre, 2018). Outliers were artificially introduced in training data only in different proportions. For this purpose, we followed the same procedure used by Mattos et al. (2017). The reported results correspond to post-training evaluation of the 6 models in outlier-free scenarios. The rationale for this approach is to assess how the outliers affect the parameter estimation process (training) and its impact in the model validation (testing) phase. Trained models were tested under the free simulation regime, in which predicted output values were fed back to the input regression vector. During testing, the parameters of the models were not updated. We defined 200 epochs for training all models. The number of neurons was set to S = 5 after some initial experimentation with the data. Higher values of S did not improve significantly the results, while smaller values led to poorer performances for all evaluated models. The initial and final learning rates were defined as α0 = 0.5 and αT = 0.001. For the Dryer dataset, we report the results of the LMN models only. This dataset corresponds to a mechanical SISO system from a laboratory setup acting like a hair dryer. In this system, the air temperature yn is measured by a thermocouple at the output. The input un is the voltage over the heating device. From the 1000 available samples of un and yn , we use the first half for model building and the other half for model evaluation. The RMSE values as a function of the percentage of outliers for the Dryer dataset are shown in Figure 1 and Table 2. One can easily see the improvement in performance of the LMN model due to the replacement of the original LMS rule by the outlier-robust LMM rule. While the performance of the original LMN-LMS model deteriorates with the increase in the number of outliers in training data, the performance

2019 IFAC DYCOPS Florianópolis - SC, Brazil, April 23-26, 2019Jessyca A. Bessa et al. / IFAC PapersOnLine 52-1 (2019) 436–441

6 output

output

6

439

5

4

5

4

0

100

200 300 YREAL step

400

500

YEST IM AT ION

0

100

200 300 YREAL step

400

500

YEST IM AT ION

Fig. 2. Predictive performance of the LMN-LMS (L) and LMN-LMM (R) for 15% of outliers (Dryer dataset). Table 2. RMSE results for the LMN model on the Dryer dataset. 0% outliers

15% outliers

Models

RMSE

STD

RMSE

STD

LMN-LMS (original) LMN-LMM (robust)

1.66E-02 1.76E-02

3.75E-03 1.52E-03

2.68E-02 1.26E-02

1.63E-03 3.13E-03

Table 3. RMSE results for the RBF models on the pH dataset. 0% outliers

15% outliers

Models

RMSE

STD

RMSE

STD

RBF-LMS (original) RBF-LMM (robust)

5.96E-01 5.90E-01

1.21E-03 2.97E-04

6.53E-01 5.88E-01

4.50E-03 1.70E-03

RMSE

RBF-LMM

0.65

0.6

0%

5% 10% Outliers

A numerical comparison in terms of RMSE values of the two variants of the RBF model for the pH dataset is shown in Table 3 and Figure 3. Two scenarios are tested: the outlier-free scenario and one with 15% of outlier contamination. We observe that while the RMSE values are similar for the outlier-free scenario, the robust RBF model achieved lower RMSE values in comparison to the original RBF model for the outlier-contaminated scenario. We also observe in Figure 3 that the deterioration in performance with the increase in contamination levels is much smaller for the RBF-LMM model. A typical example of the prediction results of the RBF models under free simulation regime for the pH dataset is shown in Figure 4. As can be seen, both models were able to capture the system dynamics; however, the robust RBF-LLM model was much less influenced by the presence of outliers in the training data.

RBF-LMS

0.7

For the pH dataset, we evaluate RBF models only. The data comes from a pH neutralization process in a constant volume stirring tank. The control input is the base solution flow and the output is the pH value of the solution in the tank. We use the first 200 samples for model building and parameter estimation and the next 800 samples for model validation (testing).

15%

Fig. 3. RMSE values for validation data as a function of the amount of outliers in training data (pH dataset). of the robust LMN-LMM model is practically insensitive to the presence of outliers. We can exemplify the improvement in performance for the Dryer dataset with a typical prediction result of the LMN model, as shown in Figure 2. The improvement in performance of the LMN model due to the replacement of the original LMS rule by the outlier-robust LMM rule is easily observed. 439

As a final experiment, we decided to evaluate the performance of the LLM models on a large-scale dataset. To this sake, we selected the Silverbox dataset (Schoukens et al., 2003). Obtained from an electrical circuit simulating a mass-spring damper system, it corresponds to a nonlinear dynamical system with feedback, with a dominant linear behavior. This dataset contains a total of 131,072 samples of each sequence un and yn , where the first 40,000 samples of un and yn were used for model building and the remaining 91,072 for model validation. Since the Silverbox dataset is too long, training is not repeated for several epochs. In other words, the model is used in a fully online mode with recursive parameter estimation, in which only one pass through the data is enough for the model to converge. A numerical comparison in terms of RMSE values of the two variants of the LLM model for the Silverbox dataset is shown in Table 4 and Figure 5. Two scenarios were tested: the outlier-free scenario and one with 15% of outlier contamination. It can be easily observed that

2019 IFAC DYCOPS 440 Florianópolis - SC, Brazil, April 23-26, 2019Jessyca A. Bessa et al. / IFAC PapersOnLine 52-1 (2019) 436–441

10

10 output

12

output

12

8

8

6

6

4

4 0

200

400 step

600

800

0

200

400 step

600

800

Fig. 4. Predictive performance of the RBF-LMS (L) and RBF-LMM (R) for 15% of outliers (pH dataset). Table 4. RMSE results for the LLM models on the Silverbox dataset. 0% outliers

15% outliers

Models

RMSE

STD

RMSE

STD

LLM-LMS (original) LLM-LMM (robust)

2.05E-05 1.21E-05

2.63E-06 1.68E-07

3.40E-05 1.93E-05

4.05E-06 6.43E-07

·10−5

The results of a comprehensive comparative study with benchmarking datasets revealed that the simple replacement of the LMS-rule by the outlier-robust LLM rule provided a considerable improvement in the performance of the local models evaluated in outlier-contaminated scenarios. This study also showed a considerable improvement for all the robust versions of the local models evaluated and that the improvement was even more evident when the proportion of outliers in the training set increased.

LLM-LMS LLM-LMM

RMSE

3

2

1

0%

5% 10% Outliers

the LMS-rule, in the recursive estimation of its parameters. The main goal of the article was firstly to evaluate how badly outliers affect the predictive performance of the approximating local models and secondly, if the replacement of the original LMS rule by an outlier-robust version, called the LMM rule (Zou et al., 2000), was capable of somehow improving the performances of the local models by providing them with higher resilience to outliers.

Currently, we are evaluating the performance of growing robust local models; that is, models in which the prior specification of the number of neurons is not necessary because neurons are being allocated over time only when required.

15%

Fig. 5. RMSE values for validation data as a function of the amount of outliers in training data (Silverbox). the deterioration of the performance with the increase in contamination levels is much smaller for the robust LLMLMM model. A typical example of the prediction results of the LLM models under free simulation regime for the Silverbox dataset is shown in Figure 6. For better visualization of the results, we show only the first 200 output samples of the validation data. As expected, the improvement in performance of the LLM model due to the replacement of the original LMS rule by the outlier-robust LMM rule is clearly observed. 5. CONCLUSIONS In this paper, we revisited a class of ANN-based identification models known as local models and addressed the issue of how outliers affect the parameter estimation (i.e. learning) of the approximating model. The evaluated local models use the well-known Widrow-Hoff rule, also called 440

ACKNOWLEDGEMENTS This study was financed in part by the Coordena¸c˜ ao de Aperfei¸coamento de Pessoal de N´ıvel Superior - Brasil (CAPES) - Finance Code 001. The authors also thanks IFCE (Campus of Maranguape) and CNPq (grant no. 309451/2015-9) for supporting this research. REFERENCES Barreto, G.A. and Souza, L.G.M. (2016). Novel approaches for parameter estimation of local linear models for dynamical system identification. Applied Intelligence, 44(1), 149–165. Belz, J., M¨ unker, T., Heinz, T.O., Kampmann, G., and Nelles, O. (2017). Automatic modeling with local model networks for benchmark processes. IFACPapersOnLine, 50(1), 470–475. Chan, S.C. and Zhou, Y. (2010). On the performance analysis of the least mean M-estimate and normalized least mean M-estimate algorithms with gaussian inputs and additive gaussian and contaminated Gaussian noises. Journal of Signal Processing Systems, 60(1), 81–103.

2019 IFAC DYCOPS Florianópolis - SC, Brazil, April 23-26, 2019Jessyca A. Bessa et al. / IFAC PapersOnLine 52-1 (2019) 436–441

·10−2

1

1

0.5

0.5 output

output

·10−2

441

0

−0.5

0

−0.5 0

50

100 YREAL step

150

200

0

YEST IM AT ION

50

100 YREAL step

150

200

YEST IM AT ION

Fig. 6. Predictive performance of the LLM-LMS (L) and LLM-LMM (R) for 15% of outliers (Silverbox dataset). Chen, S., Billings, S.A., Cowan, C.F.N., and Grant, P.M. (1990). Non-linear systems identification using radial basis functions. International Journal of Systems Science, 21(12), 2513–2539. Costa, T.V., Fileti, A.M.F., Oliveira-Lopes, L.C., and Silva, F.V. (2015). Experimental assessment and design of multiple model predictive control based on local model networks for industrial processes. Evolving Systems, 6(4), 243–253. Huber, P. (1964). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1), 73–101. Johansen, T.A. and Foss, B.A. (1992). A NARMAX model representation for adaptive control based on local models. Modeling, Identification and Control, 13(1), 25. Kohonen, T. (2013). Essentials of the self-organizing map. Neural Networks, 37, 52–65. K¨onig, O., Hametner, C., Prochart, G., and Jakubek, S. (2014). Battery emulation for power-HIL using local model networks and robust impedance control. IEEE Transactions on Industrial Electronics, 61(2), 943–955. Maier, C.C., Schirrer, A., and Kozek, M. (2018). Real-time capable nonlinear pantograph models using local model networks in state-space configuration. Mechatronics, 50, 292–302. Mattos, C.L.C., Dai, Z., Damianou, A., Barreto, G.A., and Lawrence, N.D. (2017). Deep recurrent gaussian processes for outlier-robust system identification. Journal of Process Control, 60, 82–94. Moody, J. and Darken, C.J. (1989). Fast learning in networks of locally-tuned processing units. Neural computation, 1(2), 281–294. Moshou, D. and Ramon, H. (1997). Extended selforganizing maps with local linear mappings for function approximation and system identification. In Proceedings of the 1st Workshop on Self-Organizing Maps (WSOM’97), 1–6. M¨ unker, T., Heinz, T.O., and Nelles, O. (2017). Hierarchical model predictive control for local model networks. In Proceedings of the American Control Conference (ACC’2017), 5026–5031. 441

Narendra, K. and Parthasarathy, K. (1990). Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1(1), 4–27. Ribeiro, A.H. and Aguirre, L.A. (2018). Parallel training considered harmful?: Comparing series-parallel and parallel feedforward network training. Neurocomputing, 316, 222–231. Schoukens, J., Nemeth, J.G., Crama, P., Rolain, Y., and Pintelon, R. (2003). Fast approximate identification of nonlinear systems. Automatica, 39(7), 1267–1274. Sj¨oberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P.Y., Hjalmarsson, H., and Juditsky, A. (1995). Nonlinear black-box modeling in system identification: a unified overview. Automatica, 31(12), 1691–1724. Takagi, T. and Sugeno, M. (1985). Fuzzy identification of systems and its applications to modelling and control. IEEE Transactions on Systems, Man, and Cybernetics, 15(1), 116–132. Walter, J., Ritter, H., and Schulten, K. (1990). Nonlinear prediction with self-organizing maps. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN’90), 589–594. Wang, S. and Tanaka, M. (1999). Nonlinear system identification with piecewise-linear functions. IFAC Proceedings Volumes, 32(2), 3796–3801. Widrow, B. (2005). Thinking about thinking: The discovery of the LMS algorithm. IEEE Signal Processing Magazine, 22(1), 100–106. Yan, L., Sundararajan, N., and Saratchandran, P. (2000). Nonlinear system identification using Lyapunov based fully tuned dynamic RBF networks. Neural Processing Letters, 12(3), 291–303. Zou, Y., Chan, S.C., and Ng, T.S. (2000). Least mean M-estimate algorithms for robust adaptive filtering in impulse noise. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 47(12), 1564–1569.