Maximum Likelihood identification of Wiener-Hammerstein system with process noise

Maximum Likelihood identification of Wiener-Hammerstein system with process noise

Proceedings,18th IFAC Symposium System Identification July 9-11, 2018. Stockholm, Sweden on Proceedings,18th IFAC Proceedings,18th IFAC Symposium Symp...

560KB Sizes 0 Downloads 76 Views

Proceedings,18th IFAC Symposium System Identification July 9-11, 2018. Stockholm, Sweden on Proceedings,18th IFAC Proceedings,18th IFAC Symposium Symposium on System System Identification Identification July 9-11, 2018. Stockholm, Sweden on July 9-11, 2018. Stockholm, Sweden Available online at www.sciencedirect.com July 9-11, 2018. Stockholm, Sweden on System Identification Proceedings,18th IFAC Symposium July 9-11, 2018. Stockholm, Sweden

ScienceDirect

IFAC PapersOnLine 51-15 (2018) 401–406 Maximum Likelihood identification of Maximum Likelihood identification of Maximum Likelihood identification of Maximum Likelihood identification of Wiener-Hammerstein system with process Wiener-Hammerstein system with process Maximum Likelihood identification of Wiener-Hammerstein system Wiener-Hammerstein system with with process process noise noise Wiener-Hammerstein system with process noise noise ∗ Giuseppe Giordano oberg ∗∗ noise ∗ Jonas Sj¨ Giuseppe Giordano oberg ∗ ∗ Jonas Sj¨

Giuseppe o Giuseppe Giordano Giordano ∗ Jonas Jonas Sj¨ Sj¨ oberg berg ∗ Department of Electrical Engineering, Chalmers of ∗ ∗ Giuseppe Giordano Jonas Sj¨ obergUniversity Department ofTechnology, Electrical Engineering, Chalmers University of Gothenburg, Sweden. Department of Electrical Engineering, Chalmers University Department ofTechnology, Electrical Engineering, Chalmers University of of Gothenburg, Sweden. Technology, Gothenburg, Sweden. ∗ Gothenburg, Sweden. Department ofTechnology, Electrical Engineering, Chalmers University of Abstract: The Wiener-Hammerstein model is a block-oriented Technology, Gothenburg, Sweden. model consisting of two linear Abstract: The Wiener-Hammerstein model is aWe block-oriented consisting of twooflinear blocks and a static nonlinearity in the middle. address the model identification problem this Abstract: The Wiener-Hammerstein model is a block-oriented model consisting of Abstract: The Wiener-Hammerstein model is aWe block-oriented model consisting of two twooflinear linear blocks and a static nonlinearity in the middle. address the identification problem this model, when a disturbance affects the input of the non-linearity, i.e. process noise. For blocks and a static nonlinearity in middle. We address the identification problem of this blocks and a static nonlinearity in themodel middle. address the model identification problem oflinear this Abstract: The Wiener-Hammerstein isofaWe block-oriented consisting of two model, when a disturbance affects input the non-linearity, i.e. process noise. For case, Maximum Likelihoodaffects estimator is derived, delivers ai.e. consistent estimate of this the model, when a input of the non-linearity, process noise. model,a whena static a disturbance disturbance affects the middle. input of We thewhich non-linearity, process problem noise. For For blocks and nonlinearity in the address the identification of case, aparameters. Maximum Likelihood estimator is derived, which delivers ai.e. consistent estimate of this the model In the presence of process noise, in fact, a standard Prediction Error Method case, a Maximum Likelihood estimator is derived, which delivers a consistent estimate of the case, aparameters. Maximum Likelihood estimator is derived, which ai.e. consistent of this the model, when a disturbance affects input of the non-linearity, process estimate noise. For model In theresults. presence ofthe process noise, in fact, delivers aestimate standard Prediction Error Method normally leads to biased The Maximum Likelihood is then used together with model parameters. In the presence of process noise, in fact, a standard Prediction Error Method modelaparameters. In theresults. presence of process noise, in fact, delivers aestimate standard Prediction Error Method case, Maximum Likelihood estimator is derived, which a is consistent estimate ofwith the normally leads to biased The Maximum Likelihood then used together the Best Linear of theMaximum system, inLikelihood order to implement athen complete identification normally leads biased The used together with normally leads to toApproximation biased results. The Maximum Likelihood estimate isPrediction then used Error together with model parameters. In theresults. presence of process noise, in fact, aestimate standardis Method the Best Linear Approximation of the system, in order to implement a complete identification scheme theApproximation parametrization of linear in blocks known a aapriori. The identification computation the Linear the system, order tonot implement complete the Best Bestwhen Linear of thethe system, in orderis implement complete identification normally leadsthe toApproximation biased results. of The Maximum Likelihood estimate then used together with scheme when parametrization of the linear blocks istonot known ais priori. The computation of the likelihood function requires numerical integration, which is solved by Monte Carlo and scheme when the parametrization of the linear blocks is not known a priori. The computation scheme when theApproximation parametrization the linear blocks known a apriori. The computation the Best Linear of of the system, in orderistonot implement complete identification of the likelihood function requires numerical integration, which is solved by Monte Carlo and Metropolis-Hastings techniques. Numerical examples show which the effectiveness of Monte the identification of requires numerical integration, is by Carlo of the the likelihood likelihood function requires numerical integration, which is solved solved by Monte Carlo and and scheme when the function parametrization of the linear blocksshow is not known a priori. The computation Metropolis-Hastings techniques. Numerical examples the effectiveness of the identification scheme. Metropolis-Hastings techniques. Numerical examples show the of the Metropolis-Hastings techniques. Numerical examples show which the effectiveness effectiveness of Monte the identification identification of the likelihood function requires numerical integration, is solved by Carlo and scheme. scheme. scheme. Metropolis-Hastings techniques. Numerical examples show the effectiveness of the identification © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: systems, Maximum Likelihood, Best Linear Approximation, scheme. Non-linear Keywords: Non-linear systems, Maximum Likelihood, Best Linear Approximation, Wiener-Hammerstein, process noise Keywords: Maximum Keywords: Non-linear Non-linear systems, systems, Maximum Likelihood, Likelihood, Best Best Linear Linear Approximation, Approximation, Wiener-Hammerstein, process noise Wiener-Hammerstein, process noise Wiener-Hammerstein, process noise Keywords: Non-linear systems, Maximum Likelihood, Best Linear Approximation, 1. INTRODUCTION Westwick and Schoukens (2012), Schoukens et al. (2014), Wiener-Hammerstein, process noise 1. INTRODUCTION Westwick and Schoukens (2012), et al. Giordano and Sj¨oberg (2015). In Schoukens this case, the ML(2014), crite1. INTRODUCTION Westwick Schoukens (2012), Schoukens et (2014), 1. INTRODUCTION Westwick (2012), et al. al. Giordano and Schoukens Sj¨ oberg (2015). In Schoukens this case, the ML(2014), criterion simplifies into the Prediction Error Method (PEM) Block-oriented models represent a structured and useGiordano and Sj¨ o berg (2015). In this case, the ML criteGiordano and Schoukens Sj¨ oberg (2015). In Schoukens this case, the ML(2014), crite1. INTRODUCTION Westwick (2012), et al. rion simplifies into the Prediction Error Method (PEM) Block-oriented models represent a structured and use(Ljung, 1999). into Hence, aPrediction consistentError estimate can (PEM) be reful tool to describe non-linear dynamical relations and, rion simplifies the Method Block-oriented models represent a structured and userion simplifies into the Prediction Error Method (PEM) Block-oriented models representdynamical a structured and and, use- Giordano and Sj¨ oberg (2015). In thisestimate case, thecan MLbe crite(Ljung, 1999). Hence, a consistent reful tool to describe non-linear relations trieved, using given a good initial guess. can For finding recently, thedescribe identification of such model structures has (Ljung, 1999). Hence, aaPrediction consistent estimate be ful tool non-linear dynamical relations (Ljung, 1999).PEM, Hence, consistent estimate can be rereful tool to to describe non-linear dynamical relations and, rion simplifies into the Error Method (PEM) trieved, using PEM, given a good initial guess. For finding Block-oriented models represent a model structured and and, userecently, the identification of such structures has this, many results are based on the Bestguess. Linear Approxattracted a lot of attention. Block-oriented models has are trieved, using PEM, given aa good initial For finding recently, the identification of such model structures trieved, using PEM, given good initial guess. For finding recently, the identification of such model structures has (Ljung, 1999). Hence, a consistent estimate can be rethis, many results are based on the Best Linear Approxful tool to describe non-linear dynamical relations and, attracted a lot of attention. Block-oriented models are imation (BLA), i.e.are the beston model, in the least built combining blocks which Block-oriented are either linear dynamic many results based the Best Linear Approxattracted aa lot of attention. models are this, many results are based onlinear the Best Linear Approxattracted lot of attention. Block-oriented models are this, trieved, using PEM, given a good initial guess. For finding imation (BLA), i.e. the best linear model, in the least recently, the identification of such model linear structures has built combining blocks which are either dynamic squares sense, that be derived from the see systems or staticblocks non-linear See, e.g. dynamic Lauwers imation (BLA), i.e. the best model, in the least built which are either either linear dynamic (BLA), i.e.arecan the best linear model, in data, the least built combining combining blocks whichfunctions. are linear this, many results based onlinear the Best Approxsquares sense, that can be derived fromLinear the data, see attracted lot of attention. Block-oriented models are imation systems ora static non-linear functions. See, e.g. Lauwers Schoukens and Tiels (2017) for an exhaustive overview. et al. (2007), for some general aspects on block-oriented squares sense, that can be derived from the data, see systems or static non-linear functions. See, e.g. Lauwers squares sense, that be derived from the see systems or static non-linear See, e.g. dynamic Lauwers imation (BLA), i.e. can the best linear model, in data, the least Schoukens and Tiels (2017) for an exhaustive overview. built combining blocks whichfunctions. are either linear et al. (2007), for some general aspects on block-oriented Under the assumption of Gaussian excitation, the BLA models. The Wiener-Hammerstein (W-H) system is a Schoukens and Tiels (2017) for an exhaustive overview. et al. (2007), for some general aspects on block-oriented Schoukens and Tiels (2017) for an exhaustive overview. et al. (2007), for some general aspects on block-oriented squares sense, that can be derived from the data, see Under the assumption of Gaussian excitation, the BLA systems or static non-linear functions. See, e.g. Lauwers models. The Wiener-Hammerstein (W-H) system is a provides a consistent estimate of the excitation, concatenation the block-oriented model structure where (W-H) a static non-linearity the assumption of the BLA models. The Wiener-Hammerstein system Under the assumption of Gaussian Gaussian excitation, the of BLA models. The for Wiener-Hammerstein (W-H) system is is aa Under Schoukens and Tiels estimate (2017) for an exhaustive overview. provides a consistent of the concatenation of the et al. (2007), some generalwhere aspects on block-oriented block-oriented model structure a static non-linearity dynamicsaa of the twoestimate linear parts and Ljung, is squeezed between two linearwhere dynamic systems. In the provides consistent of concatenation of the block-oriented model aa static non-linearity consistent of the the(Enqvist concatenation ofBLA the block-oriented model structure structure where static non-linearity Under the of assumption of Gaussian excitation, theLjung, dynamics the and twoestimate linear parts (Enqvist and models. Thebetween Wiener-Hammerstein (W-H) systemInisthea provides is squeezed two linear dynamic systems. 2005), (Pintelon Schoukens, 2001). Thus, once this literature, many approaches deal with the identification of dynamics of the two linear parts (Enqvist and Ljung, is squeezed between two linear dynamic systems. In the dynamics of the two linear parts (Enqvist and Ljung, is squeezed between two linear dynamic systems. In the provides a consistent estimate of the concatenation of the 2005), (Pintelon and Schoukens, 2001). Thus, once this block-oriented model structure where athe static non-linearity literature, many approaches deal with identification of concatenation is known, the next step of the identification Wiener-Hammerstein systems. The differences among the (Pintelon and 2001). Thus, once this literature, approaches deal with the identification of 2005), (Pintelon and Schoukens, 2001). Thus, once this literature, many approaches deal with the systems. identification of 2005), dynamics of the two Schoukens, linear parts (Enqvist and Ljung, concatenation is known, the next step of the identification is squeezedmany between two linear dynamic In the Wiener-Hammerstein systems. The differences among consists of finding the right theof dynamics between approaches are mainlysystems. related to thedifferences use of different concatenation is the next step the identification Wiener-Hammerstein The among the concatenation is known, known, the split next of step the identification Wiener-Hammerstein systems. The differences amongnoise the (Pintelon and 2001). Thus, once this consists of finding theofSchoukens, right split of theofdynamics between literature, many approaches deal with theofidentification of 2005), approaches areDepending mainly related to the use different noise the two linear parts the W-H model, and identifying assumptions. on these assumptions, two main consists of finding the right split of the dynamics between approaches mainly related to the use of different different noise of finding theofright split of theofdynamics between approaches are areDepending mainlysystems. related to the use of noise concatenation isparts known, next step the identification the two linear the W-H model, and identifying Wiener-Hammerstein The differences among the consists assumptions. on these assumptions, two main the non-linearity in the middle. For example, in Sj¨ oberg problems needDepending to be addressed: two linear parts of the W-H model, and assumptions. on assumptions, two two of linear parts ofright the W-Hof model, and identifying identifying assumptions. on these these assumptions, two main main consists finding thethe split the dynamics in middle. For example, in between Sj¨ oberg approaches areDepending mainly related to the use of different noise the problems need to be addressed: et al.non-linearity (2012), this splitting is done by testing all possible the non-linearity in the middle. For example, in Sj¨ o problems need to be addressed: non-linearity in the middle. For example, in possible Sj¨ oberg berg problems needDepending toofbe two linearthis parts of the W-H model, and all identifying et al. (2012), splitting is done by testing assumptions. on these two main (1) The choice theaddressed: criterion to assumptions, minimize/maximize, in the combination ofthis thesplitting dynamics in theby two linearall parts and, et al. (2012), this splitting is done by testing all possible (1) The choice of the criterion to minimize/maximize, in et al. (2012), is done testing possible the non-linearity indynamics the middle. For two example, in Sj¨ oberg combination of the in the linear parts and, problems need to be addressed: order to obtain a consistent estimate. The common (1) The choice of the criterion to minimize/maximize, in for each combination, the non-linearity is identified via (1) order The choice of the criterion to minimize/maximize, in combination of the dynamics in the two linear parts and, to obtain a consistent estimate. The common combination ofthis thesplitting dynamics in theby two linear parts and, et al. (2012), is done testing all possible for each combination, the non-linearity is identified via choice in this case is the Maximum Likelihood (ML) order to obtain a consistent estimate. The common PEM, implemented as a Least Squares problem. Since in orderchoice to obtain a criterion consistent The common each combination, the non-linearity is identified via (1) choice The of the to estimate. minimize/maximize, in for in this case is the Maximum Likelihood (ML) for each combination, the non-linearity is identified via combination of the dynamics inSquares the twoproblem. linear parts and, PEM, implemented as a Least Since in criterion, Ljung (1999), Wald (1949). This, depending choice in this case is the Maximum Likelihood (ML) output-noise framework PEM is consistent, the lowest choice to in obtain this case is theWald Maximum Likelihood (ML) the PEM, implemented as a Least Squares problem. Since in order a consistent estimate. Thedepending common criterion, Ljung (1999), (1949). This, PEM, implemented as a Least Squares problem. Since in for each combination, the non-linearity is identified via the output-noise framework PEM is consistent, the lowest on the assumption on the noise framework, can be criterion, Ljung (1999), Wald (1949). This, prediction error framework is achievedPEM at the right split,the which is criterion, Ljung (1999), Wald (1949). This, depending depending the output-noise framework PEM is consistent, the lowest choice inassumption this case ison thethe Maximum Likelihood (ML) on the noise framework, can be the output-noise is consistent, lowest PEM, implemented as a LeastatSquares problem. Since in prediction error is achieved the right split, which is implemented in different ways. on the assumption on the noise framework, can be used as initial guess. Another approach based on the BLA on the assumption on the noise framework, can be prediction error is achieved at the right split, which is criterion, Ljung (1999), Wald (1949). This, depending implemented in different ways.In general, the identifi- the prediction error is achieved at the right split, which is output-noise framework PEM is consistent, the lowest used as initial guess. Another approach based on the BLA (2) implemented The choice of the initial guess. in different ways. the Fractional Approach in Vanbeylen (2014). In this implemented in different ways. used as initial guess. Another approach based on the BLA on the assumption on the noise framework, can be is (2) The choice of the initial guess. In general, the identifiused as initial guess. Another approach based on the BLA prediction error is achieved at the right split, which is is the Fractional Approach in Vanbeylen (2014). In this cation criterion be non-convex and the search for case, the dynamics from the BLA are placed both in the (2) of the initial guess. the identifi(2) The The choice choice of in thecan initial guess. In general, general, the identifithe Fractional Approach in Vanbeylen (2014). In this implemented different ways.In cation criterion can be an non-convex and the search for is is the Fractional Approach in Vanbeylen (2014). In this used as initial guess. Another approach based on the BLA case, the dynamics from the BLA are placed both in the the minimum requires iterative procedure. Hence, criterion can be non-convex and search for first and in the second part, their positions cation criterion can be an non-convex and the the search for case, the dynamics from the are placed both in the (2) cation Theminimum choice of the initial guess. In general, the identifithe requires iterative procedure. Hence, the dynamics from linear theinBLA BLA areand placed both in this the is theand Fractional Approach Vanbeylen (2014). In first in the second linear part, and their positions athegood initial requires guess foran the model procedure. parametersHence, needs case, the minimum iterative are parametrized via fractional exponents. The fractional minimum requires an iterative procedure. Hence, first and in the second linear part, and their positions cation criterion can be non-convex and the search for a good initial guess for the model parameters needs first and in the second linear part, and their positions case, the dynamics from the BLA are placed both in the are parametrized fractional exponents. The identified, fractional be retrieved in order avoid local minima. needs ato initial guess for the model parameters and thevia non-linear function are then a good good initial requires guess foranto the model parameters needs exponents are parametrized via fractional exponents. The fractional the minimum iterative procedure. to be retrieved in order to avoid local minima.Hence, are parametrized via fractional exponents. The fractional first and in the second linear part, and their positions exponents and the non-linear function are then identified, to be order to local minima. againand PEM, and the split of the dynamics is found. togood be retrieved retrieved inthe order to avoid local minima. exponents the non-linear function are a initial for theavoid model parameters needs using In literature, most guess ofin approaches assume the outputexponents thevia non-linear function are then then identified, are parametrized fractional exponents. The identified, fractional using againand PEM, and the split of the dynamics is found. In literature, most of the approaches assume the outputusing again PEM, and the split of the dynamics is found. toframework be retrieved order to Wills avoid and local minima. noise only, see approaches e.g. Ninness (2012), using In most of assume the again PEM, the split of the approaches dynamics isassume found. and the and non-linear function are then identified, However, as mentioned before, these In literature, literature, most ofinthe the assume the outputoutputnoise framework only, see approaches e.g. Wills and Ninness (2012), exponents However, as mentioned before, these approaches assume noise framework only, see e.g. Wills and Ninness (2012), using again PEM, and the split of the dynamics is found. additive noise on the output of the system only. In this noise framework only, see approaches e.g. Wills and Ninness (2012), However, as before, approaches assume In literature, most of the assume the output However, as mentioned mentioned before,of these these approaches assume Corresponding author: G. Giordano, Tel. +46765606453, additive noise on will the also output thea disturbance system only.entering In this  work, instead, we consider additive noise on the output of the system only. In Corresponding G.e.g. Giordano, Tel.Ninness +46765606453, noise frameworkauthor: only, see Wills and (2012), additive  noise on will the also output of these thea disturbance system only.entering In this this [email protected] However, as mentioned before, approaches assume work, instead, we consider Corresponding  Corresponding author: author: G. G. Giordano, Giordano, Tel. Tel. +46765606453, +46765606453, [email protected] work, instead, we will also consider a disturbance entering work, instead, we will also consider a disturbance entering additive noise on the output of the system only. In this [email protected]  [email protected] Corresponding author: G. Giordano, Tel. +46765606453, work, instead, we will also consider a disturbance entering Copyright © 2018 IFAC 401 ∗ ∗ ∗ ∗

[email protected] 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2018, 2018 IFAC 401 Hosting by Elsevier Ltd. All rights reserved. Copyright 2018 IFAC 401 Peer review© of International Federation of Automatic Copyright ©under 2018 responsibility IFAC 401Control. 10.1016/j.ifacol.2018.09.178 Copyright © 2018 IFAC 401

2018 IFAC SYSID 402 July 9-11, 2018. Stockholm, Sweden

Giuseppe Giordano et al. / IFAC PapersOnLine 51-15 (2018) 401–406

before the non-linearity of the W-H system, i.e. the process noise. In particular, we will address the problem of finding the right split of the dynamics from the BLA, when both process and output noise affect the system. In Giordano and Sj¨ oberg (2016), the authors showed that, in this noise framework, the BLA is still a consistent estimate of the dynamics of the two linear parts, but the PEM estimate of all model parameters is not consistent. In this paper, we will briefly recall the consistency results of W-H estimation in case of process noise. Then, a Maximum Likelihood estimator is derived, and the splitting algorithm in Sj¨oberg et al. (2012) is modified in order to obtain the right split of the dynamics. Furthermore, we will show that the computation of the likelihood function requires the solution of high dimensional integrals. Thus, an efficient numerical integration technique, based on Monte Carlo (MC) and Metropolis-Hastings (MH) (Metropolis et al., 1953), (Hastings, 1970), is proposed. In literature, Haryanto and Hong (2013) consider the process noise for W-H system, but with the assumption that the correct split of the linear parts is known. Moreover, the multidimensional integrals are solved via Monte Carlo integration with one global sampling. This may require the need of an enormous number of sampling points. With the Metropolis-Hastings technique, instead, the number of points is considerably reduced, see Section 4. Zhang and Schoukens (2017), instead, address the issue of assessing the presence of process noise in a W-H system, from the generated data. Once this presence is assessed, identification schemes like the one being presented in this paper can be implemented. The paper is organized as follows. Section 2 introduces the identification problem and recalls the consistency results of the BLA. In Section 3, first the ML estimator is derived, then the computational aspects and the identification algorithm based on the ML estimate are presented and discussed. The ML estimate is then combined with the BLA splitting algorithm in Section 3.4 and the combined approach is tested both in simulation and on benchmark data in Section 4. Finally, Section 5 concludes the paper. 2. PRELIMINARIES In this section, we introduce the W-H system and its identification problem. We recall the results concerning the consistency of the BLA. Based on the BLA, we introduce a splitting algorithm which will be modified in order to provide a consistent estimate in case of process noise. 2.1 The Wiener-Hammerstein system with process noise The W-H system with process noise is depicted in Figure 1, and described by the following equations in the discretetime

Fig. 1. Wiener-Hammerstein system with process and output-noise. The input of the non-linearity is affected by the disturbance w(t). by a second linear dynamical system GH (q, θH ). Finally, the output of the filter is corrupted by additive measurement noise e(t). We will assume that the intermediate signal x(t) is unknown and the two noises, w(t), e(t), are independent. The two linear filters GW and GH are parametrized as rational transfer functions with parameters θW and θH . The non-linear function f is a basis function expansion, linearly parametrized in η. The identification procedure for W-H models aims at estimating the vector parameter θ = [θW , θH , η], given a set of N data points, Z = {u(t), y(t)}N t=1 , i.e. measured input and output of the system. As mentioned in the introduction, we will present an identification algorithm based on the splitting of the dynamics contained in the BLA. In the next section, we will recall the consistency result of the BLA identification in presence of process noise, and we will discuss a possible splitting algorithm. 2.2 BLA consistency and a splitting algorithm The Best Linear Approximation of a time-invariant nonlinear system to a given stationary input u(t), t = 1, ..., N , is defined as the best linear system approximating the system’s output in the least square sense (Enqvist and Ljung, 2005), N 1  GBLA (q) = arg min (y(t) − G(q, γ)u(t))2 . (2) G∈G N t=1

We recall the consistency result of the BLA for W-H systems in presence of process noise. Theorem 1. We assume that the following holds:

(1) The input u(t) and the process noise w(t) are independent, Gaussian, stationary processes; (2) The measurement noise e(t) is a stationary stochastic process, independent of u and w; (3) G(q, γ) is an arbitrary transfer function parametrization with freely adjustable gain, such that G(q, γ0 ) = G0W (q)G0H (q), for some parameter value γ0 , and G0W (q), G0H (q) being the true linear dynamical parts; (4) Parameter γ is estimated from u and y using an output error method, N 1  γˆN = arg min (y(t) − G(q, γ)u(t))2 . (3) γ N t=1

Then x0 (t) = GW (q, θW )u(t)

(1a)

x(t) = x0 (t) + w(t)

(1b)

y(t) = GH (q, θH )f (x(t), η) + e(t).

(1c)

The input signal, u(t), is filtered by GW (q, θW ). The output x0 (t) of the filter is corrupted by additive process noise w(t). The corrupted signal x(t) is the input of the static non-linear function f (x, η), whose output is filtered 402

G(q, γˆN ) → kG0W (q)G0H (q) w.p. 1 as N → ∞.

(4)

The proof can be found in Giordano and Sj¨oberg (2016). Hence, the BLA is a consistent estimate of the true dynamics, and the parameter γˆN describes the concatenation of θW , θH , γˆN = [θW , θH ]. In the following, we will investigate a modification of the exhaustive search algorithm presented in Sj¨oberg et al.

2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden

Giuseppe Giordano et al. / IFAC PapersOnLine 51-15 (2018) 401–406

(2012), to solve the splitting problem of the BLA, in presence of process noise. The algorithm performs an exhaustive search over all the possible combinations of the dynamics of the BLA. For each combination, the estimation of the non-linearity is performed, by improving a certain criterion VN . The estimation providing the best value of the criterion is selected as the split of the dynamics. In order to find the right split, the criterion VN should be the one providing a consistent estimate of the parameters η of the non-linearity. In the next section, we will use the Maximum Likelihood (ML) criterion to retrieve a consistent estimate. We will therefore assume that the regularity conditions for consistency of the ML estimate hold, see Wald (1949). We will further assume that, for each ML estimation, the global maximum is retrieved. In practice, this is not always the case, since the ML criterion can have multiple minima, and some globalization technique needs to be implemented. We will come back and discuss this point in Section 3.3 and 3.4. We will finally show that, when process noise is not present, the ML criterion boils down to the PEM criterion. 3. MAXIMUM LIKELIHOOD ESTIMATION IN THE PRESENCE OF PROCESS NOISE In this section, we first derive the likelihood function for the W-H system. Then, the inconsistency of the standard PEM approach is presented. Finally, a splitting algorithm based on the Maximum Likelihood estimator is derived. 3.1 The likelihood function

Assuming e(t), w(t) IID white Gaussian random variables, 2 , for each time instant, we have with variances σe2 , σw  N N   1 1  e− 2 E(x(t),θ) dxN py (θ; Z) = 2 2 N 2π σe σw t=1 x(t)∈R (9) where 1 E(x(t), θ) = 2 (y(t) − GH (q, θH )f (x(t), η))2 σe (10) 1 + 2 (x(t) − GW (q, θW )u(t))2 . σw Hence, the maximization of (9) provides a Maximum Likelihood estimate of θ. Furthermore, the formulation (9) 2 , σe2 too. allows the estimation of the noise variances σw Hence, we have ˆσ θ, ˆ2, σ ˆ 2 = arg max py (θ; Z). (11) e

w

2 θ,σe2 ,σw

Given the intractability of the likelihood function (9), its integration and the maximization can be solved numerically. This is discussed in Section 3.3. 3.2 The case of no process noise When there is no process noise, (9) and (10) are simplified. 2 In fact, if σw = 0, the integral in (9) becomes  1 e− 2 E(x(t),θ) δ(x(t) − x0 (t, θ))dxN (12) x(t)∈RN

The likelihood function is the probability density function (PDF) of the model output yˆ(t) for given parameters θ and a given set of data Z = {u(t), y(t)}N t=1 . In our case, the probability of the observed data depends on both process and output noise. Hence, similarly to the derivation in Hagenblad et al. (2008), for Wiener models, we consider two conditional probabilities, in order to separate the two stochastic contributions given by the output and process noise. The first contribution is the probability of the output y(t) given the disturbed and unknown signal x(t), see Figure 1. For this contribution, in fact, the only stochasticity is introduced by the output noise. Thus, we can write (5) py|x (θ; Z) = pe (y(t) − GH (q, θH )f (x(t), η)), where pe is the PDF of the output noise e(t). The second conditional probability is the one describing the probability of x(t) given the input u(t). This is determined by the process noise only, hence px|u (θ; Z) = pw (x(t) − GW (q, θW )u(t)), (6) where pw is the PDF of the process noise w(t). In this case, since the input u(t) is known and deterministic, (6) is a total probability, (7) px|u (θ; Z) = px (θ; Z). We can derive the likelihood function, i.e. the total probability of y, by integrating the disturbed and, hence, unknown signal x(t) out from the joint PDF py,x (θ; Z),  py (θ; Z) = py,x (θ; Z)dx x  (8) = py|x (θ; Z)px (θ; Z)dx = pe pw dx. x

403

x

403

where δ(x(t) − x0 (t, θ)) is a Dirac delta function centred at x0 (t, θ). Hence, the term in the exponent simplifies into 1 E(x0 (t, θ), θ) = 2 (y(t) − GH (q, θH )f (x0 (t), η))2 , (13) σe with x0 (t, θ) = Gw (q, θW )u(t). Thus, (9) becomes N 

− 1 (y(t)−GH (q,θH )f (x0 (t),η))2 1 e 2σe2 . 2 2πσe t=1 (14) Maximizing this is equivalent to minimizing the criterion

py (θ; Z) =



N 1  VN (θ) = (y(t) − GH (q, θH )f (x0 (t, θ), η))2 , N t=1

(15)

which is the Prediction Error criterion. Hence, in this 2 = 0), the PEM is equivalent to ML and, thus, case (σw consistent (Wald, 1949). In opposite, when σw = 0, the PEM is not the ML and its use normally leads to biased results. For formal proof of PEM inconsistency for W-H systems in presence of process noise, see Theorem 3 in Giordano and Sj¨oberg (2016). 3.3 Implementation of the ML estimate In this section, we discuss the numerical implementation of computing (11). This can be formulated as minimization of the negative log-likelihood, ˆσ θ, ˆ2, σ ˆ 2 = arg min L(θ, σ 2 , σ 2 ) (16) e

where

w

2 θ,σe2 ,σw

e

w

2018 IFAC SYSID 404 9-11, 2018. Stockholm, Sweden July

Giuseppe Giordano et al. / IFAC PapersOnLine 51-15 (2018) 401–406

2 2 L(θ, σe2 , σw ) = − log py (θ, σe2 , σw ; Z) N 2 log(σe2 , σw ) = N log(2π) + 2  N  1 − log e− 2 E(x(t),θ) dxN , t=1

Algorithm 1 ML estimate of W-H model parameters j j 2 Set initial guess θj = [θW , θH , η j , σe2j , σw ], for j = 0 j 2: for j=0,... do 3: Generate samples and weights {Xkt , qkt }M k=1 = MetropolisHastings(θj , E(x, θj )) 4: Build integral approximation with IS t 1 M 1  e− 2 E(Xk ,θj ) I(θj , t) = M qkt

1:

(17)

x(t)∈RN

2 ) with E(x(t), θ) given by (10). The criterion L(θ, σe2 , σw contains analytically intractable multi-dimension integrals. Hence, the integral is approximated numerically,  1 e− 2 E(x(t),θ) dxN ≈ I(θ, t). (18) x(t)∈RN

Since x(t) is unknown, sampling techniques are required to compute I(θ, t). In particular, Importance Sampling (IS) (Doucet et al., 2001) is used t 1 M 1  e− 2 E(Xk ,θ) , Xkt ∼ q(x(t)), (19) I(θ, t) = M q(Xkt ) k=1

where M indicates the number of samples, and Xkt is one sample from the proposal distribution q(x(t)). Ideally, one would like to sample directly from pw (x(t) − x0 (t, θ)), i.e. the distribution of the process noise, but this may be difficult to sample from. Hence, the Metropolis-Hastings (MH) technique can be used to approximate the distribution, Xkt ∼ q(x(t)) ≈ pw (x(t) − x0 (t, θ)). (20) To simplify the notation, in the following, we will use pw (θ) in place of pw (x(t) − x0 (t, θ)). The output linear filter GH , which in general can be an Infinite Impulse Response, at each instant t, correlates in time all the sequence of f (x(t)), from 0 to t. Thus, for each t, a multi-dimension sample Xkt ∈ Rt needs to be generated in order to properly evaluate the integral. In Neal and Roberts (2006), a modification of the Metropolis-Hastings technique, where each dimension of the variable is sampled separately, is proven to be optimal for independent and identically distributed variables, which is the case for pw (θ). This is the component-wise Metropolis-Hastings (CWMH) technique, which is used in our approach. Details on CWMH can be found in Appendix A. 2 ) Once the numerical integration is performed, L(θ, σe2 , σw is a deterministic function and Newton’s method can be used to minimize over the model parameters and the variances. Since Metropolis-Hastings is a local, θ-dependent sampling technique, at each Newton’s iteration the pa2 rameters are updated and therefore L(θ, σe2 , σw ) needs to be re-sampled. However, if new samples and weights, {Xkt , qkt }M k=1 are generated at each iterate, the algorithm 2 would be highly unstable, since L(θ, σe2 , σw ) would not be deterministic, see Durbin and Koopman (1997). Hence, new samples and weights are generated only upon convergence of Newton’s method, which delivers a new guess θj+1 . The procedure is then repeated until convergence of the series θj . This is illustrated in Algorithm 1. A description of the routine MetropolisHastings can be found in Appendix A. Under the hypothesis of infinite sample size M , the series θj will then converge to a true local maximum of the likelihood function, see Doucet et al. (2001). In practice, however, the same result is achieved with M sufficiently large. In order to find this M , one can monitor the error introduced in the integral approximation and decide the sample size to use in order to achieve a desired accuracy. More importantly, consistency is actually

404

k=1

5:

6:

Build the log-likelihood function N  N 2 2 log(σej , σwj ) − log I(θj , t) L(θj ) = 2 t=1 Run Newton’s Method to solve θj+1 = arg min L(θj ) θj

7:

end for

achieved at the global maximum of the likelihood. Hence, some globalization technique has to be implemented. However, this is outside the scope of this paper, since here we address the main issue of finding the right split of the BLA, given a consistent estimate of the W-H non-linearity, as it will be explained in the next section. 3.4 Exhaustive search with ML estimate The consistent ML estimate derived in the previous section is here combined with the Exhaustive Search algorithm of Sj¨oberg et al. (2012). For each combination of poles and zeros from the BLA, a ML estimate of the non-linearity is found. For this, Algorithm 1 is modified in order to estimate only the variances and the parameter η of the non-linearity, since θW , θH are fixed for each combination. As we discussed before, we assume that, for each ML estimation, the global maximum is achieved. The steps of the algorithm are the following. (1) Use the data set Z to estimate the BLA of the system, GBLA (q), see Equation (2). ˆ W (q, θˆW ), G ˆ H (q, θˆH ). (2) Split GBLA (q) into all possible G ˆ ˆ (3) For all partitions, fix θW , θH in the ML criterion 2 L(θ, σe2 , σw ) (17) and use Algorithm 1 to find the ML 2 estimate for θ = [η, σe2 , σw ]. (4) Order the partitions with respect to the ML criterion, and select the best one. The selected partition will correspond to the right split of the dynamics, and it can be used as initial guess for a final optimization in all model parameters, using again ML. If, at point (3), a PEM estimator were used instead, even under the assumption that the global optimum of the criterion was found, there would be no guarantee that the best split is also the right one, given the inconsistency of the PEM estimate. This is also illustrated via simulations, in Section 4. 4. SIMULATION AND EXPERIMENTAL RESULTS In this section, first the Exhaustive Search combined with the ML estimate is tested on a simulation example. Since simulated data are used, it is possible to compare

2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden

Giuseppe Giordano et al. / IFAC PapersOnLine 51-15 (2018) 401–406

405

Table 1. BLA estimation Poles a1 a2

True 0.4 0.8

Estimated (µ ± σ) 0.3988 ± 0.0100 0.7998 ± 0.0401

Table 2. BLA Splitting - Value of the criterion True split (RMSE) Wrong split (RMSE)

ML 1.6934 4.3872

PEM 8.11246 8.07622

the estimated parameters with the true ones. Then, the splitting algorithm is tested on the experimental data provided by the benchmark, Schoukens and Noel (2016). 4.1 Simulation example Noisy data are generated using the following W-H system, 1 u(t) (21a) 1 − a1 q −1 z(t) = f (x0 (t) + w(t), η) (21b) 1 z(t) + e(t). (21c) y(t) = 1 − a2 q −1 with f (x(t)) being a third degree polynomial f (x(t)) = c0 + c1 x(t) + c2 x2 (t) + c3 x3 (t). The total number of poles and zeros, and the degree of the polynomial are assumed to be known. The system is excited with Gaussian noise with standard deviation 1. The process and output noise are respectively white and Gaussian with standard deviations σw = 4, σe = 1. The signals u, w, and e are mutually independent. First the BLA is estimated. 1000 data points are used to fit a second order OE model. In Table 1, the comparison between true and estimated parameters is shown. As expected, the BLA is unbiased. Mean (µ) and standard deviation (σ) of the estimates are computed over 1000 Monte-Carlo runs. Since only two poles are present, the identified BLA can be split in two possible ways. For each partition, the non-linearity is estimated, according to the procedure in Section 3.4. Again, 1000 data points have been used for estimation, and a window of M = 100 samples for the integration. For comparison purposes, the non-linearity has been estimated with PEM too. In Table 2, the final values of ML and PEM criterion, for each split, are reported. As expected, the ML estimate finds the right split, while the PEM, inconsistent for the process noise case, shows lower value of the criterion for the wrong split. Finally, for the true split, the estimated parameters with PEM and ML are reported in Figure 2. It can be seen that the PEM estimate is biased, while the ML one is not. From a computational perspective, the integration method in Haryanto and Hong (2013) uses a uniform grid of M = 30000 points. Hence, a significant reduction of the needed samples for integration is achieved (M = 100). The reason is that MH progressively constructs an approximation of the likelihood function, by the use of a local (around the current value of θ) sampling procedure. With a uniform grid, instead, a global picture of the likelihood needs to be built, requiring enormous sample size. On the other hand, a drawback of this local sampling technique is that re-sampling is needed for each new guess of θ, see step 3 in Algorithm 1. This operation is, in general, costly. However, it has been observed that, using as initial x0 (t) =

405

Fig. 2. PEM vs ML. Non-linear parameters. Histograms over 1000 Monte-Carlo simulations. The ML estimates are unbiased and they also show quite narrow variance, compared to the ones of the PEM estimates. guess for the ML estimate the result of the PEM, the total number of iterations is kept low (4-5 in this example). Nevertheless, there is no general evidence that initializing the ML estimate with the (inconsistent) PEM estimate always leads to good results.

4.2 Benchmark The identification procedure based on BLA splitting and ML estimate has been tested on real data provided by the Non-linear Systems Identification Benchmark, Schoukens and Noel (2016). The first linear filter GW is a third order low pass filter, GH is an inverse Chebyshev filter. The static non-linearity is realized with a diode-resistor network. The BLA was estimated by fitting a 6-th order OE model, with 10000 input/output data. The input used for estimation was the Multisine, provided by the benchmark. The BLA provided a RMSE of 35 mV on the validation data (no process noise). The linear model was then split into two sub-models in all possible ways, but avoiding the combination providing improper transfer functions. For each split, a 5-th order polynomial function was estimated as static non-linearity, via Algorithm 1. In this case, 3000 data and an integration window of M = 300 points were used. The RMSE achieved by the best split, i.e. lowest value of the ML criterion, was 16.2 mV. This result can be compared with the other methods presented at the invited session on non-linear system identification applied on benchmark datasets, Schoukens and Noel (2016), of the SYSID conference 2018. 5. CONCLUSIONS A Maximum Likelihood estimate for the W-H system in presence of process noise has been derived. Efficient numerical integration techniques have been implemented to solve the analytically intractable multi-dimension integrals contained in the likelihood function. In particular, the high-dimension problem has been tackled via the component-wise Metropolis-Hastings technique. Given the

2018 IFAC SYSID 406 July 9-11, 2018. Stockholm, Sweden

Giuseppe Giordano et al. / IFAC PapersOnLine 51-15 (2018) 401–406

samples generated by the Metropolis-Hastings technique, the likelihood becomes a deterministic functions of the parameters and it can be locally maximized via the Newton’s Method. Finally, the Maximum Likelihood estimate has been combined with the Exhaustive Search approach in Sj¨ oberg et al. (2012), in order to derive a complete identification algorithm for W-H based on the BLA. Future work will investigate the use of the Maximum Likelihood estimate in combination with other approaches based on the BLA splitting, e.g. the Fractional Approach. In fact, the exhaustive search has the drawback to be limited to low order models (up to approximately 10), given the combinatorially growth of the complexity. Moreover, the minimization of the likelihood function requires multiple re-sampling. This is in general quite expensive in terms of computational time. Further research will investigate the possibility of reducing the number of re-sampling needed. REFERENCES Doucet, A., Smith, A., de Freitas, N., and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Information Science and Statistics. Springer New York. Durbin, J. and Koopman, S.J. (1997). Monte carlo maximum likelihood estimation for non-gaussian state space models. Biometrika, 84(3), 669–684. Enqvist, M. and Ljung, L. (2005). Linear approximations of nonlinear FIR systems for separable input processes. Automatica, 41(3), 459–473. Giordano, G. and Sj¨ oberg, J. (2015). A Time-Domain Fractional Approach for Wiener-Hammerstein Systems Identification. IFAC-PapersOnLine, 48(28), 1232–1237. Giordano, G. and Sj¨ oberg, J. (2016). Consistency Aspects of Wiener-Hammerstein Model Identification in Presence of Process Noise. Conference on Decision and Control, Las Vegas, USA, 3042–3047. Hagenblad, A., Ljung, L., and Wills, A. (2008). Maximum likelihood identification of Wiener models. Automatica, 44(11), 2697–2705. Haryanto, A. and Hong, K.S. (2013). Maximum likelihood identification of Wiener-Hammerstein models. Mechanical Systems and Signal Processing, 41(1-2), 54–70. Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97. Lauwers, L., Schoukens, J., Pintelon, R., Moer, W.V., and Gomme, L. (2007). Some Practical Applications of a Nonlinear Block Structure Identification Procedure. 2007 IEEE Instrumentation & Measurement Technology Conference IMTC 2007, 1–6. Ljung, L. (1999). System Identification (2Nd Ed.): Theory for the User. Prentice Hall PTR, Upper Saddle River, NJ, USA. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and Teller, E. (1953). Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics, 21(6), 1087–1092. Neal, P. and Roberts, G. (2006). Optimal scaling for partially updating MCMC algorithms. Annals of Applied Probability, 16(2), 475–515. Pintelon, R. and Schoukens, J. (2001). System identification: a frequency domain approach. IEEE press, New York. 406

Schoukens, M. and Noel, J.P. (2016). WienerHammerstein system with process noise. Nonlinear System Identification Benchmarks Workshop, Brussels, Belgium. Schoukens, M., Pintelon, R., and Rolain, Y. (2014). Identification of Wiener-Hammerstein systems by a nonparametric separation of the best linear approximation. Automatica, 50(2), 628–634. Schoukens, M. and Tiels, K. (2017). Identification of block-oriented nonlinear systems starting from linear approximations: A survey. Automatica, 85, 272 – 292. Sj¨oberg, J., Lauwers, L., and Schoukens, J. (2012). Identification of Wiener-Hammerstein models: Two algorithms based on the best split of a linear model applied to the SYSID’09 benchmark problem. Control Engineering Practice, 20(11), 1119–1125. Vanbeylen, L. (2014). A fractional approach to identify Wiener-Hammerstein systems. Automatica, 50(3), 903– 909. Wald, A. (1949). Note on the Consistency of the Maximum Likelihood Estimate. The Annals of Mathematical Statistics, 20(4), 595–601. Westwick, D.T. and Schoukens, J. (2012). Initial estimates of the linear subsystems of Wiener-Hammerstein models. Automatica, 48(11), 2931–2936. Wills, A. and Ninness, B. (2012). Generalised Hammerstein-Wiener system estimation and a benchmark application. Control Engineering Practice, 20(11), 1097–1108. Zhang, E., S.M. and Schoukens, J. (2017). Structure Detection of Wiener-Hammerstein Systems With Process Noise. IEEE Transactions on Instrumentation and Measurement, 66(3), 569–576. Appendix A. COMPONENT-WISE METROPOLIS-HASTINGS Metropolis-Hastings makes use of a Markov chain to construct a progressive picture of a target distribution, see Hastings (1970). In its component-wise formulation, Neal and Roberts (2006), the Markov chain simulates one dimension (i) of a high-dimension random variable, Xkt = {. . . , Xkti , . . . } ∈ Rt , at the time. The i-th dimension of k-th sample is generated as a random-walk from i-th ti dimension of sample k − 1, Xkti ∼ g(Xkti |Xk−1 ). Hence, t the procedure needs an initial sample X1 . The first generated samples are highly dependent on X1t , but they are usually removed as burn-in samples. The Markov chain, then, accepts or reject the generated sample component, according to the following rule  ti Xk if u ∼ U (0, 1) ≤ P ti Xk = ti Xk−1 otherwise where P is the acceptance probability computed as min(1, a1 ), where 1

a1 =

ti

e− 2 E(Xk

,θj )

. e The procedure also returns the weight associated to Xkti , ti 1 1  − 1 E(X ti ,θj ) k e 2 . qkti = e− 2 E(Xk ,θj ) / M ti − 12 E(Xk−1 ,θj )

k