Proceedings,18th IFAC Symposium on System Identification Proceedings,18th IFAC July 9-11, 2018. Stockholm, Sweden on Proceedings,18th IFAC Symposium Symposium on System System Identification Identification Proceedings,18th IFAC Symposium on System Identification July July 9-11, 9-11, 2018. 2018. Stockholm, Stockholm, Sweden Sweden Available online at www.sciencedirect.com July 9-11, 2018. Stockholm, Sweden on System Identification Proceedings,18th IFAC Symposium July 9-11, 2018. Stockholm, Sweden
ScienceDirect
IFAC PapersOnLine 51-15 (2018) 796–801
Using Using Decoupling Decoupling Methods Methods to to Reduce Reduce Using Decoupling Methods to Reduce Polynomial NARX Models Polynomial NARX Models Using Decoupling Methods to Reduce Polynomial NARX Models ∗ ∗∗ NARX Models David T.Polynomial Westwick Gabriel Hollander Kiana Karami ∗
∗ ∗∗ ∗ Gabriel Hollander David T. Westwick Karami ∗∗ ∗∗ ∗∗ Kiana David Hollander Schoukens ∗ Gabriel ∗∗ Kiana Karami ∗ ∗∗ David T. T. Westwick WestwickJohan Gabriel Hollander Kiana Karami ∗∗ Johan Schoukens Schoukens ∗ ∗∗ ∗∗ Kiana Karami ∗ David T. WestwickJohan Gabriel Hollander Johan Schoukens ∗ ∗∗ and Computer Engineering, Schulich School Johan Schoukens ∗ Department of Electrical ∗ Department of Electrical and Computer Engineering, Schulich School of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive NW, ∗ Department Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive NW, Engineering, University of Calgary, 2500 University Drive NW, ∗ of Calgary Alberta, T2N 1N4 Canada (e-mail:
[email protected], Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, 2500
[email protected], University Drive NW, Calgary Alberta, T2N 1N4 Canada (e-mail: Calgary Alberta, T2N 1N4 Canada (e-mail:
[email protected],
[email protected]) of Engineering, University of Calgary, 2500 University Drive NW, Calgary Alberta, T2N 1N4 Canada (e-mail:
[email protected],
[email protected]) ∗∗
[email protected]) Department Fundamental Electricity Instrumentation Calgary Alberta, of T2N 1N4 Canada (e-mail: and
[email protected], ∗∗
[email protected]) ∗∗ Department of Fundamental Electricity and Instrumentation Department of Fundamental Electricity and Instrumentation (ELEC), Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 ∗∗
[email protected]) Department ofUniversiteit Fundamental Electricity andPleinlaan Instrumentation (ELEC), Vrije Brussel (VUB), 2, 1050 (ELEC), Vrije Universiteit Brussel (VUB), Pleinlaan 2, ∗∗ Brussels, Belgium (e-mail:
[email protected], Department of Fundamental Electricity and Instrumentation (ELEC), Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 1050 Brussels, Belgium (e-mail:
[email protected], Brussels, Belgium (e-mail:
[email protected],
[email protected]) (ELEC), Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 Brussels, Belgium (e-mail:
[email protected],
[email protected])
[email protected]) Brussels, Belgium (e-mail:
[email protected],
[email protected])
[email protected]) Abstract: The polynomial NARX model, where the output is a polynomial function of past Abstract: The polynomial NARX model, where the output is a polynomial function of past Abstract: The polynomial NARX model, where the is polynomial function of inputs and outputs, is a commonly used equation model for systems. While it is Abstract: The polynomial NARXused model, whereerror the output output is aanonlinear polynomial function of past past inputs and outputs, is a commonly equation error model for nonlinear systems. While it is is inputs and outputs, is a commonly used equation error model for nonlinear systems. While it linear in the variables, which simplifies its identification, it suffers from two major drawbacks: Abstract: The polynomial NARX model, where the output is a polynomial function of past inputsinand outputs, is which a commonly used equation error model forfrom nonlinear systems. While itthe is linear the variables, simplifies its identification, it suffers two major drawbacks: the linear in the variables, which simplifies its identification, it suffers from two major drawbacks: the number of parameters grows combinatorially with the degree of the nonlinearity, and it is a black inputs outputs, is which a commonly used error model for nonlinear systems. While itthe is linear inand the variables, simplifies itsequation identification, it suffers from two major drawbacks: number of parameters grows combinatorially with the degree of the nonlinearity, and it is aa black number of parameters grows combinatorially with the degree of the nonlinearity, and it is black box model, which makes it difficult to its draw any insights from the identified model. Polynomial linear in the variables, which simplifies identification, it suffers from two major drawbacks: the number of parameters grows combinatorially with the degree of the nonlinearity, and it is a black box model, techniques which makes difficult to drawthe any insights from the identified model. Polynomial box which it difficult to any insights from model. decoupling areit to replace multiple-input single-output polynomial number of parameters grows combinatorially with the degree of the the identified nonlinearity, and Polynomial it is awith blackaa box model, model, which makes makes it used difficult to draw drawthe any insights from the identified model. Polynomial decoupling techniques are used to replace multiple-input single-output polynomial with decoupling techniques are used to replace the multiple-input single-output polynomial with decoupled polynomial structure comprising a transformation matrix followed by bank of SISOaa box model, which makes it difficult to draw any insights from the identified model. Polynomial decouplingpolynomial techniques structure are used to replace the multiple-input matrix single-output polynomial with decoupled comprising aa This transformation followed by bank of SISO decoupled polynomial structure comprising transformation matrix followed by bank of SISO polynomials, whose outputs are then summed. approach is demonstrated on two benchmark decoupling techniques are used to replace the multiple-input single-output decoupled polynomial structure comprising a This transformation matrix followedpolynomial bytwo bank ofwith SISOa polynomials, whose outputs are then summed. approach is demonstrated on benchmark polynomials, whose outputs are then summed. This approach is demonstrated on two benchmark systems: The Bouc-Wen friction model and the data from the Silverbox model. In both cases, decoupled polynomial structure comprising a transformation matrix followed by bank of SISO polynomials, whose outputs are then summed. This approach is demonstrated on two benchmark systems: The Bouc-Wen model and the data the Silverbox model. In both cases, systems: The Bouc-Wen friction model and the data from the Silverbox model. In both cases, the decoupling results in friction a substantial reduction in approach thefrom number of parameters, and allows some polynomials, whose outputs are then summed. This is demonstrated on two benchmark systems: The Bouc-Wen friction model and the data from the Silverbox model. In both cases, the decoupling results in aathe substantial reduction in the number of parameters, and allows some the decoupling results in substantial reduction in the number of parameters, and allows insight into the nature of nonlinearities in the system. systems: The Bouc-Wen friction model and the data from the Silverbox model. In both cases, the decoupling results in athe substantial reduction insystem. the number of parameters, and allows some some insight into the nature of nonlinearities in the insight into the nature of the nonlinearities in the system. the decoupling results substantial reduction insystem. the number of Elsevier parameters, and allows some insight into naturein ofaFederation the nonlinearities in the © 2018, IFACthe (International of Automatic Control) Hosting by Ltd. All rights reserved. Keywords: NARX Models, Polynomial Decoupling, insight intoNonlinear the natureSystem of the Identification, nonlinearities in the system. Keywords: Nonlinear Nonlinear System Identification, Identification, NARX Models, Models, Polynomial Decoupling, Decoupling, Keywords: Keywords: Nonlinear System System Identification, NARX NARX Models, Polynomial Polynomial Decoupling, Keywords: Nonlinear System Identification, NARX Models, Polynomial Decoupling, 1. INTRODUCTION where t is the discrete time index, u(t) and y(t) are the in1. INTRODUCTION INTRODUCTION where tt is the discrete time index, u(t) and are the in1. where is the discrete time u(t) and y(t) are input and output, respectively, and e(t) an y(t) equation error 1. INTRODUCTION where t isoutput, the discrete time index, index, u(t) is and y(t) are the the input and respectively, and e(t) is an equation error put and output, respectively, and e(t) is an equation error thatand is tassumed to be a sequence of independent identically System identification refers to the process of building put 1. INTRODUCTION where is the discrete time index, u(t) and y(t) are the inoutput, respectively, and e(t) is an equation error that is assumed to be a sequence of independent identically System identification refers to the the process of measurebuilding distributed that is assumed to be aa sequence of independent identically (IID) random variables. The nonlinear funcSystem identification refers to process of building mathematical models of dynamic systems using put and output, respectively, and e(t) is an equation error that is assumed to be sequence of independent identically System identification refers to the process of measurebuilding distributed random variables. The nonlinear funcmathematical models of dynamic systems using distributed (IID) random variables. The nonlinear funcF can(IID) take forms. Inofthe case of a Polynomial mathematical dynamic systems using ments ofidentification theirmodels inputs of and outputs, while making as few tion that is(·) assumed to many be a sequence independent identically distributed (IID) random variables. The nonlinear funcSystem refers to the process of measurebuilding mathematical models of dynamic systems using measuretion F (·) can take many forms. In the case of a Polynomial ments of their inputs and outputs, while making as few tion F (·) can take many forms. In the case of a Polynomial NARX (P-NARX) model, F (·) is polynomial function of ments of their inputs and outputs, while making as few a-priori assumptions about the system as possible (Ljung, distributed (IID) random variables. The nonlinear function F (·) can take many forms. In the case of a Polynomial mathematical models of dynamic systems using measurements of their inputs and outputs, while making as few NARX (P-NARX) model, F (·) is polynomial function of a-priori assumptions about the system system as possible possible (Ljung, NARX (P-NARX) model, F (·) is polynomial function its arguments (Peyton-Jones and Billings, 1989): a-priori assumptions about the as (Ljung, 1999). The identification of linear systems has been extention F (·) can take many forms. In the case of a Polynomial NARX (P-NARX) model, F (·) isBillings, polynomial function of of ments ofassumptions their inputsabout and outputs, while making as few its a-prioriThe the system as possible (Ljung, arguments (Peyton-Jones and 1989): 1999). identification of linear systems has been extenits arguments (Peyton-Jones and Billings, 1989): 1999). The identification of linear systems has been extensively studied, resulting inofa linear mature technology that(Ljung, is sup- NARX (P-NARX) model, F (·) isBillings, polynomial function of its arguments (Peyton-Jones and 1989): a-priori assumptions about the system as possible 1999). The identification systems has been extensively studied, studied, resulting in aa mature mature technology that is is supsup(u(t), . . . , u(t − nu ), y(t − 1), . . . , y(t − ny )) = sively resulting in technology that ported by accessible textbooks (Ljung, 1999; and itsF arguments (Peyton-Jones and 1999). The identification systems hasPintelon been extenF (u(t), . . . , u(t − nu ), y(tn− 1), .... ,, y(t − 1989): nyy )) = sively studied, resulting inofa linear mature technology that is supF (u(t), − 1), ..Billings, ported by accessible accessible textbooks (Ljung, 1999; Pintelon and ny− nu m.. .. .. ,, u(t M u 1), . . ported by textbooks (Ljung, 1999; Pintelon and Schoukens, 2001; Schoukens et al., 2012). However, most F (u(t), u(t − nu ), ), y(t y(t − . , y(t y(t − −n ny )) )) = = sively studied, resulting in a mature technology that is supn ported by accessible textbooks (Ljung, 1999; Pintelon and n y m M u ny n− Schoukens, 2001; behave Schoukens et al., al., 2012). 2012). However, most most m. . . , u(t M .c,p,q u 1), . . (k , k , . . . , km ) Schoukens, 2001; Schoukens et However, F (u(t), − n ), y(t y(t − n )) = physical systems nonlinearly. The identification of n 1 2 u y n y m M u ported bysystems accessible textbooks 1999; Pintelonmost and Schoukens, 2001; behave Schoukens et (Ljung, al., 2012). However, (k11 ,, k , . . . , km )) cp,q physical nonlinearly. The identification of 2 k physical systems behave nonlinearly. The identification of ny p =1 kp+1 ,...,k p,q (k 2 ,, .. .. .. ,, k m=1 p=0 nonlinear systems is an active research area, as can be seen nu m =1 c m k1 ,...,k M (k , k km c Schoukens, 2001; Schoukens et al., 2012). However, most physical systems behave nonlinearly. The identification of p,q 1 2 m) m=1 p=0 k ,...,k =1 k ,...,k =1 nonlinear systems is an active research area, as can be seen p+1 m=1 p=0 k11 ,...,kp nonlinear systems is research area, as be p,...,km p =1 kp+1 m =1 m in recentsystems survey (Kerschen etThe al., 2006; No¨ el seen and (k1 , k2 , . . . , k physical behave nonlinearly. identification of m=1 p=0 k1 ,...,kp =1 kp+1 ,...,km =1 cp,q nonlinear systemspapers is an an active active research area, as can can be m ) p m in recent survey survey papers (Kerschen et al., al., 2006; No¨ eissues l seen and p m u(t − k ) (2) in recent papers (Kerschen et 2006; No¨ e l and Kerschen, 2017; Schoukens et al., 2016) and special y(t − k ) m=1 p=0 k1 ,...,kp =1 kp+1 ,...,km =1 i i nonlinear systems is an active research area, as can be seen p m in recent survey papers (Kerschen et al., 2006; No¨ e l and y(t Kerschen, 2017; Schoukens et al., al., 2016) and and special special issues issues −k −k i ) u(t i ) (2) Kerschen, 2017; Schoukens et (Kerschen and Braun, (2) i=1 p y(t m u(t in recent survey papers2017). (Kerschen et al., 2006; No¨eissues l and Kerschen, 2017; Schoukens et al., 2016) 2016) and special y(t − −k kii )) i=p+1 u(t − −k kii )) (2) (Kerschen and Braun, Braun, 2017). i=1 i=p+1 (Kerschen and 2017). i=1 i=p+1 Kerschen, 2017; Schoukens et al., 2016) and special issues (Kerschen and AutoRegressive Braun, 2017). eXogenous Input (NARX) where M is the degree ofi=1 y(t − k ) u(t − k ) (2) i i i=p+1 The Nonlinear the polynomial, and the polynoThe Nonlinear AutoRegressive eXogenous Input (NARX) (Kerschen and AutoRegressive Braun, 2017).(2013) where M is the degree of the polynomial, the polynoi=1 i=p+1and The Nonlinear eXogenous Input (NARX) model developed by Billings is widely used in modwhere M is the degree of the polynomial, and the polynomial coefficient, c (k , . . . , k ), multiplies the product p,q 1of the polynomial, p+q The Nonlinear AutoRegressive eXogenous Input (NARX) M is the degree and the the product polynomodel developed byidentification Billings (2013) (2013) is widely widely used in modmod- where mial coefficient, ccp,q ,, .. .. − .. ,, k ), multiplies model developed by Billings is used in p,q (k 1y(t p+q elling and system applications. The(NARX) coefficient, (k k ), multiplies of p delayed outputs, − k2 ) . and . . y(tthe − product kpolyno1 p+q The Nonlinear AutoRegressive eXogenous Input 1 )y(t p ) and model developed byidentification Billings (2013) is widely used inoutput mod- mial where M is the degree of the polynomial, the mial coefficient, c (k , . . . , k ), multiplies the product elling and system applications. The output p,q 1 p+q of p delayed outputs, y(t − k )y(t − k ))...u(t .. .. y(t − k elling and system applications. The output 1 2 p ) and of a NARX modelbyidentification isBillings given by(2013) of p delayed outputs, y(t − k )y(t − k y(t − k qmial = m − p delayed inputs, u(t − k ) . . − k ). 1 2 model developed is widely used in modp+1 m elling and system identification applications. The output coefficient, cp,qinputs, (k1y(t , . .− .u(t , kp+q ), − multiplies the of pm delayed outputs, )y(t k.2.)..u(t . . y(t −mproduct k).pp )) and and of aa NARX NARX model model is is given given by by 1− q = − p delayed k ) − k of p+1 q = m − p delayed inputs, u(t − k ) . . . u(t − k ). p+1 elling and system of a NARX model is given by applications. The output of delayed outputs, y(t −u(t k1− )y(t − )k.2P-NARX . . y(t k).p ) and qOne =p m p delayed inputs, .)..u(t − k−m identification m of−the major advantages ofkp+1 the model is of a NARX given by. . . , u(t − nu ), of major advantages the P-NARX is y(t) = model F u(t),isu(t − 1), qOne = mits −the p delayed inputs, u(t −of kp+1 ) .the . . u(t − kmmodel ). As One of the major advantages of the P-NARX model is that output is a linear function of parameters. y(t) = =F F u(t), u(t), u(t u(t − − 1), 1), .. .. .. ,, u(t u(t − −n nuu ), One of the major advantages of the P-NARX modelAs is y(t) ), that its output is a linear function of the parameters. that its output is a linear function of the parameters. As such, the c (k , . . . , k ) may be estimated using a linear y(t) = F u(t), u(ty(t −− 1),1), . . . ,. u(t −− nun),y ) + e(t) (1) One . , y(t p,q 1 m of the major advantages of the P-NARX model is that its output is a linear function of the parameters. As such, the c (k , . . . , k ) may be estimated using a linear e(t) (1) .. ,, y(t 1 m such, the ccp,q (k ,, .. a.. .. linear ,, k may estimated using a linear y(t) = F u(t), u(ty(t −− 1),1), . . . ,.. u(t −− nun ),yy )) + regression. However, due to itsbe structure, the number of + e(t) (1) y(t − 1), y(t − n p,q 1is m) that its output function of the parameters. As such, the (k k ) may be estimated using a linear y(t − 1), . . . , y(t − ny ) + e(t) (1) regression. p,q 1 m to its structure, the number of However, due regression. However, to number of increases combinatorially with the Supported by the Natural such, the cp,q (k1 , . . . , kdue estimated using a linear regression. However, due to its itsbestructure, structure, the number of (1) parameters y(t −Sciences 1), . . . ,and y(t − ny ) + e(t) m ) may Engineering Research parameters increases combinatorially with the number of parameters increases combinatorially with the number of past input and output values, n and n , respectively Supported by the Natural Sciences and Engineering Research u y regression. However, due to its structure, the number of Council of Canada through and grantEngineering RGPIN/06464-2015 Supported by the (NSERC) Natural Sciences Research parameters increases combinatorially with past input and output values, n and n , respectively Supported by the (NSERC) Natural Sciences and Engineering Research u y past input and output values, n and n , respectively Council of Canada through grant RGPIN/06464-2015 and the maximum polynomial degree, M . As a result, the u y and the Fund for Scientific Research (FWOVlaanderen), by the Council of Canada (NSERC) through grant RGPIN/06464-2015 parameters increases combinatorially with the number of past the input and output values,degree, nu and respectively y, a and maximum polynomial M ..nAs result, the Council of Canada (NSERC) through grantEngineering RGPIN/06464-2015 Supported by the NaturalResearch Sciences and Research and Fund Scientific (FWOVlaanderen), the and the maximum polynomial degree, M As aarespectively result, the linear regression can become very large and ill-conditioned. ERC Advanced Grant SNLSID under contract and byby FWO and the the Fund for for Scientific Research (FWO- 320378 Vlaanderen), by the past input and output values, n and n , and the maximum polynomial degree, M . As result, the u y and the Fund for Scientific Research (FWOby the Council of Canada (NSERC) through grant Vlaanderen), RGPIN/06464-2015 linear regression can become very large and ill-conditioned. ERC Grant linear regression can become very large and ill-conditioned. project G.0280.15N. The standard approach is to use an orthogonal forward ERC Advanced Advanced Grant SNLSID SNLSID under under contract contract 320378 320378 and and by by FWO FWO and maximum degree, M . As a result, the linear regression canpolynomial become very large and ill-conditioned. ERC Advanced Grant SNLSID under contract and byby FWO and the Fund for Scientific Research (FWO- 320378 Vlaanderen), the project G.0280.15N. The the standard approach is to to use an orthogonal orthogonal forward project G.0280.15N. The standard approach is use an linear regression can become very large and ill-conditioned. project G.0280.15N. The standard approach is to use an orthogonal forward forward ERC Advanced Grant SNLSID under contract 320378 and by FWO project G.0280.15N. The standard is to use an orthogonal forward Copyright © 2018, 2018 IFAC 796Hosting 2405-8963 © IFAC (International Federation of Automatic Control) by Elsevierapproach Ltd. All rights reserved.
Copyright © 2018 IFAC 796 Copyright 2018 responsibility IFAC 796Control. Peer review© of International Federation of Automatic Copyright ©under 2018 IFAC 796 10.1016/j.ifacol.2018.09.133 Copyright © 2018 IFAC 796
2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden
David T. Westwick et al. / IFAC PapersOnLine 51-15 (2018) 796–801
regression algorithm to incrementally build up the model until a desired accuracy has been achieved (Billings, 2013). One major disadvantage of NARX models, in general, is that they are black-box models. While they are useful for predicting or simulating the response of the system, they do not provide any insight into the system’s structure or internal functions. Dreesen et al. (2015a) developed an algorithm for decoupling multiple-input multiple-output (MIMO) polynomial functions, which applied tensor factorization techniques to a tensor constructed from the polynomial’s Jacobian matrix evaluated at multiple operating points. The MIMO polynomial is replaced by a transformation matrix, followed by a bank of single-input single-output (SISO) polynomials, followed by a second transformation matrix. This approach has been used to decouple the systems in a parallel Wiener-Hammerstein model (Dreesen et al., 2015b), to decouple MIMO systems in control design (Stoev et al., 2017), and to decouple the polynomials in a polynomial nonlinear state-space model (Esfahani et al., 2018). In this paper, we will develop a similar algorithm for decoupling multiple-input single-output polynomials, and in particular P-NARX models into a structure consisting of a transformation matrix followed by a bank of SISO polynomials, whose outputs are summed to yield the output signal. 1.1 Notation Lower and upper case letters in a regular type-face, a, A, will refer to scalars, bold faced lower case letters, a, refer to vectors, matrices are indicated by bold faced upper case letters, A, and bold faced calligraphic script will be used for tensors, A. The outer, or tensor, product will be denoted ⊗. 2. DECOUPLED POLYNOMIALS T
Let u = [ u1 u2 . . . um ] be a vector containing the m inputs to a function, and let y(u) be a polynomial function of its elements. Thus y(u) = y0 +
m
α i 1 u i1 +
i1 =1
+
m
i1 =1
...
m m
αi1 ,i2 ui1 ui2 + . . .
i1 =1 i2 =i1 m
αi1 ,...,in ui1 . . . uin
(3)
in =in−1
Let yk = y(u(k)) be the output of the polynomial due to the input u(k), k = 1, . . . , N , and collect the inputs and outputs at these N operating points into the following: U = [ u(1) u(2) . . . u(N ) ] T
y = [ y(1) y(2) . . . y(N ) ]
Consider approximating the unstructured polynomial y(u), with the decoupled structure, shown in Fig. 1, consisting of a bank of r univariate polynomial functions, each of which transforms a linear combination of the inputs: 797
u1 .. . um
y(u) →
797
x1
u1 .. . um
V
T
g1 (x1 )
.. . xr
.. . gr (xr )
y(u)
Fig. 1. Decoupling a multiple-input nonlinearity into r SISO polynomial branches, preceded by a transformation matrix, V . yˆ(u) = c0 +
r
gi (vi T u)
(4)
i=1
gi (xi ) =
n
cj,i xji
(5)
j=1
where the vi ∈ m×1 . Let V = [ v1 v2 . . . vr ], and define the intermediate signals: (6) xi (k) = vi T u(k) Construct Vandermonde matrices (7) Xi = xi x2i . . . xni where T xi = [ xi (1) xi (2) . . . xi (N ) ] and the powers are applied element by element. The output of the decoupled polynomial may be written as: yˆ = Xc (8) where X = [ 1N X1 X2 . . . Xr ], 1N is an N element column of 1s, and c ∈ (rn+1) is a vector containing the polynomial coefficients: c = [ c0 c1,1 . . . cn,1 c1,2 . . . cn,r ]
T
2.1 Estimation of the Decoupled Model Given the polynomial inputs and outputs at N different operating points, U and y, the objective is to find estimates of V and c that minimize the norm of the error VN (V , c) = y − yˆ(V , c)22 Note from (8) that yˆ is a linear function of the polynomial coefficients, c. However, the regression matrix, X depends nonlinearly on the elements of V , due to the Vandermode matrices (7). Thus, VN may be minimized using a quasi-Newton algorithm such as the Gauss-Newton or Levenberg-Marquardt algorithms (Ljung, 1999). Since the polynomial coefficients, c, appear linearly in the model output, they can be written in closed-form as functions of the remaining parameters, V . Thus, one could optimize the separated cost function: Vˆ = arg min VN (V , c(V )) (9) V
with respect to only V , using the Variable Projection Algorithm (Golub and Pereyra, 2003) to account for the dependence of c on V . Note also that there are r redundant degrees of freedom, in that any scaling applied to one of the vectors vi , can be removed by scaling the corresponding polynomial coefficients. This problem also appears in the identification of block oriented models, such as the Wiener and Hammerstein models (Giri and Bai, 2010). Thus, the vectors vi should be normalized after each iteration, to prevent numerical problems due to scaling.
2018 IFAC SYSID 798 July 9-11, 2018. Stockholm, Sweden
David T. Westwick et al. / IFAC PapersOnLine 51-15 (2018) 796–801
Regardless of whether a classical quasi-Newton algorithm, or a separable least squares based approach is used, the result is a non-convex optimization which must be solved iteratively. The result may be sensitive to the initial solution used to start the iterations. In the following subsections, we develop methods for this initialization. 2.2 Jacobian Based Decoupling Following the algorithm proposed by Dreesen et al. (2015a) for decoupling MIMO polynomials, compute the Jacobian of the output, y, with respect to its inputs, u: r
Jy =
∂y(u) T = gi (vi u)vi T ∂u i=1
(10)
where gi is the derivative of gi with respect to its argument. Let u(k) and yi (u(k) ) for k = 1, . . . , N be a collection of operating points, where the function has been evaluated. Compute the Jacobian at each of these operating points, and stack the results to form the matrix: J = V HT (11) where H is a N × r matrix whose entries are the r derivatives of the SISO polynomials, evaluated at the N operating points. H(i, j) = gj (vj T ui ) (12) and V ∈ m×r is a matrix whose columns are the vi .
Dreesen et al. (2015a) considered MIMO polynomials, resulting in Jacobian matrices at each operating point, which were then stacked to form a 3-way tensor which could then be uniquely factored using the Canonical Polyadic Decomposition. In this case, the Jacobian in (11) is a matrix, so no unique decomposition exists. Indeed, matrix decompositions such as QR and SVD enforce additional structure on the matrix factors in order to guarantee uniqueness. 2.3 Factoring of the Hessian This section will use an approach based on tensor factorization to generate an initial estimate for the separable least squares optimization (9). The Hessian of the output with respect to the input variables generates a matrix at each operating point. Evaluating the Hessian at multiple operating points results in a three-way tensor. Evaluate the Hessian, with respect to the inputs, of the decoupled model (4) at the operating point u: r ∂ 2 y(u) T Hy (u) = = g (vi u)vi vi T (13) ∂u2 i=1 If the Hessian is evaluated at N different operating points, the results may be stacked into a 3 dimensional tensor, whose entries are given by: r g (v T u(k))v (i)v (j) (14) H(i, j, k) = =1
Define the vector w , for = 1, . . . , r, which contains the second derivative of the polynomial in branch , evaluated at the operating points: T w = g (v T u(1)) . . . g (v T u(N )) (15)
798
then (14) may be rewritten as: r H= v ⊗ v ⊗ w
(16)
=1
This is the Canonical Polyadic Decomposition (CPD) (Kolda and Bader, 2009) of the Hessian tensor, which is often abbreviated as H = V , V , W (17) where the w are the columns of the matrix W ∈ N ×r . There are well developed tools, such as those included in the tensorlab toolbox (Vervliet et al., 2016) for computing the CPD. 3. DECOUPLED POLYNOMIAL NARX MODELS The discussion in the previous section dealt with a polynomial function of an aribitrary vector of variables, u, which was then evaluated at an arbitrary series of operating points, y(u(k)). In the case of a P-NARX model, as shown in (2), the vector of inputs, u(t) contains past inputs and outputs: u(t) = [ u(t − nk ) . . . u(t − nk − nu + 1)
y(t − 1) . . . y(t − ny ) ]
T
(18)
where nu and ny are the number of past input and output samples used in the model, and nk ≥ 0 is the input delay. The operating points will consist of the N samples of identification data. Given the model structure (number of past inputs, outputs and polynomial degree), the following algorithm identifies a decoupled polynomial NARX model from input/output data. (1) Fit a polynomial NARX model to the data, using standard approaches such as an orthogonal forward regression algorithm. (Billings, 2013). (2) Evaluate the Hessian of the coupled model output, and compute its CPD (17). (3) Use the V factor from the CPD as an initial estimate of the input mixing matrix in the decoupled model, see Fig. 1, and refine it using the SLS based optimization (9). Note that the polynomial coefficients will be estimated during each step of the optimization using ordinary least squares. 4. BENCHMARK RESULTS The polynomial NARX uncoupling algorithm was applied to data from two benchmark problems: A simulation of a Bouc-Wen friction model (Schoukens and No¨el, 2017), and measurement data from the Silverbox model (Wigren and Schoukens, 2013). 4.1 Silverbox Data The Silverbox system can be interpreted as an electronic implementation of a Duffing oscillator, comprising a 2nd order closed-loop system with a cubic nonlinearity in feedback, as shown in Fig. 2. This model represents a nonlinear mechanical resonating system with a moving mass m, a viscous damping d, and a nonlinear spring k(y)
David T. Westwick et al. / IFAC PapersOnLine 51-15 (2018) 796–801
(Wigren and Schoukens, 2013). The relation between the input u(t) and output y(t) is given by: d2 y(t) dy(t) +d + k(y(t))y(t) = u(t), dt dt together with the nonlinear spring characteristic: k(y(t)) = a + by 2 (t). m
(19)
Magnitude (dB)
2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden
Filters on Input: u(t)
0
-10 path 1 path 2 path 3 path 4
-20
(20)
799
Magnitude (dB)
-30 1 10
10
2
10
3
10
3
Filters on Output: y(t)
10 0
-10
Fig. 2. Block Diagram of the Silverbox system
-20 -30 1 10
Data sets for this benchmark are described in Wigren and Schoukens (2013). The input signal consists of two parts: the first 40000 samples, used for model validation, is a white Gaussian noise sequence with a linearly increasing amplitude, filtered by a 9th order discrete time Butterworth filter with a 200Hz cut-off frequency. This is followed by 10 realizations of a random-phase odd multi-sine, each of which is 8192 samples long and separated from the previous segment by 100 zeros. The first 9 realizations were used for the identification, while the last was used as a testing sequence for model structure selection.
Fig. 4 shows the simulation and prediction errors in the triangular validation data, using the 4 branch model. The model fits were 98.81% and 99.76% in simulation and one-step-ahead prediction, respectively. Note that the performance of the predictor was similar to that achieved in testing data. 0.3
Polynomial NARX models, with third degree nonlinearities and between 1 and 5 lagged inputs and outputs, were fitted to the system. A model with 3 lagged inputs, 3 lagged outputs and no input delay produced the best predictions in the test data. Decoupled P-NARX models, with 3 lagged inputs and outputs, were then fitted to the Silverbox data. Results are summarized in Table 1.
0.2
Number of Parameters 10 19 28 37 46 45
Amplitude (V)
where y0 is the output mean.
Polynomial Degree 3 3 3 3 3 5
2
Fig. 3. Frequency Response magnitudes for the input (top) and output (bottom) terms of the filters in the 4 branch model identified from the Silverbox data branches as recursive filters. Thus, the responses of the input and output terms are presented separately.
Model accuracy was assessed using the RMS model fit: Nt 2 (y(t) − yˆ(t)) (21) Fit = 100 × 1 − t=1 Nt 2 t=1 (y(t) − y0 )
Number of Branches 1 2 3 4 5 4
10
Frequency (rad/sec)
Output From Validation Segment
0.1 0
-0.1 -0.2
Accuracy (Fit Percent) 98.96 99.78 99.79 99.81 99.78 99.81
Output Simulation Error x10 Prediction Error x10
-0.3 -0.4
10
20
30
40
Time (sec)
50
60
Fig. 4. Prediction and Simulation Errors on the Validation Data. Errors are scaled up 10x to enhance visibility
Table 1. Accuracy of Decoupled NARX Model Structures in the Test Segment.
4.2 Bouc-Wen Benchmark Model Using third degree nonlinearities, the prediction accuracy in the test data started to decrease after the 4th branch was added to the model. Similarly, increasing the polynomial degree also reduced the model accuracy in the test segment. Thus, a 4 branch model with 3rd degree polynomial nonlinearities was used. Fig. 3 shows the magnitude frequency responses of the FIR filters applied to the input and output terms in each of the 4 branches. Since all branches receive the system output (rather than the output of any of the linear elements), it is not instructive to consider the 799
The second example uses data from the Bouc-Wen benchmark system, described in detail in Schoukens and No¨el (2017). Briefly, the system consists of a single mass, spring and damper with a hysteretic friction term. In continuoustime, the system is governed by the second-order differential equation: mL y¨(t) + cL y(t) ˙ + kL y(t) + z(y(t), y(t)) ˙ = u(t) (22) where u(t) and y(t) are the applied force and resulting displacement, mL , kL , and cL are the linear mass, stiffness
David T. Westwick et al. / IFAC PapersOnLine 51-15 (2018) 796–801
coefficient and viscous damping coefficient, respectively, and the hysteretic force z(t) obeys the first-order differential equation: ν−1 z(y(t), ˙ y(t)) ˙ = αy(t) ˙ − β(γ|y(t)||z(t)| ˙ z(t)+ ν δ y(t)|z(t)| ˙ )
(23)
The data are generated by integrating Eqs (22) and (23) using Newmark integration at a sampling rate of 15,000 Hz. The data are then low-pass filtered and down sampled to 750Hz. This system was chosen because its nonlinear behaviour depends on an internal state, z(t), which is not directly measurable. Thus, an input-output model, such as the decoupled NARX structure, must reconstruct the internal state from input/output measurements in order to model the system’s nonlinear behaviour. Two sets of data were generated using instances of the default multi-sine input excitation but with the RMS force increased to 55N , as suggested in Esfahani et al. (2018), so that the identification data would be richer than the two validation sequences. The first dataset was used for parameter estimation, while the second was used for structure selection. The identified models were then tested on the two validation sets provided with the benchmark: a multi-sine input and a swept sine input. The validation results are to be reported as the (un-normalized) RMS error: Nt 1 ERM St = (y(t) − yˆ(t))2 (24) Nt t=1
where yˆ(t) is the output of the model, y(t) is the output provided in the validation data, and Nt is the total number of points in the validation signal.
Pruned NARX Model An orthogonal least squares pruning algorithm was used to fit P-NARX models to the identification data. All models had third-degree polynomial nonlinearities including all possible cross-terms. The number of delayed input and output terms was scanned from 0 to 16 of each type, while the input delay was scanned from 0 to 2 samples. The model that achieved the lowest prediction error on the testing data was selected as the best model structure. This model had 15 input terms, with delays of 0 through 14 samples, and 15 delayed output terms. A full third-order polynomial model would have included 5456 terms, but the pruned model retained only 638 coefficients. Decoupled NARX Models The Hessian of the pruned NARX model was computed. Since there were 15 delayed inputs and 15 delayed outputs, and the dataset contained 40,960 data-points, the Hessian was a 30 × 30 × 40, 960 tensor. Decoupled models with r branches were obtained by computing the rank r CPD of the Hessian, using the CPD operator in tensorlab (Vervliet et al., 2016), which was then used to initialize the separable least squares optimization (9). Results are summarized in Table 2. The model that resulted in the most accurate 1-step ahead predictions in the testing data was selected. The final model was then validated using the two validation datasets provided with the benchmark: the multi-sine and the swept-sine input signals. 800
Number of Branches 4 5 6 5 5 5 5
Polynomial Degree 3 3 3 5 7 9 11
Number of Parameters 133 166 199 176 196 206 226
Accuracy (Fit Percent) 98.30 98.46 98.33 98.49 98.54 98.55 98.50
Table 2. Accuracy of Decoupled NARX Model Structures in the Testing Data The elements of the 5-branch model with degree 9 polynomial nonlinearities are shown in Figs. 5 and 6. The polynomial nonlinearities are shown in Fig. 5. For display purposes, the inputs and outputs of the nonlinearities have been normalized, and the 5 curves are displayed with vertical offset, so that they can be seen individually. Fig. 6 shows the magnitude of the frequency responses of the input and output terms in the 5 filters. Univariate Nonlinearities (offset) path 1 path 2 path 3 path 4 path 5
Output (normalized)
2018 IFAC SYSID 800 July 9-11, 2018. Stockholm, Sweden
0
Input (normalized)
Fig. 5. Nonlinearities of the 5 branch model. The linear terms have been removed to emphasize the nonlinear aspects of the functions. Their inputs and outputs have been normalized, and the curves have been offset, to facilitate comparison All of the nonlinearities, as shown in Fig. 5 appear to have a “dead-zone” like behaviour, as they are all flat near 0. The input terms in the linear filters all display a notch near 1000 rad/sec. The low frequency behaviour varies from branch to branch. All of the output filters have a wide peak near 700 rad/sec. As with the input terms, the low-frequency behaviour varies from branch to branch. Table 3 lists the results of the validation conducted with both the multi-sine and swept sine inputs, showing both the prediction and simulation accuracy of the identified model. Fig. 7 superimposes the prediction and simulation errors on top of the output from swept sine validation data. 5. CONCLUSIONS The decoupled NARX model has the potential, as demonstrated in the presented examples, to dramatically reduce the number of parameters in a NARX model, and allows
2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden
Filters on Input: u(t)
10
-10 -20 -30 -40 10 1
10 2
10 3
Filters on Output: y(t)
10
Magnitude (dB)
0
-10 -20 -30 -40 10 1
10 2
10 3
Frequency (rad/sec)
Fig. 6. Frequency Response magnitudes for the input (top) and output (bottom) terms of the filters in the 5 branches
Simulation Prediction
Multi-Sine ERM St Fit % 5.36 × 10−5 91.97 9.71 × 10−6 98.54
Sine-Sweep ERM St Fit % 1.67 × 10−5 97.48 3.48 × 10−6 99.48
Table 3. Results of the validation experiment with the final model. Model accuracy is defined both using the un-normalized RMS error, ERM St (24), and as the Fit Percent (21).
2
×10 -3
Swept Sine Validation Output
Displacement (m)
1.5 1 0.5 0
-0.5 -1 Displacement Simulation Error (x10) Prediction Error (x10)
-1.5 -2
0
50
100
Time (sec)
801
REFERENCES
path 1 path 2 path 3 path 4 path 5
0
Magnitude (dB)
David T. Westwick et al. / IFAC PapersOnLine 51-15 (2018) 796–801
150
200
Fig. 7. Prediction and Simulation Errors on the Swept-Sine Validation Data, scaled up 10x for comparison. for the inclusion of high-degree nonlinearities without producing a combinatorial increase in the model complexity. The primary disadvantage of this structure, is that the model is nonlinear in the parameters. Thus, the identification requires a non-convex optimization. However, tensor factorization methods may be used to produce a useful initial estimate of the system. 801
Billings, S. (2013). Nonlinear System Identification: NARMAX Methods in the Time, Frequency and SpatioTemporal Domains. Wiley. Dreesen, P., Ishteva, M., and Schoukens, J. (2015a). Decoupling multivariate polynomials using first-order information and tensor decompositions. SIAM J. Matrix Anal. Appl. , 36(2), 864–879. Dreesen, P., Ishteva, M., and Schoukens, J. (2015b). Recovering Wiener-Hammerstein nonlinear state-space models using linear algebra. In IFAC Sys. Ident. Symp., 951–956. Esfahani, A.F., Dreesen, P., Tiels, K., No¨el, J.P., and Schoukens, J. (2018). Parameter reduction in nonlinear state-space identification of hysteresis. Mech. Sys. Sig. Pro., 104, 884–895. Giri, F. and Bai, E.W. (eds.) (2010). Block-Oriented Nonlinear System Identification. Lecture Notes in Control and Information Sciences. Springer, Berlin. Golub, G. and Pereyra, V. (2003). Separable nonlinear least squares: the variable projection method and its applications. Inverse Problems, 19, R1–R26. Kerschen, G. and Braun, S. (2017). Special issue on the identification of nonlinear mechanical systems. Mechanical Systems and Signal Processing, 84 Part B, 1. Kerschen, G., Worden, K., Vakakis, A., and Golinval, J. (2006). Past, present and future of nonlinear system identification in structural dynamics. Mech. Sys. Sig. Pro., 20, 505–592. Kolda, T. and Bader, B. (2009). Tensor decompositions and applications. SIAM Rev., 51(3), 455–500. Ljung, L. (1999). System Identification: Theory for the User. Prentice Hall, Upper Saddle River, NJ, 2nd Ed. No¨el, J.P. and Kerschen, G. (2017). Nonlinear system identification in structural dynamics: 10 more years of progress. Mech. Sys. Sig. Pro., 83, 2–35. Peyton-Jones, J.C. and Billings, S.A. (1989). Recursive algorithm for computing the frequency response of a class of nonlinear difference equation models. Int. J. Control, 64(6), 1925–1940. Pintelon, R. and Schoukens, J. (2001). System Identification: A Frequency Domain Approach. IEEE Press, Piscataway, NJ. Schoukens, J., Vaes, M., and Pintelon, R. (2016). Linear system identification in a nonlinear setting. IEEE Control Systems Magazine, 38–69. Schoukens, J., Pintelon, R., and Rolain, Y. (2012). Mastering System Identification in 100 Exercises. Wiley. Schoukens, M. and No¨el, J. (2017). Three benchmarks addressing open challenges in nonlinear system identification. In IFAC World Congress, 448–453. Stoev, J., Ertveldt, J., Oomen, T., and Schoukens, J. (2017). Tensor methods for MIMO decoupling and control design using frequency response functions. Mechatronics, 45, 71–81. Vervliet, N., Debals, O., Sorber, L., Van Barel, M., and De Lathauwer, L. (2016). Tensorlab 3.0. URL http://www.tensorlab.net. Available online. Wigren, T. and Schoukens, J. (2013). Three free data sets for development and benchmarking in nonlinear system identification. In Eur. Control Conf., 2933–2938.