Preprints, IFAC Conference Control of 1st Nonlinear Systems on Modelling, Identification and Preprints, 1st IFAC Conference Conference on Modelling, Modelling, Identification Identification and and Preprints, IFAC on Control of 1st Nonlinear Systems June 24-26, 2015. Saint Petersburg, Russia Preprints, 1st IFAC Conference on Modelling, and AvailableIdentification online at www.sciencedirect.com Control of Systems Control of Nonlinear Nonlinear Systems June 24-26, 2015. Saint Petersburg, Russia Control of Nonlinear Systems June 24-26, 24-26, 2015. Saint Saint Petersburg, Russia Russia June 2015. Petersburg, June 24-26, 2015. Saint Petersburg, Russia
ScienceDirect
IFAC-PapersOnLine 48-11 (2015) 814–818 Application of Rank-Constrained Application of Rank-Constrained Application of Rank-Constrained Optimisation to Nonlinear System Application of Rank-Constrained Optimisation to Nonlinear System Optimisation to Nonlinear System Identification Optimisation to Nonlinear System Identification Identification Identification Ram´ on A. Delgado ∗∗ Juan C. Ag¨ uero ∗∗ ∗∗
Ram´ n A. Delgado JuanM. C.A. Ag¨ uero ∗ ∗∗∗ ∗ ∗∗ Graham C. o Goodwin M. Mendes ∗ Juan ∗∗ Ram´ o n A. Delgado C. Ag¨ u ero ∗ Eduardo ∗∗∗ Ram´ o n A. Delgado C. Ag¨ u ero ∗ Juan ∗∗ Graham C. Goodwin Eduardo M. A. M. Mendes Ram´ n A. Delgado JuanM. C.A. Ag¨ uero ∗ ∗∗∗ ∗ ∗∗∗ Graham C. oGoodwin Goodwin Eduardo M. Mendes Graham C. Eduardo M. A. M. Mendes ∗Graham C. Goodwin ∗ Eduardo M. A. M. Mendes ∗∗∗ School of Electrical Engineering and Computer Science, The ∗ Engineering and Computer The ∗ School of Electrical University of Newcastle, Australia Science, ∗ of Electrical Engineering and Computer The School Engineering and Computer Science, The ∗ School University of Newcastle, Australia Science, School of of Electrical Electrical Engineering and Computer Science, The
[email protected],
[email protected] University of Newcastle, Australia University of Newcastle, Australia
[email protected],
[email protected] ∗∗ University of Newcastle, Australia Department of Electronic Engineering, Universidad T´ e cnica
[email protected],
[email protected] ∗∗
[email protected],
[email protected] of Electronic Engineering, Universidad ecnica
[email protected],
[email protected] ∗∗ Department Federico Santa Mar´ ıa, Valpara´ ıso, Chile T´ ∗∗ Department of Electronic Engineering, Universidad T´eecnica cnica Department of Electronic Engineering, Universidad T´ ∗∗ Federico Santa Mar´ ıa, Valpara´ ıso, Chile Department of Electronic Engineering, Universidad T´ ecnica
[email protected] Federico Santa Mar´ ıa, Valpara´ ıso, Chile Federico Santa Mar´ ıa, Valpara´ ıso, Chile
[email protected] ∗∗∗ Federico Santa Mar´ ıa, Valpara´ Chile Electrˆ onica, ıso, Universidade Federal de
[email protected] ∗∗∗ Departamento de Engenharia
[email protected] Departamento de Engenharia Electrˆ oBrazil nica, Universidade Federal de
[email protected] ∗∗∗ Minas Gerais, ∗∗∗ Departamento de Engenharia Electrˆ oBrazil nica, Universidade Universidade Federal de de Electrˆ o nica, ∗∗∗ Departamento de Engenharia Minas Gerais, Departamento de Engenharia Electrˆ oBrazil nica, Universidade Federal Federal de
[email protected] Minas Gerais, Minas Gerais, Brazil
[email protected] Minas Gerais, Brazil
[email protected] [email protected] [email protected] Abstract: Nonlinear System identification has a rich history spanning at least 5 decades. A Abstract: has aupon rich history at least decades. A very flexibleNonlinear approachSystem to this identification problem depends the usespanning of Volterra series55 expansions. Abstract: Nonlinear System identification has aaupon rich history spanning at least decades. A Abstract: Nonlinear System identification has rich history spanning at least 5 decades. A very flexible approach to this problem depends the use of Volterra series expansions. Abstract: Nonlinear identification has aupon rich history spanning least 5 expansions. decades. A Related work includes System Hammerstein models, where a static nonlinearity isatfollowed by a linear very flexible approach to this problem depends the use of Volterra series very flexible approach to this problem depends upon the use of Volterra series expansions. Related work includes Hammerstein models, where a static nonlinearity is followed by a linear very flexible approach to this problem depends upon the use of Volterra series expansions. dynamical system, and Wiener models, where a static nonlinearity is inserted after a linear Related work includes Hammerstein models, where static nonlinearity is followed followed by aaa linear linear Related work includes Hammerstein models, where aa static nonlinearity is by dynamical system, Wienerwith models, where a static nonlinearity is inserted after Related work includes Hammerstein models, where static nonlinearity is followed by aa linear linear dynamical model. Aand problem these methods isa that they inherently depend upon series dynamical system, and Wiener models, where aa static nonlinearity is inserted after system, and Wiener models, where static nonlinearity is inserted after a linear dynamical model. A problem with these methods is that they inherently depend upon series dynamical system, and Wiener models, where a static nonlinearity is inserted after a linear type expansions and hence it is difficult to know which terms should be included. In this paper we dynamical model. A problem with these methods is that they inherently depend upon series dynamical model. A problem with these methods is that they inherently depend upon series type expansions and hence it is difficult to know which terms should be included. In optimization. this paper we dynamical model. A problem with these methods is that they inherently depend upon series present a possible solution to this problem using recent results on rank-constrained type expansions and hence to it is is difficult to know know which terms should be included. included. In In optimization. this paper paper we we type expansions and hence it difficult to which terms should be this present a possible solution this problem using recent results on rank-constrained type expansions and hence it is difficult to know terms should be included. In optimization. this paper we Simulation results are included toproblem illustrate thewhich efficacy of the proposed strategy. present a possible solution to this using recent results on rank-constrained present a possible solution to this problem using recent results on rank-constrained optimization. Simulation resultssolution are included illustrate the recent efficacy of theonproposed strategy. optimization. present a possible to thisto problem using results rank-constrained Simulation are included to illustrate the efficacy of the proposed Simulation results are included to illustrate the efficacy of the proposed strategy. © 2015, IFACresults (International Federation of Automatic Control) Hosting by Elsevierstrategy. Ltd. All rights reserved. Simulation results are included to illustrate the efficacy of the proposed strategy. 1. INTRODUCTION is often termed the Bias-Variance problem. Specifically, 1. INTRODUCTION is often termed theterms Bias-Variance problem. Specifically, including too few will lead to deterministic bias 1. INTRODUCTION INTRODUCTION is often termed termed theterms Bias-Variance problem. Specifically, 1. is often the Bias-Variance problem. Specifically, including too few will lead to deterministic 1. INTRODUCTION is often termed the Bias-Variance problem. Specifically, errors whereas including too lead manytoterms will leadbias to System identification tools are frequently used to fit mod- including too few terms will deterministic including too terms will to deterministic bias whereas including too lead many will leadbias to System identification obtained tools are frequently used to fitThere mod- errors including too few few terms will lead toterms deterministic bias variance errors due to the impact of noise on the parameter els to experimentally data from aused system. errors whereas including too many terms will lead to System identification tools are frequently to fit moderrors whereas including too many terms will lead System identification tools are frequently used to fit modvariance errors due to the impact of noise on the parameter els to experimentally obtained data from a system. There errors whereas including too many terms will lead to to System identification tools are frequently used toPredomifitThere mod- estimates. is atovast literature inobtained the topic (Ljung, 1999). variance errors due to the impact of noise on the parameter els experimentally data from a system. variance errors due to the impact of noise on the parameter elsato tovast experimentally obtained data from 1999). a system. system. There estimates. is literature indeals the with topicdata (Ljung, Predomivariance errors due to the impact of noise on the parameter els experimentally obtained from a There nantly the literature the case of linear systems. estimates. is aa vast in the topic (Ljung, PredomiThe topic of bias-variance trade-off has been addressed in is vast literature indeals the with topicthe (Ljung, 1999). Predominantly theliterature literature case of1999). linearinherently systems. estimates. estimates. is a vast literature in the topic (Ljung, 1999). PredomiHowever, some systems are known to exhibit The topic of bias-variance trade-off has(Chen been addressed in nantly the literature deals with the case of linear systems. many recent papers, see for example et al., 2012; nantly the literature deals with the case of linear systems. However, some systems are known to exhibit inherently The topic of bias-variance bias-variance trade-off has(Chen been addressed addressed in topic of trade-off has been in nantly thebehaviour. literature dealsare with the case of linearinherently systems. The nonlinear many recent papers, see for example et al., 2012; However, some systems known to exhibit The topic of Billings, bias-variance trade-off hastools beenfor addressed in and 2001). Standard restricting However, behaviour. some systems systems are are known known to to exhibit exhibit inherently inherently Mendes nonlinear many recent papers, see for example (Chen et al., 2012; many recent papers, see for example (Chen et al., 2012; However, some Mendes and Billings, 2001). Standard for nonlinear behaviour. many recent papers, see for example (Chen etrestricting al., 2012; the complexity of a model include the tools (Candes nonlinear behaviour. Since nonlinear includes anything other than linear there 1 heuristic Mendes and Billings, 2001). Standard tools for restricting Mendes and 2001). Standard for nonlinear behaviour. the complexity of a model the tools (Candes Sinceendless nonlinear includes anything other model than linear there 1 heuristic Mendes and Billings, Billings, 2001). Standard for restricting restricting al., 2006), nuclear norminclude (Fazel et al., 2001). and rank are options for what structure should be et the complexity of aa model include the tools heuristic (Candes Since nonlinear includes anything other than linear there 1 the complexity of model include the heuristic (Candes Since nonlinear includes anything other than linear there 1 et al., 2006), nuclear norm (Fazel et al., 2001). and rank are endless options for what structure model should be the complexity of a model include the heuristic (Candes constrained optimisation, see e.g. (Delgado et al., 2014a; Since nonlinear includes anything other than linear there 1 chosen. A broad classification of model types includes et al., 2006), 2006),optimisation, nuclear norm normsee (Fazel et al., al., 2001). 2001). and2014a; rank are endless options for what structure model should be et al., nuclear (Fazel et and rank are endless options for what structure model should be constrained e.g. (Delgado et al., chosen. A broad classification of model types includes et al., 2006),2013). nuclear normsee (Fazel et al., 2001). and2014a; rank are endless optionsBox forand what structure model should be Markovsky, Black Box, White Grey Box. The key distinction constrained optimisation, e.g. (Delgado et al., chosen. A broad classification of model types includes constrained optimisation, see e.g. (Delgado et al., 2014a; chosen. A broad classification of model types includes Markovsky, 2013). Black Box, White Box and Grey Box. Thestructure key distinction constrained optimisation, see e.g. (Delgado et al., 2014a; chosen. A broad classification of model types includes is that black box methods use a flexible under Markovsky, 2013). Black Box, White Box Box. The key Markovsky, 2013). Here we adopt rank-constrained optimisation apBlack Box, White Box and and Grey Grey Box. Thestructure key distinction distinction is that black box methods use aOn flexible under 2013).the Black Box, White Box and Grey Box. The key distinction the “one-size-fits-all” philosophy. the other hand, white Markovsky, Here weWe adopt the rank-constrained optimisation apis that black box methods use aaOn flexible structure under proach. apply this idea to the problem of choosing is that black box methods use flexible structure under the “one-size-fits-all” philosophy. the other hand, white Here we weWe adopt the rank-constrained optimisation apHere adopt the rank-constrained optimisation apis that black box methods use ausing flexible structure under box methods build the model physical/Biological proach. apply this idea to the problem of choosing the “one-size-fits-all” philosophy. On the other hand, white Here we adopt the rank-constrained optimisation apwhich terms of a series expansion to include in a nonlinear the “one-size-fits-all” “one-size-fits-all” philosophy. On the thephysical/Biological other hand, hand, white white proach. box methods build the model using We apply this idea to the problem of choosing proach. We apply this idea to the problem of choosing the philosophy. On other understanding. Grey Box lies between these extremes and which terms of a series expansion to include in a nonlinear box methods build the model using physical/Biological proach. We apply this idea to the problem of choosing model. We believe that this is the first time that this class box methods build the model using physical/Biological understanding. Grey BoxBlack lies between these extremes and which terms of a series expansion to include in a nonlinear which terms of series expansion to include in nonlinear box methods build the model using physical/Biological utilises elements of both and White Box methods. We believe thatapplied this is the first time that this class understanding. Box lies these extremes and which terms of aa been series expansion to include inofaa nonlinear of methods has in the context understanding. Grey BoxBlack lies between between theseBox extremes and model. utilises elementsGrey of both and White methods. model. We believe that this is the first time that this class class model. We believe that this is the first time that this understanding. Grey Box lies between these extremes and of methods has been applied in the context of nonlinear utilises elements of both Black and White Box methods. model. We believe that this is the first time that this class system identification. utilises elementsofof ofBlack both Black Black and White White Box methods. An advantage Box methods is Box thatmethods. they are of methods has been applied in the context of nonlinear of methods has been applied in the context of nonlinear utilises elements both and system identification. An advantage ofbeBlack Box methodswhere is that they are of methods has been applied in the context of nonlinear flexible and canof used in situations it isthey difficult identification. An Box methods is are system identification. Methods for imposing rank constraints are closely related An advantage advantage ofbeBlack Black Box methodswhere is that that they are system flexible and canof usedstructure in situations it reasoning. isthey difficult system identification. An advantage Black Box methods is that are to obtain a good model from physical Methods for imposing rank constraintsThe are latter closelyproblem related flexible and can be used in situations where it is difficult to the problem of rank-minimisation. flexible and can be used in situations where it is difficult to obtain a good model structure from physical reasoning. Methods for imposing rank constraints are closely related Methods for imposing rank constraints are closely related flexible and can be usedstructure inon situations it reasoning. is difficult to The current paper focuses thisfrom classwhere of models. the problem of rank-minimisation. The latter problem to obtain a good model physical Methods for imposing rank constraints are closely related has received considerable attention over the past few to obtain a good model structure from physical reasoning. The current paper focuses on this class of models. to the problem of rank-minimisation. The latter problem to the problem of rank-minimisation. The latter problem to obtain a good model structure from physical reasoning. has received considerable attention over the past few The current paper focuses on this class of models. to the problem of rank-minimisation. The latter problem decades. The focus has centred on various approximations The current paper focuses on this class of models. An inherent feature of Black Box methods is that they has received considerable attention overapproximations the past past few few has received considerable attention over the The current paper focuses on this class of models. decades. The focus has centred on various An inherent feature of Black Box methods is that they has received considerable attention overapproximations the past (see few as trace, nuclear norm and log-det heuristics depend uponfeature series type expansions. Hence,is there is a such decades. The focus has centred on various An inherent of Black Box methods that they decades. The focus has centred on various approximations An inherent feature of Black Box methods is that they such as trace, nuclear norm and log-det heuristics have (see depend upon series type expansions. Hence, there is a decades. The focus has centred on various approximations e.g. (Fazel et al., 2001, 2003)). These developments An inherent feature of Black Box methods is that they problem of how to choose which terms to include and such as trace, nuclear norm and log-det heuristics (see depend upon series type Hence, there aa e.g. such as trace, nuclear and log-det heuristics (see depend upon series type expansions. expansions. Hence, there is is (Fazel et al., 2001, norm 2003)). These developments have problem of how tothe choose which terms to naively, include and such as trace, norm and log-det heuristics (see applied tonuclear several system identification problems, depend series type expansions. there is a been which toupon exclude in series. Indeed, if Hence, used then e.g. et al., 2001, 2003)). These developments have problem of how to choose which terms to include and e.g. (Fazel (Fazel et al., 2001, 2003)). These developments have problem of how to choose which terms to include and been applied to several system identification problems, which to exclude in the series. Indeed, if used naively, then (Fazel et al., 2001, 2003)). These developments have e.g. to identify a moving bed process from incomplete problem of how to choose which terms to include and there is the potential for including too many terms leading been applied to aseveral several system identification problems, which exclude in Indeed, used naively, then applied to system identification problems, whichisto tothe exclude in the the series. Indeed, if usedterms naively, then been e.g. to identify moving bed process from incomplete there potential foraseries. including too if many leading been applied to aseveral system identification problems, data sets (Grossmann et al., 2009), identification of Boxwhich to exclude in the series. if used naively, then to overfitting. This isfor well Indeed, known phenomenon which e.g. to identify moving bed process from incomplete there is the potential including too many terms leading e.g. to identify a moving bed process from incomplete there is the potential for including too many terms leading data setsidentify (Grossmann et al.,bed 2009), identification of Boxto overfitting. This isfora including well known phenomenon which e.g. to a moving process from incomplete Jenkins models (Hjalmarsson et al., 2012), and for system there is the potential too many terms leading data sets (Grossmann et al., 2009), identification Boxto This is well known phenomenon which data sets (Grossmann et al., 2009), identification of BoxtoJ.overfitting. overfitting. This is aaa well well knownby phenomenon phenomenon which Jenkins models (Hjalmarsson et al., 2012), and (Liu for of system C. Ag¨ uero was partially supported the Chilean Research data sets (Grossmann et al., 2009), identification of Boxidentification with missing inputs and outputs et al., This is known which to overfitting. J. C. Ag¨ Jenkins models (Hjalmarsson et al., 2012), and for system u ero was partially supported by the Chilean Research Jenkins models (Hjalmarsson et al., 2012), and for system Council (CONICYT) through Basal Project FB0008, and FONDEidentification with missing inputs and outputs (Liu et al., Jenkins models (Hjalmarsson et al., 2012), and for system 2013). Other approaches use the related idea of structured J. C. Ag¨ u ero was partially supported by the Chilean Research C. Ag¨ uno. ero 1150954). was partially supported by FB0008, the Chilean Research Council (CONICYT) through Basal Project and FONDEidentification with missing missing inputs and outputs outputs (Liu et al., al., J. identification with inputs and (Liu et CYT (grant 2013). Other approaches use the related idea of structured J. C. Ag¨ u ero was partially supported by the Chilean Research Council (CONICYT) through Basal Basal Project Project FB0008, FB0008, and and FONDEFONDEidentification with missing and outputs et al., Council (CONICYT) through CYT (grant no. 1150954). 2013). Other Other approaches approaches useinputs the related related idea of of (Liu structured 2013). use the idea structured Council (CONICYT) through Basal Project FB0008, and FONDECYT 2013). Other approaches use the related idea of structured CYT (grant (grant no. no. 1150954). 1150954).
CYT (grant 1150954). Copyright © no. IFAC 2015 824 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2015, IFAC 2015 824 Hosting by Elsevier Ltd. All rights reserved. Peer review under responsibility of International Federation of Automatic Copyright © IFAC 2015 824 Copyright © IFAC 2015 824Control. Copyright © IFAC 2015 824 10.1016/j.ifacol.2015.09.290
MICNON 2015 June 24-26, 2015. Saint Petersburg, Russia Ramón A. Delgado et al. / IFAC-PapersOnLine 48-11 (2015) 814–818
low-rank approximation. This idea has been applied to several system identification problems. For example, it has been used for identification of periodically time-varying systems (Markovsky et al., 2014) and system identification in the behavioural setting (Markovsky, 2013).
815
This simplification leads to a generalised Hammerstein model structure P N hj (n1 )u(k − n1 )j + n(k) (3) y(k) = j=1 n1 =0
There also exist approaches that solve rank-minimization problems exactly, see e.g (d’Aspremont, 2003). However, computational complexity of the associated methods is formidable even for small-size problems. Most heuristics developed for rank-minimization can also be applied to rank-constrained problems. However, when using these heuristics, the condition on the rank is not set as a hard constraint.
In the sequel, and for sake of simplicity of exposition, we will focus on the above structure. However, the extension to more general instances of the model 1 are straightforward.
Recently in (Delgado et al., 2014a,b), a novel approach to dealing with sparsity and rank constraints has been proposed. The current paper builds on this earlier work and applies the approach to nonlinear system identification.
3. OVERVIEW OF RANK-CONSTRAINED OPTIMISATION
The layout of the remainder of the paper is as follows: In Section 2, we formulate the problem of interest. In Section 3 we discuss rank and cardinality constraints. Section 4 shows how the approach can be adapted to solve a nonlinear system identification problem. Simulated examples are given in Section 5. Finally, conclusions are drawn in Section 6. Notation and basic definitions: rank(A) denotes the rank of a matrix A. λi (A) denotes the i-th largest eigenvalue of a symmetric matrix A, A ◦ B denotes the Hadamard product of A and B, A 0 denotes that A is positive semidefinite, and A B denotes that A − B 0. We represent the transpose of a given matrix A as A . Sn denotes the set of symmetric matrices of size n × n, and Sn+ the set of symmetric positive semidefinite matrices, i.e. Sn+ := {A ∈ Sn |A 0}. 1 denotes a vector with ones as entries. F denotes the Frobenius norm. 2. NONLINEAR BLACK-BOX MODELS There are many options for nonlinear models including Volterra expansions and neural networks. For example, the Volterra model has the following generic structure y(k) =
N
n1 =1
+
n1 =1 n2 =1
We have previously applied rank constrained optimisation to a class of problems in linear system identification (see (Delgado et al., 2015)). Our goal here is to extend these tools to the nonlinear case. We first summarise the key elements of rank constrained optimisation that we will use. 3.1 Rank-Constrained optimisation Consider the following rank-constrained optimisation problem minp f (x) Prco : x∈R
subject to x ∈ Ω rank(G(x)) ≤ r Also, consider the following optimisation problem involving bilinear constraints Prcoequiv : min f (x) p n x∈R ,W ∈S
subject to x ∈ Ω G(x)W = 0m×n 0 W In trace(W ) = n − r where Ω ⊂ Rp is a constraining set, f : Rp → R is the objective function and G : Rp → Rm×n .
The following result shows that Prco and Prcoequiv are equivalent. Corollary 1. (Delgado et al., 2015) x ∈ Rp is a global solution of Prco if and only if there exist a W such that the pair (x , W ) is a global solution of Prcoequiv .
h1 (n1 )u(k − n1 )
N N
Our goal will be to use rank-constrained optimisation ideas to restrict the class of impulse response functions and the set of terms of {u(k)j } used in the model.
h2 (n1 , n2 )u(k − n1 )u(k − n2 ) + . . .
· · · + n(k) (1) where {h(n1 , . . . )} are generalised impulse response terms, and y(k), u(k) and n(k) denote the output, input and additive noise, respectively. It is readily seen from (1) that the model potentially has a huge number of parameters, namely N + N 2 + · · · + N P where P is the order of the expansion. There is thus clearly a potential for over-fitting especially when little data is available. A special case of (1) occurs when the choice hj (n1 ) if n1 = n2 = · · · = nP h(n1 , n2 , . . . , nP ) = (2) 0 otherwise 825
Proof. See (Delgado et al., 2015). Background of the proof can be found in (Delgado et al., 2014b). A key observation is that the problem Prco is combinatorial in nature whereas problem Prcoequiv can be solved using standard tools of nonlinear programming. Details are given in (Delgado et al., 2014a). 3.2 Cardinality-constrained optimisation Problem Prco can also cover optimisation problems subject to cardinality constraints, i.e. constraints in the number of non-zero elements of a vector. This is achieved, by considering G(x) to have a diagonal structure, i.e. G(x) = diag(x), then rank(G(x)) = card x. Hence if we consider
MICNON 2015 816 Ramón A. Delgado et al. / IFAC-PapersOnLine 48-11 (2015) 814–818 June 24-26, 2015. Saint Petersburg, Russia
the following cardinality-constrained optimisation problem Pcard : min n f (x) x∈Ω⊆R
subject to card x ≤ r then using Corollary 1, this problem can be formulated as an optimisation problem subject to bilinear constraints as follows min f (x) Pcardequiv : n n x∈Ω⊆R ,w∈R
subject to xi wi = 0; i = 1, . . . , n i = 1, . . . , n 0 ≤ wi ≤ 1; 1 w = n − r
The equivalence beween Pcard and Pcardequiv has been recently obtained independently in (Burdakov et al., 2015). In addition in (Mitchell et al., 2013; Piga and T´ oth, 2013) optimisation problems where the cost contains a term that induces a cardinality constraint has been analysed. The approach considered in these papers can be considered as a special case of the equivalence between Pcard and Pcardequiv , see (Delgado et al., 2014b). 3.3 Group-constrained optimisation A problem closely related to that of cardinality-constrained optimisation is that of group-constrained optimisation. The idea of group-handling in sparse representations has received attention in the last decade, see e.g. (Kim et al., 2006; Yuan and Lin, 2006). These methods are based on the 1 -norm heuristic. Here we show how group constraints can be incorporated into cardinality-constrained optimisation problems. In problem Pcardequiv , the variable w, at the optimum is a binary variable taking the value wi = 1 for those elements corresponding to xi = 0. Additional constraints over w can be included in the optimisation problem to manage how the zero and non-zero elements of x interact. To illustrate these ideas, consider the problem of minimising f (x) subject to card x ≤ r, with x ∈ Rm , with m and r being even numbers. Moreover, say we split the vector x into two groups: Group G1 consisting of the first m/2 elements of x, and group G2 consisting of the last m/2 elements of x. An interesting group constraint in this framework is to require that when the i-th element of G1 is non-zero, i.e. xi = 0, then the i-th element of G2 must be zero, i.e xm/2+i = 0, and vice versa. Thus, a non-zero element in one group induces a zero in the other group. The associated optimisation problem can be formulated as the following group-constrained optimisation problem: min
x∈Rm ,w∈Rm
f (x)
subject to xi wi = 0 0 ≤ wi ≤ 1
1 w = m − r wi + wj ≥ 1;
i = 1, · · · , m i = 1, · · · , m i = 1, . . . , m/2. j = m/2 + i.
where the last constraint in the above problem, rules the interaction between the zero and non-zero elements of the vector x. 826
4. APPLICATION TO HAMMERSTEIN NONLINEAR MODEL There are multitude of problem formulations that we could adopt to illustrate the above circle of ideas. We adopt one particular strategy to illustrate but anticipate that the same general methodology would apply, mutatis mutandis, to other black-box nonlinear system identification problems. We begin with generalised Hammerstein model (3). We can then use the ideas of section 3 to apply two constraints, namely (i) Only a fixed number (say M ) of parameters can be used (M ≤ (N + 1)P ). (ii) We could extend the above idea to incorporate the standard Hammerstein structure by adding a group constraint to the parameters. Namely, if any parameter in a group {hj (n1 ), n1 = 1, . . . , N } is non-zero, then all parameters in that group can be non-zero. To satisfy condition (i), we can use the ideas presented in section 3.2. Then we impose the cardinality constraint by requiring hj (n1 )wj,n1 = 0 (4) (5) 0 ≤ wj,n1 ≤ 1 P N
j=1 n1 =1
wj,n1 = (N + 1)P − M
(6)
Additionally, condition (ii) can be seen as a group constraint, as the discussed in section 3.3. Specifically, condition (ii) can be easily satisfied by including the following additional constraints wj,1 = wj,2 = · · · = wj,N −1 = wj,N j = 1, . . . , P. (7) 5. SIMULATION STUDY We simulate the following system y(k) = ϕ(k) θ + n(k) where u(k) u2 (k) .. . p u (k) u(k − 1) ϕ(k) = .. . up (k − 1) .. .
(8)
(9)
up (k − N ) We note that this system potentially has (N + 1)P parameters, but we restrict the model so that only M parameters are non-zero. However, we assume that the specific location of the non-zero elements is unknown. We then choose to fit a generalised Hammerstein model of the form (3) where we set P = 3, N = 7, but we consider that only M = 4 parameters take a non-zero value. Notice that the model potentially contains 24 parameters. Consider that we are given Ns = 1000 input-output samples. The identification problem can be formulated as the following least squares optimisation problem
MICNON 2015 June 24-26, 2015. Saint Petersburg, Russia Ramón A. Delgado et al. / IFAC-PapersOnLine 48-11 (2015) 814–818
Figures 1a shows the estimation error of the proposed approach, and Figure 1b shows the estimation error for the least squares estimates. Notice that the estimation error of the proposed approach has much smaller variance than the estimation error of the least squares estimate. This confirms the principal claim of the paper, namely, that the use of rank constrained optimisation gives a superior bias-variance trade-off in the context of nonlinear system identification. Also, note that in 86% of cases the methodology recovered the correct model structure. Interestingly, in the 13% of the cases where one of the non-zero elements was chosen erroneously, and in 1% of the cases two non-zero elements were chosen erroneously. Note, that the missed parameters were very small making its selection unimportant to the overall model behaviour.
Cardinality constrained 0.06
0.04
Estimation error
0.02
0
−0.02
−0.04
−0.06 0
5
10 15 Element index
20
817
25
(a) Proposed approach
6. CONCLUSIONS
Least Squares 0.06
We have applied recently developed ideas for rank constrained optimisation to nonlinear system identification. The advantages of the proposed approach have been illustrated via a numerical example.
0.04
Estimation error
0.02
REFERENCES
0
−0.02
−0.04
−0.06 0
5
10 15 Element index
20
25
(b) Least Squares
Fig. 1. Error bar plot over 100 MonteCarlo simulations. Pnsi where
min
θ∈Θ⊆RP (N )
Y − Φθ22
subject to card θ ≤ M ϕ(N ) ϕ(N + 1) Φ = .. .
ϕ(Ns ) y(N ) y(N + 1) Y = .. . y(Ns )
(10)
(11)
We run Nmc = 100 Monte Carlo simulations, with different realization of the input {u(k)}, of the noise {n(k)} and different θ, all having at most M = 4 non-zero elements. The input and the noise are taken to be zero-mean Gaussian with unitary variance. The non-zero elements of θ are draw for a normal distribution. We compare the estimates obtained by solving problem Pnsi with the estimates obtained via Ordinary Least Squares. Figures 1a-1b show the estimation mean and variance of the estimation error over the MonteCarlo simulations. 827
O. Burdakov, C. Kanzow, and A. Schwartz. On a reformulation of mathematical programs with cardinality constraints. In David Gao, Ning Ruan, and Wenxun Xing, editors, Advances in Global Optimization, volume 95 of Springer Proceedings in Mathematics & Statistics, pages 3–14. Springer International Publishing, 2015. E. J. Candes, J. K. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Info. Th., 52(2):489–509, February 2006. T. Chen, H. Ohlsson, and L. Ljung. On the estimation of transfer functions, regularizations and gaussian processes—revisited. Automatica, 48(8):1525 – 1535, 2012. A. d’Aspremont. A semidefinite representation for some minimum cardinality problems. In 42nd IEEE Conference on Decision and Control, 2003, volume 5, pages 4985–4990, 2003. R. A. Delgado, J. C. Ag¨ uero, and G. C. Goodwin. A rank-constrained optimization approach: Application to factor analysis. In 19th IFAC World Congress, Cape Town, South Africa, 2014a. R. A Delgado, J. C. Ag¨ uero, and G. C. Goodwin. A novel representation of rank constraints for non-square real matrices. arXiv preprint arXiv:1410.2317, 2014b. R.A. Delgado, J.C. Ag¨ uero, and G. C. Goodwin. Imposing structural constraints in system identification via rankconstrained optimisation. submitted for publication, 2015. M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference, pages 4734–4739. IEEE, 2001. M. Fazel, H. Hindi, and S. P. Boyd. Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices. In American Control Conference, volume 3, pages 2156–2162, 2003. C. Grossmann, C. Jones, and M. Morari. System identification with missing data via nuclear norm regular-
MICNON 2015 818 Ramón A. Delgado et al. / IFAC-PapersOnLine 48-11 (2015) 814–818 June 24-26, 2015. Saint Petersburg, Russia
ization. In Presented at: European Control Conference, volume 23, page 26, 2009. H. Hjalmarsson, J Welsh, and C Rojas. Identification of Box-Jenkins models using structured ARX models and nuclear norm relaxation. In Proceedings of the 16th IFAC Symposium on System Identification (SYSID 2012), 2012. Y. Kim, J. Kim, and Y. Kim. Blockwise sparse regression. Statistica Sinica, 16(2):375, 2006. Z. Liu, A. Hansson, and L. Vandenberghe. Nuclear norm system identification with missing inputs and outputs. Systems & Control Letters, 62(8):605–612, 2013. L. Ljung. System Identification: Theory for the user. Prentice Hall, 2nd edition, 1999. I. Markovsky. A software package for system identification in the behavioral setting. Control Engineering Practice, 21:1422–1436, 2013. doi: 10.1016/j.conengprac.2013.06.010. I. Markovsky, J. Goos, K. Usevich, and R. Pintelon. Realization and identification of autonomous linear periodically time-varying systems. Automatica, 50 (6):1632 – 1640, 2014. ISSN 0005-1098. doi: http://dx.doi.org/10.1016/j.automatica.2014.04.003. E. M.A.M. Mendes and S. A. Billings. An alternative solution to the model structure selection problem. IEEE Transactions on Systems Man and Cybernetics Part A: Systems and Humans, 31(6):597–608, 2001. J. Mitchell, J.-S. Pang, A. Waechter, L. Bai, M. Feng, and X. Shen. Complementarity formulations for 0 norm optimization problems. In Presentation at the 11th Workshop on Advances in Continuous Optimization, Florence, June 2013. D. Piga and R. T´ oth. An SDP approach for 0 minimization: Application to ARX model segmentation. Automatica, 49(12):3646–3653, 2013. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.
828