Proceedings,18th IFAC Symposium on System Identification Proceedings,18th IFAC Symposium on System Identification Proceedings,18th IFAC July 9-11, 2018. Stockholm, Sweden on Proceedings,18th IFAC Symposium Symposium on System System Identification Identification July 9-11, 2018. Stockholm, Sweden Proceedings,18th IFAC Symposium on System Identification July 9-11, 2018. Stockholm, Sweden July 9-11, 2018. Stockholm, Sweden Available online at www.sciencedirect.com July 9-11, 2018. Stockholm, Sweden on System Identification Proceedings,18th IFAC Symposium July 9-11, 2018. Stockholm, Sweden
ScienceDirect
IFAC PapersOnLine 51-15 (2018) 856–861
Weighted Null-Space Fitting for Weighted Null-Space Fitting for Weighted Null-Space Fitting for Weighted Null-Space Fitting for Identification of Cascade Networks Identification of Cascade Networks Weighted Null-Space Fitting for Identification of Cascade Networks Identification of Cascade Networks ∗∗ Miguel Galrinho ∗∗∗ Riccardo Prota ∗∗ Ferizbegovic Identification of Cascade Networks ∗∗ Mina Miguel Ferizbegovic ∗∗ Mina Miguel Galrinho Galrinho ∗ Riccardo Riccardo Prota Prota ∗∗ Mina Ferizbegovic ∗∗ ∗
Miguel Riccardo Prota Ferizbegovic akan Hjalmarsson ∗ ∗ Miguel Galrinho Galrinho ∗H˚ Riccardo Prota ∗∗ Mina Mina Ferizbegovic ∗ H˚ akan Hjalmarsson ∗ H˚ akan Hjalmarsson akan Hjalmarsson ∗ ∗H˚ ∗∗ akan Hjalmarsson Miguel Galrinho H˚ Riccardo Prota Mina Ferizbegovic ∗ ∗ ∗ Electrical Engineering, Department of Automatic Control, School of ∗ H˚ akan Hjalmarsson ∗ Department of Automatic School of Electrical Engineering, ∗ Department of Automatic Control, Control, School of ElectricalSweden Engineering, of Automatic Control, School of Engineering, ∗ Department KTH Royal Institute of Technology, Stockholm, Department of Automatic Control, School Stockholm, of Electrical ElectricalSweden Engineering, KTH Royal Institute of Technology, KTH Royal Institute of Technology, Stockholm, Sweden KTH Royal Institute of Technology, Stockholm, Sweden ∗ {galrinho,minafe,hjalmars}@kth.se) KTH (e-mail: Royal Institute of Technology, Stockholm, Sweden Department of Automatic Control, School of Electrical Engineering, (e-mail: {galrinho,minafe,hjalmars}@kth.se) (e-mail: {galrinho,minafe,hjalmars}@kth.se) ∗∗ {galrinho,minafe,hjalmars}@kth.se) of Information Engineering, ∗∗ ∗∗ Department (e-mail: {galrinho,minafe,hjalmars}@kth.se) KTH (e-mail: Royal Institute Technology, Stockholm, Sweden Department of Information Engineering, ∗∗ Department of Information Engineering, of Information Engineering, ∗∗ Department University of Padova, Italy Department of Padova, Information Engineering, (e-mail: {galrinho,minafe,hjalmars}@kth.se) University of Padova, Padova, Italy University of Padova, Padova, Italy University of Padova, Padova, Italy ∗∗ (e-mail:
[email protected]) University of Padova, Italy Department of Padova, Information Engineering, (e-mail:
[email protected]) (e-mail:
[email protected]) (e-mail:
[email protected]) (e-mail:
[email protected]) University of Padova, Padova, Italy (e-mail:
[email protected]) Abstract: For identification of systems embedded in dynamic networks, the prediction error Abstract: For For identification identification of of systems systems embedded embedded in in dynamic dynamic networks, networks, the the prediction prediction error error Abstract: Abstract: For identification of systems embedded in dynamic networks, the prediction error method (PEM) with a correct parametrization of the complete network provides asymptotically Abstract: For identification of systems embedded in dynamic networks, the prediction error method (PEM) with a correct parametrization of the complete network provides asymptotically method (PEM) with a correct parametrization of the complete network provides asymptotically method (PEM) a parametrization of network provides asymptotically efficient estimates. However, the network complexity often hinders a successful application of method with a correct correctthe parametrization of the theincomplete complete network provides asymptotically Abstract: For with identification of systems dynamic networks, the prediction error efficient (PEM) estimates. However, the networkembedded complexity often hinders successful application of efficient estimates. However, network complexity often hinders aaa successful application of efficient estimates. However, the network complexity often hinders successful application of PEM, which requires minimizing a non-convex cost function that can become more intricate efficient estimates. However, the network complexity often hinders a successful application of method (PEM) with a correct parametrization of the complete network provides asymptotically PEM, which requires minimizing a non-convex cost function that can become more intricate PEM, which requires minimizing a non-convex non-convex cost function function that can can becomeoften morefocuses intricate PEM, which requires minimizing a cost that become more intricate for more complex networks. For this reason, identification in dynamic networks in PEM, which requires minimizing a non-convex cost function that can becomeoften more intricate efficient estimates. However, thethis network complexity oftenin a networks successful application of for more more complex networks. For this reason, identification in hinders dynamic networks often focuses in for complex networks. For reason, identification dynamic focuses in for more complex networks. For this reason, identification in dynamic networks often focuses in obtaining consistent estimates of modules of interest. A downside of these approaches is that for more complex networks. For this reason, identification in dynamic networks often focuses in PEM, which requires minimizing a non-convex cost function that can become more intricate obtaining consistent estimates of modules of interest. A downside of these approaches is that obtaining consistent estimates of modules of interest. A downside of these approaches is that obtaining consistent estimates modules ofidentification interest. of these approaches is splitting the network in several modules for often costs asymptotic efficiency. In obtaining consistent estimates ofthis modules interest. A A downside downside of networks these approaches is that that for more complex networks. Forof reason, in dynamic often focuses in splitting the network in several several modules forofidentification identification often costs asymptotic efficiency. In splitting the network in modules for identification often costs asymptotic efficiency. In splitting the network in several modules for identification often costs asymptotic efficiency. In this paper, we consider dynamic networks with the modules connected in serial cascade, with splitting the network in dynamic severalofmodules identification often costs asymptotic efficiency. In obtaining consistent estimates modulesforof interest. A downside of these approaches is with that this paper, paper, we consider dynamic networks with the modules modules connected in serial serial cascade, with this we consider networks with the connected in cascade, this paper, consider dynamic networks with the modules connected in cascade, with measurements affected by sensor noise. We propose algorithm that estimates the modules this paper, we consider dynamic networks with thean modules connected in serial serialall cascade, with splitting thewe network inby several identification often costs asymptotic efficiency. In measurements affected by sensormodules noise. Weforpropose propose an algorithm that estimates all the modules measurements affected sensor noise. We an algorithm that estimates all the modules measurements affected by sensor noise. We propose an algorithm that estimates all the modules in the network simultaneously without requiring the minimization of a non-convex cost function. measurements affected by sensor noise. We propose an algorithm that estimates all the modules this paper, we consider dynamic networks with the modules connected in serial cascade, with in the network simultaneously without requiring the minimization of a non-convex cost function. in the network simultaneously without requiring the minimization of a non-convex cost function. in the network simultaneously without the minimization of This algorithm is an extension of Weighted Null-Space Fitting (WNSF), a weighted least-squares in thealgorithm network affected simultaneously without requiring the an minimization of aa non-convex non-convex cost function. measurements by sensor noise.requiring We Null-Space propose algorithm that estimates allcost the function. modules This algorithm is an an extension extension of Weighted Weighted Null-Space Fitting (WNSF), weighted least-squares This is of Fitting (WNSF), aaa single-output weighted least-squares This algorithm is an extension of Weighted Null-Space Fitting (WNSF), weighted least-squares method that provides asymptotically efficient estimates for single-input systems. This is an extension of Weighted Null-Space Fitting (WNSF), a single-output weightedcost least-squares in thealgorithm network simultaneously without requiring the minimization of a non-convex function. method that provides asymptotically efficient estimates for single-input single-input single-output systems. method that provides asymptotically efficient estimates for systems. method that asymptotically efficient estimates for systems. We illustrate the performance of the algorithm with simulation studies, which suggest that aa method that provides provides asymptotically estimates for single-input single-input single-output systems. This algorithm is an extension ofof Null-Space Fitting (WNSF), a single-output weighted least-squares We illustrate illustrate the performance ofWeighted the efficient algorithm with simulation studies, which suggest that We the performance the algorithm with simulation studies, which suggest that aa We illustrate the performance of the algorithm with simulation studies, which suggest that network WNSF method may also be asymptotically efficient when applied to cascade structures. We illustrate the performance of the algorithm with simulation studies, which suggest that a method that provides asymptotically efficient estimates for single-input single-output systems. network WNSF method may also be asymptotically efficient when applied to cascade structures. network WNSF method may also be asymptotically efficient when applied to cascade structures. network WNSF method may also be asymptotically efficient when applied to cascade structures. Finally, we discuss the possibility of extension to more general networks affected by sensor noise. network WNSF method may also be asymptotically efficient when applied to cascade structures. We illustrate the performance of the algorithm with simulation studies, which suggest that a Finally, we discuss the possibility of extension to more general networks affected by sensor noise. Finally, we discuss the possibility of extension to more general networks affected by sensor noise. Finally, we discuss the possibility of extension to more general networks affected by sensor noise. Finally, we discuss the possibility of extension to more general networks affected by sensor noise. network WNSF method may also be asymptotically efficient when applied to cascade structures. © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Keywords: System identification, least-squares identification, networks Finally, we System discuss the possibility of extension toidentification, more general networks networks affected by sensor noise. Keywords: identification, least-squares Keywords: System identification, least-squares identification, networks Keywords: System identification, least-squares identification, networks Keywords: System identification, least-squares identification, networks 1. INTRODUCTION INTRODUCTION be to Keywords: System identification, least-squares identification, 1. be estimated estimatednetworks to obtain obtain consistent consistent estimates; estimates; for for this this reason, reason, 1. INTRODUCTION be estimated to obtain consistent estimates; for this reason, 1. INTRODUCTION INTRODUCTION be estimated to obtain consistent estimates; for this reason, two-stage methods (Van den Hof and Schrama, 1993) are 1. be estimated to obtain consistent estimates; for this reason, two-stage methods (Van den Hof and Schrama, 1993) are two-stage methods (Van den Hof and Schrama, 1993) are two-stage methods (Van den Hof and Schrama, 1993) are also considered in the aforementioned works, which do not Identification in dynamic networks gives rise to several two-stage methods (Van den Hof and Schrama, 1993) are 1. INTRODUCTION be estimated to obtain consistent estimates; for this reason, also considered in the aforementioned works, which do not Identification in in dynamic dynamic networks networks gives gives rise rise to to several several also considered in the aforementioned works, which do not Identification also considered in the aforementioned works, which do not have this requirement. If the objective is to identify the Identification in dynamic networks gives rise to several problems and approaches. A first problem is the detection of also considered in the aforementioned works, which do not two-stage methods (Van den Hof and Schrama, 1993) are have this requirement. If the objective is to identify the Identification in dynamic networks gives rise to several problems and approaches. A first problem is the detection of have this requirement. If the objective is to identify the problems and approaches. A first problem is the detection of have this requirement. If the objective is to identify the whole network, PEM can be applied with a multi-input problems and approaches. A first problem is the detection of the network topology (e.g., Materassi and Innocenti, 2010). have this requirement. If the objective is to identify the also considered in the aforementioned works, which do not whole network, PEM can be applied with a multi-input problems and approaches. A first problem is the detection of Identification in dynamic networks gives rise to several the network topology (e.g., Materassi and Innocenti, 2010). whole network, PEM can be applied with a multi-input thesecond network topology (e.g., Materassiconditions and Innocenti, Innocenti, 2010). whole network, PEM be applied with multi-input multi-output (MIMO) model the whole which the network topology (e.g., and 2010). A problem is determining for network, PEM can can be of applied with a identify multi-input have this requirement. If the objective is network, toa the multi-output (MIMO) model of the whole network, which the network topology Materassi andisInnocenti, 2010). problems and approaches. AMaterassi first problem the detection of whole A second problem is in in(e.g., determining conditions for network network multi-output (MIMO) model of the whole network, which A second problem is in determining conditions for network multi-output (MIMO) model of the whole network, which with a correct parametrization provides asymptotically A second problem is in determining conditions for network identifiability (Weerts et al., 2015; Gevers et al., 2017). A multi-output (MIMO) model of the whole network, which whole network, PEM can be applied with a multi-input with a correct parametrization provides asymptotically A second problem is in determining conditions for network the network topology (e.g., Materassi and Innocenti, 2010). identifiability (Weerts et al., 2015; Gevers et al., 2017). A with a correct correct parametrization provides asymptotically identifiability (Weerts et variance al., 2015;analysis Gevers et al., 2017). A with a parametrization provides asymptotically efficient estimates. The main limitation with this approach identifiability (Weerts al., Gevers et al., A third problem concerns (Everitt et al., a correct parametrization provides asymptotically multi-output (MIMO) model of the whole network, which efficient estimates. The main limitation with this approach identifiability (Weerts et al., 2015; 2015;analysis Gevers et al., 2017). A with A second problem is in et determining conditions for2017). network third problem concerns variance (Everitt et al., efficient estimates. The main limitation with this approach third problem concerns variance analysis (Everitt et al., efficient estimates. The main limitation with this approach is that the non-convex cost function of PEM becomes more third problem concerns variance analysis (Everitt et al., 2013, 2014). Finally, there is the problem of developing efficient estimates. The main limitation with this approach with a correct parametrization provides asymptotically that the non-convex cost function of PEM becomes more third problem concerns (Everitt et al., identifiability (Weerts et variance al.,is Gevers et 2017). A is 2013, 2014). Finally, Finally, there is2015; the analysis problem ofal., developing is that the non-convex cost function of PEM becomes more 2013, 2014). there the problem of developing is that the non-convex cost function of PEM becomes more complicated as the size of the network increases. 2013, 2014). Finally, there is the problem of developing appropriate for identification of the systems in is that the non-convex cost function of PEM more efficient estimates. The main limitation with becomes this approach complicated as the size of the network increases. 2013, 2014). methods Finally, there is the analysis problem of developing third problem concerns variance (Everitt et al., appropriate methods for identification of the systems in complicated as the size of the network increases. appropriate methods for identification of the systems in complicated as the size of the network increases. appropriate methods for identification of the systems in the network, which is the problem considered in this paper. complicated as the size of the network increases. is that the non-convex cost function of PEM becomes more appropriate methods for identification of the systems in 2013, 2014). Finally, there is the problem of developing If there is also sensor noise, PEM cannot be applied to the network, which is the problem considered in this paper. the network, which is the problem considered in this paper. If there is also sensor noise, PEM cannot be applied to the network, which is the problem considered in this paper. If there is also sensor noise, PEM cannot be applied to complicated as the size of the network increases. If there is also sensor noise, PEM cannot be applied to the network, which is the problem considered in this paper. appropriate methods for identification of the systems in particular modules using internal network signals because To estimate particular modules in dynamic networks where If there is also sensor noise, PEM cannot be applied to particular modules using internal network signals because To estimate particular modules in dynamic networks where particular modules using internal network signals because To estimate particular modules in dynamic networks where particular modules using internal network signals because theestimate network, which is the problem considered in this paper. the inputs are noisy. In this case, instrumental variable (IV) To particular modules in dynamic networks where all the nodes are observed with process noise (but no particular modules using internal network signals because If there is also sensor noise, PEM cannot be applied to the inputs are noisy. In this case, instrumental variable (IV) To estimate particular modules in dynamic networks where all the nodes are observed with process noise (but no the inputs(S¨ are noisy. Inand thisStoica, case, instrumental instrumental variable (IV) all the nodes are observed with process noise (but no the inputs are noisy. In this case, variable (IV) methods ooderstr¨ oom 1983) have been applied all the nodes are observed with process noise (but no sensor noise), the inputs to every module are known the inputs are noisy. In this case, instrumental variable (IV) particular modules using internal network signals because methods (S¨ derstr¨ m and Stoica, 1983) have been applied all the nodes are observed with process noise (but no To estimate particular modules in dynamic networks where sensor noise), the inputs to every module are known methods (S¨ o derstr¨ o m and Stoica, 1983) have been applied sensor noise), the inputs toerror every module are known methods ooderstr¨ oom and 1983) been applied to provide consistent estimates particular in the sensor the inputs every module are known exactly, and the method (Ljung, 1999) methods (S¨ derstr¨ mIn and Stoica, 1983) have havemodules been applied the inputs(S¨ are noisy. thisStoica, case,of instrumental variable (IV) provide consistent estimates of particular modules in the sensor noise), theprediction inputs to toerror every module are(but known all the noise), nodes are observed with process noise no to exactly, and the prediction method (Ljung, 1999) to provide consistent estimates of particular modules in the exactly, and the prediction error method (Ljung, 1999) to provide consistent estimates of particular modules in the network (Dankers et al., 2015). The approaches by Everitt exactly, and the prediction error method (Ljung, 1999) can be applied to a multi-input single-output (MISO) to provide consistent estimates of particular modules in the methods (S¨ o derstr¨ o m and Stoica, 1983) have been applied network (Dankers et al., 2015). The approaches by Everitt exactly, and the prediction error method (Ljung, 1999) sensor noise), the inputs to every module are known can be be applied applied to to a a multi-input multi-input single-output single-output (MISO) (MISO) network (Dankers et al., 2015). The approaches by Everitt can network (Dankers et al., 2015). The approaches by Everitt et al. (2016) and Galrinho et al. (2017) handle this by can be applied to a multi-input single-output (MISO) model (Van den Hof et al., 2013). A key issue to obtain network (Dankers et al., 2015). The approaches by Everitt to provide consistent estimates of particular modules in the et al. (2016) and Galrinho et al. (2017) handle this by can be applied to a multi-input single-output (MISO) exactly, and the prediction error method (Ljung, 1999) model (Van den Hof et al., 2013). A key issue to obtain et al. (2016) and Galrinho et al. (2017) handle this by model (Van den Hof et al., 2013). A key issue to obtain et al. (2016) and Galrinho et al. (2017) handle this by using external excitations and auxiliary non-parametric model (Van den Hof et al., 2013). A key issue to obtain consistent estimates is how to choose the signals that et al. (2016) and Galrinho et al. (2017) handle this by network (Dankers et al., 2015). The approaches by Everitt using external excitations and auxiliary non-parametric model (Van den Hof et al., 2013). A key issue to obtain can be applied to a multi-input single-output (MISO) consistent estimates is how to choose the signals that using external excitations and auxiliary non-parametric consistent estimates is how to choose the signals that using external excitations and auxiliary non-parametric models, and often provide more accurate estimates than consistent estimates is how to choose the signals that should be included in the predictor as inputs because using external excitations and auxiliary non-parametric et al. (2016) and Galrinho et al. (2017) handle this by models, and often provide more accurate estimates than consistent estimates is the how2013). to choose the signals that models, model (Van den Hof in et al., A key issue tobecause obtain should be included predictor as inputs and often provide more accurate estimates than should be included in the predictor as inputs because models, and often provide more accurate estimates than IV. In this case, because of the presence of the different should be included in the predictor as inputs because they influence the output (Dankers et al., 2016). Also, models, and often provide more accurate estimates than using external excitations and auxiliary non-parametric IV. In this case, because of the presence of the different should be included in the predictor as inputs because consistent estimates is how to choose the signals that they influence the output (Dankers et al., 2016). Also, IV. In this case, because of the presence of the different they influence the output (Dankers etcase al., 2016). Also, IV. In this because of the presence of the noise applying PEM to the whole network with they influence the output (Dankers al., 2016). Also, similarly the standard (Forssell and IV. Insources, this case, because ofmore the presence of the different different models, andcase, often provide accurate estimates thanaaa noise sources, applying PEM to the whole network with they influence the output (Dankers et al., 2016). Also, should beto included in theclosed-loop predictor et as inputs because similarly to the standard closed-loop case (Forssell and noise sources, applying PEM to the whole network with similarly to the standard closed-loop case (Forssell and noise sources, applying PEM to the whole network with aa correct parametrization to obtain asymptotically efficient similarly to the standard closed-loop case (Forssell and Ljung, 1999), a noise model that is in the model set must noise sources, applying PEM to the whole network with IV. In this case, because of the presence of the different parametrization to obtain asymptotically efficient similarly to the standard case (Forssell and correct they output (Dankers al., 2016). Also, Ljung, 1999), a noise model that the model set correct parametrization to obtain asymptotically efficient Ljung,influence 1999), a the noise modelclosed-loop that is is in in et the model set must must correct parametrization to obtain asymptotically efficient estimates is often a very complex problem. Ljung, 1999), a noise model that is in the model set must correct parametrization to obtain asymptotically noise sources, applying PEM to the whole networkefficient with a is often aa very complex problem. Ljung, 1999), a noise modelclosed-loop that is in the set must similarly to the standard casemodel (Forssell and estimates estimates is often complex problem. estimates is often aa very very complex problem. This work was supported by the Swedish Research Council estimates is often very complex problem. correct parametrization to obtain asymptotically efficient This work Ljung, 1999), a noise model that is in the model set must For simplicity of presentation, we consider aa particular case was supported by the Swedish Research Council This was by Research Council For simplicity of presentation, we consider case through the project System identification: Unleashing the algorithms This work work was supported supported by the the Swedish Swedish Research Council For simplicity of presentation, we consider aa particular particular case estimates isnetworks: often a very complex problem. For simplicity of presentation, we consider particular case through the project System identification: Unleashing the algorithms This work was supported by the Swedish Research Council of dynamic serial cascade networks, where at each through the project System identification: Unleashing the algorithms For simplicity of presentation, we consider a particular case of dynamic networks: serial cascade networks, where at each (contract 2015-05285) and theidentification: research environment NewLEADS: New through the project System Unleashing the algorithms of dynamic networks: serial cascade networks, where at each (contract 2015-05285) and the research NewLEADS: New through the project System identification: Unleashing the algorithms of dynamic networks: serial cascade networks, where at each This work was supported by theenvironment Swedish Research Council node (i.e., between each module) there is either a known ex(contract 2015-05285) and the research environment NewLEADS: New of dynamic networks: serial cascade networks, where at each For simplicity of presentation, we consider a particular case Directions2015-05285) in Learningand Dynamical Systems (contract 2016-06079). (contract the research environment NewLEADS: New node (i.e., between each module) there is either a known exnode (i.e., between each module) there is either aa known exDirections in Learning Dynamical Systems (contract 2016-06079). (contract 2015-05285) and theidentification: research environment NewLEADS: New through the System Unleashing the algorithms node (i.e., between each module) there is either known exDirections inproject Learning Dynamical Systems (contract 2016-06079). Directions in Learning Dynamical Systems (contract 2016-06079). node (i.e., between each module) there is either a known exof dynamic networks: serial cascade networks, where at each Directions2015-05285) in Learningand Dynamical Systems (contract 2016-06079). (contract the research environment NewLEADS: New node (i.e., between each module) there Copyright © 2018 IFAC 856Hosting Directions in2018, Learning Systems (contract 2016-06079). 2405-8963 © IFACDynamical (International Federation of Automatic Control) by Elsevier Ltd. All rights reserved. Copyright © 2018 IFAC 856 Copyright © 2018 IFAC 856 Copyright 2018 responsibility IFAC 856Control. Peer review© of International Federation of Automatic Copyright ©under 2018 IFAC 856 10.1016/j.ifacol.2018.09.116 Copyright © 2018 IFAC 856
is either a known ex-
2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden
Miguel Galrinho et al. / IFAC PapersOnLine 51-15 (2018) 856–861
For simplicity of notation, we consider this network example for the paper, and generalizations are discussed in Section 7.
u2 (t) u1 (t)
G1 (q)
857
G2 (q)
G3 (q)
e1 (t)
e2 (t) y1 (t)
For the model, we consider that each Gj (q) is parametrized by a rational transfer function in the parameter vector θj : Bj (q, θj ) , (2) Gj (q, θj ) = Fj (q, θj ) where Bj (q, θj ) and Fj (q, θj ) are polynomials
y2 (t)
(j)
Fig. 1. Serial cascade network example. ternal excitation or a measurement affected by sensor noise (i.e., not all nodes are necessarily measured). Although PEM can be applied to a tailor-made parametrization of this type of network, the non-convexity of the cost function is a concern. Wahlberg et al. (2009) have pointed out how indirect PEM (S¨oderstr¨ om et al., 1991) can be useful to provide asymptotically efficient estimates, but only with models for which the predictor is linear in the parameters. In some cases, subspace methods (Van Overschee and De Moor, 2012) can be applied (Wahlberg and Sandberg, 2008; H¨agg et al., 2010), but they in general do not provide asymptotically efficient estimates. The main purpose of this paper is to propose a method that provides asymptotically efficient estimates of all the cascade network modules without solving non-convex optimizations. The proposed method is based on the Weighted Null-Space Fitting (WNSF) method, a weighted least-squares method proposed by Galrinho et al. (2014) as an alternative to the non-convexity of PEM. It has been shown by Galrinho et al. (2018) that the method is consistent and asymptotically efficient for single-input single-output (SISO) systems. In the network case, the challenge is that WNSF must incorporate the network structure, as standard MIMO models in general do not. In this paper, we propose an algorithm to do this. To keep notation simple, we consider one example, and later elaborate on the extension to other networks. In a simulation study, we illustrate the robustness of the method compared to PEM, which is prone to converge to non-global minima. Supported by the simulation study and the theoretical analysis in Galrinho et al. (2018), there are strong reasons to believe that the WNSF network extension is asymptotically efficient. Notation. Let x be a p-dimensional column vector. Then, Tn×m {x} (n ≥ p, n ≥ m) is the n × m Toeplitz matrix with first column [x 01×n−p ] and first row [x1 01×m−1 ], with x1 the first element of x. 2. PROBLEM STATEMENT Consider the cascade network in Fig. 1, where {Gj (q)}3j=1 are stable transfer functions (q is the forward-shift operator), {u1 (t), u2 (t)} are known inputs, and {y1 (t), y2 (t)} are measured outputs subject to mutually independent white Gaussian noises {e1 (t), e2 (t)} with variances λ1 and λ2 . The relation between these signals can be written as y(t) = G(q)u(t) + e(t), where u (t) y (t) e (t) u(t) = 1 , y(t) = 1 , e(t) = 1 , u2 (t) y2 (t) e2 (t) and (argument q omitted for notational simplicity) G2 G 2 G1 . G(q) = G 3 G2 G1 G3 G 2
(1) 857
(j) −m q , Fj (q, θj ) = 1 + f1 q −1 + · · · + fm (j)
and
(j)
(j)
Bj (q, θj ) =b0 + b1 q −1 + · · · + bm−1 q −(m−1) ,
(3)
(j) (j) (j) (4) θj = f1(j) · · · fm b0 · · · bm−1 . Although each polynomials in every transfer function may have a different number of parameters, as well as number of delays in the numerator, we assume the structure in (3) for simplicity of notation. This gives the cascade model y(t) = G(q, θ)ut + et , (5)
where θ = [θ1 θ2 θ3 ] . We assume also that there is θjo such that Gj (q) = Gj (q, θjo ) and that the network is identifiable. The problem considered in this paper is how to obtain consistent and asymptotically efficient estimates of θ without the need of minimizing a non-convex cost function. 3. PREDICTION ERROR METHOD In this section, we review how the prediction error method (PEM) can be applied to the cascade network in Fig. 1 to obtain asymptotically efficient estimate of θ. We discuss the limitation of non-convexity and how initial estimates can be obtained. This limitation serves as motivation for the method we will propose. The idea of PEM is to minimize a cost function of the prediction errors. For the cascade network considered in this paper, the prediction errors are given by ε(t, θ) = y(t) − G(q, θ)u(t), where ε(t, θ) is a white noise sequence. Then, PEM with a quadratic cost consists of minimizing N 1 ε(t, θ)ε (t, θ) , (6) VN (θ) = det N t=1
where N is the sample size.
The global minimizer θˆN of this cost function is an asymptotically efficient estimate of θ. The problem is that this is, in general, a non-convex cost function, for which optimization algorithms can converge to a non-global minimum. Although this is the case also for SISO systems, the cascade case is of more concern. First, the size of the problem increases with the number of systems in the network; second, methods to provide initialization points for PEM may not be directly applicable to cascade networks. Nevertheless, the following procedure could be applied to provide initial estimates for the cost function (6). First, estimate an over-parametrized MIMO OE model with G11 (q, η11 ) G12 (q, η12 ) G(q, η) = , (7) G21 (q, η21 ) G22 (q, η22 )
where this parametrization is defined similarly to (2), (3) and (4). The difference is that the structure (1) of
2018 IFAC SYSID 858 July 9-11, 2018. Stockholm, Sweden
Miguel Galrinho et al. / IFAC PapersOnLine 51-15 (2018) 856–861
the cascade is not captured by this parametrization, and we now have a standard MIMO OE problem. Although this problem also requires minimizing a non-convex cost function, standard methods are available to initialize it, such as instrumental variable S¨oderstr¨ om and Stoica (1983) or subspace methods Van Overschee and De Moor (2012). Second, using G(q, θ) = G(q, η), let an estimate of G2 (q, θˆ2 ) be given by G12 (q, ηˆ12 ), and consider ˆ 3 (q) = G12 (q, ηˆ12 ) ˆ 1 (q) = G11 (q, ηˆ11 ) , G (8) G G12 (q, ηˆ12 ) G22 (q, ηˆ22 ) as estimates of G1 (q) and G3 (q). Except for G2 (q), these estimates do not have the desired structure (i.e., because of the noise in the estimates ηˆ12 and ηˆ22 , there will not be any ˆ 3 (q) will be overparametrized). pole-zero cancellation, and G ˆ 1 (q) Third, apply a model order reduction technique to G ˆ and G3 (q) in order to obtain estimates with the desired structures G1 (q, θˆ1 ) and G3 (q, θˆ3 ). Although this procedure provides initial estimates for (6) by solving standard PEM problems, the risk of converging to non-global minima in some cost function is still high, as the problems are MIMO. As basis to propose a method for the cascade problem that does not have this limitation, we now consider WNSF for SISO OE models. 4. WEIGHTED NULL-SPACE FITTING
n−1 B(q, θ) = gk q −k F (q, θ)
(11)
k=0
to obtain an estimate of θ from the estimate of g n we have obtained in Step 1. This is done by re-writing (11) as n F (q, θ) k=0 gkn q −k − B(q, θ) = 0, which can be re-written in matrix form as g n − Q(g n )θ = 0, (12) where Im×m n n Q(g ) = −Tn×m {Γn g12 } , 0n−m×m
with
Γn =
0 01×n−1 . In 0n−1×1
Because (12) is linear in θ, we may replace g n by the estimate gˆn from Step 1, and solve for θ with least squares: −1 n n g n )Q(ˆ gn ) Q (ˆ g )ˆ g . (13) θˆLS = Q (ˆ Step 3: Re-estimation of Parametric Model The idea of the third step is to re-estimate θ with a statistically sound approach. We do this by replacing the noisy estimate gˆn in (12) and re-writing the residuals as gˆn − Q(ˆ g n )θ = g n + ∆ng − Q(g n + ∆ng )θ ˜ n )θ], = [g n − Q(g n )θ] + [∆n − Q(∆ g
(14)
g
The weighted null-space fitting (WNSF) method was introduced by Galrinho et al. (2014) and is consistent and asymptotically efficient under similar assumptions as PEM (Galrinho et al., 2017). We now review the method, which we will extend for cascade networks.
˜ is defined similarly to Q but with the identity where Q matrix in the top-right block replaced by the zero matrix. Using (12), (14) can be re-written as gˆn − Q(ˆ g n )θ = T (θ)∆ng , (15)
Consider an OE model as (5) with a SISO transfer function G(q, θ). WNSF is a three-step weighted least-squares method to estimate θ. First, a non-parametric model is estimated by least squares. Second, the estimated nonparametric model is reduced to a parametric model by least squares. Third, the parametric model is re-estimated by weighted least squares, where the weighting is constructed using the parametric estimate obtained in Step 2.
where T (θ) = Tn×n {[1 f1 . . . fm ] }. If the residuals we try to minimize when we replace g n by gˆn in (12) are given by (15), the estimate of θ with minimum variance is given by solving a weighted least-squares problem, where the weighting is the inverse of the covariance of the residuals (15). This covariance is given by
Step 1: Non-parametric model In the first step, we approximate the parametricOE model with the nonn−1 parametric FIR model y(t) = k=0 gk q −k u(t)+e(t), where n is sufficiently large so that the bias error by truncation is negligible. The PEM estimate of g n = [g0 g1 · · · gn−1 ] is then obtained by least squares with −1 N N 1 1 n gˆ = ϕ(t)ϕ (t) ϕ(t)y(t) , (9) N t=1 N t=1
where ϕ(t) = [u(t) u(t − 1) · · · u(t − n − 1)] . If we assume the error made by truncation is negligible, the noise in the non-parametric estimate ∆ng = gˆn − gon (go are the first n true impulse response coefficients) is distributed as ∆ng ∼ N (0, P ), where N is the normal distribution and N −1 1 P = ϕ(t) ϕ (t) . (10) λ t=1 Step 2: Estimation of Parametric Model step, we use the relation
In the second
858
W −1 (θ) = T (θ)P T −1 (θ). (16) Because the true value of θ is not available to compute (16), we replace it by the estimate θˆLS obtained in Step 2. The noise variance λ in P (10) is also typically unknown, but because it is a scalar, it can be disregarded in the weighting. Then, we re-estimate θ using −1 g n )W (θˆLS )Q(ˆ g n ) Q (ˆ g n )W (θˆLS )ˆ g n . (17) θˆWLS = Q (ˆ The estimate θˆWLS is an asymptotically efficient estimate of θ. However, for finite sample size, continuing to iterate, constructing the weighting with the estimate obtained at the previous iteration, may provide an improvement.
Remark 1. Three properties are required to apply WNSF: (1) the parametric model of interest can be approximated by a non-parametric model estimated by least squares; (2) the non-parametric and parametric models can be related using the form (12) (i.e., linear in the parametric model parameters); (3) the residuals (15) are linear in the error of the nonparametric estimate, so that a closed-form expression for the covariance can be obtained.
2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden
Miguel Galrinho et al. / IFAC PapersOnLine 51-15 (2018) 856–861
In this section, we extend WNSF for cascade networks, using the network in Fig. 1 as example. We follow the WNSF steps from Section 4. Step 1: Non-parametric model The key aspect of Step 1, as stated in Remark 1, is to estimate with least squares a non-parametric model that can approximate the parametric model of interest. For the cascade network in Fig. 1, we use the non-parametric MIMO FIR model ˜ 12 (q, g n ) ˜ (q, g n ) G e1 (t) G y1 (t) 11 12 u(t) + = ˜ 11 , (18) n ˜ 22 (q, g n ) e2 (t) y2 (t) ) G G21 (q, g21 22 ˜ ij (q, g n ) = n−1 g (ij) q −k , and where, for i, j = {1, 2}, G ij k=0 k (ij)
(ij)
n gij = [g0 g1 · · · gn−1 ] . Then, the PEM estimate of n n n n g n = (g11 is obtained by least ) (g12 ) (g21 ) (g22 ) squares with (9), but with y(t) = [y1 (t) y2 (t)] and ϕ11 (t) 0 ϕ12 (t) 0 , ϕ(t) = 0 ϕ21 (t) 0 ϕ22 (t) where in this case, because all modules have no delay,
ϕ11 (t) = [u1 (t) u1 (t − 1) · · · u1 (t − n − 1)] = ϕ21 (t),
ϕ12 (t) = [u2 (t) u2 (t − 1) · · · u2 (t − n − 1)] = ϕ22 (t). n
Now, the covariance of gˆ is given by, similarly to (10) but for the MIMO case, N −1 P = ϕ(t)Λ−1 ϕ (t) , (19) t=1
where Λ is a diagonal matrix with diagonal [λ1 λ2 ].
Step 2: Estimation of Parametric Model The second step consists of estimating θ from the non-parametric estimate. The key aspect, as mentioned in Remark 1, is that the relation between the non-parametric and the parametric model can be written linearly in the parameters θ, as in (12). In this case, we have g θ θ ˜ G ˜g G Gθ2 G2 G 1 11 12 , = (20) ˜g G ˜g Gθ3 Gθ2 Gθ1 Gθ3 Gθ2 G 21
22
where we used the simplified notation Gθj = Gj (q, θj ) and ˜ g n ). Because of the products between transfer ˜ g = G(q, G ij ij functions parametrized by θ, this cannot be written linearly in θ as in (12). Thus, the second point in Remark 1 is not satisfied, and WNSF cannot be applied directly. To solve this problem, we re-write (20) as θ g g ˜ G ˜ ˜g G1 G G Gθ2 12 11 12 , = ˜ g Gθ G ˜g ˜g G ˜g Gθ G G 1
3
22
12
21
(21)
22
where we replaced some of the products of θ by nonparametric models. Replacing Gθj with the respective numerator and denominator, we can re-write (21) as θ g ˜ − Lθ G ˜g ˜ g − Lθ F1 G F2θ G 1 12 2 11 12 (22) ˜ g − Lθ G ˜g F θG ˜ g − Lθ G ˜ g = 0, F θG 1
21
1
22
3
22
3
Q1 (g n ) 0 0 Q2 (g n ) 0 0 Q(g n ) = , Q3 (g n ) 0 0 n 0 0 Q4 (g )
5. WEIGHTED NULL-SPACE FITTING FOR CASCADE NETWORKS
(ij)
859
12
which is linear in θ, and can be written as (12) with
859
where
n n } Tn×m {g12 }] , Q1 (g n ) = [−Tn×m {Γn g11 I n Q2 (g n ) = −Tn×m {Γn g12 , } m×m 0n−m×m
n n } Tn×m {g22 }] , Q3 (g n ) = [−Tn×m {Γn g21 n n n Q4 (g ) = [−Tn×m {Γn g22 } Tn×m {g12 }] . Then, the estimate gˆn from Step 1 can be used to obtain an estimate of θˆLS with least squares, using (13).
Remark 2. The replacement from (20) to (21) to achieve linearity in θ is not unique. For example, g θ g ˜ G ˜g ˜ G Gθ2 G1 G 12 11 12 (23) ˜g G ˜g ˜ g Gθ G ˜g = G Gθ G 3
11
3
12
21
22
is also possible. As we will see in Section 6, simulations support that the replacement chosen does not change the asymptotic properties of the estimate. Remark 3. Although not all equations defined in (22) need to be used to obtain consistent estimates of all transfer functions (in this case, there are two equations that determine Gθ1 ), discarding “redundant” equations will have a cost in terms of efficiency. Step 3: Re-estimation of Parametric Model In the third step, we re-estimate θ by weighted least squares. The key aspect to derive the weighting matrix, as mentioned in ˜ g are Remark 1, is that the residuals of (22) when G ij replaced by estimates can be written linearly in the nonparametric estimate noise, as in (15). This will be possible because (22) can be written linearly in g n . In this case, T (θ) is given by T (θ) 02n×2n , T (θ) = 11 T21 (θ) T22 (θ) where Tn×n {f (1) } −Tn×n {b(1) } T11 (θ) = , Tn×n {f (2) } 0n×n 0 0n×n , T21 (θ) = n×n 0n×n −Tn×n {b(3) }} {f (1) } −Tn×n {b(1) } T , T22 (θ) = n×n Tn×n {f (3) } 0n×n (j)
(j)
(j)
(j)
with f (j) = [1 f1 . . . fm ] and b(j) = [b0 . . . bm−1 ] , while the covariance of ∆ng is given by (19). In the SISO case, it was given by (10), where λ was a scalar and could be neglected in the weighting. Now, this is replaced by Λ, which is a matrix. Thus, it cannot be ignored in the ˆ = weighting and must be estimated from data using Λ N 1 ˆ ˆ t=1 ε(t, θLS )ε (t, θLS ), where the off-diagonal elements N can then be replaced with zeros under the assumption that the noise sources are mutually independent. Then, if we ˆ we can denote by Pˆ the matrix (19) with Λ replaced by Λ, re-estimate θ with weighted least squares using (17), where W −1 (θ) = T (θ)Pˆ T (θ). (24) Because of the similarities with the SISO case and the theoretical analysis by Galrinho et al. (2018), there are
Miguel Galrinho et al. / IFAC PapersOnLine 51-15 (2018) 856–861
reasons to believe that θˆWLS is an asymptotically efficient estimate of θ. Also as in the SISO case, for finite sample size, it may be possible to improve the accuracy by iterating, where the weighting at some iteration is constructed using the estimate from the previous iteration. Algorithm 1 The following WNSF algorithm can be applied to identify the cascade network in Fig. 1: (1) estimate the non-parametric FIR model (18) with least squares (9); (2) estimate the parametric model with least squares (13), using (22) or (23) with the non-parametric estimate; (3) re-estimate the parametric model with weighted least squares (17), with (24) the weighting inverse. 6. SIMULATIONS In this section, we perform a simulation study to illustrate the performance of the method. Mainly, we observe how the simulation supports that the method is asymptotically efficient, and how it is robust against convergence to nonglobal minima of the PEM cost function. In the simulation, we use the network in Fig. 1 with: 0.7q −1 + 0.5q −2 0.6 − 0.2q −1 , G2 (q) = , G1 (q) = −1 −2 1−1.2q +0.5q 1−1.3q −1 +0.6q −2 0.6 + 0.8q −1 − 1.2q −2 . G3 (q) = 1 − 0.75q −1 + 0.56q −2 The noises {e1 (t), e2 (t)} have variances 2 and 3, respectively, and the inputs {u1 (t), u2 (t)} are given by uj (t) = (1 − 0.9q −1 )−1 euj (t), where {eu1 (t), eu2 (t)} are unit-variance mutually-uncorrelated Gaussian white-noise sequences. We compare the following methods: • minimization of the cost function (6) initialized at the true parameter values (PEM-true); • minimization of the cost function (6) initialized according to the procedure described in Section 3 (PEM-oe); • weighted null-space fitting according to the relations (22) (denoted WNSF-1); • weighted null-space fitting according to the relations (23) (denoted WNSF-3). For the PEM methods, the network structure is implemented with the MATLAB function greyest, and then minimized using the pem function, with a maximum of 1000 iterations. For the initialization of PEM-oe, we first estimate the over-parametrized MIMO OE model (7) using the MATLAB function oe with default initialization. Then, an estimate of G2 (q, θ) is readily available through G12 (q, ηˆ12 ), while estimates of G1 (q, θ) and G3 (q, θ) are obtained by performing model order reduction on the overparametrized estimates (8); for this, we use the MATLAB function reduce. For WNSF, we apply the algorithm for a grid of non-parametric model orders n = {20, 30, 40} (all the polynomials in the MIMO FIR were chosen to have the same order) and a maximum of 1000 iterations, and the parametric estimate that minimizes the PEM criterion (6) is chosen. This is the same approach that has been used for the SISO case (Galrinho et al., 2018). We perform 1000 Monte Carlo runs for seven sample sizes N logarithmically spaced between 300 and 60000 (rounded 860
100 MSE
2018 IFAC SYSID July 860 9-11, 2018. Stockholm, Sweden
10−2
PEM-oe PEM-true WNSF-1 WNSF-3
103
104 N Fig. 2. Average MSE as function of sample size. Table 1. Average computational times (in seconds) for several sample sizes. N PEM-oe PEM-true WNSF-1 WNSF-3
300 40 11 5.0 4.7
725 24 10 2.4 2.2
1754 19 9.1 1.7 1.6
4243 18 10 1.6 1.5
10260 18 12 1.8 1.8
24811 26 18 3.0 2.9
60000 41 31 6.1 6.1
to the nearest integer). The results are presented in Fig. 2, where we plot the average mean-square error (MSE = ||θˆ − θo ||2 ) as function of the sample size N , where θˆ is the estimate obtained for a particular method and sample size. In Table 1, we present the average computational times for each method and sample size N (although most methods show longer computational time for low N , this is probably unrelated to the network setting, but to the need for more iterations to converge when the data are less informative). We draw the following conclusions from these results. First, the two alternative procedures (21) and (23) to make the equations for WNSF linear perform similarly. Second, initializing PEM by estimating an over-parametrized MIMO OE model followed by model reduction often did not provide accurate enough estimates to attain the global minimum of the cost function, as the MSE is considerably different than when PEM is initialized at the true parameters. Third, WNSF is an appropriate method to avoid the non-convexity of PEM, as it did not converge to lowperformance non-global minima. Fourth, the computational time was considerably lower for WNSF than for PEM, even when initialized at the true parameters (for PEM-oe the times in Table 1 do not take into account the MIMO OE estimate for initialization). Fifth, as supported by the SISO analysis by Galrinho et al. (2018), the delineated network WNSF procedure seems to be asymptotically efficient. 7. DISCUSSION In this paper, we proposed a method for identification of networks with sensor noise. The method is based on WNSF, which has been shown to be asymptotically efficient for SISO models (Galrinho et al., 2018). A simulation supports that this extension of WNSF is still asymptotically efficient, and the aforementioned work gives theoretical support to this idea. We also saw that the method proposed allows for different formulations when constructing the equations that link the non-parametric and parametric models. Although these formulations should be asymptotically the same (as pointed out in Remark 2 and supported by the simulation), there could be differences in estimation accuracy for finite sample sizes may. More thorough simulations will be performed in the future to investigate this.
2018 IFAC SYSID July 9-11, 2018. Stockholm, Sweden
Miguel Galrinho et al. / IFAC PapersOnLine 51-15 (2018) 856–861
u2 (t) u1 (t)
G1 (q)
G2 (q)
e1 (t)
G3 (q) e2 (t)
y1 (t)
y2 (t)
Fig. 3. Another serial cascade network example. For simplicity of presentation, we introduced the method for the network in Fig. 1. We now address how the method generalizes to other networks. In particular, serial cascade networks for which all nodes with an input appear prior to all nodes with outputs are straightforwardly covered by the proposed algorithm. For space concerns, this case is not detailed here in general, but it will be considered in a future contribution. Moreover, if additional inputs or outputs exist, the algorithm is also applicable straightforwardly, but there will be additional equations to consider. Serial cascade networks that are not covered straightforwardly occur when not all nodes with inputs appear prior to all nodes with outputs. Consider the cascade network in Fig. 3. In this case, the relation between the parametric and non-parametric models would be given by g ˜ Gθ1 0 G 11 0 − (25) ˜ g = 0. ˜g G Gθ3 Gθ2 Gθ1 Gθ3 G 21 22
Although (25) can be made linear in θ by replacing ˜ g Gθ G ˜g Gθ3 Gθ2 Gθ1 = G 22 2 11 , a closed-form expression for the weighting is not available, because it will not be possible to write the residuals of (25) linearly in the non-parametric estimate noise (the third criterion in Remark 1 is not ˜g G ˜g satisfied). This is a consequence of the product G 11 22 . Solving this requires an intermediate step, which will be detailed in a future contribution. If the measurement noise is colored (and possibly correlated between the different outputs), a non-parametric ARX model should be estimated instead of an FIR, as in the SISO case (Galrinho et al., 2014). However, differently than the SISO case, products of non-parametric ARX polynomials will now always appear in Step 2, similarly to (25). This hinders successful application of Step 3, and solving it requires a similar approach as to solve the aforementioned case, which will be detailed in the same contribution. These contributions will complete the proposed algorithm for serial cascade networks. However, the same principles can be used to apply WNSF to other network structures affected by sensor noise. These contributions will then serve as basis to propose a more general network WNSF method. REFERENCES Dankers, A., Van den Hof, P., Bombois, X., and Heuberger, P. (2015). Errors-in-variables identification in dynamic networks—consistency results for an instrumental variable approach. Automatica, 62, 39–50. Dankers, A., Van den Hof, P.M., Bombois, X., and Heuberger, P.S. (2016). Identification of dynamic models in complex networks with prediction error methods: Predictor input selection. IEEE Transactions on Automatic Control, 61(4), 937–952. 861
861
Everitt, N., Bottegal, G., Rojas, C.R., and Hjalmarsson, H. (2016). Identification of modules in dynamic networks: An empirical bayes approach. In 55th IEEE Conference on Decision and Control, 4612–4617. Everitt, N., Rojas, C.R., and Hjalmarsson, H. (2013). A geometric approach to variance analysis of cascaded systems. In 52nd IEEE Conference on Decision and Control, 6496–6501. Everitt, N., Rojas, C.R., and Hjalmarsson, H. (2014). Variance results for parallel cascade serial systems. In 19th IFAC World Congress, 2317–2322. Forssell, U. and Ljung, L. (1999). Closed-loop identification revisited. Automatica, 35(7), 1215–1241. Galrinho, M., Rojas, C.R., and Hjalmarsson, H. (2018). Parametric identification using weighted null-space fitting. provisionally accepted at Transactions on Automatic Control (arXiv:1708.03946). Galrinho, M., Everitt, N., and Hjalmarsson, H. (2017). Incorporating noise modeling in dynamic networks using non-parametric models. In 20th IFAC World Congress, 4612–4617. Galrinho, M., Rojas, C.R., and Hjalmarsson, H. (2014). A weighted least-squares method for parameter estimation in structured models. In 53rd IEEE Conference on Decision and Control, 3322–3327. Gevers, M., Bazanella, A.S., and Parraga, A. (2017). On the identifiability of dynamical networks. In 20th IFAC World Congress, 10580–10585. H¨agg, P., Wahlberg, B., and Sandberg, H. (2010). On subspace identification of cascade structured systems. In 49th IEEE Conf. on Decision and Control, 2843–2848. Ljung, L. (1999). System Identification. Theory for the User. Prentice-Hall. Materassi, D. and Innocenti, G. (2010). Topological identification in networks of dynamical systems. IEEE Transactions on Automatic Control, 55(8), 1860–1871. S¨oderstr¨om, T. and Stoica, P. (1983). Instrumental Variable Methods for System Identification. Springer. S¨ oderstr¨om, T., Stoica, P., and Friedlander, B. (1991). An indirect prediction error method for system identification. Automatica, 27(1), 183–188. Van den Hof, P. and Schrama, R. (1993). An indirect method for transfer function estimation from closed loop data. Automatica, 29(6), 1523–1527. Van den Hof, P.M., Dankers, A., Heuberger, P.S., and Bombois, X. (2013). Identification of dynamic models in complex networks with prediction error methods—basic methods for consistent module estimates. Automatica, 49(10), 2994–3006. Van Overschee, P. and De Moor, B. (2012). Subspace identification for linear systems: Theory, Implementation, Applications. Springer Science & Business Media. Wahlberg, B., Hjalmarsson, H., and M˚ artensson, J. (2009). Variance results for identification of cascade systems. Automatica, 45(6), 1443–1448. Wahlberg, B. and Sandberg, H. (2008). Cascade structural model approximation of identified state space models. In 47th IEEE Conf. on Decision and Control, 4198–4203. Weerts, H.H., Dankers, A.G., and Van den Hof, P.M. (2015). Identifiability in dynamic network identification. In 17th IFAC Symposium on System Identification, 1409–1414.