Blind system identification using kernel-based methods*

Blind system identification using kernel-based methods*

17th IFAC Symposium on SystemCenter Identification Beijing International Convention 17th IFAC Symposium on System Identification Beijing International...

490KB Sizes 2 Downloads 65 Views

17th IFAC Symposium on SystemCenter Identification Beijing International Convention 17th IFAC Symposium on System Identification Beijing International Convention Center October 19-21, 2015. Beijing, China 17th IFAC Symposium on SystemCenter Identification Available online at www.sciencedirect.com Beijing International Convention October 19-21, 2015. Beijing, China Beijing International Convention Center October 19-21, 2015. Beijing, China October 19-21, 2015. Beijing, China

ScienceDirect

IFAC-PapersOnLine 48-28 (2015) 466–471

Blind system identification Blind system identification Blind system identification using kernel-based methods Blind system identification using kernel-based methods  using kernel-based methods using ∗ kernel-based methods akan Hjalmarsson ∗ Giulio Bottegal Riccardo S. Risuleo ∗ H˚

Giulio Bottegal ∗∗∗ Riccardo S. Risuleo ∗∗∗ H˚ akan Hjalmarsson ∗∗∗ Giulio Bottegal Riccardo S. Risuleo H˚ akan Hjalmarsson ∗ ∗ ∗ ∗ S. Risuleo H˚ akan Hjalmarsson Giulio Bottegal ACCESS LinnaeusRiccardo Centre, School of Electrical Engineering, KTH ∗ Linnaeus Centre, School of Electrical Engineering, KTH ∗ ACCESS Royal Institute of Technology, Stockholm, Sweden ∗ Linnaeus Centre, School of Electrical Engineering, KTH ∗ ACCESS Royal Institute of Technology, Stockholm, Sweden ACCESS Linnaeus Centre, School of Electrical Engineering, KTH (e-mail: {bottegal; risuleo; hjalmars}@kth.se) Royal Institute of Technology, Stockholm, Sweden (e-mail: {bottegal; risuleo; hjalmars}@kth.se) Royal Institute of Technology, Stockholm, Sweden (e-mail: (e-mail: {bottegal; {bottegal; risuleo; risuleo; hjalmars}@kth.se) hjalmars}@kth.se) Abstract: We propose a new method for blind system identification (BSI). Resorting to a Abstract: We propose a new method for blind system identification (BSI). Resorting to a Gaussian regression framework, we model the impulse response of the unknown linear system Abstract: We propose a new method for blind system identification (BSI). Resorting to Gaussian regression framework, we model the impulse response of the unknown linear system Abstract: We propose a new process. method The for blind system identification (BSI). to ofaa as a realization of a framework, Gaussian structure of the covariance matrixResorting (or kernel) Gaussian regression we model the impulse response of the unknown linear system as a realization a framework, Gaussian The of thehas covariance matrix introduced (or kernel) of Gaussian regression we model thestructure impulsewhich response ofbeen the recently unknown linear system such a process isof given by theprocess. stable spline kernel, for as a realization a Gaussian process. The structure of the covariance matrix (or kernel) of such a identification process isof given by theand stable spline kernel, which has been recently introduced for as a realization of a Gaussian process. The structure of the covariance matrix (or kernel) of system purposes depends on an unknown hyperparameter. We assume that such a identification process is given by theand stable splineonkernel, which has been recentlyWe introduced for system purposes depends an unknown hyperparameter. assume that suchinput a process islinearly given by the stable spline kernel, which has been recently introduced for the can be described by few parameters. We estimate these parameters, together system identification purposes and depends on an hyperparameter. We assume that the input can behyperparameter linearly described parameters. We estimate these Bayes parameters, together system purposes and depends on an unknown unknown hyperparameter. We assume that with theidentification kernel andby thefew noise variance, using an empirical approach. The the input can be linearly described by few parameters. We estimate these parameters, together with theoptimization kernel thefew noise variance, using an empirical Bayes approach. the input can behyperparameter linearly described by parameters. estimate these scheme parameters, related problem isand efficiently solved with We a novel iterative basedtogether on The the with the kernel hyperparameter and the noise variance, using an empirical Bayes approach. related problem isand efficiently with using awenovel iterative based on The the with theoptimization kernel hyperparameter the noise variance, an empirical Bayes approach. The Expectation-Maximization (EM) method. Insolved particular, show that eachscheme iteration consists of related optimization problem is efficiently solved with a novel iterative scheme based on the (EM) method. Insolved particular, showiterative that each iteration consists of related optimization problem is efficiently withexperiments awenovel scheme based on the aExpectation-Maximization set of simple update rules. Through some numerical we show that the proposed Expectation-Maximization (EM) method. In particular, we show that each iteration consists a set of simple update rules. Through some numerical experiments we show that the proposed Expectation-Maximization (EM) method. In particular, we show that each iteration consists of of method give very promising performance. a set of simple update rules. Through method give very promising a set of simple update rules. performance. Through some some numerical numerical experiments experiments we we show show that that the the proposed proposed method give very promising performance. © 2015, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. method very promising performance. 1. give INTRODUCTION De Nicolao [2010], Pillonetto et al. [2011], Chen et al. 1. INTRODUCTION De Nicolao [2010],etPillonetto al. [2011], Chen et al. [2012], Pillonetto al. [2014]).et The main advantage of 1. INTRODUCTION De Nicolao [2010], Pillonetto et al. [2011], Chen et [2012], Pillonetto et al. [2014]). The main advantage of 1. INTRODUCTION De Nicolao [2010], Pillonetto et al. [2011], Chen et al. al. methods, compared to standard parametric methods In many engineering problems requiring data-driven mod- these [2012], Pillonetto et al. [2014]). The main advantage of these methods, compared to standard parametric methods In many engineering problems requiring data-driven mod[2012], Pillonetto et al. [2014]). The main advantage of [1999]), is that the to user is not required to select the eling of dynamical systems, the requiring experimenter may notmodhave (Ljung these methods, compared standard parametric methods In many engineering problems data-driven (Ljung [1999]), is that the user is not required to select the eling of dynamical systems, the experimenter may not have these methods, compared to standard parametric methods In many engineering problems requiring data-driven modmodel structure and order of the system, an operation that access to the inputsystems, data. Inthe these cases, standard system eling dynamical experimenter may not have (Ljung [1999]), is that the not required to the structure order ofknown theis anthe operation accessof to the input data.asInPEM these cases, [1999]) standard system (Ljungbe [1999]), isand that the isuser user issystem, notabout required to select selectthat the eling ofto dynamical the experimenter maycannot not have might difficult if little dynamics of identification toolssystems, such (Ljung be model access the input data. In these cases, standard system model structure and order of the system, an operation that be difficult ifwe little isofknown aboutanthe dynamics of identification tools such asInPEM (Ljung [1999]) cannot be might model structure and order the system, operation that access to the input data. these cases, standard system the system. Thus, model the impulse response of the applied and specific methods, namely blind system idenidentification tools such as PEM (Ljungblind [1999]) cannot be might be difficult if little is known about the dynamics of the system. Thus, we model the impulse response of the applied and specific methods, namely system idenmight be difficult if little is known about the dynamics of identification tools such as PEM (Ljung [1999]) cannot be system realization of a Gaussian tification (BSI) methods (or blind deconvolution, if one is unknown the system. Thus,as weaa model the impulse responserandom of the applied and specific methods, namely blind system idenunknown system as realization of a Gaussian random tification (BSI) methods (or blind deconvolution, if one is the system. Thus, we model the impulse response of applied and specific methods, namely blind system idenprocess, whose covariance matrix, or kernel, is given by the mainly interested in the input), need be employedif(Abedtification (BSI) methods (or blind deconvolution, one is unknown system as realization aa Gaussian random covariance matrix, orof is given by the mainly interested in the input), need be employedif(Abedunknown system as aa kernel realization ofkernel, Gaussian tification methods (or blind deconvolution, one is process, so calledwhose stable spline (Pillonetto and De random Nicolao Meraim et(BSI) al. [1997]). mainly interested in the input), need be employed (Abedprocess, whose covariance matrix, or kernel, is given by so called stable spline kernel (Pillonetto and De Nicolao Meraim et al. [1997]). process, whose covariance matrix, or kernel, is given by the the mainly interested in the input), need be employed (Abed- [2010], Bottegal and Pillonetto [2013]), and which encodes Meraim et al. [1997]). so called stable spline kernel (Pillonetto De Nicolao BSI finds applications in a wide range of engineering [2010], Bottegal andBIBO Pillonetto [2013]), which encodes so called stable spline kernel (Pillonetto and De Nicolao Meraim et al. [1997]). prior information on stability and smoothness. Such Bottegal on andBIBO Pillonetto [2013]), which encodes BSI finds applications in a wide range of engineering areas, suchapplications as image reconstruction (Ayers Dainty [2010], prior information stability and smoothness. Such Bottegal anda hyperparameter Pillonetto [2013]), which encodes BSI finds in a wide range of and engineering a[2010], kernel depends on which regulates the areas, such as image reconstruction (Ayers and Dainty prior information on BIBO stability and smoothness. Such BSI finds applications in a wide range of engineering [1988]), biomedical sciences (McCombie et al. [2005]) and a kernel depends on a hyperparameter which regulates the prior information on BIBO stability impulse and smoothness. Such areas, such as image reconstruction (Ayers and Dainty exponential decay of the generated responses. [1988]), biomedical sciences (McCombie et al. [2005]) and aexponential kernel depends on aathe hyperparameter which regulates the areas, such as image reconstruction (Ayers and Dainty in particular communications (Gustafsson and Wahlberg decay of generated impulse responses. a kernel depends on hyperparameter which regulates the [1988]), biomedical sciences (McCombie et al. [2005]) and in particular communications (Gustafsson and Wahlberg exponential decay of the generated impulse responses. [1988]), biomedical sciences (McCombie et al. [2005]) and In the kernel-based framework, the estimate of the impulse [1995], Moulines et al. [1995]), for which literally hundreds exponential decay of the generated impulse responses. in particular communications (Gustafsson and Wahlberg framework, estimate of thegiven impulse [1995], Moulines et al. [1995]), for which literally hundreds In the kernel-based in particular communications (Gustafsson and Wahlberg can be obtained as itsthe Bayes estimate the of methods have developed. It would be impossible [1995], Moulines etbeen al. [1995]), for which literally hundreds response In the kernel-based framework, the estimate of the impulse response can be obtained as its Bayes estimate given the of methods have been developed. It would be impossible In the kernel-based framework, the estimate of the impulse [1995], Moulines et al. [1995]), for which literally hundreds output data. However, when applied to BSI problems, such to give a thorough literature review here. response can be obtained as its Bayes estimate given the of have been developed. It here. would be impossible output data. However, when applied to BSI problems, such to methods give a thorough literature review response can be obtained as its Bayes estimate given the of methods have been developed. It would be impossible an estimator is a function of the kernel hyperparameter, output data. However, when applied to BSI problems, such to give a thorough literature review here.makes BSI prob- an estimator is a function of the kernel hyperparameter, The unavailability of the input signal output data. However, when applied to BSI problems, such to give a thorough literature review here. the parameters characterizing the input and the noise The unavailability of the input signal makes BSI probis function of hyperparameter, lems generally ill-posed. further information on an the parameters thekernel input and the noise an estimator estimator is aacharacterizing function of the the kernel The unavailability of theWithout input signal makes BSI probvariance. All these parameters need to behyperparameter, estimated from lems generally ill-posed. Without further information on the parameters characterizing the input and The unavailability of the input signal makes BSI probthe input sequence or the structure of the system, it is variance. All these parameters need to be estimated from the parameters characterizing the input and the the noise noise lems generally ill-posed. Without further information on data. In this paper, using empirical Bayes arguments, we variance. All these parameters need to be estimated from the input sequence or athe structure of the system, it on is data. In this paper, using empirical Bayes arguments, lems generally ill-posed. Without further information impossible to retrieve unique description ofsystem, the system we variance. All these parameters need to be estimated from the input sequence or the structure of the it is estimate such parameters by maximizing the marginal data. In this paper, using empirical Bayes arguments, we impossible to retrieve a unique description of the system the input or acircumvent the structure thepartially) system, itthis is estimate (Tong et al.sequence [1991]). To (atofleast such parameters by maximizing the marginal data. In this paper, usingmeasurements, empirical Bayes arguments, we impossible to retrieve unique description of the system likelihood of the output obtained by inte(Tong et al. [1991]). To circumvent (at least partially) this estimate such parameters by maximizing the marginal impossible to[1991]). retrieve acircumvent unique description of the system intrinsic non-uniqueness issue, we shall assume some prior likelihood ofthe the output measurements, obtained by inteestimateout such parameters bythe maximizing the marginal (Tong et al. To (at least partially) this grating dependence on system. In order to solve intrinsic non-uniqueness issue, we shall assume some prior the output obtained by inte(Tong et non-uniqueness al.on [1991]). To circumvent (at framework least partially) this likelihood knowledge the input. Following the of Ohlsgrating outof dependence on the system. Inisorder to solve likelihood ofthe the output measurements, measurements, obtained by inteintrinsic issue, we shall assume some prior the related optimization problem, which highly nongrating out the dependence on the system. In order to solve knowledge on theand input. Following the framework of Ohlsintrinsic non-uniqueness issue, we shall assume some prior son et al. [2014] Ahmed et al. [2014], we describe the the related optimization problem, whichInisorder highly nongrating out the dependence on theofsystem. to solve knowledge on the input. Following the framework of Ohlsconvex, involving a large number variables, we propose a son et al. [2014] and Ahmed et al. [2014], we describe the the related optimization problem, which is highly nonknowledge on the input. Following the framework of Ohlsinput sequence using a number of parameters considerably convex, involving a largescheme number of variables, we propose a the related optimization problem, which is Expectationhighly nonson et al. [2014] and Ahmed et al. [2014], we describe the novel iterative solution based on the input sequence using a number of parameters considerably convex, involving aa large number of variables, we propose a son et al. [2014] and Ahmed et al. [2014], we describe the smaller than the length of the input sequence; see Section novel iterative solution scheme based on the Expectationconvex, involving large number of variables, we propose a input sequence using a number of parameters considerably Maximization (EM) method (Dempster et al.Expectation[1977]). We smaller than the length of the input sequence; see Section novel iterative solution scheme based on the input sequence using a number of parameters considerably 2 for details and applications. Maximization (EM) method (Dempster et al. [1977]). We novel iterative solution scheme based on the Expectationsmaller than the length of the input sequence; see Section show that each(EM) iteration of (Dempster such a scheme a method et al.consists [1977]).of 2 for details and applications. smaller than and the length of the input sequence; see Section Maximization show thatofeach iteration ofwhich such can a scheme ofWe a Maximization (EM) method (Dempster al.consists [1977]). We 2 for main details applications. sequence simple updates beetperformed using The contribution of this paper is to propose a new show that each iteration of such a scheme consists of a 2The for main details and applications. sequence ofeach simple updates which can be performed using show that iteration of such a scheme consists of a contribution of this paper is to propose a new little computational efforts. Notably, our method is comBSI method. Our system modeling approach reliesaupon sequence of simple updates which can be performed using The main contribution of this paper is to propose new little computational efforts. Notably, our method to is using comsequence of simple updates which can be required performed BSI kernel-based method. Ourmethods system modeling approach reliesaupon pletely automatic, since the user is not tune The main contribution of this paper is to propose new the for linear system identification computational efforts. Notably, our method is BSI method. Ourmethods system modeling approach relies upon little pletely automatic, the user is with not to tune computational efforts. Notably, ourrequired method is comcomthe kernel-based for linear system identification any kind parameter.since This in contrast the BSI methods BSI method. Ourmethods system modeling approach relies upon recently introduced in a series of papers (Pillonetto and little pletely automatic, since the user is with not required to tune the kernel-based for linear system identification any kind parameter. This in contrast the BSI methods pletely automatic, since the user is [2014], not required to ettune recently introduced in a series of papers (Pillonetto and recently proposed in Ohlsson et al. Ahmed al. the kernel-based methods for linear system identification any kind parameter. This in contrast with the BSI methods recently introduced in a series of papers (Pillonetto and recently proposed inThis Ohlsson et al. with [2014], Ahmed et al. any kind parameter. in contrast the BSI methods [2014], where, although the identification process is based  This work recently introduced in a series of papers (Pillonetto and was supported by the European Research Council under recently proposed in Ohlsson et Ahmed et [2014], although the identification is based  This work was supported by the European Research Council under recently proposed Ohlsson et al. al. [2014], [2014], Ahmed et al. al. on the where, solution of ain convex problem, the process user is required the advanced grant LEARN, contract 267381 and by the Swedish [2014], where, although the identification process is based  This work was supported by the European Research Council under on the solution of a convex problem, the user is required the advanced grant LEARN, contract 267381 and by the Swedish [2014], where, although the identification process is based  to select some regularization parameters and the model Research Council under contract This work was supported by the621-2009-4017. European Research Council under on the solution of a convex problem, the user the advanced grant LEARN, contract 267381 and by the Swedish to some regularization parameters and is therequired model Research Council under contract 621-2009-4017. on select the solution of a convex problem, the user is required the advanced grant LEARN, contract 267381 and by the Swedish Research Council under contract 621-2009-4017. to select some regularization parameters and the to select some regularization parameters and the model model Research Council under contract 621-2009-4017.

Copyright © IFAC 2015 466 2405-8963 © IFAC (International Federation of Automatic Control) Copyright © 2015, IFAC 2015 466 Hosting by Elsevier Ltd. All rights reserved. Peer review©under of International Federation of Automatic Copyright IFAC responsibility 2015 466Control. Copyright © IFAC 2015 466 10.1016/j.ifacol.2015.12.172

2015 IFAC SYSID October 19-21, 2015. Beijing, China

Giulio Bottegal et al. / IFAC-PapersOnLine 48-28 (2015) 466–471

order. The method derived in this paper follows the same approach used in Risuleo et al. [2015], where a method for Hammerstein system identification is described. The paper is organized as follows. In the next section, we introduce the BSI problem and we state our working assumptions. In Section 3, we give a background on kernel-based methods, while in Section 4 we describe our approach to BSI. Section 5 presents some numerical experiments, and some conclusions end the paper.

We consider a SISO linear time-invariant discrete-time dynamic system (see Figure 1) +∞ 

gi ut−i + vt ,

(1)

i=0

where {gt }+∞ t=0 is a strictly causal transfer function (i.e., g0 = 0) representing the dynamics of the system, driven by the input ut . The measurements of the output yt are corrupted by the process vt , which is zero-mean white Gaussian noise with unknown variance σ 2 . For the sake of simplicity, we will also hereby assume that the system is at rest until t = 0. vt ut

g

+

and known frequencies ω1 , . . . , ωp . Then in this case we have   sin(ω1 ) . . . sin(ωp )   .. .. (4) H= , . . sin(N ω1 ) . . . sin(N ωp ) with ω1 , . . . , ωp such that H is full column rank. The vector x represents the amplitude of the sinusoids. Applications of this setting are found in blind channel estimation (Ahmed et al. [2014]). 2.1 Problem statement

2. BLIND SYSTEM IDENTIFICATION

yt =

467

yt

Fig. 1. Block scheme of the system identification scenario. We assume that N samples of the output measurements are collected, and denote them by {yt }N t=1 . The input u(t) is not directly measurable and only some information about it is available. More specifically, we assume we know −1 that the input, restricted to the N time instants {ut }N t=0 , N belongs to a certain subspace of R and thus can be written as u = Hx , (2) T

where u = [u0 . . . uN −1 ] . Here, H ∈ RN ×p is a known matrix with full column rank and x ∈ Rp , p ≤ N , is an unknown vector characterizing the evolution of u(t). Below we report two examples of inputs generated in this way. Piecewise constant inputs with known switching instants Consider a piecewise constant input signal u(t) with known switching instants T1 , T2 . . . Tp , with Tp = N . The levels the input takes in between the switching instants are unknown and collected in the vector x. Then the input signal can be expressed as u = Hx with H = diag {1T1 , 1T2 −T1 , . . . , 1Tp −Tp−1 } , H ∈ RN ×p (3) where 1m denotes a column vector of length m with all entries equal to 1. The vector x needs to be estimated from output data. Applications of BSI with piecewise constant inputs are found in room occupancy estimation (Ebadat et al. [2013]) and nonintrusive appliance load monitoring (NIALM) (Hart [1992], Dong et al. [2013]). Combination of known sinusoids Assume that u is composed by the sum of p sinusoids with unknown amplitude 467

We state our BSI problem as the problem of obtaining an estimate of the impulse response gt for n time instants, namely {gt }nt=1 , given {yt }N t=1 and H. Recall that, by choosing n sufficiently large, these samples can be used to approximate gt with arbitrary accuracy (Ljung and Wahlberg [1992]). To achieve our goal we will need to T estimate the input u = [u0 . . . uN −1 ] ; hence, we might also see our problem as a blind deconvolution problem. 2.2 Identifiability issues It is well-known that BSI problems are not completely solvable (see e.g. Abed-Meraim et al. [1997], Gustafsson and Wahlberg [1995]). This because the system and the input can be determined up to a scaling factor, in the sense that every pair (αu, α1 g), α ∈ R, can describe the output dynamics equally well. Hence, we shall consider our BSI problem as the problem of determining the system and the input up to a scaling factor. Another possible way out for this issue is to assume that g2 or g1 are known (Bai and Li [2004]). 3. KERNEL-BASED SYSTEM IDENTIFICATION In this section we briefly review the kernel-based approach introduced in Pillonetto and De Nicolao [2010], Pillonetto et al. [2011] and show how to readapt it to BSI problem. Let us first introduce the following vector notation         v1 g1 y1 u0  ..   ..   ..   ..  u :=  .  , y :=  .  , g :=  .  , v :=  .  vN gn uN −1 yN

and the operator Tn (·) that, given a vector of length N , maps it to an N × n Toeplitz matrix, e.g.   u0 0 ... 0 u0 0 ... 0   u1  . . ..  N ×n .   .. .. . Tn (u) =  .. .  ∈ R  u 0 N −2 uN −3 . . . uN −n+1 uN −1 uN −2 . . . ... uN −n We shall reserve the symbol U for Tn (u). Then, the inputoutput relation for the available samples can be written y = Ug + v . (5) In this paper we adopt a Bayesian approach to the BSI problem. Following a Gaussian process regression approach (Rasmussen and Williams [2006]), we model the impulse response as follows: g ∼ N (0, λKβ ) . (6)

2015 IFAC SYSID 468 October 19-21, 2015. Beijing, China

Giulio Bottegal et al. / IFAC-PapersOnLine 48-28 (2015) 466–471

Here Kβ is a covariance matrix whose structure depends on a shaping parameter β, and λ ≥ 0 is a scaling factor, which regulates the amplitude of the realizations from (6). Given the identifiability issue described in Section 2.2, λ can be arbitrarily set to 1. In the context of Gaussian regression, Kβ is usually called a kernel and its structure is crucial in imposing properties on the realizations drawn from (6). An effective choice of kernel for system identification purposes is given by the so-called stable spline kernels (Pillonetto and De Nicolao [2010], Pillonetto et al. [2011]). In particular, in this paper we adopt the first-order stable spline kernel (or TC kernel in Chen et al. [2012]), which is defined as {Kβ }i,j := β max(i,j) , (7) where β is a scalar in the interval [0, 1). Such a parameter regulates the decay velocity of the generated impulse responses. Recall the assumption introduced in Section 2 on the Gaussianity of noise. Due to this assumption, the joint distribution of the vectors y and g is Gaussian, provided that the vector x (and hence the input u), the noise variance σ 2 and the parameter β are given. Let us introduce the vector   ∈ Rp+2 , (8) θ := xT σ 2 β which we shall call hyperparameter vector. Then we can write         y  0 Σy Σyg p θ ∼ N , , (9) g  0 Σgy Kβ where Σyg = ΣTgy = U Kβ and Σy = U Kβ U T + σ 2 I. It follows that the posterior distribution of g given y (and θ) is Gaussian, namely p(g|y, θ) = N (Cy, P ) ,

where P =



UT U + Kβ−1 σ2

−1

,

C=P

(10) UT . σ2

(11)

From (10), the impulse response estimator can be derived as the Bayesian estimator (Anderson and Moore [1979]) gˆ = E[g|y, θ] = Cy .

(12)

Clearly, such an estimator is a function of θ, which needs to be determined from the available data y before performing the estimation of g. Thus, the BSI algorithm we propose in this paper consists of the following steps. (1) Estimate the hyperparameter vector θ. (2) Obtain gˆ by means of (12). In the next section, we discuss how to efficiently compute the first step of the algorithm. 4. ESTIMATION OF THE HYPERPARAMETER VECTOR An effective approach to choose the hyperparameter vector characterizing the impulse response estimator (12) relies on Empirical Bayes arguments (Maritz and Lwin [1989]). More precisely, since y and g are jointly Gaussian, an efficient method to choose θ is given by maximization of the marginal likelihood (Pillonetto and Chiuso [2014]), which is obtained by integrating out g from the joint 468

probability density of (y, g). Hence, an estimate of θ can be computed as follows: (13) θˆ = arg max log p(y|θ) . θ

Solving (13) in that form can be hard, because it is a nonlinear and non-convex problem involving a possibly large number (p + 2) of decision variables. For this reason, we propose an iterative solution scheme which resorts to the EM method. To this end, we define the complete likelihood L(y, g|θ) := log p(y, g|θ) , (14) which depends also on the missing data g. Then, the EM method provides θˆ by iterating the following steps: (E-step) Given an estimate θˆk after the k-th iteration of the scheme, compute (15) Q(θ, θˆk ) := E ˆk [L(y, g|θ)] ; p(g|y, θ )

(M-step) Compute θˆk+1 = arg max Q(θ, θˆk ) . θ

(16)

The iteration of these steps is guaranteed to converge to a (local or global) maximum of (13) (McLachlan and Krishnan [2007]), and the iterations can be stopped if θˆk+1 − θˆk 2 is below a given threshold.

Assume that, at iteration k + 1 of the EM scheme, the estimate θˆk of θ is available. Using the current estimate of the hyperparameter vector, we construct the matrices Cˆ k and Pˆ k using (11) and, accordingly, we denote by gˆk the estimate of g computed using (12), i.e. gˆk = Cˆ k y and the linear prediction of y as yˆk = U gˆk . Furthermore, let us define   Aˆk = H T RT (Pˆ k + gˆk gˆkT ) ⊗ IN RH ˆbk = H T TN (ˆ g k )T y ,

(17)

where R ∈ R

is a matrix such that, for any u ∈ RN : (18) Ru = vec(Tn (u)) . Having introduced this notation, we can state the following theorem, which provides a set of upgrade rules to obtain the hyperparameter vector estimate θˆk+1 . Theorem 1. Let θˆk be the estimate of the hyperparameter vector after the k-th iteration of the EM scheme. Then  k+1,T 2,k+1 k+1  (19) θˆk+1 = x ˆ σ ˆ βˆ can be obtained performing the following operations: N n×N

• The input estimate is updated computing (20) x ˆk+1 = (Ak )−1 bk ; • The noise variance is updated computing   1  ˆ k+1 Pˆ k U ˆ k+1,T y − yˆk 22 + Tr U σ ˆ 2,k+1 = , N (21) ˆ k+1 denotes the Toeplitz matrix of the sewhere U quence u ˆk+1 = H x ˆk+1 ; • The kernel shaping parameter is updated solving (22) βˆk+1 = arg min Q(β, θˆk ) , β∈[0, 1)

where   Q(β, θˆk ) = log det Kβ +Tr K −1 (Pˆ k + gˆk gˆkT ) . β

(23)

2015 IFAC SYSID October 19-21, 2015. Beijing, China

Giulio Bottegal et al. / IFAC-PapersOnLine 48-28 (2015) 466–471

Hence, the maximization problem (13) reduces to a sequence of very simple optimization problems. In fact, at each iteration of the EM algorithm, the input can be estimated by computing a simple update rule available in closed-form. The same holds for the noise variance, whereas the update of the kernel hyperparameter β does not admit any closed-form expression. However, it can be retrieved by solving a very simple scalar optimization problem, which can be solved efficiently by grid search, since the domain of β is the interval [0, 1). It remains to establish a way to set up the initial estimate θˆ0 for the EM method. This can be done by just randomly choosing the entries of such a vector, keeping ˆ 2,0 > 0. the constraints βˆ0 ∈ [0, 1), σ Below, we provide our BSI algorithm.

Algorithm: Bayesian kernel-based EM Blind System Identification Input: {yt }N t=1 , H Output: {ˆ g }nt=1 , {ˆ ut }N t=1  0T 2,0 0  (1) Initialization: randomly set θˆ0 = x ˆ σ ˆ βˆ (2) Repeat until convergence: (a) E-step: update Pˆ k , Cˆ k from (11) and gˆk from (12); (b) M-step: update the parameters: • x ˆk+1 from (20); • σ ˆ k+1 from (21), • βˆk+1 from (22) (3) Compute {ˆ g }nt=1 from (12) and {ˆ ut } N ˆ; t=1 = H x

469

• NB-KB. This is the Bayesian kernel-based system identification method introduced in Pillonetto and De Nicolao [2010] and revisited in Chen et al. [2012]. Like the estimator NB-LS, this method has knowledge of the input and in fact corresponds to the estimator B-KB when x is known and not estimated. The performance of the estimators are evaluated by means of the output fitting score F IT = 1 −

ˆi gˆi − Ui gi 2 U , Ui gi − Ui gi 2

(24)

ˆi are the where, at the i-th Monte Carlo run, Ui and U ˆi = Toeplitz matrices of the true and estimated inputs (U Ui if the method needs not estimate x), gi and gˆi are the true and estimated systems and Ui gi is the mean of Ui gi . Figure 2 shows the results of the six group of experiments. As expected, the estimator NB-KB, which has access to the true input, gives the best performance for all the values of p and in fact is independent of such a value. Surprisingly, for p = 10 and p = 20, the proposed BSI method outperforms the least squares estimator that knows the true input. An example of one such Monte Carlo experiments in reported in Figure 3. As one might

Input Signal

0.12

true B−KB

0.1 0.08 0.06 0.04 0.02

5. NUMERICAL EXPERIMENTS

0

0

20

40

60

80

100

120

140

160

180

200

Impulse Response

We test the proposed BSI method by means of Monte Carlo experiments. Specifically, we perform 6 groups of simulations where, for each group, 100 random systems and input/output trajectories are generated. The random systems are generated by picking 20 zeros and 20 poles with random magnitude and phase. The zero magnitude is less than or equal to 0.95 and the pole magnitude is no larger than 0.92. The inputs are piecewise constant signals and the number of switching instants is p = 10, 20, 30, 40, 50, 60, depending on the group of experiments. We generate 200 input/output samples per experiment. The output is corrupted by random noise whose variance is such that σ 2 = var(U g)/10, i.e. the noise variance is ten times smaller than the variance of the noiseless output. The goal of the experiments is to estimate n = 50 samples of the generated impulse responses. We compare the following three estimators. • B-KB. This is the proposed Bayesian kernel-based BSI method that estimates the input by marginal likelihood maximization, using an EM-based scheme. The convergence criterion of the EM method is θˆk+1 − θˆk 2 < 10−3 . • NB-LS. This is an impulse response estimator based on the least squares criterion. Here, we assume that the input is known, so the only quantity to be estimated is the system. Hence, this corresponds to an unbiased FIR estimator (Ljung [1999]). 469

0.6

true B−KB

0.4 0.2 0 −0.2 −0.4 5

10

15

20

25

30

35

40

45

50

Output

15

True B−KB

10 5 0 −5 0

20

40

60

80

100

120

140

160

180

200

Fig. 3. Example of one Monte Carlo realization with p = 20. Top panel: true vs estimated (normalized) input. Middle panel: true vs estimated (normalized) impulse response. Bottom panel: true vs predicted output.

expect, there is a performance degradation as p increases, since the blind estimator has to estimate more parameters. Figure 4 shows the median of the fitting score of each group of experiments as function of p. It appears that, approximately, there is a linear trend in the performance degradation.

2015 IFAC SYSID 470 October 19-21, 2015. Beijing, China

p = 10

p = 20

1

0.95

0.9

0.9

0.9

0.85

0.85

0.85

Fit

0.95

0.8

0.8

0.8

0.75

0.75

0.75

0.7

B−KB

NB−LS

0.7

NB−KB

p = 40

1

B−KB

NB−LS

0.7

NB−KB

p = 50

1

0.9

0.85

0.85

0.85

Fit

0.95

0.9 Fit

0.95

0.9

0.8

0.8

0.8

0.75

0.75

0.75

0.7

0.7

NB−LS

NB−KB

B−KB

NB−LS

B−KB

NB−KB

0.7

NB−LS

NB−KB

p = 60

1

0.95

B−KB

p = 30

1

0.95

Fit

Fit

1

Fit

Giulio Bottegal et al. / IFAC-PapersOnLine 48-28 (2015) 466–471

B−KB

NB−LS

NB−KB

Fig. 2. Results of the Monte Carlo simulations for different values of p, namely the number of different levels in the input signals. Appendix A. PROOF OF THEOREM 1

0.9 0.88

Fit

0.86 0.84 0.82 0.8 0.78 10

20

30

p

40

50

60

Fig. 4. Median of the fitting score for each group of Monte Carlo simulations as function of p. 6. CONCLUSIONS In this paper we have proposed a novel blind system identification algorithm. Under a Gaussian regression framework, we have modeled the impulse response of the unknown system as the realization of a Gaussian process. The kernel chosen to model the system is the stable spline kernel. We have assumed that the unknown input belongs to a known subspace of the input space. The estimation of the input, together with the kernel hyperparameter and the noise variance, has been performed using an empirical Bayes approach. We have solved the related maximization problem with the help of the EM method, obtaining a set of update rules for the parameters which is simple and elegant, and permits a fast computation of the estimates of the system and the input. We have shown, through some numerical experiments, very promising results. We plan to extend the current method in two ways. First, a wider class of models of the system, such as BoxJenkins models, will be considered. We shall also attempt to remove the assumption on the input belonging to a known subspace by adopting suitable Bayesian models. 470

First note that p(y, g|θ) = p(y|g, θ)p(g|θ). Hence we can write the complete likelihood as L(y, g|θ) = log p(y|g, θ) + log p(g|θ) (A.1) and so 1 N 2 L(y, g|θ) = − log σ 2 − 2 y − U g 2 2σ 1 1 − log det Kβ − g T Kβ−1 g 2 2  1  T N 2 = − log σ − 2 y y + g T U T U g − 2y T U g 2 2σ 1 1 − log det Kβ − g T Kβ−1 g . 2 2 We now proceed by taking the expectation of this expression with respect to the random variable g|y, θˆk . We obtain the following components   N N 2 (a) : E − log σ = − log σ 2 2 2   1 T 1 T (b) : E − 2 y y = − 2 y y 2σ 2σ     1 1 g k gˆkT ) (c) : E − 2 g T U T U g = Tr − 2 U T U (Pˆ k +ˆ 2σ 2σ   1 T k 1 T (d) : E 2 y U g = 2 y U gˆ σ σ   1 1 (e) : E − log det Kβ = − log det Kβ 2 2    1  1 (f ) : E − g T Kβ−1 g = − Tr Kβ−1 (Pˆ k + gˆk gˆkT ) 2 2 k It follows that Q(θ, θˆ ) is the summation of the elements obtained above. By inspecting the structure of Q(θ, θˆk ), it can be seen that such a function splits in two independent terms, namely (A.2) Q(θ, θˆk ) = Q1 (x, σ 2 , θˆk ) + Qβ (β, θˆk ) ,

2015 IFAC SYSID October 19-21, 2015. Beijing, China

Giulio Bottegal et al. / IFAC-PapersOnLine 48-28 (2015) 466–471

471

R. Dong, L. Ratliff, H. Ohlsson, and S.S. Sastry. A dynamical systems approach to energy disaggregation. In Proc. of the IEEE Decision and Control Conference (CDC), pages 6335–6340. IEEE, 2013. (A.4) A. Ebadat, G. Bottegal, D. Varagnolo, B. Wahlberg, and depends only on β and corresponds to (23). We now K.H. Johansson. Estimation of building occupancy address the optimization of (A.3). To this end we write levels through environmental signals deconvolution. In Proc. of the 5th ACM Workshop on Embedded Systems 1 2 ˆk k 2 k (A.5) Q1 (x, σ , θ ) = 2 Qx (x, θˆ ) + Qσ2 (σ , θˆ ) For Energy-Efficient Buildings, pages 1–8, 2013. σ    F. Gustafsson and B. Wahlberg. Blind equalization by  1 1 = 2 Tr − U T U (Pˆ k +ˆ g k gˆkT ) + y T U gˆk direct examination of the input sequences. IEEE Trans. σ 2 on Communications, 43(7):2213–2222, 1995. 1 T N 2 G.W. Hart. Nonintrusive appliance load monitoring. Proc. log σ − 2 y y . − 2 2σ of the IEEE, 80(12):1870–1891, 1992. L. Ljung. System Identification, Theory for the User. This means that the optimization of Q1 can be carried out Prentice Hall, 1999. first with respect to x, optimizing only the term Qx , which L. Ljung and B. Wahlberg. Asymptotic properties of the 2 is independent of σ and can be written in a quadratic form least-squares method for estimating transfer functions 1 and disturbance spectra. Advances in Applied ProbabilQx (x, θˆk ) = xT Aˆk x + ˆbkT x . (A.6) 2 ity, pages 412–440, 1992. To this end, first note that, for all v1 ∈ Rn , v2 ∈ Rm , J. S. Maritz and T. Lwin. Empirical Bayes Method. Chapman and Hall, 1989. (A.7) Tm (v1 )v2 = Tn (v2 )v1 . D.B. McCombie, A.T. Reisner, and H.H. Asada. LaguerreRecalling (18), we can write model blind system identification: cardiovascular dy  namics estimated from multiple peripheral circulatory g k gˆkT ) =vec(U )T((Pˆ k+ˆ g k gˆkT ) ⊗ IN )vec(U ) Tr U T U (Pˆ k+ˆ signals. IEEE Trans. on Biomedical Engineering, 52   1 (11):1889–1901, 2005. (A.8) = − uT RT (Pˆ k + gˆk gˆkT ) ⊗ IN Ru 2 G. McLachlan and T. Krishnan. The EM algorithm and  1 T T T  ˆk k kT extensions, volume 382. John Wiley & Sons, 2007. = − x H R (P + gˆ gˆ ) ⊗ IN RHx , 2 E. Moulines, P. Duhamel, J.F. Cardoso, and S. Mayrargue. Subspace methods for the blind identification of multiwhere the matrix in the middle corresponds to Aˆk defined channel fir filters. IEEE Trans. on Signal Processing, 43 in (17). For the linear term we find (2):516–525, 1995. T k T k T k g )u = y TN (ˆ g )Hx , (A.9) H. Ohlsson, L.J. Ratliff, R. Dong, and S.S. Sastry. Blind y U gˆ = y TN (ˆ identification via lifting. In Proc. of the 19th IFAC so that the term ˆbkT in (17) is retrieved and the maximizer World Congress, 2014. x ˆk+1 is as in (20). Plugging back x ˆk+1 into (A.3) and maximizing with respect to σ 2 we easily find σ ˆ 2,k+1 G. Pillonetto and A. Chiuso. Tuning complexity in kernelbased linear system identification: The robustness of the corresponding to (21). This concludes the proof. marginal likelihood estimator. In Proc. of the European Control Conference (ECC), pages 2386–2391, 2014. REFERENCES G. Pillonetto and G. De Nicolao. A new kernel-based K. Abed-Meraim, W. Qiu, and Y. Hua. Blind system approach for linear system identification. Automatica, identification. Proc. of the IEEE, 85(8):1310–1322, 1997. 46(1):81–93, 2010. A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution G. Pillonetto, A. Chiuso, and G. De Nicolao. Prediction using convex programming. IEEE Trans. on Informaerror identification of linear systems: a nonparametric tion Theory, 60(3):1711–1732, March 2014. Gaussian regression approach. Automatica, 47(2):291– B.D.O. Anderson and J.B. Moore. Optimal Filtering. 305, 2011. Prentice-Hall, Englewood Cliffs, N.J., USA, 1979. G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and G.R. Ayers and J.C. Dainty. Iterative blind deconvolution L. Ljung. Kernel methods in system identification, method and its applications. Optics letters, 13(7):547– machine learning and function estimation: A survey. 549, 1988. Automatica, 50(3):657–682, 2014. E.W. Bai and D. Li. Convergence of the iterative Ham- C.E. Rasmussen and C.K.I. Williams. Gaussian Processes merstein system identification algorithm. IEEE Trans. for Machine Learning. The MIT Press, 2006. on Automatic Control, 49(11):1929–1940, 2004. R.S. Risuleo, G. Bottegal, and H. Hjalmarsson. A kernelG. Bottegal and G. Pillonetto. Regularized spectrum based approach to Hammerstein system identification. estimation using stable spline kernels. Automatica, 49 In Proc. of the 16th IFAC Symp. on System Identifica(11):3199–3209, 2013. tion, 2015. T. Chen, H. Ohlsson, and L. Ljung. On the estimation L. Tong, R. Liu, V.C. Soon, and Y.F. Huang. Indeterof transfer functions, regularizations and Gaussian prominacy and identifiability of blind identification. IEEE cesses - revisited. Automatica, 48(8):1525–1535, 2012. Trans. on Circuits and Systems, 38(5):499–509, 1991. A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pages 1–38, 1977. where

Q1 (x, σ 2 , θˆk ) = (a) + (b) + (c) + (d) is function of x and σ 2 , while Qβ (β, θˆk ) = (e) + (f )

(A.3)

471