A variational approach for the online dual estimation of spatiotemporal systems governed by the IDE

A variational approach for the online dual estimation of spatiotemporal systems governed by the IDE

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011 A variational a...

788KB Sizes 0 Downloads 18 Views

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011

A variational approach for the online dual estimation of spatiotemporal systems governed by the IDE A. Zammit Mangion ∗ G. Sanguinetti ∗∗ V. Kadirkamanathan ∗ ∗

Department of Automatic Control and Systems Engineering, University of Sheffield, UK (e-mails: [email protected], [email protected]) ∗∗ School of Informatics, University of Edinburgh, UK (e-mail: [email protected] ) Abstract: A dual variational Bayes filter for states and parameter estimation in IDE based spatiotemporal dynamic systems is developed. Recursive updates are obtained from a restricted variational Bayesian perspective, using a dual filtering formulation where parameters are allowed to evolve in time. The added benefit over conventional point estimate filters is that parameter distributions are readily available for one to take advantage of in the design of complex experiments or in adaptive control scenarios. The dual filter is evaluated in a simulation study and seen to perform favorably when compared to a standard SMC approach. Keywords: Variational Bayes, integro-difference equation, spatiotemporal modelling, system identifiction. 1. INTRODUCTION The stochastic integro-difference equation (IDE) is a convenient and practical representation of spatiotemporal systems which are of considerable interest to the engineering community (see Scerri et al., 2009, for a list of applications). Accurate and reliable inference of latent data and parameters associated with these models would be required when making use of such representations. The estimation is generally conditioned on some observation process from which measurements are drawn. The inference problem, as with any dynamic model, may be tackled by two broad classes of estimators - those in which data is considered as one batch, and those in which the data is treated sequentially. The former, whilst being elegant from a statistical viewpoint, may be exhausting in terms of memory requirements and computational speed (Capp´e and Moulines, 2009). Moreover, algorithms in this class are less adapted to online scenarios where data is arriving in real-time. Sequential estimators on the other hand, by nature of their implementation, are seen to require less computational resources (Krishnamurthy and Moore, 1993). They are ideal for online scenarios and may be adapted for situations where parameters are slowly changing (Weinstein et al., 1990). In the context of IDE based spatiotemporal systems, this would relate to a gradual alteration in the shape of the underlying connectivity kernel and resulting behavioural shifts in the field dynamics. A dual filter attempts to estimate, online, two distinct sets of variables. In the context of dynamic systems, these refer to a sequence of latent states and to a set of unknown static parameters, respectively. In cases when the uncertainty 978-3-902661-93-7/11/$20.00 © 2011 IFAC

over the parameters is negligible, the dual Kalman filtering technique of Wan and Nelson (2001), which considers two (standard, extended or unscented) Kalman filters running alternately, is accurate, fast and memory efficient. Online expectation-maximisation (EM) filters (Andrieu and Doucet, 2003) provide better performance since they make use of statistics over latent data (in an E-step) to find a maximum likelihood point estimate for the parameters (in the following M-step). At the other end of the spectrum, sequential Monte Carlo (SMC) filters, such as those by Kitagawa (1998) and Storvik (2002), are often employed when an estimation of the full posterior over the parameters is required. While these methods can often accurately approximate the posterior distribution, their computational cost may easily prove prohibitive in the study of spatiotemporal systems which usually involve a high combined state space and parameter space dimensionality. Dual variational Bayesian (VB) filters attempt to find a compromise between parameter point estimate filters ˇ ıdl and online Monte Carlo filters such as SMC filters (Sm´ and Quinn, 2005). The key advantage of VB filters over conventional point estimate filters is that they provide distributions over the unknown parameters, which could be useful in decision making or in an adaptive control scenario. In its elementary form, VB is not a suitable tool for sequential estimation, since, to remain faithful to the variational paradigm, all distributions estimated at previous time instants need to be re-estimated under the new, updated distributions given the new data points. However, recently, a new class of VB techniques ˇ ıdl has been developed, coined restricted VB (RVB) by Sm´ and Quinn (2006), which can be exploited for the efficient

3204

10.3182/20110828-6-IT-1002.02459

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

online estimation of dynamic systems. In this work we thus take advantage of RVB for the practical online inference of spatiotemporal systems. We show that by additionally constraining the variational parameter posterior, one can obtain a recursive algorithm in which only the last two variational posteriors of the states and parameters, respectively, are coupled. The approximate inference technique is fast, memory efficient, and readily provides a measure on parameter uncertainty. The remainder of the paper is organised as follows: section 2 outlines the dynamic model considered and formulates the VB framework for this work, section 3 applies the dual filter to IDE based spatiotemporal systems and section 4 provides a simulation example and a comparative study with a standard SMC approach. We conclude in section 5 by summarising the work.

dependency between the states but which assumes conditional independence between the states and the parameters. Moreover, the full joint distribution of the parameters is also factorised over time so that k Y p˜(Xk , Θk ) ≈ p˜(Xk |Yk ) p˜(θj |Yk ) = p˜(Xk |Yk )˜ p(Θk ), j=1

(3) where p˜(Θk ) is the product of the variational posterior distributions conditioned on data up to the current time instant k. Expressions for the variational state posterior distribution p˜(Xk | Yk ) and the variational parameter posterior distribution at the final time instant p˜(θk | Yk ) which maximise F can be found by taking the functional derivatives of F with respect to these distributions and setting them equal to zero. The maximisers are given ˇ ıdl and Quinn, 2005) by (Sm´

2. RECURSIVE INFERENCE FRAMEWORK

p˜(Xk |Yk ) ∝ exp(Ep(Θ ˜ k ) [ln p(Xk , Yk , Θk )]),

2.1 Dynamic model

(4a)

p˜(θk |Yk ) ∝ exp(Ep(X ˜ k |Yk )p(Θ ˜ k−1 ) [ln p(Xk , Yk , Θk )]), (4b)

In view of section 3, we consider the following dynamic model, xk = F (θk , xk−1 ) + wk−1 , (1) yk = G(θk , xk ) + vk , where xk ∈ Rn and yk ∈ Rm are the state and observation vectors at k respectively, with n, m ∈ Z+ . F (·) and G(·) are smooth functions and wk ∈ Rn and vk ∈ Rm are additive noise terms distributed as wk ∼ Nwk (0, Q) and vk ∼ Nvk (0, R), where Q ∈ Rn×n and R ∈ Rm×m . Nx (µ, Σ) denotes the normal distribution of x with mean µ and covariance matrix Σ. The states xk are treated as latent variables, and the unknown parameter vector θk ∈ Rd , d ∈ Z+ is assumed to be artificially evolved so that θk = θk−1 + ek−1 . (2) where ek−1 ∈ Rd is additive white Gaussian noise with time-varying covariance Σek−1 ∈ Rd×d . 2.2 The marginal likelihood and factorisation of the joint posterior distribution Let Xk , Yk be the set of states and observed data set up to time k respectively i.e. Xk = {xi }ki=0 and Yk = {yi }ki=1 . Similarly, let Θk = {θi }ki=1 . The variational framework is set up by considering the marginal log likelihood, which for any arbitrary conditional distribution p˜(Xk , Θk ) is given by Z Z ln p(Yk ) = ln dΘk dXk p(Xk , Yk , Θk )   Z Z p(Xk , Yk , Θk ) ≥ dΘk dXk p˜(Xk , Θk ) ln = F, p˜(Xk , Θk ) where the lower bound F is obtained by application of Jensen’s inequality. The free energy F is clearly maximised by setting p˜(Xk , Θk ) = p(Xk , Θk |Yk ), where the above inequality becomes an equality. However, this optimal joint posterior is generally intractable beckoning the need for approximations. As in Attias (1999), we choose a factorised approximation which preserves the conditional

where Ep(x) [·] denotes the expectation operator with respect to the distribution p(x). Since p˜(Xk |Yk ) needs to be re-estimated under the updated sequence of parameter distributions {˜ p(θj |Yk )}kj=1 , without any further restrictions it is very difficult to find a recursive solution to (4a) and (4b) for each new data. We therefore propose to further restrict the variational posteriors so that they are conditional only up to the time instant in which they were estimated so that p˜(Θk ) is now given as k Y p˜(θj |Yj ). (5) p˜(Θk ) = j=1

At each time step, the distributions p˜(Xk | Yk ) and p˜(θk |Yk ) are the usual variational posteriors, whilst {˜ p(θj |Yj )}k−1 j=1 are the restricted variational posteriors. In practice, each p˜(θj |Yj ) can be further factorised into l conditionally independent distributions i.e. p˜(θj |Yj ) = {˜ p(θji |Yj )}li=1 . Without any loss in generality, we proceed with the unfactorised parameter distribution l = 1. The results can be easily extended to the factorised case. The dual filtering problem under study is to find the variational state marginal p˜(xk | Yk ) and variational parameter posterior p˜(θk |Yk ) at each time step. We show how this is possible with the use of the chosen factorisations and restrictions by deriving a Bayesian recursive estimator in variational form. 2.3 The recursive estimator Referring to Theorem 1 in Zammit Mangion et al. (in press, 2011), from (4a) the variational approximation of the state marginal is given by Z p˜(xk |Yk ) ∝ dXk−1 exp(Ep(Θ ˜ k ) ln p(Xk , Θk , Yk )) Z (6) = dXk−1 exp(Ep(Θ ˜ k ) ln p(Xk−1 , Θk , Yk−1 )) × exp(Ep(Θ ˜ k ) ln p(xk |xk−1 , θk )p(yk |xk , θk )). Restriction (5) ensures that the distributions of the parameters do not need to be recomputed at each time step

3205

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

using the most recent available data. In particular one has that Z p˜(xk−1 |Yk−1 ) ∝

dXk−2 exp(Ep(Θ ˜ k ) ln p(Xk−1 , Θk , Yk−1 )).

This in turn establishes the recursion Z p˜(xk |Yk ) ∝

dxk−1 p˜(xk−1 |Yk−1 )

(7)

× exp(Ep(θ ˜ k |Yk ) ln p(xk |xk−1 , θk )p(yk |xk , θk )). From (4b), the required parameter variational posterior at each time step is obtained as p˜(θk |Yk ) ∝ exp(Ep(θ ˜ k−1 |Yk−1 ) ln p(θk |θk−1 )) × exp(Ep(X ˜ k |Yk ) ln p(xk |xk−1 , θk )p(yk |xk , θk )).

(8)

The properties exhibited by the recursion (7) and (8) are very desirable in an online context. Computation of the new state marginal posterior distribution requires knowledge of only the previous state marginal and the current parameter posterior distributions. Moreover, computation of the current parameter posterior distribution only requires knowledge of the parameter posterior distribution of the previous time step, the current state posterior marginal distribution and one smoothed state marginal distribution of the previous time step. Remark 1. Equations (7) and (8) are evidently coupled, and iterating between them gives the required solutions. It is important to note that iterations (iterative VB or IVB) are only needed for two (l + 1 if the parameter distribution is further factorised) posterior distributions and no backward propagation is required. On the other hand, if computation time is insufficient, the number of iterations can be limited or omitted completely. This gives the designer a trade-off between computational time and desired distribution accuracy. 3. THE VB DUAL FILTER FOR IDE BASED SPATIOTEMPORAL SYSTEMS The key motivation for this work is the online kernel estimation of the spatiotemporal IDE model such as that given in Dewar et al. (2009) and Scerri et al. (2009). A readily available measure of uncertainty over the parameters, as provided with standard Bayesian approaches, is usually a requirement when studying such systems. If the spatiotemporal field under study exhibits fast dynamics, say on the order of a few seconds or minutes, Monte Carlo methods may be too slow to be executed in a given time interval. The dual VB filter, which can readily give the required spatially distributed parameter uncertainty estimates, is thus ideal for the problem at hand. The homogeneous IDE in one spatial dimension 1 is given as follows Z zk (s) = kI (s − r)zk−1 (r)dr + ek (s), (9) Ω

where kI : R → R is a homogeneous redistribution kernel. The spatiotemporal random field s → zk (s) evolves 1 this restriction in spatial dimension is taken to facilitate the exposition and does not constitute any loss in generality.

Fig. 1. Space-time field generated by the evolution IDE of (9). All simulation parameters as given in section 4. over space s ∈ Ω ⊂ R and its random constituent is characterised by a temporally uncorrelated random noise function s → ek (s) equipped with some spatial, symmetric covariance operator Q on L2 (Ω). In particular we let R Qf = Ω kQ (s − r)f (r)dr and we assume kQ : R → R is Gaussian. The IDE can be used to model highly complex spatiotemporal patterns. For instance, in section 4 we will attempt to identify the kernel of the system under which the field of Fig. 1 is generated. As in Dewar et al. (2009) we decompose the field and the kernel using linearly independent Gaussian basis functions so that zk (s) ≈ φ(s)T xk , kI (s − r) ≈ θ T ψ(s − r).

(10)

The state vector xk ∈ Rn weights the field basis functions at a given time k to reconstruct the field. The state dimensionality n denotes the number of basis used for reconstruction. Hence we have that φ(s) = (s → φi (s))ni=1 . Similarly, the parameter vector weights the kernel basis functions ψ(s) = (s → ψi (s))di=1 . It is thus possible to project the IDE onto a linear dynamical system (LDS) of the form (Dewar et al., 2009) xk = A(θk )xk−1 + wk−1 , yk = Ck xk + vk ,

(11)

where θk has been assumed to evolve as in (2) and A(θk ) := Ak ∈ Rn×n . The observation equation at k is defined by the matrix Ck ∈ Rm×n which can be used to describe the motion of the sensors, the measurements of which are corrupted by additive white Gaussian noise. To keep the exposition simple, in this example we keep Ck fixed so that Ck := C ∀ k. Online optimal formulations of Ck based on the state and parameter uncertainty measures is the subject of future work. It can be shown, using conventional techniques, that for the dynamic model (11), from (7), p˜(xk |Yk ) = Nxk (µk , Σk ) where

3206

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

−1 Σ−1 + CT R−1 C− k = Q T −1 Q−1 Ep(θ , ˜ k |Yk ) [Ak ]ΛEp(θ ˜ k |Yk ) [Ak ]Q

#   "X d T −1 p˜(θk |Yk ) ∝ p˜(θk |θk−1 )exp tr Sk θk,i Vi Ψ−1 x Q i=1" " d # # d X X 1 T −1 −1 −1 − Wk−1 θk,i Vi Ψx Q Ψx θk,i Vi . 2 i=1 i=1

(12a)

µk = Σk CT R−1 yk −1 + Σk Q−1 Ep(θ ˜ k |Yk ) [Ak ]ΛΣk−1 µk−1 ,

(12b)

and  −1 T −1 Λ = Σ−1 Ak ] , ˜ k |Yk ) [Ak Q k−1 + Ep(θ

(13)

with µk ∈ Rn and Σk ∈ Rn×n . Note that this update makes use of parameter second moments in addition to point estimates and that, unlike in the full state marginal ˇ ıdl and Quinn, 2005, remark 7.8), factorisation case (see Sm´ the state covariance is dynamic. From Theorem 1 in Dewar et al. (2009) we have that n Ak = Ψ−1 x Ψθk where Ψx = (hφi , φj i)i,j=1 and Ψθk = (hφi , θkT Φ:,j i)ni,j=1 = (θkT hφi , Φ:,j i)ni,j=1 and each vector Φ:,· = (ψi ∗ φ· )di=1 where φ and ψ are basis functions used to decompose the field and the kernel respectively as in (10), h·, ·i denotes inner product and ∗ is the convolution operator. The notation (·i ·j ) is used to describe a matrix with the associated quantities at the i, j th entry and the subscript :, j is used to denote all the entries in column j.

T where the matrices Wk−1 = Ep(X ˜ k |Yk ) [xk−1 xk−1 ] and T Sk = Ep(X ˜ k |Yk ) [xk xk−1 ]. In order to obtain Wk−1 , the distribution of the smoothed state may be evaluated by marginalising out xk and Xk−2 from the approximate joint. To evaluate Sk , Schur complements on the joint of xk and xk−1 can be used to find the corresponding second moment. The interested reader is referred to Beal (2003) for details. The above expression can be further reduced to   1 p˜(θk |Yk ) ∝ p˜(θk |θk−1 ) exp θkT νk − θkT Υk θk , (15) 2 where −1 d νk = (tr(Sk ViT Ψ−1 ))i=1 , x Q −1 −1 Υk = (tr(Wk−1 ViT Ψ−1 Ψx Vj ))di,j=1 , x Q which are similar to the same quantities given in Dewar et al. (2009) in the offline EM case. The final step is to compute (15) to give the mean and variance of the estimates parameters as −1

Σθk = (λΣθk−1 + Υk )−1 ,

The projected field noise is given by Z −1 wk = Ψx φ(s)ek (s)ds,

−1



which, with ek being white in discrete time with covariance 2 and equipped with a covariance operator Q, has the σw projected covariance given by

µθk = Σθk (λΣθk−1 µθk−1 + νk ). (16b) The required second moment in (13) can now be computed from (16a) and (16b) as T −1 Ep(θ Ak ] ˜ k |Yk ) [Ak Q # " d #  "X d X T −1 −1 −1 = Ep(θ θk,i Vi Ψx Q Ψx θk,i Vi , ˜ k |Yk )

2 −1 Q = Ts σw Ψ−1 x Qn Ψx ,

where Ts is the data sampling interval and each element in Qn is the inner product Qn = (hφi , Qφj i)ni,j=1 . This quantity can be computed off-line for known covariance operator Q. The variance of ek−1 in the parameter evolution model (2) is a design choice. One common method is to make use of a forgetting factor λ to provide an exponentially decaying weight of past information. In this case, analogous to Wan and Nelson (2001), we let Σek−1



−1

(16a)

Σθk−1 ,

where Σθk−1 ∈ Rd×d is the covariance of the variational posterior of θk−1 i.e. p˜(θk−1 |Yk−1 ). For a given θk , Ak is defined and hence so are its first and second moments required in (12a), (12b) and (13), which are found through the variational parameter posterior. From (8) this distribution is given by p˜(θk |Yk ) ∝ p˜(θk |θk−1 )  (14)  1 T −1 ×exp − Ep(X ˜ k ) [(xk−Ak xk−1 ) Q (xk−Ak xk−1 )] , 2 where p˜(θk |θk−1 ) = exp(Ep(θ ˜ k−1 |Yk−1 ) ln p(θk |θk−1 )). Ak can be expressed explicitly in terms of θk by defining Pd Vh = (hφi , Φj,h i)ni,j=1 so that Ak = Ψ−1 x i=1 θk,i Vi . Upon substitution and application of the trace operator tr(·) in (14) we have that

i=1

i=1

and after rearranging it can be shown that this is given by T −1 Ep(θ Ak ] ˜ k |Yk ) [Ak Q =

d X

T Ep(θ ˜ k |Yk ) [θk θk ]

 i,j

−1 −1 [ViT Ψ−1 Ψx Vj ], x Q

(17)

i,j=1

where the second moment of the parameter vector is easily found through the relation T

T θ θ θ Ep(θ ˜ k |Yk ) [θk θk ] = Σk + µk µk .

As seen from (16a), a choice of λ = 1 ensures convergence in parameter estimation on the account of monotonically increasing precision (Υ is positive definite by the same reasoning of Lemma 3 in Dewar et al. (2009)). In practice this strict restriction can be lifted in exchange for more sophisticated learning strategies which make use of λ ∈ (0, 1]. From the derived statistics, the distributed mean and variance of the kernel can be found using conventional statistical techniques. As alluded to in remark 1, iterations between (12a),(12b) and (16a), (16b) may be required at each time step for increased distributional accuracy. A summary of the algorithm is given in Algorithm 1. 4. RESULTS The aim of the spatiotemporal identification problem is to estimate the field through xk and the mixing kernel

3207

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

Algorithm 1 The online dual VB algorithm for the IDE based spatiotemporal system

1.5

Start: Initialise the priors p(θ0 ) ∼ Nθ0 (0, κ−1 θ I) and p(x0 ) ∼ Nx0 (0, κ−1 I) where the precisions κ and κx are θ x small.

true kernel estimated kernel 95% confidence intervals

1.0

0.5

kernel kI (s-r)

for k = 1 to K set Ep(θ ˜ k |Yk ) [Ak ] = Ep(θ ˜ k−1 |Yk−1 ) [Ak−1 ]. T −1 set Ep(θ [A Q A k] ˜ k |Yk ) k T −1 = Ep(θ Ak−1 ]. ˜ k−1 |Yk−1 ) [Ak−1 Q ∗ while (not converged) Find Λ from (13). Find Σk from (12a). Find µk from (12b). T Find Wk−1 = Ep(X ˜ k |Yk ) [xk−1 xk−1 ]. Find Sk = Ep(X ˜ k |Yk ) [xk xk−1 ]. Find Σθk from (16a). Find µθk from (16b). Pd −1 θ Update Ep(θ ˜ k |Yk ) [Ak ] = Ψx i=1 µk,i Vi T −1 and Ep(θ Ak ] from (17). ˜ k |Yk ) [Ak Q

0.0

−0.5

−1.0

−1.5 −2.0

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

relative distance (s-r)

Return µk , Σk , µθk and Σθk . ∗

Convergence is quick in the presence of relatively low observation noise. To the detriment of distributional accuracy, the loop may be run once through only, should computational time become an issue. kI (s − r) through θ. The simulation was carried out over a domain Ω = {s; s ∈ [0, 5]} and the system was observed using 35 static sensors equally spaced in the spatial domain (m = 35). The covariance operator considered was an unnormalised Gaussian kernel of variance 0.01 and unit amplitude. The spatial domain was discretised using a set of 25 Gaussian basis functions weighted at each time point by the state vector xk (n = 25). The kernel for the problem at hand was assumed to be composed of three Gaussian functions centred at [0.2,0,-0.3] with variances [0.1,0.1,0.5] and amplitudes θ = [2, −1, −1]. Data was generated for 50 samples and observed in additive white Gaussian noise of variance 0.2I. The initial state and parameter prior distributions were set to Nx0 (0, 100I) and Nθ0 (0, 100I) and we made use of a forgetting factor λ = 0.99 to manage the time-varying covariance in (2). The estimated kernel for a sample dataset at the final time point K = 50 is shown in Fig. 2. The true value of the kernel is seen to lie consistently within the confidence limits. To study the potential of the proposed algorithm, we compared the temporal mean trajectories of the estimated variational posteriors to those given by a RaoBlackwellised particle filter (RBPF) (see Doucet et al., 2000; Sch¨ on et al., 2005). By exploiting the bilinear structure of the decomposed IDE, Rao-Blackwellisation was used to marginalise out the states using the standard Kalman filter, resulting in a dimensionality reduction of the space to be sampled from 28 to 3. The benefit of the algorithm proposed in this work over RBPFs is in the distributional accuracy vs. computational requirement trade-off. As seen in Table 1, the time required by the RBPF with N = 2000 particles, the minimum

Fig. 2. Graphical comparison between the true kernel (solid line) and the kernel as estimated by the dual VB filter (marked, bold line) at the final time instant. The 95% confidence intervals are given by the dashed lines. Table 1. Computation time required for the online parameter inference of the IDE kernel of (9) using the RBPF and the dual VB filter Filter employed

Time required

RBPF (N = 5000)

≈ 25mins

RBPF (N = 2000)

≈ 10mins

RBPF (N = 500)

2-3mins

dual VB filter

10-20s

R Note: Simulations were carried out on an Intel CoreTM 2Duo E8400 @ 3.00GHz personal computer with 2GB of RAM.

deemed necessary to obtain an adequate performance in this particular scenario, was several times more than that required by the dual VB filter. With N = 500, although still on average an order of magnitude slower than the dual VB filter, the RBPF frequently converged to erroneous estimates (see Fig. 3). A two-sample t-test confirmed that there was no significant difference between the distribution mean of the VB filter and the the RBPF with 5000 particles at the final time point over 50 Monte Carlo runs. In addition, the true mean absolute errors lied within the estimated two-sigma boundaries for both filters over the 50 runs, and were of the same order of magnitude (10−1 ). This result suggests that, at least for LDS, the proposed dual filter may prove to be very useful where computational time starts to become an issue, even in application domains other than spatiotemporal system identification. Since the RBPF has been used for the inference of several LDS recently, for instance in vehicle modeling (Li et al., 2007) and audio engineering (Andrieu and Godsill, 2000), the potential implications of this result are considerable. 5. CONCLUSION In this work, variational Bayes is used to derive an online dual filter for the online estimation of the stochastic IDE. Recursion is obtained through an appropriate choice of

3208

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

Fig. 3. (left) Posterior mean of θk = [θk,0 , θk,1 , θk,2 ] at every time instant using the dual VB filter. The cross, square and circle denote the mean estimates of θk,0 , θk,1 and θk,2 in that order. The true parameter vector represented by the level unmarked lines is [2,-1,-1]. Confidence intervals are omitted for sake of clarity. (centre) Same as left panel but for a RBPF with number of particles N = 5000. (right) Same as left panel but for a RBPF with N = 500. restricted variational posteriors so that only distributions at the final time instant are coupled. The algorithm is both fast and memory efficient and seen to give coherent estimates on the unknown parameters. The developed filter is a required development for a new class of problems associated with the study of spatiotemporal systems where the age-old problem of sensor planning is given a new perspective with the provision of both field and governing system parameter uncertainty online. It is thus envisioned that the dual VB filter, together with the uncertainty measures it provides, may prove to be very useful in providing sophisticated online sensor planning strategies for the identification of such systems. REFERENCES C Andrieu and A Doucet. Online expectationmaximization type algorithms for parameter estimation in general state space models. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, 2003. C. Andrieu and SJ Godsill. A particle filter for model based audio source separation. In Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation, Helsinki, Finland, 2000. H. Attias. Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pages 21–30, Stockholm, Sweden, 1999. M. J. Beal. Variational algorithms for approximate Bayesian inference. PhD thesis, University College London, London, UK, 2003. O. Capp´e and E. Moulines. On-line expectationmaximization algorithm for latent data models. Journal of the Royal Statistical Society Series B - Statistical Methodology, 71(3):593–613, Jun. 2009. M. Dewar, K.Scerri, and V. Kadirkamanathan. Datadriven spatio-temporal modeling using the integrodifference equation. IEEE Transactions on Signal Processing, 57(1):83–91, Jan. 2009. A. Doucet, N. De Freitas, K. Murphy, and S. Russell. Rao-Blackwellised particle filters for dynamic Bayesian networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, Stanford, California, USA, 2000.

G Kitagawa. A self-organizing state-space model. Journal of the American Statistical Association, 93(443):1203– 1215, Sep. 1998. V. Krishnamurthy and J. B. Moore. On-line estimation of hidden Markov model parameters based on the Kullback-Leibler information measure. IEEE Transactions on Signal Processing, 41(8):2557–2573, Aug. 1993. P. Li, R. Goodall, P. Weston, C. S. Ling, C. Goodman, and C. Roberts. Estimation of railway vehicle suspension parameters for condition monitoring. Control Engineering Practice, 15(1):43–55, Jan. 2007. K. Scerri, M.Dewar, and V. Kadirkamanathan. Estimation and model selection for an IDE-based spatio-temporal model. IEEE Transactions on Signal Processing, 57(2): 482–492, Feb. 2009. T. Sch¨on, F. Gustafsson, and P.J. Nordlund. Marginalized particle filters for mixed linear/nonlinear state-space models. IEEE Transactions on Signal Processing, 53 (7):2279–2289, 2005. G Storvik. Particle filters for state-space models with the presence of unknown static parameters. IEEE Transactions on Signal Processing, 50(2):281–289, Feb. 2002. ˇ ıdl and A. Quinn. The Variational Bayes Method in V. Sm´ Signal Processing. Springer Verlag, New York, 2005. ˇ ıdl and A. Quinn. The restricted variational Bayes V. Sm´ approximation in Bayesian filtering. In Proceedings of the IEEE Nonlinear Statistical Signal Processing Workshop, Cambridge, UK, 2006. E. A. Wan and A. T. Nelson. Dual extended Kalman filter methods. In S. Haykin, editor, Kalman Filtering and Neural Networks, pages 123–173. John Wiley & Sons, Inc., New York, USA, 2001. E. Weinstein, M. Feder, and A.V. Oppenheim. Sequential algorithms for parameter estimation based on the Kullback-Leibler information measure. IEEE Transactions on Acoustics Speech and Signal Processing, 38(9): 1652–1654, 1990. A. Zammit Mangion, K. Yuan, V. Kadirkamanathan, M. Niranjan, and G. Sanguinetti. Online variational inference for state-space models with point process observations. Neural Computation, in press, 2011.

3209