Application of Unscented Transformation in Nonlinear System Identification

Application of Unscented Transformation in Nonlinear System Identification

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011 Application of ...

444KB Sizes 0 Downloads 26 Views

Proceedings of the 18th World Congress The International Federation of Automatic Control Milano (Italy) August 28 - September 2, 2011

Application of Unscented Transformation in Nonlinear System Identification ⋆ Matej Gaˇ sperin ∗ , -Dani Juriˇ ci´ c∗ ∗

Joˇzef Stefan Institute, Jamova 39, Ljubljana, Slovenia (e-mail: [email protected]).

Abstract: This article addresses the problem of system identification of nonlinear dynamical statespace models from input and output data. The problem is tackled by using Expectation Maximization (EM) algorithm for calculating the Maximum Likelihood (ML) estimates of the model parameters. The novelty of the presented algorithm is related to an efficient employment of the Unscented Transformation (UT) in the expectation step, which lowers the number of computations required. This property enables the algorithm to cope with high-dimensional models without a significant increase in computational load. The overall performance of the algorithm is demonstrated using both numerical and real data examples. Keywords: Nonlinear System Identification; Maximum Likelihood Estimate; Unscented Transformation; Expectation-Maximization; 1. INTRODUCTION System identification of nonlinear dynamic systems is recognized as a challenging problem (Ninnes [2009]). The attention that the field has received in the past decades resulted in a broad spectrum of approaches (Ljung [2008]). In this paper, we follow the ideas laid down by Gibson and Ninness [2005], Sch¨ on et al. [2011], which is to employ the Expectation-Maximization (EM) algorithm for obtaining the Maximum-Likelihood (ML) estimate of the model parameters. The novelty in our work comes from the use of approximation techniques, namely Unscented Transformation, to compute the required conditional expectations. The method presented in this paper can be applied to the identification of a very general class of systems, which can be represented in state-space form. The advantage of this formulation is that almost any system, usually described by a set of differential equations, can be transformed to the state-space representation.

Different estimators for unknown model parameters can be constructed. This paper focuses on the MaximumLikelihood (ML) estimators, due to their attractive properties, e.g. asymptotic unbiasedness, asymptotic minimum variance (Dekking et al. [2005]) as well as their general acceptance. This is a well established approach, especially in the case of linear dynamical systems, where different gradient based search techniques are commonly used to calculate the maximum of the data likelihood function (Ljung [1999]). An interesting alternative that has been presented in Gibson and Ninness [2005], Shumway and Stoffer [2005] is to adopt the Expectation-Maximization (EM) algorithm to compute the ML estimate of the model parameters.

(1)

The EM algorithm has so far been successfully applied to system identification of different system setups (McLachlan and Krishnan [2008]). In the case of linear models with additive Gaussian noise, it has been shown that a closed form expression for ML estimate of the model parameters can be formed (Shumway and Stoffer [2005]). This result has also been extended to bilinear systems by Gibson and Ninness [2005]. Furthermore, Andrieu et al. [2004] have considered the on-line estimation, using EM algorithm and split data.

Here xt ∈ Rn is the state vector, yt ∈ Rm is the system output or measurement vector, ut ∈ Rk denotes the measured system input, f (· ) and g(· ) are nonlinear functions parameterized by parameter vector θ ∈ Rk . Finally wt and vt are zero-mean, mutually independent stochastic processes, represented by probability density functions (pdf’s) pw (· ) and pv (· ). The pdf’s are assumed to be normal with unknown covariance matrices that can be included into θ if necessary.

On the other hand, the nonlinear models pose a far grater estimation problem. The problem arises from the fact, that a nonlinear transformation of a random variable may result in a posterior probability density function, that can not be described by a finite number of moments. Exact solutions of such problems only exist for a limited class of nonlinearities, such as Beneˇs type (Elliott and Haykin [2010]). Therefore, any identification algorithm, for general nonlinear models, has to rely on approximations.

In mathematical terms, the state-space model representation is of the following form: xt+1 = f (xt , ut , θ ) + wt yt = g(xt , ut , θ ) + vt

⋆ Support from the Slovenian Research Agency trough the Research Programme P2-00001 is gratefully acknowledged.

978-3-902661-93-7/11/$20.00 © 2011 IFAC

A versatile algorithm, capable to solve the problem, has been proposed recently by Wills et al. [2008], Sch¨on

4428

10.3182/20110828-6-IT-1002.03024

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

et al. [2011]. The approach is based on so-called particle smoother and sequential Monte Carlo method, and therefore takes no prior constraints on model structure and noise distributions. Although versatile, the particle based approach may suffer from relatively large computational complexity as relatively large amount of particles might need to be propagated through the nonlinear functions in order to achieve sufficient accuracy. To alleviate the problem of computational load, we introduce an algorithm based on Unscented Transformation (Julier [2002], van der Merwe [2004]). The approximations are used to solve the smoothing problem and also to approximate the conditional expectations of a nonlinear function. However, the introduction of approximation methods, such as unscented transformation, introduces an error in the estimate, which depends on the properties of the nonlinearities in the model (van der Merwe [2004]). The developed algorithm will therefore be tested under different setups, which demonstrate its efficiency and also highlight its limitations. 2. STATE-SPACE SYSTEM IDENTIFICATION In this section we will address the following problem: given realization YT = {y1 , y2 , . . . , yT } of the stochastic process, represented by a state-space model (1), with known structure of the nonlinear functions f (· ) and g(· ), our aim is to find the ML estimate of the unknown parameter vector θ . Without the loss of generality, the algorithm will be derived under the assumption that the model has no measurable inputs. 2.1 Maximum likelihood estimate Optimal values of the model parameters can be obtained in different manners, depending on the definition of optimality. One of them is the Maximum Likelihood (ML) criterion, which can be understood as the selection of parameter values θ = θ ∗ , where the value of data likelihood function pθ (Y) is a maximum. This estimate of θ is known as the maximum likelihood (ML) estimate. However, in most cases it proves to be more numerically efficient to search for the optimum of the log-likelihood function of the parameters, Lθ (YT ) , log pθ (YT ). (2) Since log is a strictly increasing function of the argument, the value of θ that maximizes Lθ (YT ), also maximizes pθ (YT ). However, when the system output also depends on the vector of system states (XT = {x1 , x2 , . . . , xT }), the direct maximization of log-likelihood function is not possible if the information about the state vector is not available. Classical maximum likelihood methods do not provide the solution to joint state and parameter estimation problem, as considered in this paper.

In order to evaluate the measured data likelihood function Lθ (YT ), it is clear that besides maximization of the function with respect to θ , the estimation of state vector (XT ) is also required. The solution for this problem was introduced by Dempster et al. [1977]. The ExpectationMaximization algorithm was derived to solve the ML estimation problem using incomplete or missing data. If the states XT are considered as missing data, this algorithm can be successfully deployed to solve the described system identification problem. 2.2 Expectation-Maximization algorithm EM algorithm solves the problem of simultaneously estimating system states and model parameters by alternating between two steps. Firstly, it approximates the likelihood function with its expected value over the missing data (E-step), and secondly, maximizes the likelihood function w.r.t. θ (M-step). A short overview of the algorithm will be presented, while a more detailed explanation can be found in Haykin [2001], Gibson and Ninness [2005]. Taking logarithms of (3) and rearranging it, we can write the following relation between classical and complete data log-likelihood function: log pθ (YT ) = log pθ (XT , YT ) − log pθ (XT |YT ) (4) Taking the expectation of (4) with respect to the probability density function pθ k (XT |YT ), where θ k is any fixed value of the parameter vector, we obtain Eθ k {log pθ (YT )} = log pθ (YT ) = Z log pθ (XT , YT )pθ k (XT |YT )dXT Z − log pθ (XT |YT )pθ k (XT |YT )dXT (5) or

Lθ (YT ) , Q(θθ , θ k ) −

Z

log pθ (XT |YT )pθ k (XT |YT )dXT

(6) It has been shown by Gibson and Ninness [2005], Shumway and Stoffer [2005], that increasing the value of Q(θθ , θ k ) also increases the value of the likelihood function Lθ (Yk ). Indeed, using definition from (6) we write Lθ (YT ) − Lθ k (YT ) = Q(θθ , θ k ) − Q(θθ k , θ k ) Z pθ (XT |YT ) + log k pθ (XT |YT )dXT pθ (XT |YT ) k (7) where the last term in (7) is the Kullback-Leibler divergence term and is by definition non-negative. Therefore, Lθ (YT ) − Lθ k (YT ) ≥ Q(θθ , θ k ) − Q(θθ k , θ k ) (8) from which it follows that increasing the value of Q(θθ , θ k ) will in fact increase the value of Lθ (YT ). The EM algorithm procedure can be summarized as follows: EM Algorithm

To include the state vector into the likelihood function we can use the multiplication rule (Dekking et al. [2005]) to derive the relation between classical likelihood function and joint likelihood function of the measurement data and the system state: pθ (YT , XT ) = pθ (XT |YT )pθ (YT ) (3) 4429

(1) Begin with an initial parameter vector estimate θ 0 . (2) (E-step) Form the expected value of pθ (XT , YT ) over the unknown system states XT , based on a current parameter estimate θ k and system output YT . Z Q(θθ , θ k ) = log pθ (XT , YT )pθ k (XT |YT )dXT (9)

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

(3) (M-step) Find a new estimate θ k+1 by maximizing Q(θθ , θ k ) with respect to θ . θ k+1 = arg max Q(θθ , θ k ) (10) θ

(4) Repeat the procedure until convergence criteria are achieved. 3. THE E STEP: COMPUTING Q(θ, θk ) The first part of the algorithm is the Expectation step (Estep), that calculates the expected value of complete data log-likelihood function, or Z Q(θθ , θ k ) = log pθ (XT , YT )pθ k (XT |YT )dXT = Eθ k {pθ (XT , YT )}.

After forward and backward run of the smoother algorithm we obtain the approximations to the state distributions pθ k (xt |YT ) and joint state distributions pθ k (xt+1 , xt |YT ), at some value of the parameter vector θ k for time index t ∈ {1, 2, . . . T }.

(11)

The problem can be divided into two sub-problems. The first one is to find the probability density function pθ k (XT |YT ), where θ k is a fixed value parameter vector. This problem corresponds directly to solving the wellknown smoothing problem (Sarkka [2008]). The second part concerns calculating the expected value of the function, over the distribution of the system states. 3.1 Discrete-time optimal smoothing The problem of optimal discrete-time filtering and smoothing is well elaborated in the literature (Haykin [2001], Pillonetto and Bell [2008]). The algorithms are based either on Kalman filtering framework (Extended Kalman Filter, Unscented Kalman filter) or Monte-Carlo methods (particle filtering). The setup described in this work does not pose any explicit prior requirements about the algorithm selection. In our work, we consider the forward-backward or RauchTung-Striebel (RTS) type implementation of the smoother algorithms. The basic characteristic of the RTS type algorithms is, that it is composed of a forward filtering and backward smoothing pass. From Bayesian viewpoint, the filtering and smoothing pass can be described as follows: 1) Forward filtering equations: Prediction step Z pθ k (xt |Yt−1 ) = pθ k (xt |xt−1 )pθ k (xt−1 |Yt−1 )dxt−1

(12)

Update step pθ (yt |xt )pθ k (xt |Yt−1 ) pθ k (xt |Yt ) = k pθ k (yt |Yt−1 ) 2) Backward smoothing equations: Joint smoothed density pθ k (xt+1 , xt |YT ) = pθ k (xt |YT )× pθ k (xt+1 |xt )pθ k (xt+1 |Yt ) pθ k (xt+1 |Yt ) Smoothed density Z pθ k (xt |YT ) = pθ k (xt+1 , xt |YT )dxt+1

smoother algorithms rely on various approximation methods. In the presented work, we use the Unscented RTS Smoother (Sarkka [2008]), which uses the Unscented transformation to approximate the statistics of a random variable, that undergoes a nonlinear transformation using a set of deterministically selected points (called sigma points). The points are selected in such a way, that they correctly capture the posterior distribution up to the second term in the Taylor series expansion of the nonlinear function.

(13)

(14)

3.2 Novel result: Approximating Q(θθ , θ k ) function by UT Given the distribution of the state vector with respect to a certain parameter estimate θ k , next step is to calculate the expected value of the expression (11). However, for most nonlinear systems, an analytical expression for the integral can not be derived. Thus, most approaches rely either on the gradient based approximations or on MonteCarlo techniques. The main idea of the Monte-Carlo based methods can be presented by the following equation for a nonlinear function f (·): Z N 1 X E{f (x)} = f (x)p(x)dx ≈ f (xi ) (16) N i=1 where xi are i.i.d. samples form the probability distribution p(x). In our case, this relates to sampling from the state probability density function pθ k (XT |YT ). Unfortunately, this type of sampling is very inefficient for large data (Gopaluni [2008]), as the number of samples N needs to be relatively large, which in consequence drastically affects the computational load of the algorithm. To overcome this issue, we propose an approximation based on the Unscented Transformation (Julier [2002], van der Merwe [2004]). Unscented Transformation is an approximate method of calculating the statistics of a random variable that undergoes a nonlinear transformation.

3.3 Unscented Transformation The main idea behind UT is, that it is easier to approximate the distribution, rather than a nonlinear function. Let x be an n dimensional random variable with probability density function p(x). First, select a set of weighted samples {χi , wi } that correctly capture the first two moments (mean x ¯ and covariance Px ) of the random variable x. The selection scheme, proposed by Julier [2002], that satisfies this requirement, is as follows: X0 = x ¯ Xi = x ¯+

(15)

In the case of nonlinear systems, closed form expressions of the described procedure cannot be derived. Therefore,

i=0 p



(n + κ)Px i p (n + κ)Px Xi = x ¯− i

i = 1, . . . , n i = n + 1, . . . , 2n

and the corresponding weights as

4430

(17)

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

κ n+κ κ wi = 2(n + κ) κ wi = 2(n + κ)

w0 =

i=0

Q(θθ , θ k ) ≈

(18)

Yi = f (Xi )

i = 0, . . . 2n

(19)

and the approximated mean and covariance of y are obtained as follows:

Py ≈

2n X

i=0 2n X

wi Yi wi (Yi − y¯)(Yi − y¯)′

(20)

i=0

The posterior mean value and covariance, obtained using this approximation, are accurate to the second order of the Taylor series expansion of f (x) for any nonlinear function (van der Merwe [2004]). However, depending on the type of nonlinearities, these errors can still prevail and in some cases, the approximation errors can be relatively large (Hendeby and Gustafsson [2008]). In our identification problem, the nonlinear function reads as follows: log pθ (XT , YT ) = log pθ (x1 ) +

T −1 X

log pθ (xt+1 |xt )

t=1

+

T X

+

T −1 X 4n X

j wtj log pθ (Xtj |Xt−1 )

+

T X 2n X

wti log pθ (yt |Xti )

t=1 j=0

where wi is the weight associated with i-th p sigma point  P2n (n + κ)Px and satisfies the condition i=0 wi = 1. i is the i-th column of the matrix square root of the weighted covariance matrix and κ is the scaling parameter. Each sigma point Xi is then propagated trough the nonlinear function

y¯ ≈

w1i log pθ (X1i )

i=0

i = 1, . . . , n i = n + 1, . . . , 2n

2n X

log pθ (yt |xt ),

(21)

(23)

t=1 i=0

Where X1i are the sigma points representing the distribui h j tion pθ k (x1 |YT ), similarly Xt+1 , Xtj represent the joint

distribution pθ k (xt+1 , xt |YT ) and Xti represent pθ k (xt |YT ). The resulting expression (23) presents an approximation to the function Q(θθ , θ k ) (22). The number or calculations required for approximating Q(θθ , θ k ) depends only on the dimension of the state vector N . This means that at every time step t only a fixed number of computations is required. The computational complexity of the step is thus O(4N + 1). Compared to the computational complexity of Monte Carlo based approximation, which is O(M ) , where M is the number of samples used and M ∝ N 2 , this represents a significant decrease of complexity, especially as the dimension of the problem N increases. he number or calculations required depends only on the dimension of the state vector n. This means, that at every time step t, only a fixed number of computations is required. 4. EXAMPLES In this section, the performance of the algorithm will be presented using a simple numerical example and a more demanding high-order model. 4.1 Nonlinear System

t=1

which is expanded using Markov property of the model and Bayes’ rule. The function for E-step of the EM procedure Q(θθ , θ k ) (11) can be viewed as the expected value, or first moment of logarithm of the posterior distribution pθ (XT , YT ). To approximate the value of Q(θθ , θ k ) for some prior parameter estimate θ K we select a set of points Xti along with corresponding weights wti , based on the estimated distribution pθ k (XT |YT ). Taking expectations of (21) yields Q(θθ ,θθ k ) = Epθ k (XT |YT ) {log pθ (XT , YT )} Z = log pθ (x1 )pθ k (x1 |YT )dx1 + +

T −1 Z X t=1 T Z X

Z

xt+1 = axt + b



xt + c cos(1.2t) + wt 1 + x2t

yt = dx2t + vt      wt Q 0 0 , ∼N vt 0 R 0

(24)

It has been chosen for being acknowledged as a challenging estimation problem (Doucet et al. [2000]). The true parameter values are chosen as follows θ = [a, b, c, d, Q, R] = [0.5, 2, 8, 0.05, 0, 0.001] (25)

log pθ (xt+1 |xt )pθ k (xt+1 , xt |YT )dxt+1 dxt

log pθ (yt |xt )pθ k (xt |YT )dxt

The model, taken here as a general nonlinear example, is adopted from previous studies of similar algorithms (Sch¨on et al. [2011]). It is described by the following nonlinear system:

(22)

t=1

which can be approximated using unscented transformation based on (20) as

To analyze the performance of the algorithm, a MonteCarlo analysis has been performed, using 100 different data realizations YT with the length N = 100 samples. At the beginning of every estimation procedure, the initial values of the parameter vector θ has been selected randomly form uniform distribution on interval equal to ±50% of the corresponding true parameter values.

4431

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

1 0.8 0.6 0.4 0.2

parambeter b estimate

parameter a estimate

The resulting estimates of parameters a and b as a function of iteration number are shown in figure 1 and final estimates of all the parameters are given in table 4.1

0 10

20

30

40

10

20

30

40

50

60

70

80

90

100

50

60

70

80

90

100

4 3 2 1 0

Iteration number

reactors of the actual waste-water plant. The discrete-time model is of the following form xk+1 = (I − TS D)xt + TS (F + ρ ) + wt yk = Hxk + vt (26) where x is the state vector and y is the model output. wt and ut are mutually independent stochastic processes with covariance matrices Q and R respectively. Finally, D is the dilution rate matrix, F is the feed rate vector, ρ is the ammonia nitrogen reaction rate vector and H is the output vector. These vectors and matrix are defined as follows: T x = [SN H2 SN H3 SN H4 SN H5 ]

Fig. 1. Parameter estimates as a function of iteration number; (top: parameter a, bottom: parameter b, black: true value) True 0.5 2 8 0.05 0 0.01

Parameter a b c d Q R

Estimated 0.503 ± 0.005 2.02 ± 0.12 8.35 ± 0.31 0.046 ± 0.006 0.03 ± 0.006 0.006 ± 0.002

F = [Din SN Hin

Q(θ, θ′ ) function value

0 −1000 −2000 −3000 −4000 −5000 40

50

Iteration number

60

70

80

1 0]

T

(27)

Qin Qin + Qr Qin + Qr , D12 = , D2 = V12 V12 V2 Qin + Qr Qin + Qr Qr D3 = , D4 = , Dr = V V V1  3 4  o SN Hi 1 ρi = rN H ΘT −20 C −k S +k 1 O 2 i KN H + SN Hi 1+e

Din =

1000

30

T

T

H = [0 0

Values of Q(θθ , θ ′ ) as a function of iteration number are shown in Figure 2 The presented results show, that the

20

0 0]

ρ = [0 ρ3 ρ4 0]   D12 0 0 −D 0 0   −D3 D3 D= 0 −D4 D4 0  0 0 −D5 D5

Table 1. The statistics (mean and deviation) of the estimates obtained on a set of N = 100 Monte-Carlo realizations of Y

10

0

90

100

θ′

Fig. 2. Q(θθ , θ ) function values as a function of iteration number algorithm converged in every run, independently of the initial parameter vector (within the selected interval) and system output realization. However, parameter values do not converge to the true value, even if the number of iterations is increased, which is a consequence of the approximations in the algorithm. The magnitude of the approximation error depends on the characteristics of the nonlinear functions and exact relations between this error and function properties are still an open research problem. Therefore, further studies of the algorithm behavior are needed to reach a deeper understanding of its performance and convergence conditions.

The vector of the unknown model parameters is θ = [rN H KN H k1 k2 ], while the process and measurement covariance values have been set to Q = 0.1 and R = 0.01. The values of other parameters are known process constants and can be found in the work by Stare et al. [2006]. The estimation procedure consisted of 20 iterations and has been repeated 25 times with different initial parameter values selected from the uniform distributions on the intervals: rN H0 ∈ [200 400] ; KN H0 ∈ [4 6] ; k10 ∈ [0.5 1] ; k20 ∈ [2.5 4] ; Figure 3 shows parameter estimates as a function of iteration number. Table 2 summarizes the results in terms of mean parameter estimate value.

4.2 High-order System A major advantage of the algorithm is, that when the dimension of the state vector increases, the number of calculations required grows with 2n + 1, where N is the state dimension. The model used to illustrate this is a reduced nonlinear model of the waste-water treatment plant (Stare et al. [2006]). The reduced nonlinear model is based on the mass balances for ammonia nitrogen in the

Fig. 3. Model parameter estimates KN H (top) and k1 (bottom) as a function of iteration number The model output, using the estimated parameter values, is shown in Figure 4, along with the 95% confidence

4432

18th IFAC World Congress (IFAC'11) Milano (Italy) August 28 - September 2, 2011

Table 2. Mean values and variance of the final parameter estimates

N H4 − N in the last aerobic reactor (mg/l)

Parameter ML estimate mean ML estimate variance

rNH 386 106.3

KNH 1.52 0.29

k1 0.36 0.04

k2 5.33 0.43

Confidence interval Model mean value Process measurement

20

15

10

5

0 0

2

4

6

8

10

12

14

16

18

20

22

Time (day)

Fig. 4. Model output mean and confidence interval along with the measured output interval. The results suggest, that the model using the estimated parameter values is able to correctly capture the process dynamics. These errors are partially a consequence of the approximations used in the estimation procedure. However, different estimation procedures provide similar results (Stare et al. [2006], Azman and Kocijan [2007]), which suggest that the real plant process includes additional dynamics, that are not modeled in the simplified nonlinear model, as considered in this example. Finally, it is important to note that, in this case, only 17 sigma points (Xi ) have been used to estimate the statistics of the posterior distributions, involved in the evaluation of the (23). As the value of (23) has to be evaluated many times during the maximization step, this feature can be a crucial advantage of the presented algorithm, especially when considering real-life applications. 5. CONCLUSION The contribution of this paper is a novel procedure for approximation based EM approach to nonlinear system identification. The main advantage of the algorithm is that it can successfully deal with high-order systems due to the use of computationally efficient approximations. The overall robustness and accuracy of the estimates of this type of approximation is generally lower than of the approaches based on particle filters, e.g. in Sch¨on et al. [2011]. However, when nonlinearities in the system are not extremely severe, this method is a valuable tool, especially in applications, where computational load needs to be kept low. Further investigation in the tuning of the UT design parameters would hopefully contribute to the reduction of the estimation error. REFERENCES C. Andrieu, A. Doucet, S. S. Singh, and V. B. Tadic. Particle methods for change detection, system identification, and control. Proceedings of the IEEE, 92:423– 438, 2004. K. Azman and J. Kocijan. Application of gaussian processes for black-box modelling of biosystems. ISA Transactions, 46(4):443 – 457, 2007.

F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, and L.E. Meester. A Modern Introduction to Probability and Statistics. Springer, 2005. A. Dempster, N. Lard, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977. A. Doucet, S. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian filtering. Statistics and Computing, 10:1533–1375, 2000. R. J. Elliott and S. Haykin. A zakai equation derivation of the extended kalman filter. Automatica, 46(3):620 – 624, 2010. S. Gibson and B. Ninness. Robust maximum-likelihood estimation of multivariable dynamic systems. Automatica, 41:1667–1682, 2005. R. B. Gopaluni. Identification of nonlinear processes with known model structure under missing observatrions. In Proceedings of the IFAC 17th World Congress, Seoul, Korea, July 6.-11., 2008. S. Haykin, editor. Kalman Filtering and Neural Networks. Wiley Interscience, 2001. G. Hendeby and F. Gustafsson. On nonlinear transformations of stochastic variables and its application to nonlinear filtering. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2008. S.J. Julier. The scaled unscented transformation. In Proceedings of the 2002 American Control Conference, 2002. L. Ljung. System Identification: Theory for the User (2nd Edition). Prentice Hall, 1999. L. Ljung. Perspectives on system identification. In Proceedings of the 17th IFAC World Congress, Seoul, South Korea, 2008. G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions (2nd Edition). John Wiley and Sons, 2008. B. Ninnes. Some system identification challenges and approaches. In Proceedings of the 15th IFAC Symposium on System Identification, July 6 - 8, 2009, Saint-Malo, France, 2009. G. Pillonetto and B. M. Bell. Optimal smoothing of nonlinear dynamic systems via monte carlo markov chains. Automatica, 44(7):1676 – 1685, 2008. S. Sarkka. Unscented rauch-tung-striebel smoother. IEEE Transactions on Automatic Control, 53:845–849, 2008. T. B. Sch¨on, A. Wills, and B. Ninness. System identification of nonlinear state-space models. Automatica, 47(1): 39–49, 2011. R.H. Shumway and D.S. Stoffer. Time Series Analysis and Its Applications with R examples, Second Edition. Springer, 2005. A. Stare, D. Vreˇcko, and N. Hvala. Modeling, identification, and validation of models for predictive ammonia control in a wastewater treatment plant - a case study. ISA Transactions, 45:159–174, 2006. R. van der Merwe. Sigma-Point Kalman Filters for Probabilistic Inference in Dynamic State-Space Models. PhD thesis, Oregon Health & Science University, 2004. A. Wills, T.B. Schon, and B. Ninnes. Parameter estimation for discrete-time nonlinear systems using em. In Proceedings of the IFAC 17th World Congress, Seoul, Korea, July 6.-11., 2008.

4433