Tuned iterated filtering

Tuned iterated filtering

Statistics and Probability Letters 83 (2013) 2077–2080 Contents lists available at SciVerse ScienceDirect Statistics and Probability Letters journal...

502KB Sizes 1 Downloads 47 Views

Statistics and Probability Letters 83 (2013) 2077–2080

Contents lists available at SciVerse ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Tuned iterated filtering Erik Lindström Division of Mathematical Statistics, Centre for Mathematical Sciences, Lund University, Box 118, SE-22100 Lund, Sweden

article

info

Article history: Received 2 November 2012 Received in revised form 14 March 2013 Accepted 16 May 2013 Available online 24 May 2013 MSC: 60J05 62M05 62M10 Keywords: Hidden Markov models Sequential Monte Carlo methods Maximum likelihood estimation

abstract Iterated filtering is an algorithm for estimating parameters in partially observed Markov process (POMP) models. The real-world performance of the algorithm depends on several tuning parameters. We propose a simple method for optimizing the parameter governing the joint dynamics of the hidden parameter process (called the Σ matrix). The tuning is implemented using a fixed-lag sequential Monte Carlo expectation– maximization (EM) algorithm. We introduce two different versions of the tuning parameter, the approximately estimated Σ matrix, and a normalized version of the same matrix. Our simulations show that the finite-sample performance for the normalized matrix outperform the standard iterated filter, while the naive version is doing more harm than good. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Iterated filtering (IF), see Ionides et al. (2006, 2011), is a popular plug-and-play method for inference in partially observed Markov process models; see Bretó et al. (2009); He et al. (2010). Plug-and-play methods require only simulations from the model, and being able to relate those simulations to measurements. Applications include cholera transmissions, see King et al. (2008), measles, see He et al. (2010), the role of climate in malaria transmissions, see Laneri et al. (2010), and finance, see Bhadra (2010). The method is implemented in the open-source R package pomp.1 A number of design parameters must be initialized prior to applying the iterated filtering algorithm to data. Some of the design parameters were linked in Lindström et al. (2012). This paper studies optimization of the joint dynamics of the hidden parameters, as expressed by the Σ matrix; see Eq. (1). The asymptotic rate of convergence does not depend on Σ , but the practical performance, in terms of covariance of the estimates, often does; cf. Fabian (1978). The Σ matrix is often chosen as a diagonal matrix; cf. Ionides et al. (2006). This choice is robust, but will not utilize any dependence structures in the model. We introduce a simple (essentially free) method for finding a near-optimal value for the full Σ matrix. It is also possible to recover the near-optimal diagonal matrix within the proposed framework. The matrix is obtained using a fixed-lag sequential Monte Carlo (SMC) expectation–maximization (EM) method; see Olsson et al. (2008). Two closely related approximations are derived, with vastly different qualitative properties. The paper is organized as follows. Section 2 reviews the iterated filtering algorithm, and presents the estimators of the Σ matrix, Section 3 evaluates these estimators when applying the iterated filtering algorithm to a simple model, and Section 4 concludes.

E-mail address: [email protected]. 1 Available on CRAN at http://www.r-project.org. 0167-7152/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.spl.2013.05.019

2078

E. Lindström / Statistics and Probability Letters 83 (2013) 2077–2080

2. Methods for partially observed Markov process models Partially observed Markov processes are a class of processes satisfying the following conditions. Let {Xk } be a hidden

X-valued Markov process with initial density fX0 (x0 ; θ ) and transition density fXk |Xk−1 (xk |xk−1 ; θ ). The Y-valued observation

process {Y } depends on the hidden process at time k through the observation density fYk |Xk (yk |xk , θ ), and it is assumed that Yk |Xk is independent of all other observations. Finally, the model is governed by the parameter vector θ . Iterated filtering is distantly related to the ‘‘combined state and parameter’’ estimation method that is popular in engineering, see Ljung (1979), statistics, see Kitagawa (1998), finance, see Lindström et al. (2008), and hydrology, see Evensen (2009). The idea is to augment the latent {X } process with the parameters with vanishing dynamics. Let κ(·) be a density with compact support, and let {ζk } be independent draws from κ(·) such that

E [ζk ] = 0,

Var [ζk ] = Σ ,

∀ k.

(1)

The time-varying parameter dynamics is then given by

Θ0 = θ + τ ζ 0 ,

Θn = Θn−1 + σ ζn ,

(2)

where τ and σ are small positive numbers. The joint distribution of the augmented model is given by gX0:N ,Θ0:N ,Y1:N (x0:n , θ0:N , y1:N ; θ , σ , τ , Σ ) = fX0:N ,Y1:N (x0:n , y1:N ; θ0:N )gΘ0:N (θ0:N ; θ , σ , τ , Σ ),

(3)

where fX0:N ,Y1:N is the joint distribution of {X , Y } conditional on the parameters and gΘ0:N is the distribution of the parameters. Define the conditional mean and covariance, computed with respect to the joint distribution, of the parameter process as

θnF = Eθ,τ ,σ ,Σ [Θn |Y1:n ], VnP

(4)

= Covθ,τ ,σ ,Σ [Θn |Y1:n−1 ].

(5)

It was shown in Ionides et al. (2011) that the score function can be approximated using these moments. Theorem 2.1 (Theorem 3 in Ionides et al., 2011). Let K1 be a compact subset of Rp , C1 be a constant, τ be sufficiently small, and limτ →0 σ (τ )/τ = 0. It then holds that

    N    F  σ2   P −1 F sup  ( Vn ) θn − θn−1 − log fY1:N (y1:N ; θ ) ≤ C1 τ + 2 .  τ θ∈K1  n=1

(6)

These moments are not known for most models, but they can be accurately approximated using SMC (particle filter) methods. Let θ˜nF and V˜ nP be the empirical versions of Eqs. (4) and (5) computed using J particles. The score can still be accurately approximated using θ˜nF and V˜ nP as long as a sufficient number of particles is used; cf. Theorem 4 in Ionides et al. (2011). The iterated filtering algorithm is a stochastic approximation algorithm, in which the biased and noisy approximation of the score function is used to iteratively update the estimate of the parameters, eventually arriving at the maximum likelihood estimate (MLE) (we use iteration index m; i.e., θ˜nF,m is the SMC estimate of θnF during iteration m). Theorem 2.2 (Theorem 5 in Ionides et al., 2011). Let {am }, {τm }, {σm } and {Jm } be positive sequences such that τm → 0,  ∞ 2 −1 −2 ˆ σm τm−1 → 0, τm Jm → ∞, ∞ m=1 am = ∞ and m=1 am Jm τm < ∞, and define θm according to

θˆm+1 = θˆm + am

N  

V˜ nP,m

−1 

 θ˜nF,m − θ˜nF−1,m .

(7)

n =1 a.s. The estimate will then converge to the MLE with probability 1: θm → θ ⋆ .

Ionides et al. (2011) noted that choosing am = m−1 , τm2 = m−1 , σm2 = m−(1+δ) , and Jm = m(1/2+δ) , where δ > 0, satisfies the conditions in Theorem 2.2. 2.1. Tuning the Σ matrix Maximum likelihood methods can be used to estimate the Σ parameter, as the joint distribution is known; cf. Eq. (3). An integral part of the iterated filtering algorithm is applying SMC filters for estimating properties of the latent processes, including the joint posterior density. However, Olsson et al. (2008) found that the naive approximation of the posterior distribution degenerates quickly, and argued that a fixed-lag approximation p(xk |y1:N ) ≈ p(xk |y1:min(k+L,N ) )

(8)

is less variable. They also showed how this approximation can be used to estimate parameters within the EM algorithm. The resulting algorithm is computationally attractive, but the estimates are often slightly biased; cf. Kantas et al. (2009).

E. Lindström / Statistics and Probability Letters 83 (2013) 2077–2080

2079

ˆ m′ matrix when no normalization is used. Fig. 1. Norm of the Σ

ˆ 0 (which could be an identity matrix) which is used in the very Assume that we start from a positive definite matrix Σ first iteration. We also assume that the latent parameter vector evolves according to a Gaussian random walk, gΘn |Θn−1 (θn |θn−1 ; Σ ) = N (θn ; θn−1 , σm2 Σ ); cf. Eq. (2). An updated Σ matrix can then be estimated obtained using the EM algorithm. The intermediate quantity is computed in the E-step (using the fixed-lag output from the SMC filter)

 ˆ m−1 ) = E Q (Σ , Σ

N 

 ˆ m−1 , Y1:N log gΘn |Θn−1 (Θn |Θn−1 ; Σ )|Σ

,

(9)

n=1

and this is maximized in the M-step. The matrix maximizing the intermediate quantity is given by

ˆ m′ = arg maxQ (Σ , Σ ˆ m−1 ) Σ   N (Θn − Θn−1 )(Θn − Θn−1 )T (Θ1 − Θ0 )(Θ1 − Θ0 )T  1 ˆ m−1 , Y1:N . + |Σ = E N τm2 + σm2 σm2 n =2

(10)

We propose two closely related tuning strategies.

ˆ m′ instead of Σ0 or Σ ˆ m′ −1 . However, the norm of Σ ˆ m is in general different from the norm • The naive tuning strategy uses Σ ˆ of Σ0 ; see Fig. 1 (it grows approximately at an exponential rate). The tuned parameter will then interfere with the τm2 and σm2 parameters and eventually violate the conditions for convergence in the stochastic approximation; cf. Theorem 2.2. The naive tuning only made things worse in our simulations!

ˆ m′ matrix is obtained by normalizing the matrix according to Σ ˆ m′′ = • An alternative that utilizes the structure of the Σ ′ ′ ′ ˆ m · ∥Σ ˆ 0 ∥2 /∥Σ ˆ m ∥2 . The normalized estimator will redistribute particles according to the structure of the Σ ˆ m matrix but Σ will not interfere with τm2 or σm2 . We refer to this strategy as the tuned iterated filter. ˆ m′′ does not Remark 2.1. There is a non-zero probability (due to particle depletion) that the estimated covariance matrix Σ have full rank. That would cause numerical problems, and would not allow the algorithm to explore the parameter space efficiently. This problem is avoided by introducing a regularized version (we used ε = 10−2 in our simulation study) that ensures persistent excitation of all parameters ˆ m = (1 − ε)Σ ˆ m′′ + ε Σ ˆ 0, Σ

(11)

ˆ 0 is the Σ matrix used in the first iteration. where Σ 3. Simulation study We have studied a partially observed linear Gaussian model, given by Yk = Xk + ηk ,

(12)

Xk = aXk−1 + σ wk ,

(13)

where {ηk } and {wk } are sequences of pairwise independent standard Gaussian random variables. The a and σ parameters are implicitly related through the unconditional variance given by Var(X ) = σ 2 /(1 − a2 ), in the sense that a change in one parameter must be compensated with a change in the other parameter if the unconditional variance of the model is to be preserved. We expect that a tuned Σ matrix will be able to utilize this relation to speed up the convergence, by reducing SMC noise.

2080

E. Lindström / Statistics and Probability Letters 83 (2013) 2077–2080

Fig. 2. Difference between the estimate and the MLE for the a and σ parameters as a function of the number of iterations. Note the log scale on the x-axis.

We simulated Nsim = 100 independent realizations, each consisting of Nobs = 200 observations. The true parameters used were a = 0.7 and σ = 2. The filter was iterated M = 200 times, and the filter parameters used were am = m−1 , τm2 = m−1 , σm2 = m−1.3 , and Σ = I2 . The number of particles was increased according to Jm = ⌊J0 exp(−m)+ J1 + J2 m⌋, where J0 = 2000, J1 = 500, and J2 = 10; cf. Lindström et al. (2012). Fig. 2 presents the difference between the MLE and estimates from the ˆ m ). iterated filter (IF) and (normalized) tuned iterated filter (using Σ The tuned iterated filter outperforms the iterated filter when comparing the estimates of the a parameter (faster and more robust convergence: ≈30 iterations with the tuned iterated filter is equivalent to 64 with the standard method; ≈100 iterations with the tuned filter is equivalent to 200 iterations with the standard method) while the distribution is similar (actually slightly worse) when comparing the σ parameters of the estimates. 4. Conclusion We introduce a simple tuning procedure for the iterated filtering algorithm. The additional computational cost is negligible, while the practical gain (speed and robustness) in our study is substantial. Acknowledgement The author would like to thank the referees for their careful review of the manuscript. References Bhadra, A., 2010. Discussion of ‘‘Particle Markov Chain Monte Carlo methods’’ by Andrieu, C., Doucet, A., and Holenstein, R. Journal of the Royal Statistical Society, Series B 72, 314–315. Bretó, C., He, D., Ionides, E., King, A., 2009. Time series analysis via mechanistic models. The Annals of Applied Statistics 3 (1), 319–348. Evensen, G., 2009. The ensemble Kalman filter for combined state and parameter estimation. IEEE Control Systems Magazine 29 (3), 83–104. Fabian, V., 1978. On asymptotically efficient recursive estimation. The Annals of Statistics 854–866. He, D., Ionides, E., King, A., 2010. Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. Journal of the Royal Society Interface 7 (43), 271–283. Ionides, E.L., Bhadra, A., Atchade, Y.F., King, A.A., 2011. Iterated filtering. Annals of Statistics 39, 1776–1802. Ionides, E., Bretó, C., King, A., 2006. Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences 103 (49), 18438–18443. Kantas, N., Doucet, A., Singh, S., Maciejowski, J., 2009. An overview of sequential Monte Carlo methods for parameter estimation in general state-space models. In: Proceedings IFAC System Identification. King, A., Ionides, E., Pascual, M., Bouma, M., 2008. Inapparent infections and cholera dynamics. Nature 454 (7206), 877–880. Kitagawa, G., 1998. A self-organizing state-space model. Journal of the American Statistical Association 1203–1215. Laneri, K., Bhadra, A., Ionides, E., Bouma, M., Dhiman, R., Yadav, R., Pascual, M., 2010. Forcing versus feedback: epidemic malaria and monsoon rains in northwest India. PLoS Computational Biology 6 (9), e1000898. Lindström, E., Ionides, E., Frydendall, J., Madsen, H., 2012. Efficient iterated filtering. in: Proceedings IFAC System Identification, Vol. 16, pp. 1785–1790. Lindström, E., Ströjby, J., Brodén, M., Wiktorsson, M., Holst, J., 2008. Sequential calibration of options. Computational Statistics & Data Analysis 52 (6), 2877–2891. Ljung, L., 1979. Asymptotic behavior of the extended Kalman filter as a parameter estimator for linear systems. IEEE Transactions on Automatic Control on 24 (1), 36–50. Olsson, J., Cappé, O., Douc, R., Moulines, E., 2008. Sequential Monte Carlo smoothing with application to parameter estimation in nonlinear state space models. Bernoulli 14 (1), 155–179.