Available online at www.sciencedirect.com
ScienceDirect Stochastic Processes and their Applications 125 (2015) 2643–2673 www.elsevier.com/locate/spa
Generalised particle filters with Gaussian mixtures D. Crisan a , K. Li b,∗ a Department of Mathematics, Imperial College London, 180 Queen’s Gate, London, SW7 2AZ, UK b Department of Mathematics, Uppsala University, Box 480, Uppsala, 75106, Sweden
Received 14 June 2013; received in revised form 25 January 2015; accepted 26 January 2015 Available online 2 February 2015
Abstract Stochastic filtering is defined as the estimation of a partially observed dynamical system. Approximating the solution of the filtering problem with Gaussian mixtures has been a very popular method since the 1970s. Despite nearly fifty years of development, the existing work is based on the success of the numerical implementation and is not theoretically justified. This paper fills this gap and contains a rigorous analysis of a new Gaussian mixture approximation to the solution of the filtering problem. We deduce the L 2 -convergence rate for the approximating system and show some numerical examples to test the new algorithm. c 2015 Elsevier B.V. All rights reserved. ⃝
MSC: 60F25; 60G35; 60G57; 60H35; 93E11 Keywords: Stochastic partial differential equation; Nonlinear filtering; Zakai equation; Particle filters; Gaussian mixtures; L 2 -convergence
1. Introduction The stochastic filtering problem deals with the estimation of an evolving dynamical system, called the signal, based on partial observations and a priori stochastic model. The signal is modelled by a stochastic process denoted by X = {X t , t ≥ 0}, defined on a probability space (Ω , F, P). The signal process is not available to observe directly; instead, a partial observation is obtained and it is modelled by a process Y = {Yt , t ≥ 0}. The information available from the observation up to time t is defined as the filtration Y = {Yt , t ≥ 0} generated by the ob∗ Corresponding author.
E-mail addresses:
[email protected] (D. Crisan),
[email protected] (K. Li). http://dx.doi.org/10.1016/j.spa.2015.01.008 c 2015 Elsevier B.V. All rights reserved. 0304-4149/⃝
2644
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
servation process Y . In this setting, we want to compute πt — the conditional distribution of X t given Yt . The analytical solutions to the filtering problem are rarely available, and there are only few exceptions such as the Kalman–Bucy filter and the Beneˇs filter (see, e.g. Chapter 6 in [4]). Therefore numerical algorithms for solving the filtering equations are required. The description of a numerical approximation for πt should contain the following three parts: the class of approximations; the law of evolution of the approximation; and the method of measuring the approximating error. Generalised particle filters with Gaussian mixtures is a numerical scheme to approximate the solution of the filtering problem, and it is a natural generalisation of the classic particle filters (or sequential Monte Carlo methods) in the sense that the Dirac delta measures used in the classic particle approximations are replaced by mixtures of Gaussian measures. In other words, Gaussian mixture approximations are algorithms that approximate πt with random measures of the form ai (t)Γvi (t),ω j (t) , i
where Γv j (t),ω j (t) is the Gaussian measure with mean v j (t) and covariance matrix ω j (t). The evolution of the weights, the mean and the covariance matrices satisfy certain stochastic differential equations which are numerically solvable. As time increases, typically the trajectories of a large number of particles diverge from the signal’s trajectory; with only a small number remaining close to the signal. The weights of the diverging particles decrease rapidly, therefore contributing very little to the approximating system, and causing the approximation to converge very slowly to the conditional distribution. In order to tackle this so-called sample degeneracy phenomenon, a correction procedure is added. At correction times, each particle is replaced by a random number of offspring. Redundant particles are abandoned and only the particles contributing significantly to the system (i.e. with large weights) are carried forward; so that the most probable region of the trajectory of the signal process X will be more thoroughly explored. This correction mechanism is also called branching or resampling. Currently the multinomial branching algorithm and the tree based branching algorithm (TBBA) are two approaches for the correction step. The idea of using Gaussian mixtures in the context of Bayesian estimation can be traced back to Alspach and Sorenson [1,32], and Tam and Moore [34]. Later in the work by Anderson and Moore [2], Gaussian mixtures are used in combination with extended Kalman filters to produce an empirical approximation to the discrete time filtering problem. The research on this area became more active since 2000s. Recently in Doucet et al. [15], these methods are revisited to construct a method which uses Rao-Blackwellisation in order to take advantage of the analytic structure present in some important classes of state-space models. Recent advances in this direction are contained in [3,14,11]. In Chen and Liu [6], a random mixture of Gaussian distributions, called mixture Kalman filters, are used to approximate a target distribution again based on the explicit linear filter. Further work on this topic is contained in [18,29,33,35]. Gustafsson et al. [20] describe a general framework for a number of applications, which are implemented using the idea of Gaussian particle filters. Further development in this direction can be found in [13,17,19,26]. For more recent work related to Gaussian mixture approximations, see Kotecha and Djuri´c [24,25], Le Gland et al. [27], Reich [30], Flament et al. [16], Van der Merwe and Wan [12], Carmi et al. [5], Iglesias, Law and Stuart [21], and Crisan and Li [7]. 1.1. Contribution of the paper In this paper we construct a new approximation to the conditional distribution of the signal π = {πt : t ≥ 0} that consists of a mixture of Gaussian measures. In contrast with the exist-
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2645
ing work, the approximation is not based on the (Extended) Kalman filter. The approximation is analysed theoretically: we obtain the rate of convergence of the approximation to πt as the number of Gaussian measures increases. We are not aware of other theoretically justified algorithms of this kind. In particular, if π n = {πtn : t ≥ 0} is the constructed approximation of πt , where n is the number of Gaussian measures, we prove the following result. Theorem. For any T ≥ 0 and m ≥ 6, there exists a constant C(T, ϕ) independent of n, such that for any test function ϕ ∈ Cbm+2 (Rd ), n C(T, ϕ) . (1.1) E sup πt (ϕ) − πt (ϕ) ≤ √ n t∈[0,T ] Additional L p result can also be covered. The proof of this result is harder than the proof of the convergence of approximating measures under the classic particle filters. The technical difficulty arises from the fact that the covariance matrix of the component Gaussian measure is random and as a result we cannot use a standard approach such as that contained in the proof of the convergence of the classic particle filters by using the dual of the conditional distribution process (see, e.g. Chapter 7 in [4]). We deal with this difficulty by adopting the ideas in [9], namely we show a variation of Theorem 8 in [9] (see Theorem 4.8), which is an abstract convergence criterion for signed measure-valued process. We then verify the conditions of Theorem 4.8 hold in our case so that it can be applied to prove the convergence of the Gaussian mixture approximating measure. We also make use of the Beneˇs filter as an example to numerically compare the performances of the Gaussian mixture approximation and the classic particle approximation. The implementation shows that generally Gaussian mixtures approximation performs better than particle approximations when the number of particles/Gaussian mixtures is relatively small. The following is a summary of the contents of the paper. In Section 2, we review the central results of stochastic filtering theory.1 The filtering framework is introduced first, with the focus on the problems where the signal X and observation Y are diffusion processes and the filtering equations are presented. Section 3 contains the description of the generalised particle filters with Gaussian mixtures. These approximations use mixtures of Gaussian measures which will be set out, with the aim of estimating the solutions to the Zakai and the Kushner–Stratonovich equations. The Tree Based Branching Algorithm (TBBA) and Multinomial branching algorithm are discussed as possible correction mechanisms. Section 4 contains the main results of the thesis, which is the law of large numbers theorem associated to the approximating system. In this section, the evolution equations of the approximating systems introduced in the previous chapter are derived. It is shown that, under certain conditions, the unnormalised and normalised versions of the approximations of the conditional distribution converge to the solutions of the Zakai equation and the Kushner–Stratonovich equation, respectively. Section 5 contains a numerical example to compare the approximations by using the classic particle filters and the Gaussian mixtures. This paper is concluded with an Appendix which contains some additional proofs. 1 For a short historical account of the development of stochastic filtering problem, see, for example, Section 1.3 of [4].
2646
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
1.2. Notations • (Ω , F, P) — probability triple consisting of a sample space Ω , the σ -algebra F which is the set of all measurable events, and the probability measure P. • (Ft )t≥0 — a filtration, an increasing family of sub-σ -algebras of F; Fs ⊂ Ft , 0 ≤ s ≤ t. • Rd — the d-dimensional Euclidean space. • Rd — the one-point compactification of Rd formed by adding a single point at infinity to Rd . • (S, B(S)) — the state space of the signal. Normally S is taken as a complete separable space, and B(S) is the associated Borel σ -algebra, that is, the σ -algebra generated by the open sets in S. • B(S) — the space of bounded B(S)-measurable functions from S to R. • P (S) — the family of Borel probability measures on space S. • Cb (Rd ) — the space of bounded continuous functions on Rd . • Cbm (Rd ) — the space of bounded continuous functions on Rd with bounded derivatives up to order m. • C0m (Rd ) — the space of continuous functions on Rd , vanishing at infinity with continuous partial derivatives up to order m. p d 2 • ∥ · ∥ — the Euclidean norm for a d × p matrix a, ∥a∥ = i=1 j=1 ai j . • ∥ · ∥∞ — the supremum norm for ϕ : Rd → R: ∥ϕ∥∞ = sup x∈Rd ∥ϕ(x)∥. • ∥ · ∥m,∞ — the norm such that for ϕ on Rd , ∥ϕ∥m,∞ = |α|≤m supx∈Rd |Dα ϕ(x)|, where α = (α 1 , . . . , α d ) is a multi-index and Dα = (∂1 )α1 · · · (∂d )αd . • M F (Rd ) — the set of finite measures on Rd . • M F (Rd ) — the set of finite measures on Rd . • DM F (Rd ) [0, T ] — the space of c`adl`ag functions (or right continuous functions with left limits) f : [0, T ] → M F (Rd ). • DM F (Rd ) [0, ∞) — the space of c`adl`ag functions (or right continuous functions with left limits) f : [0, ∞) → M F (Rd ). 2. The filtering problem and key result Let (Ω , F, P) be a probability space together with a filtration (Ft )t≥0 which satisfies the usual conditions. On (Ω , F, P) we consider a Ft -adapted process X = {X t ; t ≥ 0} taking value on d be the solution of a d-dimensional stochastic differential Rd . To be specific, let X = (X i )i=1 p equation driven by a p-dimensional Brownian motion V = (V j ) j=1 : X ti
=
X 0i
+ 0
t
f (X s )ds + i
p
t
j
σ i j (X s )d Vs ,
i = 1, . . . , d.
(2.1)
j=1 0
d We assume that both f = ( f i )i=1 : Rd → Rd and σ = (σ i j )i=1,...,d; j=1,..., p : Rd → Rd× p are globally Lipschitz. Under the globally Lipschitz condition, (2.1) has a unique solution (Theorem 5.2.9 in [23]). The infinitesimal generator A associated with the process X is the second-order differential operator
A=
d i=1
fi
d d ∂ ∂2 ij + a , ∂xi ∂xi ∂x j i=1 j=1
(2.2)
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2647
where a = (a i j )i, j=1,...,d : Rd → Rd×d is the matrix-valued function defined as a = 12 σ ⊤ σ . We denote by D(A) the domain of A. m Let h = (h i )i=1 : Rd → Rm be a bounded measurable function. Let W be a standard Ft adapted m-dimensional Brownian motion on (Ω , F, P) independent of X , and Y be the process which satisfies the following evolution equation t Yt = Y0 + h(X s )ds + Wt . (2.3) 0
This process Y = {Yt ; t ≥ 0} is called the observation process. Let {Yt , t ≥ 0} be the usual augmentation of the filtration associated with the process Y , viz Yt = σ (Ys , s ∈ [0, t]) ∨ N . As stated in the introduction, the filtering problem consists in determining the conditional distribution πt of the signal X at time t given the information accumulated from observing Y in the interval [0, t]; that is, for ϕ ∈ B(Rd ), πt (ϕ) = ϕ(x)πt (d x) = E[ϕ(X t ) | Yt ]. (2.4) S
Throughout this paper we will assume that the coefficients f i , σ i j and h i are all bounded and Lipschitz; f i and σ i j are six times differentiable, and h i is twice differentiable. In other words, we assume that f i , σ i j ∈ Cb6 (Rd ) and h i ∈ Cb2 (Rd ). Let P˜ be a new probability measure on Ω , under which the process Y is a Brownian motion. To be specific, let Z = {Z t , t ≥ 0} be the process defined by m t m t 1 i i i 2 Z t = exp − h (X s )d Ws − h (X s ) ds , t ≥ 0; (2.5) 2 i=1 0 i=1 0 and we introduce a probability measure P˜ t on Ft by specifying its Radon–Nikodym derivative ˜ with respect to P to be given by Z t . We finally define a probability measure P which is equivalent to P on 0≤t<∞ Ft . Then we have the following Kallianpur–Striebel formula (see [22]) πt (ϕ) =
ρt (ϕ) ρt (1)
˜ P(P)-a.s. for ϕ ∈ B(Rd );
(2.6)
where ρt is an Yt -adapted measure-valued process satisfying the following Zakai Equation (see [36]). t t ˜ ρs (Aϕ)ds + ρs (ϕh ⊤ )dYs , P-a.s. ∀t ≥ 0 (2.7) ρt (ϕ) = π0 (ϕ) + 0
0
for any ϕ ∈ D(A). Also the process ρ = {ρt ; t ≥ 0} is called the unnormalised conditional distribution of the signal. The equation satisfied by π is called the Kushner–Stratonovich equation. In the following we will analyse the generalised particle filters with Gaussian mixtures, and we will give detailed analysis in the following sections. We denote by π n = {πtn ; t ≥ 0} the approximating measure of the solution of the filtering problem, where n is the number of Gaussian measures in the approximating system. Then the main convergence of π n is stated as follows:
2648
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Theorem 2.1. For any T > 0 and m ≥ 6, there exists a constant C(T ) independent of n, such that for any test function ϕ ∈ Cbm+2 (Rd ) ) ˜E sup |πtn (ϕ) − πt (ϕ)| ≤ C(T (2.8) √ ∥ϕ∥m+2,∞ . n t∈[0,T ]
3. The approximating system with Gaussian mixtures For ease of notations, we assume, hereinafter, that the state space of the signal is onedimensional. The approximating algorithm discussed in this section, together with the L 2 convergence analysis in Section 4 is done under this assumption. We note that all the results hereinafter can be extended without significant technical difficulties to the multi-dimensional case. Firstly, we let ∆ = {0 = δ0 < δ1 < · · · < δ N = T } be an equidistant partition of the interval [0, T ] with equal length, with δi = iδ, i = 1, . . . , N ; and N = Tδ . The approximating algorithm is then introduced as follows. Initialisation: At time zero, the particle system consists of n Gaussian measures all with equal weights 1/n, initial means v nj (0), and initial variances ωnj (0), for j = 1, . . . , n; denoted by Γv nj (0),ωnj (0) . The approximation of π0 has the form π0n ,
n 1 Γv n (0),ωnj (0) . n j=1 j
(3.1)
Recursion: During the interval t ∈ [iδ, (i + 1)δ), i = 1, . . . , N , the approximation ρ n of the unnormalised conditional distribution ρ will take the form πtn ,
n
a¯ nj (t)Γv nj (t),ωnj (t) ,
(3.2)
j=1
where v nj (t) denotes the mean and ωnj (t) denotes the variance of the Gaussian measure Γv nj (t),ωnj (t) , and a nj (t) is the (unnormalised) weight of the particle, and a¯ nj (t) =
a nj (t) n akn (t)
k=1
is the normalised weight. Obviously, each particle is characterised by the triple process (a nj , v nj , ωnj ) which is chosen to evolve as t n a (t) = 1 + a nj (s)h(v nj (s))dYs , j iδ t t √ ( j) v nj (t) = v nj (iδ) + f v nj (s) ds + 1 − α σ v nj (s) d Vs , iδ iδ t 2 n ωn (t) = α β + σ v (s) ds , j j iδ
(3.3)
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2649
where {V ( j) }nj=1 are mutually independent Brownian motions and independent of Y . The parameter α is a real number in the interval [0, 1]. For α = 0 we recover the classic particle approximation (see, for example, Chapter 9 in [4]); for α = 1 the means of Gaussian measures evolve deterministically (the stochastic term is eliminated). The parameter β is a positive real √ number. We define the smoothing parameter to be ζ = αβ, and it ensures that the approximating measure has smooth density at the branching time. Correction: at the end of the interval [iδ, (i + 1)δ), immediately prior to the correction step, each Gaussian measure is replaced by a random number of offspring, which are Gaussian measures with mean X nj ((i + 1)δ) and variance αβ, where the mean X nj ((i + 1)δ) is a normally distributed random variable, i.e. X nj ((i + 1)δ) ∼ N v nj ((i + 1)δ− ), ωnj ((i + 1)δ− ) , j = 1, . . . , n. n,(i+1)δ
We denote by o j the number of “offspring” produced by jth generalised particle. The total number of offspring is fixed to be n at each correcting event. After correction all the particles are re-indexed from 1 to n and all of the unnormalised weights are re-initialised back to 1; and the particles evolve following (3.3) again. The recursion is repeated N times until we reach the terminal time T , where we obtain the approximation πTn of πT . Before discussing how the correction step is actually carried out, we give here a brief explanation why we should introduce it. As time increases, the unnormalised weights of the majority of the particles decrease to zero, with only few becoming very large (or equivalently, the normalised weights of the majority of the particles decrease to zero, with only few becoming close to one), this phenomenon is called the sample degeneracy. As a consequence, only a small number of particles contribute significantly to the approximations, and therefore a large number of particles are needed in order to obtain the required accuracy; in other words, the convergence of this approximation is very slow. In order to solve this, a correction procedure is used which culls particles with small weights and multiplies particles with large weights. The resampling depends both on the weights of the particles and the observation data, and by doing this particles with small weights (and hence their trajectories are far from the signal) are not carried forward and therefore the more likely region where the signal might be can be explored. In the following we discuss two correction mechanisms. The first one uses the so called Tree Based Branching Algorithm (TBBA) and the second one is based on the Multinomial Resampling to determine the number of offspring {onj }nj=1 . The Tree Based Branching Algorithm (see Chapter 9 in [4]) produces offspring {onj } with distribution n,(i+1)δ n,(i+1)δ with prob. 1 − {n a¯ j } n a ¯ j n,(i+1)δ (3.4) oj = n a¯ n,(i+1)δ + 1 with prob. {n a¯ n,(i+1)δ }; j j n,(i+1)δ
where a¯ j is the value of the Gaussian particle’s weight immediately prior to the branching, in other words, n,(i+1)δ
a¯ j
= a¯ nj ((i + 1)δ−) =
lim
t↗(i+1)δ
a¯ nj (t).
If F(i+1)δ− is the σ -algebra of events up to time (i + 1)δ, i.e. F(i+1)δ− = σ (Fs : s < (i + 1)δ), then we have the following proposition (see Chapter 9 in [4]).
2650
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Proposition 3.1. The random variables {onj }nj=1 defined in (3.4) have the following properties n,(i+1)δ n,(i+1)δ E o j′ |F(i+1)δ− = n a¯ j ′ , 2 n,(i+1)δ n,(i+1)δ F(i+1)δ− = n a¯ n,(i+1)δ 1 − n a¯ n,(i+1)δ . E oj − n a¯ j j j n,(i+1)δ
Remark 3.2. The random variables o j
defined in (3.4) have conditional minimal variance n,(i+1)δ
in the set of all integer-valued random variables ξ satisfying E[ξ |F(i+1)δ− ] = n a¯ j . This property is important as it is the variance of onj that influences the speed of the corresponding algorithm (see Exercise 9.1 in [4]). In addition the TBBA keeps the number of particles in the system constant at n; that is, for each i, n
n,(i+1)δ
oj
(3.5)
= n.
j=1
The TBBA is, for example, discussed in Section 9.2.1 in [4] to ensure (3.5) is satisfied, and by Proposition 9.3 in [4] we know that the distribution of onj satisfies (3.4) and Proposition 3.1. If multinomial resampling is used (see, for example, [9]), then the offspring distribution is determined by the multinomial distribution O(i+1)δ = Multinomial(n, a¯ 1n ((i + 1)δ− ), . . . , a¯ nn ((i + 1)δ− )), i.e. ( j) n,(i+1)δ P O(i+1)δ = o j , j = 1, . . . , n =
n on,(i+1)δ j (3.6) a¯ nj ((i + i)δ−)
n! n j=1
n,(i+1)δ
oj
!
j=1
n,(i+1)δ with nj=1 o j = n. By properties of the multinomial distribution, we have the following result (see, for example, [28]). ( j) n,(i+1)δ n has a multinomial distriProposition 3.3. At branching time (i + 1)δ, O(i+1)δ = o j j=1
bution, then the conditional mean is proportional to the normalised weights of their parents: n,(i+1)δ n,(i+1)δ E˜ o j F(i+1)δ− = n a¯ j ′ (3.7) for 1 ≤ j ≤ n; and the condition variance and covariance satisfy n,(i+1)δ n,(i+1)δ n,(i+1)δ n,(i+1)δ E˜ ol − n a¯ l oj − n a¯ j F(i+1)δ− n,(i+1)δ n a¯ n,(i+1)δ 1 − a¯ j , l= j j = n,(i+1)δ n,(i+1)δ −n a¯ l a¯ j , l ̸= j for 1 ≤ l, j ≤ n.
(3.8)
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2651
The multinomial resampling algorithm essentially states that, at branching times, we sample n times (with replacement) from the population of Gaussian random variables X nj ((i +1)δ) (with means v nj ((i + 1)δ− ) and variances ωnj ((i + 1)δ− )), j = 1, . . . , n according to the multinomial probability distribution given by the corresponding normalised weights a¯ nj ((i + 1)δ− ), j = n,(i+1)δ
1, . . . , n. Therefore, by definition of multinomial distribution, o j n,(i+1)δ X nj ((i +1)δ) is chosen at time (i +1)δ; that is to say, o j
is the number of times
is the number of offspring produced
by this Gaussian random variable. 4. Convergence analysis In this section we deduce the evolution equation of the approximating measure ρ n for the generalised particle filters with Gaussian mixtures, and show its convergence to the target measure ρ – the solution of the Zakai equation, as well as the convergence of π n to π — the solution of the Kushner–Stratonovich equation. The correction mechanism for the generalised particle system involves either the use of the Tree Based Branching Algorithm (TBBA) or the multinomial resampling algorithm. These will be investigated in Sections 4.2 and 4.3 respectively. 4.1. Evolution equation for ρ n We firstly define the process ξ n = {ξtn ; t ≥ 0} by [t/δ] n n 1 1 n,iδ n n a a (t) . ξt , n j=1 j n j=1 j i=1 Then ξ n is a martingale and by Exercise 9.10 in [4] we know for any t ≥ 0 and p ≥ 1, there t, p t, p exist two constants c1 and c2 which depend only on t, p, and maxk=1,...,m ∥h k ∥0,∞ , such that t, p (4.1) sup sup E˜ (ξsn ) p ≤ c1 , n≥0 s∈[0,t]
and t, p max sup sup E˜ (ξsn a nj (s)) p ≤ c2 .
j=1,...,n n≥0 s∈[0,t]
(4.2)
We use the martingale ξ n to linearise π n in order to make it easier to analyse its convergence. Let ρ n = {ρtn ; t ≥ 0} be the measure-valued process defined by ρtn , ξtn πtn =
n n ξ[t/δ]δ
n
a nj (t)Γv nj (t),ωnj (t) ,
(4.3)
j=1
where Γv nj (t),ωnj (t) is the Gaussian measure with mean v nj (t) and variance ωnj (t). We will show the convergence of ρ n to ρ as the number of generalised particles n increases. The following proposition describes the evolution equation satisfied by the approximating sequence ρ n = {ρtn ; t ≥ 0} constructed using the algorithm described in the previous section. As discussed in Section 3, the approximation algorithm is constructed for the case where the state space of the signal process X is R. We adopt this assumption in this and the following sections. We first introduce the following notations:
2652
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673 n 1 Rs, j (ϕ) = ω j (s)
1 α ( f ϕ ′′′ )(v nj (s)) + (σ ϕ (4) )(v nj (s)) 2 4
(4)
+ 2ασ 2 (v nj (s))I4, j (ϕ) − I j (Aϕ) ασ 2 (v nj (s)) (5) I5, j (ϕ) + (ωnj (s))2 f (v nj (s))I4, j (ϕ) + 2 ωnj (s) 1−α 2 n (6) + σ (v j (s))I4, j (ϕ) , 2 1 (4) 2 n n ′′ n Rs, (ϕ) = ω (s) h(v (s))ϕ (v (s)) − I (hϕ) + (ωnj (s))2 h(v nj (s))I4, j (ϕ), j j j j j 2 √ 1 3 1 − α σ (v nj (s))ϕ ′ (v nj (s)) + ωnj (s)σ (v nj (s))ϕ ′′′ (v nj (s)) Rs, (ϕ) = j 2 (5) + (ωnj (s))2 σ (v nj (s))I4, j (ϕ) ; −y 2
(4.4) (4.5)
(4.6)
(1 − u)3 dudy, for k = 4, 5, 6; ϕ (k) v nj (s) + uy ωnj (s) 6 R 0 5 −y 2 1 u(1 − u)3 y e 2 I5, j (ϕ) = dudy; ϕ (5) v nj (s) + uy ωnj (s) √ 6 2π 0 R 2 −y 2 1 y e 2 I j (ψ) = ψ ′′ v nj (s) + uy ωnj (s) (1 − u)dudy, for ψ = Aϕ, hϕ. √ 2π 0 R
(k) I4, j (ϕ)
y4e 2 √ 2π
=
1
Proposition 4.1. The measure-valued process ρ n = {ρtn : t ≥ 0} satisfies the following evolution equation: t t n,ϕ n,ϕ n n n (4.7) ρt (ϕ) = ρ0 (ϕ) + ρs (Aϕ)ds + ρsn (hϕ)dYs + M[t/δ] + Bt 0
Cbm (R)
for any ϕ ∈ discrete process n,ϕ
M[t/δ] =
1 n
0
n,ϕ
and t ∈ [0, T ] with m ≥ 6. In (4.7), M n,ϕ = {Mi
[t/δ] i=0
ξiδn
n
on,iδ j
R
j=1
−
− n a¯ nj (iδ−)
R
(x−X nj (iδ))2
e ϕ(x) −
e ϕ(x)
, i > 0 and i ∈ N} is the
2ζ 2
dx
2π ζ 2
(x−v nj (iδ−))2 2ωnj (iδ−)
2π ωnj (iδ−)
(4.8)
dx n,ϕ
where X nj (iδ) ∼ N (v nj (iδ−), ωnj (iδ−)) is a Gaussian random variable. Also in (4.7), Bt following process: n t 1 ( j) n,ϕ n 1 2 3 Bt = ξ[s/δ]δ a nj (s) Rs, (ϕ)ds + R (ϕ)dY + R (ϕ)d V . s s j s, j s, j n j=1 0
is the
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Proof. For any ϕ ∈ Cbm (R) and t ∈ [iδ, (i + 1)δ), we have from (4.3) that 2 n 1 ξiδn y n n n n ρt (ϕ) = a (t) ϕ v j (t) + y ω j (t) √ exp − dy, n j=1 j 2 2π R
2653
(4.9)
with similar formulas for Aϕ and hϕ. We have the following Taylor expansions p−1 k 2 k y (ωnj (t)) 2 ψ (k) (v nj (t)) ψ v nj (t) + y ωnj (t) = k! k=0 p 1 1 (4.10) + y 2 p ωnj (t) ψ (2 p) v nj (t) + uy ωnj (t) (1 − u)2 p−1 du, 0 (2 p)! where ψ can be ϕ, Aϕ, or hϕ. By applying (4.10) (for p = 2 and p = 1) to (4.9) and the similar identities for Aϕ and hϕ, noting the fact that for any k ≥ 1 and k ∈ N, 2 2 k 1 1 y y y 2k−1 √ exp − y 2k √ exp − dy = 0, dy = (2 j − 1), 2 2 2π 2π R R j=1
we obtain that n ξiδn 1 a nj (t) ϕ(v nj (t)) + ωnj (t)ϕ ′′ (v nj (t)) n j=1 2 n n 2 (4) ξ n a j (t) ωnj (t) I4, j (ϕ); + iδ n j=1
ρtn (ϕ) =
(4.11)
ρtn (Aϕ) =
n n ξn ξiδn a nj (t) (Aϕ) (v nj (t)) + iδ a n (t)ωnj (t)I j (Aϕ); n j=1 n j=1 j
(4.12)
ρtn (hϕ) =
n n ξn ξiδn a nj (t) (hϕ) (v nj (t)) + iδ a n (t)ωnj (t)I j (hϕ). n j=1 n j=1 j
(4.13)
Next we apply Itˆo’s formula to Eq. (4.11), with the particles satisfying Eqs. (3.3). After substituting (4.12) and (4.13), we have, for t ∈ [iδ, (i + 1)δ), t t n ρtn (ϕ) = ρiδ (ϕ) + ρsn (Aϕ)ds + ρsn (hϕ)dYs iδ
t
+ iδ
iδ
n 1 ( j) 1 2 3 ξiδn a nj (s) Rs, . j (ϕ)ds + Rs, j (ϕ)dYs + Rs, j (ϕ)d Vs n j=1
(4.14)
Let Fiδ− = σ (Fs , 0 ≤ s < iδ) be the σ -algebra of the events up to time iδ (the time of the n = lim n 2 ith-branching) and ρiδ− t↗iδ ρt . For any t ≥ 0, we have ρtn (ϕ) = ρ0n (ϕ) +
[t/δ]
n n (ρiδ (ϕ) − ρiδ− (ϕ)) +
i=1
[t/δ]
n (ρiδ− (ϕ)
i=1
n n − ρ(i−1)δ (ϕ)) + (ρtn (ϕ) − ρ[t/δ]δ (ϕ)).
2 We use the standard convention 0
k=1 = 0.
(4.15)
2654
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
At the ith correction event, each Gaussian measure is replaced by a random number (on,iδ j ) of offspring. Each offspring is a Gaussian measure with mean X nj (iδ) and variance αβ, where X nj (iδ) ∼ N (v nj (iδ−), ωnj (iδ−)). The weights of the offspring generalised particles are reinitialised to 1, i.e. a nj (iδ) = 1; hence a¯ nj (iδ) = 1/n. So n πiδ =
1 n
n
on,iδ j Γ X n (iδ),ζ 2 ,
and
j
j=1
n πiδ (ϕ) =
1 n
n
on,iδ j
j=1
R
−
(x−X nj (iδ))2
e ϕ(x)
2ζ 2
2π ζ 2
d x.
Before the correction event, we have 2 n 1 y n n n n n ρiδ− (ϕ) = ξiδ a¯ j (iδ−) ϕ v j (iδ−) + y ω j (iδ−) √ exp − dy. 2 2π R j=1 We then obtain [t/δ]
n,ϕ
Mt/δ ,
n n ρiδ (ϕ) − ρiδ− (ϕ)
i=0
1 n
=
[t/δ]
ξiδn
n
on,iδ j
R
j=1
i=0
−
e ϕ(x) −
− n a¯ nj (iδ−)
R
(x−X nj (iδ))2
e ϕ(x)
2ζ 2
dx
2π ζ 2
(x−v nj (iδ−))2 2ωnj (iδ−)
2π ωnj (iδ−)
dx .
(4.16)
For t ∈ [(i − 1)δ, iδ), for i = 1, 2, . . . , [t/δ], using (4.14), we obtain that [t/δ]
n n n (ρiδ− (ϕ) − ρ(i−1)δ (ϕ)) + (ρtn (ϕ) − ρiδ (ϕ))
i=1
= ρ0n (ϕ) + + 0
t
0
t
ρsn (Aϕ)ds +
0
t
ρsn (hϕ)dYs
n 1 ( j) n 1 2 3 ξ[s/δ]δ a nj (s) Rs, (ϕ)ds + R (ϕ)dY + R (ϕ)d V . s s j s, j s, j n j=1
Finally, (4.16) and (4.17) imply (4.7), which completes the proof.
(4.17)
Corollary 4.2. Under the same assumption as in Proposition 4.1, if we further assume that α = 0 in (3.3), we recover the classic particle filters with mixture of Dirac measures. 4.2. Convergence results for generalised particle filters using the TBBA In order to investigate the convergence of the approximating measure ρ n , we consider the mild form of the Zakai equation. One should note that the proof of the convergence in [4] using t,ϕ t,ϕ the dual, ψs , of the measure-valued process ρ does not work for our model. ψs is measurable 2 (ψ t,ϕ ); howt with respect to the backward filtration Ys = σ (Yt − Yr , r ∈ [s, t]), and so is Rs, s j t 2 t,ϕ 2 (ψ t,ϕ ) is measurable with respect to the ever, the Itˆo’s integral 0 Rs, j (ψs )dYs requires Rs, s j
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2655
forward filtration Ys = σ (Yr , r ∈ [0, s]). This leads to an anticipative integration which cannot be tackled in a standard manner. Another approach is therefore required. Markov semigroups were used in [9] to obtain relevant bounds on the error which in turn enables us to discuss the convergence rate. In the following this idea will be discussed in some details. We introduce first the Zakai equation for generic time-inhomogeneous test functions. Let ϕ˜ : [0, ∞) × R → R be a bounded measurable function with continuous bounded derivatives up to order m (m ≥ 6). Then for ϕ, ˜ the time-inhomogeneous Zakai equation can be written as ∂ ϕ˜ + Aϕ˜ dt + ρt (h ϕ)dY ˜ (4.18) dρt (ϕ) ˜ = ρt t. ∂t Now fix s > 0 and define a specific test function as ϕ(t, ˜ x) = Ps−t ϕ(x),
t ∈ [0, s]
where (Pr )r ≥0 is the Markov semigroup whose infinitesimal generator is the operator A and ϕ is the single variable function which does not depend on t. It follows by the properties of semigroup that ∂ ϕ˜ = −A Ps−t ϕ, ∂t therefore (4.18) becomes dρt (Ps−t ϕ) = ρt (h Ps−t ϕ)dYt , and the integration form is ρt (Ps−t ϕ) = ρ0 (Ps ϕ) +
t
ρr (h Ps−r ϕ)dYr .
0
Similarly for ρtn (ϕ), from (4.7) for t ∈ [0, s] we obtain t n,Pϕ n,Pϕ ρtn (Ps−t ϕ) = ρ0n (Ps ϕ) + ρrn (h Ps−r ϕ)dYr + M[t/δ] + Bt
(4.19)
0
and the error of the approximation has the representation t (ρtn − ρt )(Ps−t ϕ) = (ρ0n − ρ0 )(Ps ϕ) + (ρrn − ρr )(h Ps−r ϕ)dYr 0
+
n,Pϕ M[t/δ]
+
n,Pϕ Bt ,
(4.20)
where n,Pϕ
• M[t/δ] is the same as in (4.8) in Proposition 4.1, except that ϕ is replaced by Ps−t ϕ. n,Pϕ
• Bt is the same as in Proposition 4.1, except that the integration variable s is replaced 1 (ϕ), R 2 (ϕ) and R 3 (ϕ) are replaced by R 1 (P by r , and the three coefficients Rs, j s, j s, j r, j s−r ϕ), 3 2 Rr, j (Ps−r ϕ) and Rr, j (Ps−r ϕ), respectively. In order to prove the convergence of the approximating measures ρtn to the actual measure ρt , we need to control all the terms on the right hand side of (4.20). Now we will discuss each of them respectively in the following lemmas.
2656
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Lemma 4.3. There exists a constant c( p) independent of n such that for any p ≥ 1 and ϕ ∈ Cb (R), we have c( p) E˜ (ρ0n (Ps ϕ) − ρ0 (Ps ϕ)) p ≤ p/2 ∥ϕ∥ p , t ∈ [0, T ]. n Proof. Note that ρ0n (Ps ϕ) − ρ0 (Ps ϕ) = π0n (Ps ϕ) − π0 (Ps ϕ), and also note that π0n (Ps ϕ) − π0 (Ps ϕ) =
n 1 ξj, n j=1
where ξ j , (Ps ϕ(v nj (0)) − π0 (Ps ϕ), j = 1, . . . , n are independent identically distributed random variables with mean 0, therefore √ by the Marcinkiewicz–Zygmund inequality (see [31]), there exists a constant c = c( p) ≤ (3 2) p p p/2 such that p c( p) E˜ ρ0n (Ps ϕ) − ρ0 (Ps ϕ) ≤ p/2 ∥ϕ∥ p , n which completes the proof. Lemma 4.4. For any T > 0, there exists a constant c1 (T ) independent of n such that for any ϕ ∈ Cb6 (R), 2 n t 1 E˜ ξ n a n (r )Rr,1 j (Ps−r ϕ)dr ≤ c1 (T )(αδ)2 ∥ϕ∥26,∞ . n j=1 0 [r/δ]δ j Proof. For αδ ≤ 1 we have from (4.4) that |Rr,1 j (Ps−r ϕ)| ≤ C R 1 αδ∥ϕ∥6,∞ , where C R 1 = C R 1 ( f, σ, ϕ) is a constant depending on the upper bounds of f, σ , and ϕ. Then by Jensen’s inequality, Fubini’s theorem and (4.2), we have 2 n t 1 ξ n a n (r )Rr,1 j (Ps−r ϕ)dr E˜ n j=1 0 [r/δ]δ j t n 2 1 2 n n 2 2 ˜ tC 1 (αδ) ∥ϕ∥6,∞ E ξ[r/δ]δ a j (r ) dr ≤ T 2 C 2R 1 c2T,2 (αδ)2 ∥ϕ∥26,∞ . ≤ n j=1 R 0 The result follows by letting c1 (T ) = T 2 C 2R 1 c2T,2 .
Lemma 4.5. For any T > 0, there exists a constant c2 (T ) independent of n such that for any ϕ ∈ Cb4 (R), 2 n t 1 ξ n a n (r )Rr,2 j (Ps−r ϕ)dYr ≤ c2 (T )(αδ)2 ∥ϕ∥24,∞ . E˜ n j=1 0 [r/δ]δ j Proof. From (4.5), we have for αδ ≤ 1 Rr,2 j (Ps−r ϕ) ≤ C R 2 αδ∥ϕ∥4,∞ ,
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2657
where C R 2 = C R 2 ( f, σ, h, ϕ) is a constant depending on the upper bounds of f, σ, h, and ϕ. Then Burkholder–Davis–Gundy and Jensen’s inequalities, Fubini’s theorem, and (4.2) yield 2 n t 1 ξ n a n (r )Rr,2 j (Ps−r ϕ)dYr E˜ n j=1 0 [r/δ]δ j t n n 1 2 ˜ 2 2 n n n n 2 2 ˜ ˜ ≤ 2 C R 2 C(αδ) ∥ϕ∥4,∞ dr E (ξ[r/δ]δ a j (r )) E (ξ[r/δ]δ ak (r )) n 0 j=1 k=1 ˜ t,2 (αδ)2 ∥ϕ∥24,∞ , ≤ T C 2R 2 Cc 2 ˜ t,2 . and the result follows by letting c2 (T ) = T C 2R 2 Cc 2
Lemma 4.6. For any T > 0, there exists a constant c3 (T ) independent of n such that for any ϕ ∈ Cb5 (R), 2 n t 1 ( j) ≤ c3 (T ) ∥ϕ∥25,∞ . E˜ ξ n a n (r )Rr,3 j (Ps−r ϕ)d Vr n j=1 0 [r/δ]δ j n Proof. From (4.6), we have for αδ ≤ 1 Rr,3 j (Ps−r ϕ) ≤ (Cˆ R 3 + C R 3 αδ)∥ϕ∥5,∞ , where Cˆ R 3 and C R 3 are constants depending on the upper bounds of f, σ , and ϕ. Then by Burkholder–Davis–Gundy and Jensen’s inequalities, Fubini’s theorem, and (4.2), and noticing the fact that {V ( j) }nj=1 are mutually independent Brownian motions, we have 2 n t 1 ( j) E˜ ξ n a n (r )Rr,3 j (Ps−r ϕ)d Vr n j=1 0 [r/δ]δ j t n 1 ˆ ( j) 2 2 n n ˜ ˜ CE ξ[r/δ]δ a j (r )d Vr ≤ 2 (C R 3 + C R 3 αδ) ∥ϕ∥5,∞ n 0 t j=1 1 ˜ ˆ T C(C R 3 + C R 3 αδ)2 c2t,2 ∥ϕ∥25,∞ , n ˜ Cˆ R 3 + C R 3 αδ)2 ct,2 . and the result follows by letting c3 (T ) = T C( 2 ≤
Recalling Proposition 4.1 and the semigroup operator P, we can decompose M n,ϕ in the following way n,ϕ
M[t/δ] =
1 n
[t/δ] i=0
ξiδn
n
on,iδ j
R
j=1
−
e ϕ(x) −
− n a¯ nj (iδ−)
R
n,ϕ
n,ϕ
e ϕ(x) n,ϕ
(x−X nj (iδ))2 2ζ 2
(x−v nj (iδ−))2 2ωnj (iδ−)
2π ωnj (iδ−)
, A[t/δ] + D[t/δ] + G [t/δ] ,
dx
2π ζ 2 dx
2658
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
where X nj (iδ) ∼ N v nj (iδ−), ωnj (iδ−) is a Gaussian distributed random variable, and n,ϕ
[t/δ] n 1 n ξiδ on,iδ − n a¯ nj (iδ−) ϕ(X nj (iδ)) ; j n i=1 j=1 −y 2 [t/δ] n 2 e 1 on,iδ ϕ X nj (iδ) + yζ √ ξn dy − ϕ(X nj (iδ)) ; = j n i=1 iδ j=1 2π R
A[t/δ] =
n,ϕ
D[t/δ] n,ϕ
G [t/δ] =
=
(4.21)
(4.22)
[t/δ] n 1 n n a¯ nj (iδ−) ξiδ n i=1 j=1 2 e− y2 × ϕ(X nj (iδ)) − ϕ v nj (iδ−) + y ωnj (iδ−) √ dy 2π R [t/δ] n
ξiδn a¯ nj (iδ−) ϕ(X nj (kδ)) − E˜ ϕ(X nj (kδ)) .
(4.23)
i=1 j=1
Then we have the following lemma: n,ϕ
Lemma 4.7. For any T > 0, and for any ϕ ∈ Cb2 (R), we have the following bounds for A[t/δ] , n,ϕ n,ϕ D[t/δ] and B[t/δ] : c (T ) 4 n,ϕ E˜ |A[t/δ] |2 ≤ √ ∥ϕ∥20,∞ , n δ n,ϕ E˜ |D[t/δ] |2 ≤ c5 (T )ζ 4 ∥ϕ∥22,∞ , c (T )α 6 n,ϕ E˜ |G [t/δ] |2 ≤ ∥ϕ∥21,∞ ; (4.24) n where c4 (T ), c5 (T ) and c6 (T ) are constants independent of n. Proof. See Appendix A.
The following theorem, which is a variation of Theorem 8 in [9], establishes the convergence of finite signed measure valued processes and allows us to use the bounds obtained from the k : C m (Rd ) → above Lemmas to get the convergence results of ρtn . We denote by as , as,r b m d Cb (R ) two bounded linear operators with bounds c and Ck (k = 1, . . . , K ) respectively, k (ϕ)∥ i.e., ∥as (ϕ)∥m,∞ ≤ c∥ϕ∥m,∞ and ∥as,r m,∞ ≤ C k ∥ϕ∥m,∞ . Theorem 4.8. Let µn = {µnt : t ≥ 0} be a signed measure-valued process such that for any ϕ ∈ Cbm (Rd ), m ≥ 6, any fixed L ≥ 1 and fixed s > t, we have µnt (ϕ) = µn0 (as (ϕ)) +
L l=1
n,ϕ
Rt,l +
K k=1 0
t
k µrn (as,r (ϕ))d Wrk ,
(4.25)
K is a K -dimensional Brownian motion. If for any T > 0 there exist constants where W = (W k )k=1 γ0 , γ1 , . . . , γ L such that for t ∈ [0, T ], p ≥ 2 and ql > 0 (l = 0, 1, . . . , L), γ0 γl p p n,ϕ E˜ |µn0 (as (ϕ))| p ≤ q ∥ϕ∥m,∞ , E˜ |Rt,l | p ≤ q ∥ϕ∥m,∞ , l = 1, . . . , L . (4.26) n 0 n l
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Then for any t ∈ [0, T ], we have ct p E˜ |µnt (ϕ)| p ≤ q ∥ϕ∥m,∞ , n where ct is a constant independent of n and q = min(q0 , q1 , . . . , q L ). Proof. See Appendix B.
2659
(4.27)
Applying the bounds in Lemmas 4.3–4.7, one obtains the rate of convergence of the approximation in terms of the three parameters n, δ and α, stated in the following theorem. Theorem 4.9. For any T ≥ 0, m ≥ 6, there exists a constant c7 (T ) independent of n, δ, α or ζ such that for any ϕ ∈ Cbm (R), we have for t ∈ [0, T ] E˜ (ρtn (ϕ) − ρt (ϕ))2 ≤ c7 (T )∥ϕ∥2m,∞ c(n, δ, α, ζ ), (4.28) where c(n, δ, α, ζ ) = max
1 1 α , (αδ)2 , √ , ζ 4 , . n n n δ
In what follows, we will discuss c(n, δ, α, ζ ) to obtain the L 2 -convergence rate of the approximation process ρtn . When α = 0 in (3.3), the component Gaussian measures have null covariance matrices, in other words they are Dirac measures. In this case ρ n is nothing other than the classic particle filter (see, for example, [4]). In this case several terms in c(n, δ, α, ζ ) coming from the covariance term disappear. The rate of convergence c(n, δ, 0) becomes: 1 1 c(n, δ, 0) = max , √ . n n δ Obviously the fastest rate is obtained when δ is a fixed constant independent of n. The L 2 convergence rate will be in this case of order 1/n, which coincides with the results in [4]. When α ∈ (0, 1], the rate of convergence can deteriorate. First of all let us observe that we still need to choose δ to be a fixed constant independent of n. Then the convergence depends on the simpler coefficient c(n, α) given by 1 2 4 α c(n, α, ζ ) = max , α , ζ , . n n √ In this case we need to choose α = √1n (or of order 1/ n) and β to be a fixed constant independent of n to ensure ζ is of order n −1/4 and the optimal rate of convergence, which equals 1/n. This discussion therefore leads to the following convergence result: Corollary 4.10. For any T ≥ 0, m ≥ 6, there exists constant c8 (T ) independent of n, such that for any ϕ ∈ Cbm (R), t ∈ [0, T ] and α ∝ √1n (defined in (3.3)), we have c (T ) 8 E˜ (ρtn (ϕ) − ρt (ϕ))2 ≤ ∥ϕ∥2m,∞ . n For the normalised approximating measure π n , we have the following main result.
(4.29)
2660
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Theorem 4.11. For any T ≥ 0, m ≥ 6, there exists a constant c9 (T ) independent of n such that for α ∝ √1n and ϕ ∈ Cbm (R), we have c9 (T ) E˜ |πtn (ϕ) − πt (ϕ)| ≤ √ ∥ϕ∥m,∞ , n
t ∈ [0, T ].
(4.30)
Proof. Observe that ρtn (ϕ) = ρtn (1)πtn (ϕ), we then have πtn (ϕ) − πt (ϕ) = ρtn (ϕ) − ρt (ϕ) (ρt (1))−1 − πtn (ϕ) ρtn (1) − ρt (1) (ρt (1))−1 . Use the fact that m t , E˜ (ρt (1))−2 < ∞ (see Exercise 9.16 of [4] for details), and by Cauchy–Schwarz inequality: E˜ |πtn (ϕ) − πt (ϕ)| n n 2 2 ˜ ˜ ≤ mt E (ρt (ϕ) − ρt (ϕ)) + ∥ϕ∥0,∞ E (ρt (1) − ρt (1)) , (4.31) and the result follows by applying Corollary 4.10 to the two expectations of the right hand side of (4.31). A stronger convergence result for ρtn and πtn will be proved in the following two propositions, from which we can see that their convergences are uniform in time t. Proposition 4.12. For any T ≥ 0, m ≥ 6, there exists a constant c10 (T ) independent of n such that for any ϕ ∈ Cbm+2 (R), ˜E sup (ρtn (ϕ) − ρt (ϕ))2 ≤ c10 (T ) ∥ϕ∥2 (4.32) m+2,∞ . n t∈[0,T ] Proof. By Proposition 4.1 and the fact that ρt (ϕ) satisfies Zakai equation, we have ρtn (ϕ) − ρt (ϕ) = (π0n (ϕ) − π0 (ϕ)) t t n,ϕ n + (ρs (Aϕ) − ρs (Aϕ))ds + (ρsn (hϕ) − ρs (hϕ))dYs + M[t/δ] 0
0
n ∞ (i+1)δ∧t 1 ( j) 1 2 3 + ξiδn a nj (s) Rs, . j (ϕ)ds + Rs, j (ϕ)dYs + Rs, j (ϕ)d Vs n j=1 i=0 iδ∧t
By Lemmas 4.4–4.6, we know that, 2 n ∞ (i+1)δ∧t 1 1 ≤ c1 (T )(αδ)2 ∥ϕ∥26,∞ ; E˜ sup ξiδn a nj (r )Rs, j (ϕ)ds t∈[0,T ] n j=1 i=0 iδ∧t 2 n ∞ (i+1)δ∧t 1 2 ≤ c2 (T )(αδ)2 ∥ϕ∥24,∞ ; E˜ sup ξiδn a nj (r )Rs, j (ϕ)dYs t∈[0,T ] n j=1 i=0 iδ∧t 2 n ∞ (i+1)δ∧t 1 ( j) 3 ≤ c3 (T ) ∥ϕ∥25,∞ . E˜ sup ξiδn a nj (r )Rs, j (ϕ)d Vs n t∈[0,T ] n j=1 i=0 iδ∧t
(4.33)
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2661
By Doob’s maximal inequality and Lemma 4.7 2 4c ([T /δ]) M n,ϕ n,ϕ E˜ max (Mi )2 ≤ 4E˜ M[T /δ] ≤ ∥ϕ∥21,∞ . i=1,...,[T /δ] n Now we only need to bound the first three terms on the right-hand side of (4.33). For the first term, using the mutual independence of the initial locations of the particles v nj (0), 1 1 2 2 n 2 ˜ π (ϕ ) − π (ϕ) ≤ ∥ϕ∥22,∞ . E[(π (ϕ) − π (ϕ)) ] = 0 0 0 0 n n For the second term, by Jensen’s inequality and Fubini’s Theorem, together with Corollary 4.10, we have t 2 T n ˜E sup E˜ (ρsn (Aϕ) − ρs (Aϕ))2 ds (ρs (Aϕ) − ρs (Aϕ))ds ≤T t∈[0,T ]
0
0
c8 (T )T 2 ∥ϕ∥2m+2,∞ . n For the third term, similarly, by Burkholder–Davis–Gundy inequality and Fubini’s Theorem, together with Corollary 4.10, we have t 2 T n n 2 ˜E sup ˜ ˜ (ρs (hϕ) − ρs (hϕ))dYs ≤ CE (ρs (hϕ) − ρs (hϕ)) ds ≤
t∈[0,T ]
0
0
≤ The above obtained bounds together imply (4.32).
˜ 8 (T )T ∥h∥2 Cc 0,∞ n
∥ϕ∥2m,∞ .
Similar to the proof of Theorem 4.11, we can show the following proposition. Proposition 4.13. For any T ≥ 0, m ≥ 6, there exists a constant c11 (T ) independent of n such that for any ϕ ∈ Cbm+2 (R), n c11 (T ) E˜ sup πt (ϕ) − πt (ϕ) ≤ √ ∥ϕ∥m+2,∞ . (4.34) n t∈[0,T ] Remark 4.14. The fact that the optimal value for α decreases with n is not surprising. As the number of particles increases, the quantisation of the posterior distribution becomes finer and finer. Therefore, asymptotically, the position and the weight of the particle provide sufficient information to obtain a good approximation. In other words, asymptotically the classic particle filter is optimal. Remark 4.15. As in this paper, the quality of the approximation (the error) is evaluated in weak sense, then a raw empirical probability distribution (as produced by a classical particle filter) would be enough, and no further improvement in terms of the convergence rate can be expected from a generalised particle filter using Gaussian mixtures. This observation is reflected in the argument below Theorem 4.9. To be specific, we see, for the convergence rate obtained in Theorem 4.9, that 1 1 1 1 2 4 α c(n, δ, α, ζ ) = max , (αδ) , √ , ζ , ≥ max , √ = c(n, δ, 0, 0), n n n n δ n δ
2662
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
which shows that the performance (or the convergence rate) of the simple design α = ζ = 0 (using classical particle filters) could not be improved. Remark 4.16. The approximations ρtn and πtn have smooth densities with respect to the Lebesgue measure, and that makes it possible to study various properties of the density of ρt from its approximation ρtn . However, since the approximation is evaluated in weak sense, it is not sufficient to ensure that various properties of the target density could be accurately evaluated using the approximating density.3 In principle, this would require a stronger notion of convergence. Indeed, it is possible to directly study the convergence of the density ptn of ρtn to the density pt of ρt . Recalling from Chapter 7 in [4], the density pt of ρt exists and satisfies the following equation: t t pt (x) = p0 (x) + A∗ ps (x)ds + h(x) ps (x)dYs (4.35) 0
0
in a suitably chosen function space, where A∗ is the adjoint of the operator A. The expression of the density ptn of ρtn can also be known from its definition (4.3). It is then possible to write the equation satisfied by ptn and conduct the convergence study of the density in a similar way as in the convergence analysis for the approximating measure. This also enables us to study the convergence of the approximating measure ρtn to ρt under the total variation norm, since the total variation distance between ρtn and ρt can be expressed in terms of the L 1 -norm of the difference between the two densities. In addition, if the quality of the approximation would be evaluated in this strong sense, one would expect that the best rate of convergence would be attained for nonzero values of α and ζ . However, we should note that we cannot get rid of the integral with respect to the Lebesgue measure in the same way as we did for the approximating measure. In other words, because of the presence of A∗ , the analogue of (4.19) for the density ptn cannot be obtained by using the exact same approach. This would require a separate study and we will leave it as future work. ˜ however, it is So far, the convergence results and L 2 -error are obtained under probability P; more natural to investigate these results under the original probability P. The following propositions state the convergence results under P. Proposition 4.17. For any T ≥ 0, m ≥ 6, there exists constant c12 (T ) independent of n, such that for any ϕ ∈ Cbm (R), t ∈ [0, T ] and α ∝ √1n (defined in (3.3)), we have c12 (T ) E |ρtn (ϕ) − ρt (ϕ)| ≤ √ ∥ϕ∥m,∞ . n ˜ we know that Proof. Recalling the derivation of the new probability P, t t 1 h 2 (X s )ds (t ≥ 0) Z˜ t = exp h(X s )dYs − 2 0 0 is an Ft -adapted martingale under P˜ and dP ˜ = Z t t ≥ 0. d P˜ F t
3 The authors would like to thank the reviewer for the valuable comments on this issue.
(4.36)
(4.37)
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2663
Therefore E |ρtn (ϕ) − ρt (ϕ)| = E˜ |ρtn (ϕ) − ρt (ϕ)| Z˜ t c (T )c n 8 z˜ ≤ E˜ |ρt (ϕ) − ρt (ϕ)|2 E˜ ( Z˜ t )2 ≤ ∥ϕ∥m,∞ . n The result follows by letting c12 (T ) = c8 (T )cz˜ . Proposition 4.18. For any T ≥ 0, m ≥ 6, there exists constant c13 (T ) independent of n, such that for any ϕ ∈ Cbm (R), t ∈ [0, T ] and α ∝ √1n (defined in (3.3)), we have c13 (T ) E |πtn (ϕ) − πt (ϕ)| ≤ √ ∥ϕ∥m,∞ . n
(4.38)
Proof. Notice that the Radon–Nikodym derivative restricted to the σ -algebra Yt can be written as dP ˜ = E[ Z t |Yt ] = ρt (1), d P˜ Yt where ρt (1) is the normalising factor of the unnormalised filter. Hence n ρ (ϕ) ρt (ϕ) − E |πtn (ϕ) − πt (ϕ)| = E˜ ρt (1) tn ρt (1) ρt (1) n n ρt (ϕ) n ˜ ˜ ≤ E |ρt (ϕ) − ρt (ϕ)| + E n |ρ (1) − ρt (1)| ρt (1) t n n ≤ E˜ |ρt (ϕ) − ρt (ϕ)| + ∥ϕ∥0,∞ E˜ |ρt (1) − ρt (1)| n 2 ˜ ≤ E (ρt (ϕ) − ρt (ϕ)) + ∥ϕ∥0,∞ E˜ (ρtn (1) − ρt (1))2 . (4.39) The proof is then completed by applying Corollary 4.10 to (4.39).
Remark 4.19. If the correction mechanism is done using the Tree Based Branching Algorithm (TBBA), L p -convergence of ρ n to ρ cannot be generally obtained. This is because, in general, n,ϕ ˜ As a result, one can only obtain we do not have a control on the pth moment of M[t/δ] under P. L 1 -convergence result for ρ n under the original probability P. 4.3. Convergence results using the multinomial branching algorithm In this section we show the convergence result for the case where the resampling is conducted by using Multinomial branching algorithm. The results in this section can be obtained in a similar manner as previous section, therefore the theorems in this section are only stated without proof. See [28] for detailed discussion and proofs of these results. Theorem 4.20. For any T ≥ 0, m ≥ 6, there exist constants c14 (T ) and c15 (T ) independent of n, such that for any ϕ ∈ Cbm (R), t ∈ [0, T ] and α ∝ √1n (defined in (3.3)), we have c (T ) 14 E˜ (ρtn (ϕ) − ρt (ϕ))2 ≤ ∥ϕ∥2m,∞ ; n c15 (T ) E˜ |πtn (ϕ) − πt (ϕ)| ≤ √ ∥ϕ∥m,∞ . n
(4.40) (4.41)
2664
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
In contrast with the discussion under the TBBA, one can show that, if the correction is done using the multinomial branching algorithm, L p -convergence result for ρ n and π n can be obtained for any p ≥ 2, namely we have the following theorem. Theorem 4.21. For any T ≥ 0, m ≥ 6, there exists constants c16 (T ) and c17 (T ) independent of n, such that for any ϕ ∈ Cbm (R), t ∈ [0, T ] and α ∝ √1n (defined in (3.3)), we have c16 (T ) p/2 E˜ |ρtn (ϕ) − ρt (ϕ)| p ≤ p/2 ∥ϕ∥m,∞ ; n c17 (T ) p/2 E˜ |πtn (ϕ) − πt (ϕ)| p ≤ p/2 ∥ϕ∥m,∞ . n
(4.42) (4.43)
5. A numerical example In this section, we present some numerical experiments to test the performance of the approximations with mixture of Gaussian measures. The model chosen in this case is the Beneˇs filter. This is a stochastic filtering problem with nonlinear dynamics for the signal process and linear dynamics for the observation process, with an analytical finite dimensional solution. The main reason for choosing this model is that it has a sufficient nonlinear behaviour to make it interesting, and more importantly, still has a closed form for its solution. 5.1. The model and its exact solution We assume that both the signal and the observation are one-dimensional. The dynamics of the signal X is given by t Xt = X0 + f (X s )ds + σ Vt , (5.1) 0
where f (x) = µσ tanh(µx/σ ). We further assume that the observation Y satisfies t Yt = h(X s )ds + Wt ,
(5.2)
0
where W is a standard Brownian motion independent of V , and h(x) = h 1 x + h 2 . We also assume that X 0 , µ, h 1 , h 2 ∈ R and σ > 0. Then from [10] we know that the conditional law of X t given Yt , σ (Ys , 0 ≤ s ≤ t) has the exact expression of a weight mixture of two Gaussian distributions. In other words, the conditional distribution πt of X t is πt = w+ N (A+ /(2Bt ), 1/(2Bt )) + w− N (A− /(2Bt ), 1/(2Bt )), where N (µ, σ 2 ) is the normal distribution with mean µ and variance σ 2 , and + 2 − 2 2 w± , exp (A± t ) /(4Bt ) / exp (At ) /(4Bt ) + exp (At ) /(4Bt ) A± t ,±
µ h1 X 0 + h2 h2 + h 1 Ψt + − coth(h 1 σ t) σ σ sinh(h 1 σ t) σ
h1 coth(h 1 σ t) 2σ t sinh(h 1 σ s) Ψt , dYs . 0 sinh(h 1 σ t) Bt ,
2665
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
We can, however, only observe Y at a finite partition Π m,T = {0 = t0 < t1 < · · · < tm−1 = T } of [0, T ] in practice; thus we approximate the integral in the definition of Ψt by Ψt ≈
i−1 sinh(h 1 σ tk+1 ) k=0
sinh(h 1 σ ti )
(Ytk+1 − Ytk ),
for ti ∈ Π n,T .
5.2. Numerical simulation results We set values for the parameters µ, σ, h 1 , h 2 , X 0 and T as follows: µ = 0.3,
h 1 = 0.8,
h 2 = 0.0,
σ = 1.0,
X 0 = 0.0,
T = 10.0;
and then we compute one realisation for X t and one realisation for Yt respectively using the Euler scheme with an equidistant partition Π m,T = {ti = mi T }i=0,...,m with m = 106 . The realisation for Yt is then fixed and will act as the given observation path. All the simulations will be done assuming that we are given the previously obtained Yt . With this previously simulated discrete path of Y , we can then approximate Ψt and consequently compute the values of A± t , Bt and wt± ; so that we can compute the conditional law of X t given Yt . At the branching time, we use the Tree Based Branching Algorithm. We will look at the convergence of the Gaussian mixture approximation and the classic particle filters in terms of the number of time steps in the partition and the number of particles respectively. We note that for the test function ϕ(x) = x, the Gaussian mixture approximation gives 2 n n 1 y n n n n πt (ϕ) = a¯ j (t) v j (t) + y ω j (t) √ exp − dy = a¯ nj (t)v nj (t). 2 2π R j=1 j=1 This is almost the same result as the classic particle filters, except that the evolution equations satisfied by v nj (t)s are slightly different in two cases (see Eq. (9.4) in [4] and Eq. (3.3) for details). It is therefore more interesting to look at πT (ϕ) for ϕ(x) = x 2 and ϕ(x) = x 3 , that is, the second and third moments of the system at time T given the observation Y up to time T . To be specific, we estimate πT (ϕ) by πTn (ϕ) with the number of particles (of Gaussian generalised particles) n = 40,000 and we choose various values for the number of time steps m in the partition. We compute πTn (ϕ) using classic particles and mixture of Gaussian measures respectively. Instead of the absolute error πTn (ϕ) − πT (ϕ), we consider the relative error n π (ϕ) − πT (ϕ) T . |πT (ϕ)| The convergence of both methods as the number of discretisation time steps m increases can be seen from the following Fig. 1, and for large number of time steps the Gaussian mixture approximation performs slightly better than the classic particle filters. In the following we fix the number of the discretisation time steps m = 100 and vary the number of (generalised) particles n in the approximating system. From Figs. 2 and 3 we can see the convergence of both approximations with the increase of the number of (generalised) particles. It can be seen (from the right hand side of Figs. 2 and 3) that Gaussian mixture approximation performs better than the classic particle filters when the number of (generalised) particles n is small. This is because by using the Gaussian mixture approximation, each (generalised) particle carries more information about the signal (from its
2666
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Fig. 1. Relative errors with time steps for ϕ(x) = x 2 (left) & ϕ(x) = x 3 (right).
Fig. 2. Relative errors with different number of (generalised) particles for ϕ(x) = x 2 .
Fig. 3. Relative errors with different number of (generalised) particles for ϕ(x) = x 3 .
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2667
variance) than the classic particle does. Therefore a smaller number of particles are required in order to obtain the same level of accuracy. As the number of (generalised) particles increases, we can see (from the left hand side of Figs. 2 and 3) that the Gaussian mixture approximation converges faster than the classic particle filters; and we are able to obtain a good approximation for both methods with 104 particles. There is no significant improvement if we increase the number n of (generalised) particles further more in both approximating systems. It can also be seen that, although the error decreases as n increases, it does not decrease monotonically and instead there is some fluctuation. This fluctuation comes from the additional randomness caused by the corrections or time discretisation step, which is not taken into account in the theoretical convergence analysis in this paper. 6. Conclusions In this paper we analyse a class of approximations of the posterior distribution under continuous time framework. In particular, we investigate in details the case where Gaussian mixtures are used to approximate the posterior distribution. The L 2 -convergence rate of such approximation is obtained. This method is a natural extension of the classic particle filters. In general, the approximating measure has a smooth density with respect to the Lebesgue measure, which is not satisfied by the classic particle filters; however, stronger notion of convergence is generally needed in order to study various properties about the density of ρt through its approximation ρtn . Furthermore, for a small number of particles, the Gaussian mixture particle filters also perform better. It can also be seen that the asymptotic behaviour (n → ∞) of the Gaussian mixtures approximation is similar to the classic particle filters, which is not surprising. As the number of (generalised) particles increases, the quantisation of the posterior distribution becomes finer and finer. Therefore, asymptotically, the positions and the weights of the particles provide sufficient information to obtain a good approximation. Apart from the L 2 -convergence rate, a central limit type result can also be obtained for such Gaussian mixtures approximations, which can be found in [28] with a comprehensive study. It can also be found in [8]. In addition, a separate work on the convergence of the density of ρ n is in preparation. Appendix A. Proof of Lemma 4.7 Without loss of generality, we choose the test function ϕ to be non-negative (since we can ′ write ϕ = ϕ + − ϕ − ). Since the random variables {on,kδ j ′ , j = 1, . . . , n} are negatively correlated (see Proposition 9.3 in [4]), it follows that n 2 1 Yiδ− ∨ Fiδ− − n a¯ nj (iδ−) ϕ(X nj (iδ)) E˜ ξ n on,iδ j n j=1 iδ ≤
n (ξiδn )2 2 ∥ϕ∥ n a¯ nj (iδ−) 1 − n a¯ nj (iδ−) 0,∞ 2 n j=1
≤
n (ξiδn )2 2 n ∥ϕ∥ − n a ¯ (iδ−) 1 0,∞ j n2 j=1
≤
√ √ (ξiδn )2 (ξiδn )2 2 δ = C δ ∥ϕ∥ nC ∥ϕ∥20,∞ . δ δ 0,∞ n n2
2668
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
By taking the expectation on both sides, we have n 2 Cδ c1t,2 √ 1 n n E˜ ξiδn on,iδ − n a ¯ (iδ−) ϕ(X (iδ)) δ∥ϕ∥20,∞ . ≤ j j j n j=1 n Therefore 2 [t/δ] Cδ ct,2 √ [t/δ]Cδ c1t,2 √ 2 1 ˜E An,ϕ δ∥ϕ∥ ≤ δ∥ϕ∥20,∞ ≤ 0,∞ [t/δ] n n i=1 tCδ ct,2 ≤ √ 1 ∥ϕ∥20,∞ δn
(A.1)
if we let c4 (T ) = T Cδ c1T,2 . n,ϕ For D[t/δ] , first by noting that 2
ϕ
X nj (iδ) +
R
e −y2 ζ 2 ′′ n yζ √ dy − ϕ(X nj (iδ)) = ϕ X j (iδ) + O (αβ)2 ; 2 2π
then it is clear that we only need to show [t/δ] n 2 1 ≤ c5 (T )ζ 4 ∥ϕ∥22,∞ . on,iδ ζ 2 ϕ ′′ X nj (iδ) ξn E˜ n i=1 iδ j=1 j
(A.2)
(A.3)
Observe that 2 n 1 Fiδ− E˜ ξiδn on,iδ ζ 2 ϕ ′′ X nj (iδ) j n j=1 n 2 (ξiδn )2 ζ 4 n,iδ Fiδ− ∥ϕ∥22,∞ E˜ o j − n a¯ nj (iδ−) + n a¯ nj (iδ−) ≤ n2 j=1 n 2 2(ξiδn )2 ζ 4 n,iδ 2 n ˜ ≤ ∥ϕ∥2,∞ E o j − n a¯ j (iδ−) Fiδ− n2 j=1 +
n
2 n a¯ nj (iδ−)
.
(A.4)
j=1
For Tree Based Branching Algorithm (TBBA), by Proposition 3.1, 2 1 n E˜ on,iδ − n a ¯ (iδ−) ¯ nj (iδ−) 1 − n a¯ nj (iδ−) ≤ ; F iδ− ≤ n a j j 4 and then by taking expectation on both sides of (A.4), we have n 2 1 n,iδ n 2 ′′ n E˜ ξiδ oj ζ ϕ X j (iδ) n j=1
(A.5)
2669
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
≤
≤
2∥ϕ∥22,∞ ζ 4 n n2 ∥ϕ∥22,∞ ζ 4 2n
4
2∥ϕ∥22,∞ ζ 4
+
n2
n
E (ξiδn )2
2˜
n
2 a¯ nj (iδ−)
j=1
n n n 4 ˜ 2 4 E (ξiδ ) E˜ a¯ ln (iδ−)2 E˜ a¯ nj (iδ−)2 + 2∥ϕ∥2,∞ ζ l=1 j=1
∥ϕ∥22,∞ ζ 4
+ 2∥ϕ∥22,∞ ζ 4 c1t,4 ec2 t 2n ≤ c T ζ 4 ∥ϕ∥22,∞ , where c T = 12 + 2 c1t,4 ec2 t . Therefore, we obtain [t/δ] [t/δ] n 2 1 n,iδ ≤ [t/δ] oj ζ 2 ϕ ′′ X nj (iδ) c T ζ 4 ∥ϕ∥22,∞ ξiδn E˜ n i=1 j=1 i=1 ≤
(A.6)
= c5 (T )ζ 4 ∥ϕ∥22,∞ if we let c5 (T ) = T 2 c T /δ 2 . n,ϕ As for G [t/δ] , first note that X nj (iδ) ∼ N v nj (iδ), ωnj (iδ) and X nj s are mutually independent ( j = 1, . . . , n), then we have n 2 Yiδ− E˜ ξiδn a¯ nj (iδ−) ϕ(X nj (kδ)) − E˜ ϕ(X nj (kδ)) j=1
≤
n 2 2 Yiδ− ξiδn a¯ nj (iδ−) 4∥ϕ ′ ∥20,∞ E˜ X nj (kδ) − E˜ X nj (kδ) j=1 n
≤4
n 2 2 ξiδn a¯ nj (iδ−) ∥ϕ∥21,∞ ∥σ ∥20,∞ αδ , cσ αδ∥ϕ∥21,∞ ξiδn a¯ nj (iδ−) .
j=1
j=1
We know from the proof of Lemma 4.4 in [9] that for any p > 0, E˜ then by taking the expectation on both sides, we have n 2 E˜ ξ n a¯ n (iδ−) ϕ(X n (kδ)) − E˜ ϕ(X n (kδ)) iδ j
j
p a¯ nj (t) ≤
j
j=1
≤ cσ αδ∥ϕ∥21,∞
n
E˜
2 ξiδn a¯ nj (iδ−)
j=1
≤
cσ αδ∥ϕ∥21,∞
n
E˜
ξiδn
4
E˜
4 a¯ nj (iδ−)
j=1
≤
1 cσ αδ∥ϕ∥21,∞ 2 n
n j=1
c1t,4 exp(c4 t)
=
cσ c1t,4 exp(c4 t) n
αδ∥ϕ∥21,∞ .
1 np
exp(c p t);
2670
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
Finally we have E˜
n,ϕ
G [t/δ]
2
≤
[t/δ]
cσ c1t,4 exp(c4 t) n
i=1
αδ∥ϕ∥21,∞ ≤
tcσ c1t,4 exp(c4 t) n
α∥ϕ∥21,∞ .
(A.7)
The result follows by letting c6 (T ) = T cσ c1T,4 exp(c4 T ). Appendix B. Proof of Theorem 4.8 The proof is standard and we present here its principle steps, further details can be found in Section A.2 in [28]. p We first show that for any t ∈ [0, T ]∥µnt (1)∥ p = E˜ |µnt (1)| p < ∞. Observe that for ϕ = 1 and t ∈ [0, T ] L K t n,1 k µnt (1) = µn0 (as (1)) + Rt,l + µrn as,r (1) d Wrk , l=1
k=1 0
then, by Minkowski, Burkholder–Davis–Gundy and Jensen’s inequalities we have that t p p p−1 p ˜ p/2−1 p p−1 p γ n K Kt C ∥µrn (1) ∥ p dr, (L + 1) q + 2 ∥µt (1)∥ p ≤ 2 n 0
(B.1)
where C = max(C1 , . . . , C K ). Then from Gronwall’s inequality we have p p E˜ µns (1) = ∥µns (1) ∥ p ≤ D < ∞. Using a similar approach, with 1 replaced by ϕ, we obtain γ p p ∥µnt (ϕ)∥ p ≤ 2 p−1 (L + 1) p q ∥ϕ∥m,∞ n K t p k + 2 p−1 K p−1 K˜ t p/2−1 E˜ µrn as,r (ϕ) dr. k=1 0
Now denote by t t p p k k (ϕ) dr = ∥µrn as,r (ϕ) ∥ p dr, E˜ µrn as,r Aks,t , 0
(B.2)
0
and ∆ = (L + 1) p nγq , we have p
p
∥µnt (ϕ)∥ p ≤ 2 p−1 ∆∥ϕ∥m,∞ + 2 p−1 K p−1 K˜ t p/2−1
β
Aks,t .
(B.3)
p p p ∥µnt (ϕ)∥ p ≤ 2 p−1 ∆∥ϕ∥m,∞ + 2 p−1 K p K˜ t p/2 C p D∥ϕ∥m,∞ .
(B.4)
k=1
Similarly, we have that
k (ϕ) in (B.4), we get that Replacing ϕ by as,r p p p k k k ∥µrn as,r (ϕ) ∥ p ≤ 2 p−1 ∆∥as,r (ϕ)∥m,∞ + 2 p−1 K p K˜ r p/2 C p D∥as,r (ϕ)∥m,∞ p
p
≤ 2 p−1 ∆C p ∥ϕ∥m,∞ + 2 p−1 K p K˜ r p/2 C 2 p D∥ϕ∥m,∞ ,
(B.5)
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2671
Substituting into (B.2) and denote by κ = p/2, we have for k = 1, . . . , K t κ+1 p p Aks,t ≤ 2 p−1 ∆C p t∥ϕ∥m,∞ + 2 p−1 K p K˜ C 2 p D ∥ϕ∥m,∞ κ +1
(B.6)
and (B.3) becomes p p p ∥µnt (ϕ)∥ p ≤ 2 p−1 ∆∥ϕ∥m,∞ + 22( p−1) K p K˜ C p t κ ∆∥ϕ∥m,∞
+ 22( p−1) K 2 p K˜ 2 C 2 p D
t 2κ p ∥ϕ∥m,∞ . κ +1
Repeat what was done in (B.5) and (B.6), and from (B.7), we have that t κ+1 p p Aks,t ≤ 2 p−1 C p ∆t∥ϕ∥m,∞ + 22( p−1) K p K˜ C 2 p ∆ ∥ϕ∥m,∞ κ +1 t 2κ+1 p + 22( p−1) K 2 p K˜ 2 C 3 p D ∥ϕ∥m,∞ ; (κ + 1)(2κ + 1) and then (B.3) becomes p p p ∥µnt (ϕ)∥ p ≤ 2 p−1 ∆∥ϕ∥m,∞ + 22( p−1) K p K˜ C p t κ ∆∥ϕ∥m,∞ t 2κ p + 23( p−1) K 2 p K˜ 2 C 2 p ∆∥ϕ∥m,∞ κ +1 t 3κ p + 23( p−1) K 3 p K˜ 3 C 3 p D ∥ϕ∥m,∞ . (κ + 1)(2κ + 1)
Repeat the iteration process again, we have that p
Aks,t ≤ 2 p−1 C p ∆t∥ϕ∥m,∞ + 22( p−1) K p K˜ C 2 p ∆
t κ+1 p ∥ϕ∥m,∞ κ +1
t 2κ+1 p ∥ϕ∥m,∞ (κ + 1)(2κ + 1) t 3κ+1 p + 23( p−1) K 3 p K˜ 3 C 4 p D ∥ϕ∥m,∞ ; (κ + 1)(2κ + 1)(3κ + 1) + 23( p−1) K 2 p K˜ 2 C 3 p ∆
and that p p p ∥µnt (ϕ)∥ p ≤ 2 p−1 ∆∥ϕ∥m,∞ + 22( p−1) K p K˜ C p t κ ∆∥ϕ∥m,∞ t 2κ p + 23( p−1) K 2 p K˜ 2 C 2 p ∆∥ϕ∥m,∞ κ +1 t 3κ p + 24( p−1) K 3 p K˜ 3 C 3 p ∆∥ϕ∥m,∞ (κ + 1)(2κ + 1) t 4κ p + 24( p−1) K 4 p K˜ 4 C 4 p D ∥ϕ∥m,∞ . (κ + 1)(2κ + 1)(3κ + 1)
In general after kth-iteration, we have that p,k
p
∥µnt (ϕ)∥ p , ∥µnt (ϕ)∥ p
≤ 2 p−1 ∆∥ϕ∥m,∞ + 22( p−1) β p K C p t κ ∆∥ϕ∥m,∞ p
p
(B.7)
2672
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
t 2κ p ∆∥ϕ∥m,∞ κ +1 2k( p−1) β (k−1) p K k−1 C (k−1) p t (k−1)κ p ∆∥ϕ∥m,∞ + ··· + (κ + 1)(2κ + 1)(3κ + 1) · · · ((k − 2)κ + 1) trκ p k( p−1) kp k kp +2 β K C D ∥ϕ∥m,∞ . (κ + 1)(2κ + 1) · · · ((k − 2)κ + 1)((k − 1)κ + 1) + 23( p−1) β 2 p K 2 C 2 p
Letting k → ∞, we get that4 p E˜ |µnt (ϕ)| p = ∥µnt (ϕ)∥ p ≤ 2 p−1 ∆∥ϕ∥m,∞ + 22( p−1) K p K˜ C p t κ ∆∥ϕ∥m,∞ t 2κ p + 23( p−1) K 2 p K˜ 2 C 2 p ∆∥ϕ∥m,∞ κ +1 2k( p−1) K (k−1) p K˜ k−1 C (k−1) p t (k−1)κ p + ··· + ∆∥ϕ∥m,∞ (κ + 1)(2κ + 1)(3κ + 1) · · · ((k − 2)κ + 1) + ······ p
= 2 p−1 (α + 1) p
p
∞ (k−1)κ γ (k−1)( p−1) (k−1) p ˜ k−1 (k−1) p t p 2 K K C ∥ϕ∥ . m,∞ k−2 nq k=1 ( jκ + 1) j=0
t (k−1)κ Let ηt,k = 2(k−1)( p−1) K (k−1) p K˜ k−1 C (k−1) p k−2
j=0 ( jκ+1)
, we know ξt ,
∞
k=1 ηt,k
exists by the
following ratio test tκ ηt,k+1 = 2 p−1 K p K˜ C p = 0 < 1. k→∞ ηt,k (k − 1)κ + 1 lim
Finally the result (4.27) follows by setting ct = 2 p−1 (α + 1) p γ ξt . References [1] D.L. Alspach, H.W. Sorenson, Nonlinear Bayesian estimation using Gaussian sum approximations, IEEE Trans. Automat. Control AC-17 (1972) 439–448. [2] B.D.C. Anderson, J.B. Moore, Optimal Filtering, Prentice-Hall, Inc., Englewood Cliffs, New Jersey 07632, 1979. [3] C. Andrieu, A. Doucet, Particle filtering for partially observed Gaussian state space models, J. R. Stat. Soc. Ser. B 64 (4) (2002) 827–836. [4] A. Bain, D. Crisan, Fundemantals of Stochastic Filtering, in: Stochastic Modelling and Applied Probability, vol. 60, Spinger, New York, 2009. [5] A. Carmi, F. Septier, S. Godsill, The Gaussian mixture MCMC particle algorithm for dynamic cluster tracking, in: Proc. 12th Int. Conf. on Information Fusion, 2009, pp. 1179–1186. [6] R. Chen, J. Liu, Mixture Kalman filters, J. R. Stat. Soc. Ser. B 62 (3) (2000) 493–508. [7] D. Crisan, K. Li, Generalised particle filters with Gaussian measures, in: Proc. 19th European Signal Processing Conf., 2011, pp. 659–663. [8] D. Crisan, K. Li, A central limit type theorem for Gaussian mixture appoximations to the nonlinear filtering promblem, 2014, submitted for publication. arXiv:1401.6592. [9] D. Crisan, O. Obanubi, Particle filters with random resampling times, Stochastic Process. Appl. 122 (2012) 1332–1368.
4 We use the convention that −1 = 1. j=0
D. Crisan, K. Li / Stochastic Processes and their Applications 125 (2015) 2643–2673
2673
[10] D. Crisan, S. Ortiz-Latorre, A Kusuoka–Lyons–Victoir particle filter, Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 469 (2013). [11] N. de Freitas, Rao–Blackwellised particle filtering for fault diagnosis, in: IEEE Aerospace Conf. Proc., 4, 2002, pp. 1767–1772. [12] R.V. der Merwe, E. Wan, Gaussian mixture sigma-point particle filters for sequential probabilistic inference in dynamic state-space models, in: Proc. ICASSP, 2003, pp. 701–704. [13] P. Djuri´c, M. Vemula, M. Bugallo, Target tracking by particle filtering in binary sensor networks, IEEE Trans. Signal Process. 56 (6) (2008) 2229–2238. [14] A. Doucet, N. de Freitas, K. Murphy, S. Russell, Rao–Blackwellised particle filtering for dynamic Bayesian networks, in: Proc. of the Sixteenth conf. on Uncertainty in Artificial Intelligence, 2000, pp. 176–183. [15] A. Doucet, S. Godsill, C. Andrieu, On sequential Monte Carlo sampling methods for Bayesian filtering, Stat. Comput. 10 (2000) 197–208. [16] M. Flament, G. Fleury, M. Davoust, Particle filter and Gaussian-mixture filter efficiency evaluation for terrain-aided navigation, in: Proc. EUSIPCO, 2004, pp. 605–608. [17] D. Fox, Adapting the sample size in particle filters through KLD-sampling, Int. J. Robot. Res. 22 (12) (2003) 985–1003. [18] D. Guo, X. Wang, R. Chen, New sequential Monte Carlo methods for nonlinear dynamic systems, Stat. Comput. 15 (2005) 135–147. [19] F. Gustafsson, Particle filter theory and practice with positioning applications, IEEE Aerosp. Electron. Syst. Mag. 25 (7) (2010) 53–82. [20] F. Gustafsson, et al., Particle filters for positioning, navigation, and tracking, IEEE Trans. Signal Process. 50 (2) (2002) 425–437. [21] M. Iglesias, K. Law, A. Stuart, Evaluation of Gaussian approximations for data assimilation in reservoir models, Comput. Geosci. (2013) 1–35. [22] G. Kallianpur, R. Karandikar, White noise calculus and nonlinear filtering theory, Ann. Probab. 13 (4) (1985) 1033–1107. [23] I. Karatzas, S. Shreve, Brownian Motion and Stochastic Calculus, second ed., Springer-Verleg, New York, 1991. [24] J. Kotecha, P. Djuri´c, Gaussian particle filtering, IEEE Trans. Signal Process. 51 (2003) 2593–2602. [25] J. Kotecha, P. Djuri´c, Gaussian sum particle filtering, IEEE Trans. Signal Process. 51 (2003) 2602–2612. [26] C. Kwok, D. Fox, M. Meila, Real-time particle filters, IEEE Proc. 92 (3) (2004) 469–484. [27] F. Le Gland, V. Monbet, V. Tran, Large sample asymptotics for the ensemble Kalman filter, in: Handbook on Nonlinear Filtering, Oxford University Press, 2010. [28] K. Li, Generalised particle filters (Ph.D. thesis), Imperial College, London, UK, 2013. [29] M. Morelande, S. Challa, Manoeuvring target tracking in clutter using particle filters, IEEE Trans. Aerosp. Electron. Syst. 41 (1) (2005) 252–270. [30] S. Reich, A Gaussian mixture ensemble transform filter, 2011, arXiv:1102.3089. [31] Y. Ren, H. Liang, On the best constant in Marcinkiewicz-Zygmund inequality, Statist. Probab. Lett. 53 (3) (2001) 227–233. [32] H. Sorenson, D. Alspach, Recursive Bayesian estimation using Gaussian sums, Automatica 7 (1971) 465–479. [33] X. Sun, L. Munoz, R. Horowitz, Mixture Kalman filter based highway congestion mode and vehicle density estimator and its application, Proc. American Control Conf. 3 (2004) 2098–2103. [34] P. Tam, J. Moore, A Gaussian sum approach to phase and frequency estimation, IEEE Trans. Commun. COM-25 (1977) 435–942. [35] W. Wu, M. Black, D. Mumford, Y. Gao, E. Bienenstock, J. Donoghue, Modeling and decoding motor cortical activity using a switching Kalman filter, IEEE Trans. Biomed. Eng. 51 (6) (2004) 933–942. [36] M. Zakai, On the optimal filtering of diffusion processes, Z. Wahrscheinlichkeitstheor. Verwandte Geb. 11 (1969) 230–243.