Nonlinear filtering in unknown measurement noise and target tracking system by variational Bayesian inference

Nonlinear filtering in unknown measurement noise and target tracking system by variational Bayesian inference

JID:AESCTE AID:4809 /FLA [m5G; v1.246; Prn:24/10/2018; 11:12] P.1 (1-19) Aerospace Science and Technology ••• (••••) •••–••• 1 67 Contents lists ...

2MB Sizes 0 Downloads 24 Views

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.1 (1-19)

Aerospace Science and Technology ••• (••••) •••–•••

1

67

Contents lists available at ScienceDirect

68

2 3

Aerospace Science and Technology

4

69 70 71

5

72

6

www.elsevier.com/locate/aescte

7

73

8

74

9

75

10

76

12 13

Nonlinear filtering in unknown measurement noise and target tracking system by variational Bayesian inference

OF

11

14 16 17 18

a

Xingkai Yu , Jianxun Li

a,∗

, Jian Xu

b

a

Department of Automation, School of Electronic Information and Electric Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai, China b Department 8th, Nanjing Research Institute of Electronic Engineering, Nanjing 210014, China

RO

15

19 20

23 24 25 26 27 28 29 30 31 32

i n f o

Article history: Received 2 February 2018 Received in revised form 31 August 2018 Accepted 31 August 2018 Available online xxxx Keywords: Nonlinear filtering Target tracking system Probability density function Variational Bayesian inference Iterative algorithm Unknown measurement noise

a b s t r a c t

This paper considers a class of nonlinear filtering algorithms based on variational Bayesian (VB) theory to settle the unknown measurement noise problem in target tracking system. When the unknown measurement noise is conditionally independent of states, based on the variational idea, estimate of probability density function of state is converted into approximation two probability density functions for both unknown noise and nonlinear states. Then, an iterative algorithm is established to jointly estimate the state and the unknown measurement noise using variational Bayesian inference. Thus, the unknown measurement noise could be estimated as hidden state. The convergence result of the proposed nonlinear probability density function approximation algorithm is also given. The simulation results of typical examples show that the proposed VB based methods have superior performance to these classic algorithms in target tracking problems. © 2018 Elsevier Masson SAS. All rights reserved.

DP

22

a r t i c l e

TE

21

33 34 35

EC

36 37 38 39

1. Introduction

40

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

CO RR

42

Study of systems with unknown measurement noise has recently received much attention due to its common and broad application in aeronautic and astronautic engineering. Many literature and algorithms have been proposed to address this problem. Generally, unknown any statistic information on measurement noise or missing part of statistic information of measurement noise (such as mean, variance or type of probability density function) together are called as unknown measurement noise. This paper discusses unknown any information on measurement noise. The existing methods generally settle the unknown mean of measurement noise [1, 2], unknown variance [3–7], and address the non-Gaussian measurement noise using different types of distribution [8–11]. These works mainly focus on linear system and parameters estimation. For example, S. Sarkka et al. have addressed this problem in linear system [12] and nonlinear system [13]. Sarkka’s works focus on parameter estimation like unknown variance of measurement noise [12,14]. Even from the prospective of probability density function (pdf) approximation, their methods apply different distributions (Gaussian sum [11], Student-t [9], Wishart distribution [10] and so

UN

41

60 61 62 63 64 65 66

*

Corresponding author. E-mail address: [email protected] (J. Li).

https://doi.org/10.1016/j.ast.2018.08.043 1270-9638/© 2018 Elsevier Masson SAS. All rights reserved.

on) to model the unknown measurement noise, then this problem changes to a parameter estimation problem with prior distribution information. But, in real environment and in nonlinear system, we don’t have enough prior information and only have a small amount of information on nonlinear model. So, the paper proposes an variational method to settle it in nonlinear system by using variational Bayesian (VB) inference to approximate the pdf of measurement noise. We would like to emphasize that this problem is considered in the context of nonlinear tracking applications. Optimal estimation problem for nonlinear systems poses a challenge since it needs the accurate description of the conditional probability density function which in general requires an infinite number of parameters [15]. Variational Bayesian inference [16,17] is a powerful tool for machine learning of probabilistic models which is more accurate than traditional point estimates (maximum a posteriori, least squares, maximum likelihood), but still very fast compared to sampling methods and Rao–Blackwell theorem [18]. Moreover, it has a closed-form solution and a criterion to guarantee convergence. It has been developed for a wide range of models to perform approximate posterior inference at low computational cost in comparison with the sampling methods and is special advantage in latent variable models where the number of unknown variables is often very large which makes point estimates to overfitting. Uncertainty of the unknown variables is taken into account by estimating their probability distribution function. Recently, the

77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

68 69

2.1. Model introduction

70 71

Consider the nonlinear discrete-time stochastic system:

72 73

xk = f (xk−1 ) + w k−1

(1)

74

zk = g (xk ) + v k

(2)

76

75 77

where xk ∈ R nx is the state vector, f (·) is the nonlinear state function, w k ∈ R r denotes zero-mean Gaussian white noise with E { w k w Tj } = Q k δkj (k = j ), where δkj is the Dirichlet function. zk ∈ R n z is the observation vector, g (·) is the nonlinear measurement function, v k ∈ R n z is the unknown measurement noise with nonGaussian distribution [27]. nx , n z , and r denote the dimension of related variables, respectively. When v k is unknown, how to settle it? S. Sarkka et al. have addressed these problems in linear system [12] and nonlinear system [13]. Their works focus on parameter estimation like unknown variance of measurement noise [12,14]. Even from the prospective of pdf approximation, their methods apply different distributions (Gaussian sum [11], Student-t [9], Wishart distribution [10] and so on) to model the unknown measurement noise, then this problem changes to parameter estimation problem with prior distribution information. In target tracking application, the reference [11] uses the VB technique to address the state space estimation problem of linear systems in unknown GMM measurement noise, in particular, for nonmaneuvering targets. v k is unknown non-Gaussian measurement noise and is approximated by a finite Gaussian mixture. Then, this problem becomes a parameter estimation problem. Also, the paper gives a single target moving in a 2-D space with a constant velocity (CV) model to evaluate the performance of unknown measurement noise in linear system. Zhu et al. [9] settle the unknown measurement noise problem in sensor fusion for linear target tracking system. v k is modeled by a Student-t distribution. Then, this problem also becomes a parameter estimation of Student-t distribution. Dong et al. [10] propose a variational Bayesian adaptive Cubature information filter based on Wishart distribution. A target tracking scenario is considered in the paper. The target moves in a horizontal plane and undergoes a constant turn maneuver with unknown turn rate. The target dynamic can be modeled by a coordinated turning (CT) model with unknown turn rate. The measurement consists of range and bearing, so the observation with noise is formulated by nonlinear model. This nonlinear observation model is common in target tracking system. Therefore, this paper considers this problem in nonlinear tracking system is meaningful. We will not settle the unknown measurement noise by parameter estimation. We directly solve the problem from the probability density function of noise by using variational Bayesian inference to approximate the pdf of the measurement noise. We would like to emphasize again that we study this problem in nonlinear target tracking system. In aeronautic engineering and astronautic engineering, when f and g are linear, the optimal filtering algorithm can settle it on condition that v k is the zero-mean Gaussian white noise. But, under real condition, the system model f and g are nonlinear. Especially, in distance and angular target tracking, g (·) is nonlinear measurement function. Moreover, the measurement noise is unknown and obeys non-Gaussian or arbitrary complex distribution. How to settle this difficult problem? The variational Bayesian inference is considered and the MCMC sampling technique is used here.

OF

5

67

RO

4

2. Problem statement

EC

3

inference rule has been used in many disciplines, such as machine learning [19,20], signal processing [21], and communications [22]. These are more complicated models and the algorithm is not available in nonlinear state-space model [23]. The approximation pdf of the unknown measurement noise is a typical example of variational inference rather than other approaches. Insufficient prior statistics information on measurement noise make this problem more suitable for VB approach. So, this paper applies this powerful tool to tackle this hard problem. The main idea is of the paper settles measurement noise as hidden state, and combines with the system states to jointly estimate. The VB technique is applied to settle the nonlinear estimation problem from a pdf approximation standpoint. It is an approximation in the exact Bayesian inference where the true posterior is approximated by a simpler distribution. It gives the joint distribution over states and noise. The main advantage is that it can have robust estimation and automatic model selection [9]. The derivation of the proposed algorithm is by the VB inference, which makes the proposed algorithm more flexible for solving complex problems and studies the observation noise even further. The unknown measurement noise in target tracking is a typical application example. Target tracking problem [22] is one of the most challenging issues in military systems and aeronautic engineering. Target motion analysis deals with sequential state estimation of a moving object using noise corrupted bearingsonly measurements, collected by a single moving observer [5]. The noise corrupted measurements are usually abnormal distribution. We cannot model it. It is equivalent to don’t known any measurement noise information. This problem is important in passive sensing, using radar or sonar. The usual method is recursive Bayesian methods, because they provide the entire posterior pdf of state. Standard Bayesian algorithm typically assume additive zero-mean Gaussian measurement noise with known variance. However, in radar, sonar and phasor measurements, that the Gaussian assumption is used such that the sample mean and sample covariance matrix are unreliable. This assumption does not hold true in practice. For example, an investigation conducted by Pacific Northwest National Laboratory reveals that phasor measurement noise follows a heavy tailed distribution, which is non-Gaussian [24]. Moreover, the autonomous target motion is peculiar nonlinear filtering problem because the target range remains unobservable until the observer platform carries out an appropriate maneuver. Assuming a plane geometry and the target state, one needs to estimate from the sequences of one dimensional bearing measurements, the time-evolving four dimensional target state, and the statistics of measurement noise. Because the non-Gaussian and unknown type of measurement noise exist in application examples, we regard the thorny observation noise as completely unknown variable. Based on the result of a relevant research, this paper focuses on nonlinear target tracking systems without measurement noise information using a variational Bayesian technique. Since it is difficult to obtain an accurate analytical solution of pdf. The MCMC sampling technique [25,26] is concerned for initialization of the iterative algorithm. The proposed algorithm is ensured in accuracy and efficiency. Convergence conditions and convergence results of the use of sampling method are also introduced. Four typical application examples are made to demonstrate the superior performance to these classical methods in settling the unknown measurement noise problem in target tracking system. The structure is summarized as follows. First, problem statement is introduced in Section 2. The models and variational Bayesian inference are presented in this section. Section 3 proposes the nonlinear variational Bayesian filtering algorithm with convergence proof. Illustrative application results are given in Section 4. Section 5 concludes the paper.

CO RR

2

[m5G; v1.246; Prn:24/10/2018; 11:12] P.2 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

UN

1

AID:4809 /FLA

DP

2

TE

JID:AESCTE

78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.3 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

1

2.2. Variational Bayesian approach

special forms of any number of partitions [31]. It can be obtained through the solution of corresponding analytical expressions

2

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27



ln p ( z) =



28

+

29 30

35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

qc ( v ) ∝ exp{ E q(x) [ln p ( z, x, v )]}

(5)

80

qc (x) ∝ exp{ E q( v ) [ln p ( z, x, v )]}

(6)

(3)

where, KL(q(x, v )|| p (x, v | z)) ≥ 0 denotes Kullback–Leibler (KL) divergence between the chosen distribution q(x, v ) and true posterior distribution p (x, v | z). F (q(x, v )) denotes the divergence of variational free energy. When distributions between q(x, v ) and p (x, v | z) are same, equality holds. In addition, the divergence is minimum, and F (q(x, v )) is maximum. Variational Bayesian learning is through maximizing F (q(x, v )) by q(x, v ) iteration. The study of neural network has to be translated into the problem of approximating the function using an iterative neural network. This problem can be mathematically interpreted as representing the multivariate function by the compounding of a function of one variable. This is also Hilbert’s 13th conjecture. Hilbert’s 13th problem asked whether every continuous multivariate function can be written as superposition of continuous functions of 2 variables. Kolmogorov and Arnold show that every continuous multivariate function can be represented as superposition of continuous univariate functions and addition in a universal form and thus solved the problem positively. In Kolmogorov’s representation, only one univariate function (the outer function) depends on and all the other univariate functions (inner functions) are independent of the multivariate function to be represented. This greatly inspired research on representation and superposition of functions using Kolmogorov’s superposition theorem (KST) [29,30]. This theorem ensures the rapid development of neural network. So, according to this theorem,

lim |q(x, v ) − q(x)q( v )|

t ,n→∞

p ( z, x, m) = ρ (x, z)h(m, z), (7)

= lim |qt (x, v ) − (qt (x1 ) · · · qt (xn ) · · · qt ( v 1 ) · · · qt ( v n ) · · · )| t ,n→∞

= 0, and, let q(x, v ) ≈ q(x)q( v ), which means that states and the noise distribution function are independent of each other. It is appropriate in theory and realistic understanding. This two divisions are

74 75 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 92 93 95 96 97

(8)

98

ρ (x, z) h( v , z). In (7),

99

and similarly for q( v ) and p ( z, x, v ) =  h( v , z) ≡ E q(x) [h(·)] are the necessary moments of q(x). x, v denotes a chosen partition θ = (x, v ). In this paper, θ denotes the joint variables to be estimated. Convergence of the iterative algorithm has been proven [17]. Iterative algorithms are now as follows:

100 101 102 103 104 105

q˜ c ( v | z) ∝ exp{ E q˜ c−1 (x|z) [ρ (x, z) h( v , z)} q˜ c (x| z) ∝ {ρ (x, z) E q˜ c ( v |z) [h( v , z)]}.

73

94

qn (θ) ∝ exp{ E q(x) [ln p ( z, x, v )]},

EC

34

p (x, v | z)

72

here, ρ (x, z) and h( v , z) denote finite dimensional vectors. Marginal probability density functions become

dxdv

CO RR

33

q(x, v ) ln

q(x, v ) q(x, v )

70 71

until  F = | F n (q(x)) − F n−1 (q(x))| < th, th denotes the lower bound threshold. It is used to judge convergence. Its order of magnitude is lower. c represents the number of cycles. Here, you can choose any of the initial distribution q0 (x). Convergence of q∞ (x), q∞ ( v ) can be proven. Reference [21,32] provides a powerful tool for approximating probability (density) functions (pdfs) that exhibit a separable form:

dxdv

68

where the denominator is a normalization factor constant, and the distribution of each parameter q(xt ) will need to compute expectations for other distributed q(xk ). Therefore, it is similar to iterative EM algorithm. Algorithms can be summarized as the iterative process of the objective function F (q(x, v )). That initialize super parameters in q(x, v ). Then, the iteration parameter update to calculate for each cycle steps:

ln p ( z, x, m) = ρ (x, z) h(m, z) or

= F (q(x, v )) + KL(q(x, v )|| p (x, v | z)),

31 32

q(x, v ) ln

p ( z, x, v )

(4)

,

OF

7

exp[ E q( v ) [ln p ( z, x, v )]]dx

67 69

RO

6

exp[ E q( v ) [ln p ( z, x, v )]]

DP

5

q(x) = 

TE

4

In system (1)–(2), z represents observational data vectors. x and v respectively, denote the state and unknown hidden variables. Many problems based on variational approximation theory are introduced in traditional Bayesian inference and EM iterative estimation algorithm. The overall idea is to avoid complex multidimensional integral operation of marginal likelihood function p ( z) in the high dimensionality of multivariate environments. They considered a priori information p (x, v ) of the parameters, and used cluster distribution, to approximate the real posterior distribution p (x, v | z). Variational Bayesian method is applied by maximizing lower bound of log marginal likelihood function (the objective function) of the variational parameters to obtain the model parameters. What’s more, mean-field theory [28] in statistical physics is applied to make the joint probability distribution be approximated the product of marginal probability distribution function of each multivariate variables. It leads to making the joint estimate of multi-variables easily to convert into an iterative estimate the marginal distribution of these variables. It significantly reduces the computational complexity, and improves operational efficiency. Bayesian model for log marginal likelihood function can be expressed as the equivalent

UN

3

3

(9) (10)

The principle of the variational Bayesian inference is the joint estimate of states and the hidden states variables (unknown measurement noise). Then, an iterative algorithm can be derived. This is a guide to nonlinear filtering for unknown measurement noise problems. Next, we will apply it to solve the stated problem.

106 107 108 109 110 111 112 113 114

3. Proposed approach: nonlinear variational Bayesian filtering algorithm

115 116 117

3.1. Proposed algorithm

118 119

There are many cases of the unknown observation noise. In general sense, we only know that noise and state are independent of each other. Since the calculation of probability density function involved integral term are too complex for nonlinear system, it is difficult to obtain accurate analytical solution. The MCMC sampling method is then concerned for initialization of the iterative algorithm. In many nonlinear model (ρ , h in (7)), VB marginal probability density is no longer a standard form, so it will be difficult to estimate VB moments. Since approximation requires free-form, it is necessary to replace these non-standard VB marginal probability density [33] with a tractable one. The form q˜ ( v | z) can be replaced by an alternative one. This VB constrained approximation methods will introduce in the following.

120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.4 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

q˜ (θt | zt ) = q˜ (xt −1 | zt −1 )˜q( v t −1 | zt −1 ) is similar to the marginalized particle filtering method [35]. The marginal likelihood function can be handled by



q(xt , v t | v t −1 , z) ∝

33 34 35

38 39 40 41

44 45 46 47 48

ω



v ti ),

E qδ ( v t | v t −1 ,zt ) [h( v t | v t −1 , zt )] =

n 

ωti h( v ti , v t −1 , zt )

i =1

(14)

Note that terms in (13) and (14) are conditioned on v t −1 . Now, the nonlinear VB based algorithm is organized as follows—Algorithm 1:

1 Initialization: Get random samples v ti from q˙ ( v t | v ti −1 , zt ).

55

4 Evaluate the weights

56

5 Calculate moments using (11), (14), E qe ( v t | v t −1 ,zt ) [h( v t , v t −1 , zt )] ≡ hˆ ( v t ), get

60 61 62 63

ωti ∝

q˜ ( v ti | v t −1 , zt ) q˙ ( v ti | v t −1 , zt )

.

UN

54

2 Calculate moments E q˜ (xt | v t −1 ,zt ) [ρ (xt , v t −1 , zt )] ≡ ρˆ (xt ). 3 Then, get q˜ ( v t | v t −1 , zt ) ∝ exp{ρˆ (xt )h( v t , v t −1 , zt )}.

59

q˜ (xt | v t −1 , zt ) ∝ exp{ρ (xt , v t −1 , zt )hˆ (xt )}. 6 Judge algorithm convergence. If not converged, go back to step 2. 7 Generate the empirical marginal qe ( v t | zt ), calculated via step 2:

ωti = ωti ωti −1 .

66

75

.

76 77

q˜ (xt | zt ) ∝ exp{ E qe ( v t −1 |zt −1 ) [ρ (xt , v t −1 , zt ) E qe ( v t |zt ) h( v t , v t −1 , zt )]}, and qe ( v t | zt ), using

ω ∝ i t

q˜ ( v ti | v t −1 , zt ) q˙ ( v ti | v t −1 , zt )

. This avoids the iterative process.

84

= ραk−1 , βk−,i



P k = P k− − P k− H T (mk− ) H (mk− ) P k− H T (mk− ) + rˆk





βk,i = βk−,i

i

∂ f (xk ) xk =mk ∂ xk

−1 

−1

96



97

zk − g (mk− ) ,

H (mk− ) P k− ,



1 2



2

k

102 103 104

k

105 106 107

χi ,k = f (χi ,k−1 ), xˆ k− = 

2n



3 Pk =

108 109

i =0

2n

i =0

110 111

ωi χi ,k ,



ωi · χi ,k − xˆ k− χi ,k − xˆ k−

T

112

+ Q k−1 , i = 0, 1, · · · , 2n

113

χi ,k the sigma points and ωi the weight can be written by   χ0,k−1 = xˆ k−1 , χi ,k−1 = xˆ k−1 + (n + κ ) P k−1 i , i = 0, 1, · · · , n   χi ,k−1 = xˆ k−1 − (n + κ ) P k−1 i , i = n + 1, n + 2, · · · , 2n ω0 = κ /(n + κ ), ωi = 1/2(n + κ ), i = 0, 1, · · · , 2n where κ is a scaling parameter which determines the spread of the sigma

4 where

points around the mean. 7 Update: 8

μk =

9 Ck = +

the following marginal: q˜ (xt | zt ) ∝ exp{ E qe ( v t −1 |zt −1 ) [ρ (xt , v t −1 , zt ) E qe ( v t |zt ) h( v t , v t −1 , zt )]}.

Then, the constrained method is organized as follows. Using a fixed easily handled form is described as follows. Use a fixed

100 101

+ E xk zk − g (mk )k i ,

 2   = zk − g (mk− )k i + H k P k H kT ii . ∂ g (xk− ) , H (mk− ) = x− =m− . ∂ x− k

98 99

β

1 Prediction:

5 6

90

95

Algorithm 4: UKF

2

89

= ρ βk−1 , i = 1, · · · , d

,i rˆk = Erk = diag E σk2,1 , E σk2,1 , · · · , E σk2,d , E σk2,i = α k− , i = 1, · · · , d, k ,i 1

zk − g (mk− )k

88

94



Q k αk−,i

mk = mk− + P k− H T (mk− ) H (mk− ) P k− H T (mk− ) + rˆk

E xk

87

93



2

86

92

α0 , β0 , m0 and P 0 .

+

85

91

3 Update the noise statistics and the state:



80

83

This subsection will introduce these VB based nonlinear filters used in nonlinear tracking problem for nonlinear system with unknown measurement noise and will used in the following examples. The same design ideas can get VB-CKF and VB-GHKF which will be used in the following application examples.

1 , 2

79

82

3.2. VB based nonlinear filters

αk,i = αk−,i

78

81

2n

i =0 2n

i =0

ωi · h(χi ,k ), T k = 

2n

i =0





ωi · h(χi ,k ) − μk h(χi ,k ) − μk



T

115 116 117 118 120

+ Rk ,

T

ωi · χi ,k − xˆ k−1 h(χi ,k ) − μk , K k = C k / T k ,



114

119



10 xˆ k = xˆ k + K k ( zk − μk ), P k = P k − K k T k K kT .

121 122 123 124 125 126

8 Using samples v ti from step 1 and the following (marginal) weights generate

64 65

q˙ ( v ti | v t −1 , zt )

where v ti ∼ q˙ (·), weights are calculated by definition. VB moments now are available:

52

58

ωti ∝

3 Evaluate moments E qe ( v t | v t −1 , D t ) [h( v t , v t −1 , zt )] by (14). 4 Generate q˜ (xt | zt ) using:

4 F (mk ) =

i =1

51

57

2 Evaluate weights

(13)

Algorithm 1: Nonlinear filtering algorithm in unknown measurement noise

53

74

q( v ti | v t −1 , zt )

P k = F (mk−1 ) P k−1 F T (mk−1 ) +

49 50

73



i t δ( v t

70

1 Initialization: Draw samples v ti from q˙ ( v t | v ti −1 , zt ).

2 Predict the state and the noise parameters: mk = f (mk−1 )

(12)

68

71

Algorithm 3: VB-EKF

Then, q˜ (xt | v t −1 , zt ) and q˜ ( v t | v t −1 , zt ) are available. q˜ ( v t | v t −1 , zt ) can be projected into a weighted empirical approximation similar to sequential sampling method [34],

42 43

q˙ ( v ti | v t −1 , zt )

is given. The constrained al-

Algorithm 2: Constrained nonlinear filtering algorithm

(11)

= q˜ (xt | v t −1 , zt )˜q( v t | v t −1 , zt ).

q˜ ( v t | v t −1 , zt ) ≡ qδ ( v t | v t −1 , zt ) =

q( v ti | v t −1 , zt )

72

q(xt , v t | v t −1 , zt ) ≈ q˜ (xt , v t | v t −1 , zt )

n 

ωti ∝

gorithm is summarized as follows—Algorithm 2:

1 Initialize

36 37

69

Using VB approximation

31 32

q( zt |θt )q(θt |θt −1 )dv t −1 .

q( v t | v t −1 , zt ), and

OF

5

67

RO

4

tractable q( v | z) instead of q˜ ( v | z). According to (14), q( v t | v t −1 , zt ) can be calculated point by point. q˜ ( v t | v t −1 , zt ) is replaced by

TE

3

Variational constraint [34] can be used to ensure the realization of VB marginal function. An important constraint is the assumption that distribution is in the form of extended exponential family. However, in real-time environment and in Bayesian context, this constraint is too restrictive, because the distribution in marginal likelihood function is needed to keep invariant. Here, we relax the constraints of this hypothesis and propose a solution method. We use a fixed easily handled q ( v | z) instead of q˜ ( v | z). Then according to the equation (10), q˜ (x| z) ∝ exp{ E q( v |z) [ln q(x, v , z)]} can be obtained. Therefore, q( v | z) is known, the original algorithm loops can be avoided. One can find a lot of probability distribution approximations special form of this approach: certainty equivalence, q = δ( v − vˆ ) using some fixed v, it becomes q(x| v , z); quasiBayesian estimation, q = q( v ). If h is a linear form, the above two examples is equivalent to choose vˆ = E q( v |z) [ v ]. For nonlinear case, we combine the VB method with the sampling method for probability density approximation. Sequential Monte Carlo method is no longer introduced. We have partition θ = [x, v ], therefore,

EC

2

CO RR

1

DP

4

127

3.3. Convergence results

128 129

This section gives the convergence proof of Algorithm 1 and 2 because of the introduction of particles. We will present convergence result using bounded test function in mean square error

130 131 132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.5 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

3 4 5 6 7 8 9 10 11 12

1 Prediction: 2

χi ,k = f (χi ,k−1 ), xˆ k− = P k− =

17 18 19 20 21 22 23

i =0

− 

ωi · χi ,k − xˆ k

i =0 



C b ( R nx ), ϕ  = supx∈ R nx |ϕ (x)|. Assuming that E = P ( R nx ) is the set of probability measures over the space. The definition is given as: if (μ N )∞ N =1 is a sequence of probability measures in this topology, then we can get that μ N weakly converges to μ ∈ P ( R nx ) and write lim μ N = μ.

ωi χi ,k , − T

χi ,k − xˆ k

+ Q k−1 , i = 0, 1, · · · , 2n

N →∞

ϕ ∈ C b ( R nx ), lim (μN , ϕ ) = (μ, ϕ ). That is N →∞ also: ∀ϕi ∈ A, lim μ N = μ weakly ⇔ lim (μ N , ϕi ) = (μ, ϕi ). It is N →∞ N →∞ easy to prove that lim μ N = μ weakly ⇔ lim d(μ N , μ) = 0. Ob-

3 Update: (0)

4 The update of VB-UKF utilizes iterate filtering framework. First set xk (0)

Pk

= P k− , V k = V k− and υk = 1 + υk− . μk =

T˜ k =

15 16



Ck =

(0)

2n

i =0 2n

i =0





T

2n

i =0

Remark 1. If, for any

= xˆ k− ,

ωi · h(χi ,k ),

N →∞

ωi · h(χi ,k ) − μk h(χi ,k ) − μk , 



( j +1)

T

ωi · χi ,k − xˆ k−1 h(χi ,k ) − μk .

( j +1)

( j +1)

= T˜ k + (υk − n − 1)−1 · V k , K k ( j +1)

( j +1)

= xˆ k− + K k

( j +1) ( j +1)



( j +1)

T

(N )

.

i =0

xˆ k+ = xk , P k = P k

(N )

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57

n

Definition 1. Let (, F , P ) be a probability space and β( P ) be the Borel σ -algebra on R nx . x = (x, v ) = {xt , t ∈ N } is a Markov process and f (dxt |xt −1 ) denotes probability transition kernel function. The state processes are conditionally independent of measurement, z = { zt , t ∈ N \{0}}. The marginal distribution function g (dzt |xt ) and the densities f (dxt |xt −1 ) admit the Lebesgue measure. The joint and marginal filtering distributions are πt |t (dxt ) and π0:t |t (dx0:t ). Assuming  is a Markov transition kernel, υ is a mea. is: (υ , ϕ ) = ϕυ , sure, and ϕ is a function, then, the description . .  υ ( A ) = υ (dx)( A |x), and ϕ (x) = (dz|x)ϕ (z). It is clear that in this notation for any function, and the recurrence formula implies that: Prediction: (πt |t −1 , ϕ ) = (πt −1|t −1 , f ϕ ); Update: (πt |t , ϕ ) = (πt |t −1 , g )−1 (πt |t −1 , ϕ g ).

ϕ:R

nx

→ R,

∞ Definition 2. Let (at )t∞ =1 and (bt )t =1 be two sequences of continuous functions at , bt : E → E on a metric space ( E , d). In addition, on this space, let kt , k1: t be two other sequences of functions . . which are defined as kt = at ◦ bt , k1:t = kt ◦ kt −1 ◦ · · · ◦ k1 , where “◦” denotes the composition of functions. c N is the map that determines a random sample of size N of the measure. The stronger type need to uniform convergence converge to the identity function: c N converges uniformly to the identity function i if by definition for all ε > 0, there exists N (ε ) such that d(c N (e ), i (e )) < ε for all N ≥ N (ε ).

58 59

3.3.2. Application to optimal filtering

62 63 64 65 66

N →∞

Definition 3. The Euclidean space R nx is endowed with the topology of weak convergence, C b ( R nx ) is the set of all continuous bounded functions on R nx , A = {ϕi }i >0 ∈ C b ( R nx ) is a countable subset of continuous bounded functions that completely determines the convergence. With countable subset, the distance on P ( R nx ), which generates the weak topology is defined as:

74 75 76 77 78 79 80 81 82 84 85

90 91 92

95 96 97 98 99 100 101

In the context of filtering, when the new measurement is considered, this implies that a slight variation will not result in a big variation in the conditional distribution of the state. Mathematically, one of the ways to ensure that this happens is to assume that g ( zt |·) is a continuous bounded strictly positive function g ( zt |·) ∈ C b ( R nx ), g ( zt |xt ) > 0, ∀xt ∈ R nx . This positivity assumption is necessary to ensure that (ν , g ) is never 0, which would cause problems when it is used as a divisor. Indeed, if this satisfies, then, for all test functions,

lim

73

94

πt |t −1 = bt (πt −1|t −1 ).

Definition 5. Let at : P ( R nx ) → P ( R nx ) be a mapping such that for ∀ν ∈ P ( R nx ), at (ν ) is a probability measure defined as ∀ϕ ∈C b ( R nx ), then (at (ν ), ϕ ) = (ν , g )−1 (ν , ϕ g ). Then, πt |t = at (πt −1|t −1 ) = at ◦ bt (πt −1|t −1 ). It is possible to assume that at is continuous.

102 103 104 105 106 107 108 109 110 111

νN = ν implies

112 113

lim (ν N , ϕ g )

(ν , ϕ g ) N →∞ = = (at (ν ), ϕ ). lim (at (ν N ), ϕ ) = N →∞ lim (ν N , g ) (ν , g ) N →∞

114 115 116

Therefore, lim at (ν N ) = at (ν ) and at is continuous.

117

Let c N ,ω , N > 0, ω ∈  be the random perturbation. For all

119

N →∞

∀ν ∈ P ( R ), c nx

N ,ω

is equal to c

N ,ω

(ν ) =

1 N

N

j =1

δ{ V j (ω)} , where V j :

 → R d are i .i .d. random variables with common distribution ν . N ,ω

Lemma 1. If c is defined as above, then for almost all c lim e N = e ⇒ lim c N (e N ) = e. N →∞

60 61

DP

30

we have

72

93

R nx R nx

3.3.1. Notation and preliminaries

71

89

TE

29

70

88

EC

28

CO RR

27

sense in subsection 3.3.3. Subsection 3.3.4 proposes general convergence result using unbounded test function. Subsection 3.3.1 and 3.3.2 give preliminaries and some related results.

UN

26

69

87

ϕ ∈ C b ( R nx ) and   (bt (ν ), ϕ ) = ϕ (xt ) f (dxt |dxt −1 )ν (dxt −1 ) = (ν , f ϕ ), Hence, for

68

86

f (dxt |dxt −1 )ν dxt −1 .

R nx

. End For

24 25



bt (ν )(dxt ) = ν f (dxt ) =

67

83

Definition 4. We define: bt : P ( R nx ) → P ( R nx ) to be the mapping, for arbitrary ν ∈ P ( R nx ),

( j +1)

= Ck /T k

( zk − μk ), P k = P k− − K k Tk Kk   T 2n

( j +1) Vk = V k− + ωi · zk − h(χi(,kj) ) zk − h(χi(,kj) ) . And set V k = V k(N ) ,

xk

N →∞

viously, d depends on the choice of the countable subset set A. Hence, d generates the weak topology on P ( R nx ). However, the topology itself is independent of A.

5 For j = 1 : N, iterate the following (N denotes iterated times)

Tk

i

i =1

de f

2n

υk− = ρ · υk−1 − n − 1 + n + 1, V k− = B V k−1 B T , where ρ is a real number that 0 < ρ ≤ 1 and B is a matrix that 0 < | B | ≤ 1 with a reasonable choice √ for the matrix B = ρ I .

13 14

2n



|(μ,ϕi )−(ν ,ϕi )| , where  is the supremum norm on 2 i ϕ 

OF

2

d(μ, ν ) =

Algorithm 5: VB-UKF

RO

1

5

N ,ω

, it satisfies

N →∞

Proof. Please see Appendix A for details.

2

118 120 121 122 123 124 125 126 127 128 129

Theorem 1. Assuming that the likelihood function g is bounded, continuous, and strictly positive, and that the transition kernel f is Feller, then lim πtN|t = πt |t . N →∞

130 131 132

JID:AESCTE

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.6 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

6

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

N →∞

then

lim

μ N = μ,

N t |t

π = lim

N →∞

k1N:t (

E [{(π˜ tN|t −1 , ϕ ) − (πt |t −1 , ϕ )} ] ≤ c˜ t |t

N

μ ) = k1:t (μ) = πt |t .

4

E [{(¯c N (ν ), ϕ ) − (ν , ϕ )} ] ≤

C N2

for all arbitrary bounded functions. 2

If this is not possible, one can ask for E [{(¯c N (ν ), ϕ ) − (ν , ϕ )} ] ≤ C ; however, in this case, one has to take a subsequence of πtN|t , N so that 2

E [{(¯c Nk (ν ), ϕ ) − (ν , ϕ )} ] ≤ C

 1 Nk

k

19

˜ tN|t ,

E [{(π

2

ϕ ) − (πt |t , ϕ )} ] ≤ c˜t |t

27 28 29

N →∞

32 33 34 35

N t |t ,

but also implies that the rate Remark 2. This result holds for π of convergence trends toward zero. This calculated quantity is proportional to 1/ N, and is independent of the state dimension.

39 40 41 42 43 44 45

Now, we give a short proof for the convergence of the corresponding algorithm described in Section 3. We make the following assumptions. Simple Convergence

Assumption 1. The density f and measurement density g ( zt |·) are bounded functions, i.e.,  f  < ∞ and  g  < ∞.

46 47 48 49

The following lemmas essentially state that at each step of the particle filtering algorithm, the approximation admits a mean square error of order 1/ N.

50 51

Lemma 2. Assume that for any ϕ ∈ B ( R nx ),

52

2

E [{(πtN|t , ϕ ) − (πt |t , ϕ )} ] ≤ ct |t

54 55

N t −1|t −1 ,

E [{(π

2

ϕ ) − (πt −1|t −1 , ϕ )} ] ≤ ct −1|t −1

Then, for any ϕ ∈ B ( R nx ),

56 57 58 59

N t |t −1 ,

E [{(π

2

62 63 64 65

ϕ ) − (πt −1|t −1 , ϕ )} ] ≤ ct |t −1

after Step 1 of the algorithm.

N

66

76

.

77

N

N t |t ,

E [{(π

2

.

ϕ ) − (πt |t , ϕ )} ] ≤ ct |t

81 82 83 84

2

85

ϕ 2 N

,

87 88 89 90

.

91 92

Moreover, this method overcomes the problem of dimensionality, as the rate of convergence is independent of the state dimension. However, to ensure a given precision on the mean square error, the number of particles also depends on ct |t , which can depend on nx . Note that the result has only been applied to bounded functions. We will discuss the more general unbounded case in the following.

93 94 95 96 97 98 99 100

Remark 3. Under conditions, for any ϕ ∈ B ( R nx ), (πtN|t , ϕ ) converges toward (πt |t , ϕ ) in the mean square error sense, since Theorem 2 ensures that. However, we have not paid much attention to the growth of the sequence. This implies that one needs an increasingly larger enough number of particles as time increases, in order to ensure a given precision of the estimate. In other words, we need to get assumption that any error is forgotten (exponentially) with time, then ct |t does not increase over time.

101 102 103 104 105 106 107 108 109 110

Uniform Convergence

111

Assumption 2. For any xt −1 ∈ R , there exist a appositive measure λ and ε such that ε λ(dxt ) ≤ g (zt |xt ) f (dxt |xt −1 ) ≤ ε −1 λ(dxt ). nx

This assumption implies that the kernel function is very weakly dependent on the past record.

Assumption 3. One has

79

86

sup g ( zt |xt )

γt =

x∈ R n x

inf (μ f , g (zt |·)) μ∈ P ( R n x )

112 113 114 115 116 117 118

< γ < ∞.

Then, under Assumption 2 and 3, the uniform convergence holds.

119 120 121 122 123

Theorem 3. There exists a constant c (ε ), which is independent of N, for all t ≥ 0, such that for any ϕ ∈ B ( R nx ),

E [{(πtN|t , ϕ ) − (πt |t , ϕ )} ] ≤ c (ε )

ϕ 2 N

.

124 125 126 127 128 129

Lemma 3. Assume that for any ϕ ∈ B ( R nx ),

E [{(πtN|t −1 , ϕ ) − (πt |t −1 , ϕ )} ] ≤ ct |t −1

N

78 80

,

Theorem 2. Under Assumption 1, for all t ≥ 0, there exists ct |t , which is independent of N, such that for ∀ϕ ∈ B ( R nx ),

2

2

Proof. Please see Appendix B for details.

2

ϕ 

ϕ 2

60 61

ϕ 2

2

UN

53

N

75

Then, there exists a constant, such that for any ϕ ∈ B ( R ),

CO RR

38

ϕ 2

74

EC

36 37

72

nx

DP

26

2

TE

vergence.

25

70

73

Proof. Please see Appendix D for details.

31

24

69

Lemma 4. Let us assume that for any ϕ ∈ B ( R nx ),

< ∞. 2

30

23

68

.

after Step 2 of the algorithm.

3.3.3. Convergence result using bounded test function in mean square error sense Now we consider the convergence in mean square error sense. Conditions to ensure the weak convergence of the empirical distributions have been provided. Now, let us assume that we still have that E = P ( R nx ), then we use the following method: If ∞ (μω N ) N =1 is a random sequence of probability measures, and if ∀ϕ belongs to the set of Borel bounded measurable functions nx on R nx , then we get that μω N converges to μ ∈ P ( R ), such 2 that lim E [((μ N , ϕ ) − (μ, ϕ )) ] = 0 instead of using weak con-

22

N

Proof. Please see Appendix C for details.

20 21

ϕ 2

71

Let us consider the resampling method. The algorithm has the form ktN = c¯ N ◦ at ◦ c N ◦ bt , where c¯ N is the perturbation introduced by the resampling step. In this case, we need to apply the same condition c¯ N to c N . To ensure that the resampling procedure satisfies the required condition is to verify that it satisfies

π

67

2

N →∞

Nk t |t

Then, for any ϕ ∈ B ( R nx ),

OF

2

Proof. This result is based on Lemma 1 and since lim

RO

1

ϕ 2 N

.

Remark 4. This result approximately means that when the “true” optimal filter is quickly identified, then the uniform convergence of the proposed algorithm is ensured. On the other hand, it is

130 131 132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.7 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

3

assumed that when the optimal algorithm has a “long memory”, then there will be an accumulation of errors over time that prevents uniform convergence.

more general multiple target tracking problems can be partitioned into sub-problems, in which single targets are tracked separately at a time [40]. The state of the target at time step k consists of the position in two dimensional cartesian coordinates xk and yk and the velocity toward those coordinate axes, x˙ k and y˙ k . Thus, the state vector can be expressed as X k = (xk , yk , x˙ k , y˙ k ) T . The dynamics of the target is modeled as a linear, discretized Wiener velocity model [39]

4 5 6 7 8 9 10 11 12 13 14 15 16 17

3.3.4. General convergence result for unbounded test function Let υ be a measure and φ be a measurable function. Then, we   denote (υ , φ)  φ dυ , and f φ(x)  f (dz|x)φ( z). Let πt |t −1 denote the measure corresponding to the probability density p (xt | z1: t −1 ) and πt |t the measure corresponding to the density p (xt | z1: t ), then, using defined notations, the Bayesian filtering equations can be organized as

(πt |t −1 , φ g ) (πt |t −1 , φ) = (πt −1|t −1 , f φ), (πt |t , φ) = . (πt |t −1 , g )



1 0 t ⎜0 1 0 Xk = ⎜ ⎝0 0 1 0 0 0

(15)

20 21

First of all, to prove the convergence, we impose the following assumptions.

⎛1



Assumption 4. A separable VB approximation of states and noise, q(xt , v t | z1:n ) = q(xt | z1:n )q( v t | z1:n ), is equivalent to the joint filtering pdf of state and noise, p (xt , v t | z1:n ).

E

w k−1 w kT−1

30 31 32 33 34 35 36 37 38

supxs |φ (·)| g ( z s |xs ) < C ( y 1:s ) . Assumption 8. For any potentially unbounded importance weights [36] defined as

ωt (xt , xt −1 ) =

g ( zt |xt ) f (xt |xt −1 ) q I (xt |xt −1 , zt )

41 42

(16)

.

 Assumption 9. The rth order moment E (ωt (xt , xt −1 ))r |xt −1 is finite, where 0 < r ≤ p and the expectation is over q I (·).

43 44

We now give the following convergence theorem.

45 47 48 49 50 51 52 53 54

Theorem 4. Consider the variational Bayesian filter and suppose that p Assumptions 4–9 are satisfied. Then for any φ ∈ L t ( g ) and p ≥ 2, 0 < r ≤ p, there exists a constant ct |t , independent of sufficiently

 

p  p  φ large N such that E πtN|t , φ − πt |t , φ ≤ C t |t N p−tp,/pr , which implies that the algorithm will not run into an infinite loop; where p L t ( g ) be the class of functions satisfying Assumption 8, and φt , p 



max 1,



πs|s , |φ|

p

1 p



, s = 0, 1, ..., t [37,38].

55 56

Proof. Please see Appendix E for details.

57 58

4. Tracking simulation examples

59 60

UN

46

2

4.1. Bearings Only Tracking

61 62 63 64 65 66

= arctan 

Assumption 7. The test function φ (·) satisfies

39 40

θki

where six , siy

p

0

0

t

1 t 2 2

0

yk − siy xk − six



First, we review a classical filtering application [39], in which we track a moving object with sensors, which measure only the bearings of the object with respect to positions of the sensors. There is a one moving target in the scene and two angular sensors for tracking it. Solving this problem is important, because often

69 70 71 72 73 74 75

78 79 80

0



82 84

1 t 2 ⎟ ⎟ 2 q,

0

81 83 85

⎟ ⎠

86 87

t

88



89 90 91 92

+

v ki ,

(18)

93 94 95

is the position of sensor i, and our method don’t

96

know the real v ki ∼ N (0, σ 2 ), with σ = 0.05 radians. For comparison, the other approaches are set to know this real situation. These methods are used to demonstrate the performance:

97

TE

29

1 t 3 3

1 t 2 2

68

77

(17)

• EKF 1 and 2: first and second order extended Kalman filter [41]. UKF: unscented Kalman filter [42–44]. GHKF: Gauss–Hermite Kalman filter [45]. CKF: Cubature Kalman filter [46]. EM: Expectation Maximization Algorithm [47]. MCMC: Markov Chain Monte Carlo sampling method [48]. VB-PF: Algorithm 1.

EC

28



Assumption 6. The density f and measurement density g are bounded, that is  f  ≤ c f and  g  ≤ c g , where c f , c g are constants and · denotes the supremum norm.

CO RR

27

0

67

76

where q is the spectral density of the noise, which is set to q = 0.1 in the simulations. The measurement model for sensor i is defined as

γs > 0,

25 26

t 3

⎜ 0 ⎜ =⎜1 2 ⎝ 2 t

DP

24

Assumption 5. For any given y 1:s we have (πs|s−1 , g s ) > where s = 1, . . . , t.



3

0

22 23



where w k−1 is Gaussian process noise with zero mean and covariance

18 19

⎞⎛

xk−1 0 ⎜ y k −1 ⎟ t ⎟ ⎟⎜ ⎟ + w k −1 , 0 ⎠ ⎝ x˙ k−1 ⎠ 1 y˙ k−1

OF

2

RO

1

7

• • • • • •

98 99 100 101 102 103 104 105 106 107 108 109

In Fig. 1, we have plotted a one realization of measurements in obtained from both sensors. The sensors are placed to  1 radians  s x , s1y = (−1, −2) and s2x , s2y = (1, 1). We do not list the program code for the measurement function and it’s derivatives here as they are straightforward to implement. The target starts with state X 0 = (0, 0, 1, 0) T , and in the estimathe prior distribution tion, we set ⎛ ⎞ for the state to X 0 ∼ N (0, P 0 ), 0.1 0 0 0 ⎜ 0 0.1 0 0 ⎟ ⎟ which basically means that we where P 0 = ⎜ ⎝ 0 0 10 0 ⎠ 0 0 0 10 are fairly certain about the target’s origin, but very uncertain about the velocity. In the simulations, we also give the target an slightly randomized acceleration, so that it achieves a curved trajectory, which is approximately the same in different simulations. The trajectory and estimates of it can be seen in Figs. 2, 3, 4 and 5. As can be seen from the figures, VB based method and MCMC approach give almost identical results, while the estimates of EKF, UKF, CKF, and GHPF are clearly worse. Especially in the beginning of the trajectory, these filter has great difficulties in getting on the right track, which is due to the relatively big uncertainty in the starting velocity. After that, the estimates are fairly similar. See Fig. 6.

110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE

8

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.8 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

OF

1

13 14 15 16 17

RO

18 19 20 21 22 23

Fig. 4. Filtering results of UKF and related methods. Fig. 1. Measurements from sensors (in radians) in bearings only tracking problem. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

DP

24 25 26 27 28 29 30

TE

31 32 33 34 35

EC

36 37 38 39 40

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108

Fig. 5. Filtering results of CKF and related methods.

109 110

Fig. 2. Filtering results of first order EKF and related methods.

111 112 113 114 115 116 117 118 119

UN

42

CO RR

41

79

120 121 122 123 124 125

60

126

61

127

62

128

63

129

64

130 131

65 66

Fig. 3. Filtering results of second order EKF and related methods.

Fig. 6. Filtering results of GHKF (degree 3) and related methods.

132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.9 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

1 2 3 4

9

67

Table 1 RMSEs of estimating the position in Bearings Only Tracking problem over 1000 Monte Carlo runs.

68

Method

EKF1

VB-EKF1

VB-EM

EKF2

VB-EKF2

VB-PF

UKF

VB-UKF

MCMC

CKF

VB-CKF

GHKF

VB-GHKF

RMSE

0.114

0.054

0.052

0.202

0.074

0.043

0.113

0.055

0.052

0.107

0.053

0.108

0.053

70 71

5 6

G (k) = −

7 9 10 11 13 14 15 17 18 19



20

Q (k) = ⎝

21 22 23 24

32 33 34 35 37

4.2. Reentry Vehicle Tracking

38 40 41 42 43 44 45 46 47 48 49 50 51 52

Next, we review a challenging filtering problem, which was used in [49,50] to demonstrate the performance of UKF. This example concerns a reentry tracking problem, where radar is used for tracking a space vehicle, which enters the atmosphere at high altitude and very high speed. Fig. 7 shows a sample trajectory of the vehicle with respect to earth and radar. The dynamics of the vehicle are affected with three kinds of forces: aerodynamic drag, which is a function of vehicle speed and has highly nonlinear variations in altitude. The second type of force is gravity, which causes the vehicle to accelerate toward the center of the earth. The third type of forces are random buffeting terms. The state space in this model consists of vehicles position (x1 and x2 ), its velocity (x3 and x4 ) and a parameter of its aerodynamic properties (x5 ). The dynamics in discrete-time case are defined as [50]

CO RR

39

55 56 57 58 59 60 61 62 63 64 65 66

UN

53 54



x1 (k + 1) = x1 (k) + tx3 (k) x2 (k + 1) = x2 (k) + tx4 (k)

(19) (20)

x3 (k + 1) = x3 (k) + t ( D (k)x3 (k) + G (k)x1 (k)) + w 1 (k)

(21)

x4 (k + 1) = x4 (k) + t ( D (k)x4 (k) + G (k)x2 (k)) + w 2 (k)

(22)

x5 (k + 1) = x5 (k) + w 3 (k)

(23)

where w (k) is the process noise vector, D (k) the drag-related force and G (k) the gravity-related force. The force terms are given by

 D (k) = β(k) exp

[R 0 − R (k)] H0

 V (k),





10−6 65000.4 ⎜ ⎜ 349.14 ⎟ ⎜ 0 ⎜ ⎟ m0 = ⎜ −1.8093 ⎟ , P0 = ⎜ ⎜ 0 ⎝ −6.7967 ⎠ ⎝ 0 0 0

0 10−6 0 0 0

0 0 10−6 0 0

0 0 0 10−6 0



75 76 77 78 79 80 81 82 83 84 85 87 88 89



0 0⎟ ⎟ 0⎟ ⎟. 0⎠ 1





0 10−6 0 0 0

0 0 10−6 0 0

0 0 0 10−6 0



0 0⎟ ⎟ 0⎟ ⎟, 0⎠ 0

that is, vehicle’s aerodynamic properties were not known precisely beforehand.   The radar, which is located at s x , s y = ( R 0 , 0), is used to measure the range rk and bearing θk in relation to the vehicle on time step k. Thus, the measurement model can be written as



(x1 (k) − sx )2 + x2 (k) − s y   x2 (k) − s y + q2 (k) θk = tan−1 x1 (k) − sx rk =

74

90 91 92 93 94 95 96 97 98 99 100

10−6 65000.4 ⎜ ⎜ 349.14 ⎟ ⎜ 0 ⎜ ⎟ m0 = ⎜ −1.8093 ⎟ , P0 = ⎜ ⎜ 0 ⎝ −6.7967 ⎠ ⎝ 0 0.6932 0



73

86

0 0 ⎠. 10−6

In the simulations, the initial states were drawn from multivariate Gaussian with mean and covariance

EC

36



0 2.4064 × 10−5 0

TE

31

2.4064 × 10 0 0

DP

Fig. 7. Sample vehicle trajectory, earth and position of radar in Reentry Vehicle Tracking problem.

28 30

−5

The lower right element in the covariance was initially set to zero, but later changed to 10−6 to increase filter stability. The prior distribution for the state was set to multivariate Gaussian, with mean and covariance

25

In Table 1, we have listed the root mean square errors (mean of position errors) of all tested methods over 1000 Monte Carlo runs. The numbers prove the previous observations, that the VB based methods give almost identical performances. Had the prior distribution for the starting velocity been more accurate the performance difference between VB based method and other methods would have been smaller, and still noticeable.

RO

that this might be too simple approach in real world applications due to high nonlinearities in the dynamics, so more advanced integration scheme might be more preferable. The discrete process noise covariance in the simulations was set to

16

29

72

, β(k) = β0 exp x5 (k),

OF

12

27

Gm0

R 3 (k)  where R (k) = x21 (k) + x22 (k) is the vehicle’s distance from the  center of the earth and V (k) = x23 (k) + x24 (k) is the speed of the vehicle. The constants in previous definition were set to β0 = −0.59783, H 0 = 13.406, Gm0 = 3.9860 × 105 , R 0 = 6374. The step size between time steps was set to t = 0.1 s. Note

8

26

69

2

+ q1 (k)

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115

(24)

116 117

(25)

where the measurement noise processes q1 (k) and q2 (k) are Gaussian with zero means and standard deviations σr = 10−3 km and σθ = 0.17 rad, respectively. The proposed VB based methods don’t know distribution and parameter of measurement noise. For brevity, we allow these contrast algorithms know this prior information. Our VB based algorithms compares under unequal inferior condition. In Table 2, we have listed the RMS errors and Standarddeviation of position estimates with tested methods, which were

118 119 120 121 122 123 124 125 126 127 128 129

• EKF: extended Kalman filter. • UKF: unscented Kalman filter. • GHKF: Gauss–Hermite Kalman filter.

130 131 132

JID:AESCTE

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.10 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

10

1 2 3 4

67

Table 2 Average RMSEs of estimating the position in Reentry Vehicle Tracking problem over 100 Monte Carlo runs.

68

Method

EKF

VB-EKF

UKF

VB-UKF

MCMC

VB-PF

CKF

VB-CKF

GHKF

VB-GHKF

RMSE

0.0084

0.0054

0.0084

0.0054

0.0044

0.0043

0.0083

0.0044

0.0084

0.0045

69 70 71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

OF

5

13 14 15 16 17

RO

18 19 20 21 22 23 25 26 27 28 29 30 32 33 34 36 37 38 39

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Extended Forward–Backward smoother was also tested, but it produced in many cases divergent estimates, so it was left out from comparison. Second order EKF was also left out, because evaluating the Hessians would have taken too much work while considering the fact, that the estimates might have gotten even worse. From the error estimates, we can see that VB based methods give almost identical performances, although in the article [50], UKF was clearly superior over EKF. The reason for this might be the fact that they used numerical approximations for calculating the Jacobian in EKF rather than calculating the derivatives in closed form, as was done in this demonstration. In Fig. 8, we have plotted the root mean square errors and variances in estimating x1 , x3 and x5 with EKF and VB-EKF over 100 Monte Carlo runs. Fig. 9 shows the performance of CKF and the proposed algorithm. It shows that using VB based method always gives better estimates for positions and velocities, but for x5 , the errors are practically the same after the 45 seconds. This also shows that both methods are pessimistic in estimating x5 , because variances were bigger than the true errors. Figures for x2 and x4 are not shown, because they are very similar to the ones of x1 and x3 . Also by using CKF and VB-CKF the resulting figures were in practically identical, and therefore left out. The proposed algo-

84

89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106

rithm and MCMC method have almost the same performance with VB-CKF. In these VB based algorithms, the performance is similar and there is no obvious improvement. This problem may be caused by the model and parameter setting. See Fig. 10.

CO RR

42

• CKF: Cubature Kalman filter. • EM: Expectation Maximization Algorithm. • MCMC: Markov Chain Monte Carlo sampling method.

UN

41

83

88

Fig. 8. RMSEs and Standard-deviation in estimating of x1 , x3 and x5 using EKF and VB-EKF over 100 Monte Carlo runs.

40

82

87

EC

35

81

86

TE

31

80

85

DP

24

79

107 108 109 110 111

4.3. Tracking in Multiple Model Systems: Bearings Only Tracking of a Maneuvering Target

112 113 114

In many practical scenarios, it is reasonable to assume that the system’s model can change through time somehow. For example, a fighter airplane, which in normal situation flies with stable flight dynamics, might commence rapid maneuvers when approached by a hostile missile, or a radar can have a different SNR in some regions of space than in others, and so on. Such varying system characteristics are hard to describe with only a one certain model, so in estimation one should somehow take into account the possibility that the system’s model might change. A common way of modeling a turning object is to use the coordinated turn model [39]. The idea is to augment the state vector with a turning rate parameter ω , which is to be estimated along with the other system parameters, which in this example are the position and the velocity of the target. Thus, the joint system vector can be expressed as



X k = xk

yk

x˙ k

y˙ k

ωk

T

115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131

.

132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.11 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

11

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

OF

1

13 14 15 16 17

RO

18 19 20 21 22 23

DP

24 25 26 27 28 29 30

TE

31 32 33 34

EC

36 37 38 39 40

CO RR

41 42 43 44 45 46 47 48 49 50 51 52

UN

53 54 55

Fig. 10. Filtering result with proposed algorithm.

56 57 58 59 60 61 62 63 64 65 66

The dynamic model for the coordinated turns is



1

0

⎜ ⎜0 1 ⎜ X k +1 = ⎜ 0 0 ⎜ ⎝0 0 0

0

sin(ωk t )

ωk 1−cos(ωk t ) ωk

cos(ωk t ) sin(ωk t ) 0

cos(ωk t )−1

ωk sin(ωk t ) ωk

0

⎞T

⎟ 0⎟ ⎟ − sin(ωk t ) 0 ⎟ ⎟ cos(ωk t ) 0 ⎠ 0

1

⎛ ⎞

0 ⎜0⎟ ⎜ ⎟ Xk + ⎜ 0 ⎟ v k , ⎝0⎠ 1 (26)

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Fig. 9. RMSEs and Standard-deviation in estimating of x1 , x3 and x5 using CKF and Proposed method over 100 Monte Carlo runs.

35

79



 2

where v k ∼ N 0, σω is univariate white Gaussian process noise for the turn rate parameter. This model is, despite the matrix form, non-linear. The measurement model is with a nonlinear bearings only measurement model as Example ??. The estimation problem is harder as we must use a nonlinear version of IMM filter update step in addition to the prediction step. We simulate the system 200 time steps with step size t = 0.1. The movement of the object is produced as follows:

101 102 103 104 105 106 107 108 109 110 111

• Object starts from origo with velocity (˙x, y˙ ) = (1, 0). • At 4 s, object starts to turn left with rate ω = 1. • At 9 s, object stops turning and moves straight for 2 seconds with a constant total velocity of one. • At 11 s, objects starts to turn right with rate ω = −1. • At 16 s, object stops turning and moves straight for 4 seconds with the same velocity.

112 113 114 115 116 117 118 119

are  1 The  sensors producing  the angle measurements   located in s x , s1y = (−0.5, −3.5), s2x , s2y = (−0.5, 3.5), s3x , s3y = (7, −3.5)   and s4x , s4y = (7, 3.5).

120

The trajectory and the sensor positions are shown in Fig. 11. The resulting trajectory is plotted in Fig. 12. The standard deviation of the measurements is set to σ = 0.1 radians, which is relatively high. In the estimation, we use the following models:

123

121 122 124 125 126 127 128

1. Standard Wiener process velocity model with process noise variance q1 = 0.05, whose purpose is to model the relatively slow turns. This model is linear, so we can use the standard Kalman filter for estimation.

129 130 131 132

JID:AESCTE

12

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.12 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

OF

1

13 14 15 16 17 19 20 21 22

Fig. 11. Object’s trajectory and positions of the sensors in Bearings Only Tracking of a Maneuvering Target demonstration.

25 26 27 28 29 30 32 33 34 35 37 38 39 40

48 49 50 51 52 53 54 55 56 57 58 59

Fig. 12. Filtered estimates of object’s position using all the tested methods in Bearings Only Tracking of a Maneuvering Target demonstration.

2. A combination of Wiener process velocity model and a coordinated turn model described above. The variance of the process noise for the velocity model is set to q2 = 0.01 and for the turning rate parameter in the turning model to qω = 0.15. The estimation is now done with both the EKF and UKF based IMM filters as the turning model is non-linear. However, as the measurement model is linear, we can still use the standard IMM filter update step instead of a nonlinear one. In both cases the model transition probability matrix is set to 0.9 0.1 = and the prior model probabilities are μ0 = 0.1 0.9 [ 0.9 0.1 ] model (in case of ω = 0).

60 61 62

The position of the object is estimated with the following methods:

63 64

• EKF and UKF [42]: using the same Wiener process velocity

65

model. • IMM-EKF [39] and VB-EKF: using the same model as the EKF.

66

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108

Fig. 14. Estimates for the turn rate parameter Maneuvering Target demonstration.

UN

47

CO RR

41

46

84

EC

36

45

83

TE

31

44

82

DP

24

43

81

Fig. 13. Estimates for model 2’s probability in Bearings Only Tracking of a Maneuvering Target demonstration.

23

42

80

RO

18

79

ωk in Bearings Only Tracking of a

109 110 111

• IMM-UKF and VB-UKF [43]: UKF based IMM filter using the same combination of Wiener process velocity model. • MCMC method [48] and Algorithm 1: using 200 particles.

A sample of trajectory estimates are plotted in Fig. 12. The IMM-estimates are clearly more inaccurate than VB based methods. In Fig. 13, we have plotted the estimates of model 2’s probability for IMM-EKF, IMM-UKF, VB-EKF and VB-UKF. The turn rate VB estimates, which are plotted in Fig. 13, are more accurate than the IMM based methods. See Fig. 14. In Table 3, we have listed the average mean square errors of position estimates over 100 Monte Carlo runs. It can be observed that the estimates of EKF and UKF are identical in practice, which is to be expected from Bearings Only Tracking demonstration. The difference between IMM-UKF and IMM-EKF has grown in the favor of IMM-UKF, whereas the accuracy of IMM-EKF is now more close to the ones of EKF and UKF. On the other hand, the VB based estimates of VB-UKF and VB-EKF are still very close to one another, and are considerably better than the estimates of EKF and UKF. It should be noted that the performance of each tested method could be tuned by optimizing their parameters more carefully, so

112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.13 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

1 2 3 4

13

67

Table 3 Average MSEs of estimating the position in Bearings Only Tracking of a Maneuvering Target example over 100 Monte Carlo runs.

68

Method

EKF

UKF

IMM-EKF

VB-EKF

IMM-UKF

VB-UKF

MCMC

Proposed method

MSE

0.06064

0.0145

0.0609

0.0144

0.0544

0.0094

0.0041

0.0044

70 71

5

72

6 8 9 10 11 12

the performance differences could change radically. Still, it is clear that VB filter does actually work also with nonlinear dynamic and measurement models, and should be considered as a standard estimation method for multiple model systems. Also, one should prefer VB based methods to IMM based methods as the performance is clearly better, and the implementation is easier.

where I denotes a matrix of ones. Under the definition de f

λt = [ln f 1 ( zt ), ln f 2 ( zt ), ..., ln f C ( zt )] .

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

f (θt |θt −1 ) = Di (kθt −1 )

(27)

f (xt |xt −1 , θt ) = Mu (θt xt −1 ) f ( zt |xt ) = f 1 ( z1 )

x1,t

× · · · f c ( zt )

(28)

xc ,t

(29)

.

The model is based on system (1)–(2) without noises. The scalar parameter k is a parameter that controls the dependence on θt and θt −1 . Di denotes the Dirichlet distribution and Mu denotes the multinomial distribution. While f i ( zt ) denotes the ith known class-conditional PDF, i = 1, 2, ..., c. The unknown state (θt , xt ) cannot be estimated in the exact solution form of which is computationally intractable. Instead, we will use the proposed method to settle this problem. First of all, we will ensure conditional independence of the f (xt , θt | D t ) ≈ ˜f (xt | D t ) ˜f (θt | D t ). The logarithm of the evolution model (27) and (28) is

UN

44

82

are as follows:

84

RO

21

81

˜f (xt −1 | zt ) = Mu (βt ), ˜f (θt −1 | zt ) ∝ exp((θt −1 )),

βt = αt −1 ◦ exp(ln θt xt ),

88

c c  

89

ln (kt i , j ,t −1 ).

i =1 j =1

The necessary conditions are θˆt ∝ αt , θˆt −1 ∝ βt , ln(tˆi , j ,t −1 ) = φ(εi , j,t ) − φ( I c,1 εt I c,1 ). The elements of both moments parameters are denoted as probabilities, and so their sum in each case is unity. φ denotes the digamma function and εi , j,t is the element of εt . ln(tˆi , j ,t −1 ) is the element of matrix ln θˆt . Since (θt −1 ) is a non-

linear function, no solvable form for ˜f (θt −1 | zt ) can be achieved, and so θt −1 needs numerical integration. To avoid this, we used fixed Dirichlet distribution: ¯f (θt −1 | zt ) = ˜f (θt −1 | zt ) = Di θt −1 (εt −1 ). Furthermore, evaluation of digamma function is computationally expensive. Significant alternative method are implemented by using the constraint: f (θt | zt ) = δ(θt − θˆt ). In the simulation study, we set c = 2 and conditionally betadistributed scalar observations is f ( zt |xt ) = β(x1,t + 1, x2,t + 1). Then, zt is probability sequence of soft-bits, depend on a Markov chain, and with zt ∈ {1, 0}. An exact inference for the model via Bayesian filtering is not feasible, since the required summation over the states leads to an exponential growth. We settle this problem via the VB based method. In order to improve the performance of this VB based approximation, a simple particle filtering approach is now considered. We chose the following importance function: π (θt , xt | D t ) = Di T t (kθˆt −1 + C ) Mu xt (θˆt −1 xˆ t −1 ). The performance was assessed via the mean square error (MSE) and the total square error of the state between the simulated and estimated

ln f (xt , θt |xt −1 , θt −1 )





i =1 j =1



ln (ktt , j ,t −1 ),

1 t

2 t  2 

  [xt − xˆ t  + θt − θˆt  ].

t =1

These were evaluated by using each of the following distributional approximations:

90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120

• • • • • •

Proposed method: Algorithm 1, EM: Expectation Maximization Algorithm, MPF: marginalized particle filter in reference [35], Bayesian UKF in [44], Constrained proposed method: Algorithm 2, Approach in [32].

121 122 123 124 125 126 127

 ∝ tr ln(θt ) (kθt −1 − I c,c ) + x t ln(θt )xt −1 c  c 

state: M S E =

86 87

ˆ ˆ

(θt −1 ) = tr (k ln θˆt (θt −1 − I c,c )) −

83 85

αt = exp(λt + ln θˆt xˆ t −1 ), εt = kθˆt −1 + xˆ t xˆ t −1 ,

DP

20

80

TE

19

78 79

EC

18

75 77

(32)

˜f (xt | zt ) = Mu (αt ), ˜f (θt | zt ) = Di (εt ),

HMMs have been widely used in language recognition, natural language processing and biological information and other tracking fields. Variational inference techniques for HMM have been studied for a long time [19,32]. Different from reference methods, which the time-invariant transition matrix and the hidden parameter are inferred by iterative forward and backward passes. In this paper, only the backward process is replaced by the time update step. We expand the traditional model to allow a time-variant transition matrix modeled as a Dirichlet stochastic process. This induces an analytically intractable marginal function. Consider a HMM model with the following conditions: i) a firstorder Markov chain on the unobserved discrete variable xt , with c possible states; ii) a set of c known class-conditional observation models. For analytical convenience, θt is modeled as a random walk process. iii) for convenience, we do not consider noise in system (1)–(2). C c (i ) = [δ(i − 1), δ(i − 2), ..., δ(i − c )] denote a c-dimensional elementary basis vector and xt ∈ {C c (1), C c (2), ..., C c (c )} denote each state of xt . The probability of transition Pr(xt = C c (i )|xt −1 = C c ( j )) = t i , j ,t means: transition from the jth to the ith state, 1 ≤ i , j ≤ c, where 0 < t i , j ,t < 1, i , j ∈ [1, 2, ..., c ]. These are aggregated into the unity transition probability matrix θt , such that the column sums are unity. Bayesian filtering model is consistent with the assumption above. The following dynamic system model with unknown multiplicative noise θt is conditioned on multivariate hidden variables xt . It is based on the model examined in reference [19,32]:

74 76

Marginal functions are derived as

4.4. Speech signal tracking in Multiplicative Hidden Markov Model

CO RR

17

(31)

f ( zt |xt ) = exp(λ t xt ).

15 16

73

The observation equation (29) now is:

13 14



OF

7

69

The threshold state was derived from the soft-bits through the thresholding method, and xt was inferred by thresholding:

(30)



xˆ t = round( yt ) =

1 if yt > 0.5 0 if yt < 0.5

128 129 130 131 132

JID:AESCTE

14

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.14 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

OF

1

13 14 15 16 17

RO

18 19 20 21

Fig. 15. Monte Carlo study of the MSE performance for the related method for state and uncertain parameter estimation in the HMM model. The performance of the execution time is plotted. (Left figure: rapid variation of the uncertain parameter θt (k = 10); Right figure: slow variation of the uncertain parameter θt (k = 100).)

22 23

DP

24 25 26 27 28 29 30

TE

31 32 33 34 35

EC

36 37 38 39 40

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

Fig. 16. Comparative MSE performance of the reference method for state and uncertain parameter estimation in the model for each setting in two different Monte Carlo simulations (i.e. number of particles or iterations). (Left figure: from iteration 2 to iteration 50; Right figure: from iteration 50 to iteration 100.)

The Monte Carlo study was executed with the results shown in figures. Fig. 15 displays the results of the comparative variation of the uncertain parameter with state and a 100 run Monte Carlo study from Two-step iterations to Forty-step iterations. The execution time of each inference algorithm in MATLAB is also plotted in Fig. 15. The MPF, EM, Bayesian UKF, VB based approaches and proposed approximations can be tuned, and the computational cost of all the reference approaches increases linearly with adjustable parameters. The other algorithms cannot be tuned, and their execution times conditionally are relatively long. The performance of the proposed algorithms does not improve when the number of iterations is greater than three; however, the performance of the MPF and Bayesian UKF algorithm improves unsteadily with an increasing number of particles or iterations. This means that the proposed approximations are superior at the computational cost and stability. Therefore, if the execution time is constrained to be less than 5 s, then the proposed algorithm outperforms the reference algorithm in these simulations. This emphasizes once again

UN

42

CO RR

41

the fact that the proposed algorithm for systems with uncertain parameter is more attractive due to its low computational cost, strong stability, and high accuracy. All of the inference methods perform comparably to the proposed algorithms about the number of particles or iterations in two different Monte Carlo experiments, and the results are shown in Fig. 16. As shown, with an increasing number of iterations, the performance of the proposed VB-based approach improves significantly, while that of the other methods improves relatively slowly. The clear performance improvement of the proposed method in all cases is due to the propagation of second-order moments. The proposed method is faster than these related algorithms under the same number of iterations. Hence, the proposed VB-based filter is recommended when there is a relatively small number of iterations that can be computed. Meanwhile, the reference filter has a large fluctuation, and the proposed algorithm has a high degree of stability. Therefore, the proposed algorithm is superior to these reference algorithms.

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.15 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

ω ∈ , which implies lim (c N ,ω (τ N ), ϕi ) − (τ N , ϕi ) = 0,

5. Conclusion

for almost all

2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

In this paper, we apply variational Bayesian inference to settle tracking problem for nonlinear systems with unknown measurement noise. Nonlinear stochastic approximations, such as EKF, UKF, CKF, GHKF, IMM, PF and EM, are assumed to be the popular standard filter in context of nonlinear Bayesian filtering in tracking problem; however, the results have shown that nonlinear VB based estimation algorithm are more attractive in terms of low computational cost and performance than these methods. In VB approaches, the unknown noise is initiated as variant to be estimated, while these popular approaches are set to know these noise initialization. However, under the disadvantageous condition, VB based methods can demonstrate their superior performance in nonlinear tracking problems. The main feature of VB inference is the iterative solution to the implicit equations in order to obtain an optimal approximation. Furthermore, the iterations that involved higher order moments yielded even better performance in tracking model.

for almost all ω ∈ . ¯)=1 ¯ ⊂  of full measure P ( Thus, there exists a subset  ¯ such that ∀ω ∈ , ∀i ∈ I ,

lim (c N ,ω (τ N ), ϕi ) − (τ N , ϕi ) = 0

29 30 31 32 33

We would like to thank the editors and the reviewers for their constructive comments and valuable suggestions, which helped improve this paper. This work was jointly supported by National Natural Science Foundation (61673265); 973 project (6133190302); Aerospace Science and Technology Innovation Fund and Aeronautical Science Foundation of China (20140157001); 2015 Industry-universityresearch cooperation project of AVIC.

42 43 44



47

N

=

=

51 52 54 55 56

N4

≤ ≤

N4

63 64

j 1 , j 2 =1, j 1 = j 2 4

N4 48ϕi 4 N4

E

∞ 

. $

((c N ,ω (τ N ), ϕi ) − (τ N , ϕi ))

4

≤ 48ϕi 

N =1

N =1

N t −1|t −1 ,

4

((c N ,ω (τ N ), ϕi ) − (τ N , ϕi )) < ∞,

N t |t −1 ,

91 92 93 94 95 96 97

{xti }iN=1 ,

98 99

f ϕ ),

100

2

ϕ )|t −1 ]) |t −1

101



102 103

  2 = E ((πtN|t −1 , ϕ ) − (πtN−1|t −1 , f ϕ )) |t −1 1

2 = ((πtN|t −1 , f ϕ 2 ) − (πtN−1|t −1 , f ϕ )) N 2  ϕ  . ≤

104 105 106 107 108 109

N

110

Hence, using Minkowski’s inequality, one obtains

113 114

2 1/2

≤ E [{(πtN|t −1 , ϕ ) − (πtN−1|t −1 , f ϕ )} ]

115 116

2 1/2

+ E [{(πtN−1|t −1 , f ϕ ) − (πt −1|t −1 , f ϕ )} ] ≤

111 112

2 1/2

E [{(πtN|t −1 , ϕ ) − (πt −1|t −1 , ϕ )} ]

117 118



ϕ 2 ct |t −1 √ ,

119 120

N

where ct |t −1 = (1 +



2

ct −1|t −1 ) .

121

2

122 123

Appendix C. Proof of Lemma 4

∞  1 4 N =1

and hence ∞ 

2

E [(ϕi ( V j ) − (τ N , ϕi )) (ϕi ( V j ) − (τ N , ϕi )) ]

24 ϕi  N + 24 ϕi  × 3N ( N − 1)

65 66

2

4

#

59

62

N 

It follows that

58

61

E [(ϕi ( V j ) − (τ N , ϕi ))4 ]

j =1

6

57

60



j =1

N 1 

49

53

4

N ,ω

(τ N ), ϕi ) − (τ N , ϕi )) ⎡ ⎤ N  1 = 4 E ⎣( (ϕi ( V j ) − (τ N , ϕi )))4 ⎦

E ((c

48 50

N →∞

∀ϕ ∈ A, using the independence of V j ,

45 46

EC

41

τN , τ ∈ P ( R ) be such that lim τN = τ .

UN

40

90

σ -field generated by

ϕ ) − E [(π

86

89

ϕ )|t −1 ] = (π

N t |t −1 ,

E ((π

CO RR

39

E [(π



nx

Proof. Let

38

N t |t −1 ,

85

88

N (πt |t −1 , ϕ ) − (πt |t −1 , ϕ ) ≤ (πtN|t −1 , ϕ ) − (πtN−1|t −1 , f ϕ ) + (πtN−1|t −1 , f ϕ ) − (πt −1|t −1 , f ϕ ) . Let t −1 be the

84

87

since  f ϕ  ≤ ϕ ,

Appendix A. Proof of Lemma 2

36 37

83

Proof. One has

Acknowledgements

34 35

82

Appendix B. Proof of Lemma 3

None declared.

DP

28

81

N →∞

TE

27

80

All results should be regarded as being true with probability 1, for almost all ω ∈ . 2

25 26

79

¯. lim e N = e ⇒ lim c (e N ) = e for all ω ∈ 

23 24

78

N

21

73

77

and hence, c N ,ω satisfies N →∞

72

76

lim d(c N ,ω (τ N ), τ N ) = 0,

N →∞

71

75

¯ , that ω∈

which implies, for all

70

74

N →∞

Conflict of interest statement

22

69

N →∞

19 20

68

RO

3

67

OF

1

15

N2

124 125

< ∞,

Proof. One has

126

(π˜ tN|t −1 , ϕ ) − (πt |t −1 , ϕ ) = =

N t |t −1 , g ) N t |t −1 , g )





ϕ





N t |t −1 , g



N t |t −1 , g ) N t |t −1 , g )

ϕ



ϕ)

(πt |t −1 , g )

+



(πt |t −1 , g ϕ ) (πt |t −1 , ϕ )

N t |t −1 , g



ϕ)

(πt |t −1 , g )



(πt |t −1 , g ϕ ) (πt |t −1 , g )

127 128 129 130 131 132

JID:AESCTE

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.16 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

16

where

2

(π N , g ϕ ) (π N , g ϕ ) t |t −1 t |t −1 − (πtN|t −1 , g ) (πt |t −1 , g ) (πtN|t −1 , g ϕ ) (πt |t −1 , g ) − (πtN|t −1 , g ) = (πtN|t −1 , g )(πt |t −1 , g ) ϕ  ≤ (πt |t −1 , g ) − (πtN|t −1 , g ) . (πt |t −1 , g )

7 8 9 10 11 12

Thus, using Minkowski’s inequality

13 14 15 16 17 18

2 1/2

E [{(π˜ tN|t , ϕ ) − (πt |t , ϕ )} ]

⎡

≤ E⎣ ⎡

20 22

+ E⎣

23 24 25 26 27 28 29 30 31 32 33 34 35

N t |t −1 , g ) N t |t −1 , g )



ϕ



19 21





N t |t −1 , g



ϕ)

2 ⎤1/2

(πtN|t −1 , g ϕ )

(πt |t −1 , g ϕ ) − (πt |t −1 , g ) (πt |t −1 , g )

2 ⎤1/2 ⎦

41 42 43 44 45 46 47



ϕ ) + (π

%

2 &1/2 E (πtN|t , ϕ ) − (πt |t , ϕ ) ≤E

50 51 52

ϕ ) − (π

Then, using Minkowski’s inequality

48 49

ϕ ) − (πt |t , ϕ ) = (π

˜ tN|t ,

+E

%

2 &1/2 (πtN|t , ϕ ) − (π˜ tN|t , ϕ )

56 57 58 59 60 61 62 63 64 65 66







%

2 &1/2 . (π˜ tN|t , ϕ ) − (πt |t , ϕ ) {˜xti }iN=1 .

Let t be the σ -field generated by The multinomial procedure is such that

 

E (πtN|t , ϕ ) |t = (π˜ tN|t , ϕ ) and %

2 & C N N E (πt |t , ϕ ) − (π˜ t |t , ϕ ) |t ≤ ϕ 2 , N

giving

%

2 &1/2 √C + c˜ t |t N ϕ  . 2 E (πt |t , ϕ ) − (πt |t , ϕ ) ≤ √ N

76 77 78

81 83 84 85

90 91 92 93 94 95 96 97 98 99

102 103



1 N

N 

  E φ(˜xti )ω(˜xti , xt −1 )|t −1 ,

i =1

i =1

i =1

N 1 

N

i =1

Let x¯ ti −1 ∼





106 107 108 109

112 113



114 115 116 117

 Then, we compute the bounds for E |1 | , E |2 | p and  E |3 | p . For E |1 | p , we use the Lemma and equations to get that E |1 | p |t −1

105

111

πtN−,1ω|tt −1 , f φ g − πˆ t |t −1 , φ .



104

110







p

118 119 120 121 122



123 124

1

125

Np

⎛ ⎞ N    p i i i i i i × E ⎝ φ(˜xt )ω(˜xt , xt −1 ) − E (φ(˜xt )ω(˜xt , xt −1 )|t −1 ) |t −1⎠ i =1



88

101

N N

 1 1   2 = E φ(˜xti )ω(˜xti , xt −1 )|t −1 − πtN−,1ω|tt −1 , f φ g , N N

=

87

100

πˆ tN|t −1 , φ − πˆ t |t −1 , φ = 1 + 2 + 3 ,

πtN−,1w|tt−1 , q I , then  

N,w E φ(¯xti ) w (¯xti , xti −1 )|t −1 = πt −1|tt−1 , f φ g .

UN

55



3 =

53 54

ϕ ) − (πt |t , ϕ ).

75

89

1 = πˆ tN|t −1 , φ −

˜ tN|t ,

74

It is easy to get the bounds for the following terms:



N t |t ,

73

82

where

Proof. One has

72

86



Appendix D. Proof of Lemma 5

N t |t ,

(33)

We only study the boundedness of the former two terms. The bounds for the last two terms can be attained by setting φ = 1 in the former terms. We denote t −1 as the σ -algebra generated by xti −1 . We write

2

71

 



  p  E πtN|t −1 , φ − πt |t −1 , φ and E πtN|t −1 , |φ| p .

CO RR

40



70

80

π   πˆ t |t −1 , φ −  πtN|t −1 , φ − πt |t −1 , φ =

πˆ t |t −1 , 1 πˆ tN|t −1 , 1

69

79





 

  p  and E πˆ tN|t −1 , 1 − πˆ t |t −1 , 1 and E πtN|t −1 , 1 .

38 39



 



  p  E πˆ tN|t −1 , φ − πˆ t |t −1 , φ and E πtN|t −1 , |φ| p ,

36 37

ˆ tN|t −1 , φ



where πˆ tN|t −1 = πtN−1|t −1 , ωq N and πˆ t |t −1 = πt −1|t −1 , ωq I . I This can be solved by using the bounds for the following terms:

%

2 &1/2 ϕ  ≤ E (πt |t −1 , g ) − (πtN|t −1 , g ) (πt |t −1 , g ) %

2 &1/2 N E (πt |t −1 , g ϕ ) − (πt |t −1 , g ϕ ) + (πt |t −1 , g ) √ 2 ct |t −1  g  ϕ  ≤ √ (πt |t −1 , g ) N as  g  < ∞ by assumption.







(πt |t −1 , g )



RO

6

DP

5

68

TE

4

67

According to Assumption 1, we can get that p (xt , v t | y 1:n ) ∼ q(xt | y 1:n )q( v t | y 1:n ), which means that variations of states are equivalent to the original full state. It is based on Kullback–Leibler divergence [51] which can converge towards zero. So the particles pass from the variations are special case for the total states, which cannot cause convergence to change. So, we discuss the convergence from the perspective of the full state rather than separable variations. Here, we only prove the convergence of the prediction and update steps as in [52]. That is convergence of

EC

3

Appendix E. Proof of Theorem 4

OF

1

2p C ( p) Np

N  i =1



p

E φ(x¯ ti )ω(¯xti , xti −1 ) |t −1



126 127 128 129 130 131 132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.17 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

6



7 8 9 10

#

2p C ( p) N p (1 −

ε)

p r

N p    E φ(x¯ ti )ω(¯xti , xti −1 ) |t −1 i =1



 pr $

N r

 + E (φ(x¯ ti )ω(¯xti , xti −1 ) |t −1

13

16

Lemma 5. If rth moment E E



ω

r (¯xti , xti −1 |t −1





r

ω(¯xti , xti −1 |xt −1

the moments weights less than r are finite too.

18

Proof.



19 20

E

r





ω(¯xti , xt −1 |t −1 ≤ sup E x0:t

21

26

29 30 31 32 33 34 35 36 37 38 39 40



p   E φ(x¯ ti )ω(¯xti , xti −1 ) |t −1   ≤ E [φ 2p (¯xti )ω(¯xti , xti −1 )|t −1 ] × E [ω2p −1 (¯xti , xti −1 )|t −1 ]  (2p −1)/2 ≤ Cω E [φ 2p (¯xti )ω(¯xti , xti −1 )|t −1 ], r E [ (φ(x¯ ti )ω(¯xti , xti −1 ) |t −1 ]   ≤ E [φ 2r (¯xti )ω(¯xti , xti −1 )|t −1 ] × E [ω2r −1 (¯xti , xti −1 )|t −1 ]  (2r −1)/2 ≤ Cω E [φ 2r (¯xti )ω(¯xti , xti −1 )|t −1 ].

43



E |1 | p |t −1

45



47 49 50

ω

p

N p (1 − ε ) r i =1

+

2p Cω

N p (1 − ε )

52

But,

53

N   

55 56 57 58

i =1

61 62 63 64 65 66

p r



E φ 2p (¯xti )ω(¯xti , xti −1 )|t −1

N   

E

φ 2r (¯xti )

ω

i =1

(¯xti , xti −1 )|t −1



N    E φ 2p (¯xti )ω(¯xti , xti −1 )|t −1 , ≤N+



i =1

N   i =1



E φ 2r (¯xti )ω(¯xti , xti −1 )|t −1

≤ (1 + N )

N   i =1



2

E φ 2p (¯xti )ω(¯xti , xti −1 )|t −1

59 60

N  

 (2r −1)/2

51

54



(2p −1)/2 2p C

48

CO RR

Thus





UN

42

46

c1

2

 E φ 2r (¯xti )ω(¯xti , xti −1 )|t −1 .

.

+

p r

N p − p /r (1 − ε ) (2r −1)/2

p

N p − p /r (1 − ε )



p

2p

M t −1|t −1 φt −1, p

p r

 f  g

2 Cω

70

 f  g

2 Cω

+ c 2 φt −1, p + c 3

p



With the Cauchy–Schwarz inequality and the Lemma, we get

41

44

N p − p /r (1 − ε )



The results of weights less than r are easily obtained from Jensen’s and Holder’s inequalities. 2

27 28

2 Cω

2p

ε

(1 − ε ) p 2p

(1 − ε )

p −1



1

·

p r

71 72 73

p

M t −1|t −1 φt −1, p

p

φt −1, p

74 75

p

= C 1

φt −1, p

76

.

i =1

πtN−,1ω|tt −1 , f |φ| p g

82 83 84 85 86 87 88 89 90 91

f |φ| g p

92



93 94 95 96 97

.

N p − p /r

98



p

Then, using Jensen’s Inequality |2 | p |t −1 ≤ C 2 ·

E |3 | |t −1

81





p

80

i =1

N

N t −1|t −1 ,

79

p   E φ(x¯ ti )ω(¯xti , xt −1 ) |t −1

N 1 

ε p −1 ·

p

N 



φt −1, p N p − p /r

.

100

p

 f  p  g  p φt −1, p N p − p /r

102 103 104

p

≤ C 3

φt −1, p N p − p /r

105

.

106 107

Then, using Minkowski’s inequality, we get

 

 p 1/ p E πtN|t −1 , φ − πt |t −1 , φ  1/ p  1/ p  1/ p ≤ E |1 | p |t −1 + E |2 | p |t −1 + E |3 | p |t −1 p

φ p φt −1, p t −1 , p 1/ p 1/ p 1/ p 1/ p ≤ C 1 + C 2 + C 3 = C . t |t −1 1−1/r 1−1/r N

N

That is

108 109 110 111 112 113 114 115 116

p  

φt −1, p  p  E πtN|t −1 , φ − πt |t −1 , φ ≤ C t |t −1 p − p /r



The bound for E

πtN|t −1 , |φ| p

117

(34)

N





118 119

can be derived using the same

120 121

method as above, which leads to



99 101

 

   p ≤  f  p  g  p E  πtN−1|t −1 , φ − πt −1|t −1 , φ  ≤ C˜ t −1|t −1

77 78

i =1

N

π

≤ C · 2

EC

25

is bounded, then

ω(xti , xt −1 |xt −1

(2r −1)/2

p

x0:t

23 24



r

+

p r

(2p −1)/2

p

N p − p /r N p − p /r φt −1, p N 1    |2 | p = E φ(˜xti )ω(˜xti , xt −1 )|t −1 N i =1 N  p 1   i i − E φ(¯xt )ω(¯xt , xt −1 )|t −1 N i =1 ⎡ ⎤ p N 1  2p p −1 i i ≤ ε · E ⎣ φ(¯xt )ω(¯xt , xt −1 ) |t −1 ⎦ N (1 − ε ) p

r ≤ sup crω ≤ C ω .

22

N p − p /r (1 − ε )



is bounded too. If the rth moment is finite then

17

69

2 Cω



From Assumption 5, we can deduce the following results.

12

15

+

i =1

11

14



68

(2p −1)/2

p

67



OF

5

E |1 | p |t −1

i =1

4

Then, without loss of generality, set 2r = p,

RO

3



DP

2

 pr



N r 2 p C ( p )  i i i ¯ + E x ) ω (¯ x , x ) |  (φ( t − 1 t t t −1 Np

TE

1

17

122

p

123

 

 p  In the update step, we analyze E πˆ tN|t , φ − πt |t , φ and 

 p N E πˆ t |t , |φ| according to the prediction step.

124

E

πtN|t −1 , |φ| p

≤ M t |t −1 φt −1, p .

First, let us introduce the following separation











πˆ tN|t −1 , g φ

πˆ tN|t , φ − πt |t , φ =

πˆ tN|t −1 , g







πt |t −1 , g φ ¯1+ ¯ 2,  = πt |t −1 , g

− 

125 126 127 128 129 130 131 132

JID:AESCTE

AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.18 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

18

¯1



5



6 7

10 11

14 15 16



19 20

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

58

61 62 63 64 65 66

γt πt |t −1 , g

P k , Q , T , Ck (, F , P ) β( P n )



     πt |t −1 , g − πˆ t |t −1 , g .

υ, λ ϕ , ω(, ), φ at , bt , kt , k1: t cN

C b ( R nx ) A = {ϕi }i >0 ∈ C b ( R nx )

p  

φt −1, p  p  E πˆ tN|t , φ − πˆ t |t , φ ≤ C¯ t |t . p − p /r

N

Using a separation similar to the one mentioned above and prediction step results in



   E πˆ tN|t , |φ| p − πt |t , |φ| p



πˆ tN|t −1 , g |φ| p N  ≤ E πˆ t |t , |φ| p −  πt |t −1 , g

πˆ N , g |φ| p t |t −1    − πt |t , |φ| p + E  πt |t −1 , g  p ¯ t |t −1  g  ( g φ  + γt ) M p   · φt −1, p . ≤ γt πt |t −1 , g

Now, we observe that φt , p is increasing with respect to t. It results in

E



πˆ tN|t , |φ| p





¯ t |t −1  g  ( g φ p  + γt ) M 

 ≤



γt πt |t −1 , g 



p · φt −1, p



¯ t |t −1  g  ( g φ p  + γt ) M 



γt πt |t −1 , g



+1



+ πt |t , |φ|

p

state measurement system noise measurement noise state transition function observational function probability density function a fixed probability density function probability density function joint state weight coefficient algorithm parameter mean variance or covariance probability space the Borel σ -algebra on R n Markov transition kernel a measure a function sequences of continuous functions the map that determines a random sample of size N of the measure the set of all continuous bounded functions on R nx a countable subset of continuous bounded functions the weak topology constant

References

which implies



d(μ, ν ) C, c f , cg

(35)

1/ p C¯ t |t  g  ( g φ + γt ) φt −1, p 1/ p φt −1, p   · 1−1/r  C¯ t |t ≤ , N N 1−1/r γt πt |t −1 , g

59 60



 g φ

 

 p 1/ p E πtN|t −1 , φ − πt |t −1 , φ   1/ p  1/ p ¯ 1 p |t −1 ¯ 2 p |t −1 ≤ E  + E 

56 57

ω αk , βk m, μ

Thus, by Minkowski’s inequality and equation (34),

54 55

θ

 



N πt |t −1 , g − πˆ tN|t −1 , g φ πˆ t |t −1 , g φ  ¯ 1 =

·   πt |t −1 , g πˆ tN|t −1 , g

18

22

πt |t −1 , g φ  . πt |t −1 , g

− 

By condition, we have

17

21

π



DP

13

N t |t −1 , g



 ,

[1] X. Yu, J. Li, J. Xu, Robust adaptive algorithm for nonlinear systems with unknown measurement noise and uncertain parameters by variational bayesian inference, Int. J. Robust Nonlinear Control 28 (10) (2018) 3475–3550. [2] X. Yu, J. Li, J. Xu, Z. Zhang, Optimal state estimation for equality-constrained systems with complex noise, Optim. Control Appl. Methods 39 (5) (2018) 1684–1701. [3] S. Särkkä, A. Nummenmaa, Recursive noise adaptive Kalman filtering by variational bayesian approximations, IEEE Trans. Autom. Control 54 (3) (2009) 596–600. [4] J. Zhao, L. Mili, A framework for robust hybrid state estimation with unknown measurement noise statistics, IEEE Trans. Ind. Inform. 99 (2017) 1. [5] B. Ristic, X. Wang, S. Arulampalam, Target motion analysis with unknown measurement noise variance, in: International Conference on Information Fusion, 2017, pp. 1–8. [6] A. Dey, S. Sadhu, T.K. Ghoshal, Adaptive Gauss–Hermite filter for non-linear systems with unknown measurement noise covariance, IET Sci. Meas. Technol. 9 (8) (2015) 1007–1015. [7] T. Ardeshiri, E. Özkan, U. Orguner, F. Gustafsson, Approximate bayesian smoothing with unknown process and measurement noise covariances, IEEE Signal Process. Lett. 22 (12) (2015) 2450–2454. [8] D. Xu, C. Shen, F. Shen, A robust particle filtering algorithm with non-gaussian measurement noise using Student-t distribution, IEEE Signal Process. Lett. 21 (1) (2014) 30–34, https://doi.org/10.1109/LSP.2013.2289975. [9] H. Zhu, H. Leung, Z. He, A Variational Bayesian Approach to Robust Sensor Fusion Based on Student-t Distribution, Elsevier Science Inc., 2013. [10] P. Dong, Z. Jing, H. Leung, K. Shen, Variational bayesian adaptive cubature information filter based on Wishart distribution, IEEE Trans. Autom. Control 62 (11) (2017) 6051–6057. [11] H. Zhu, H. Leung, Z. He, State estimation in unknown non-gaussian measurement noise using variational bayesian technique, IEEE Trans. Aerosp. Electron. Syst. 49 (4) (2013) 2601–2614. [12] S. Sarkka, A. Nummenmaa, Recursive noise adaptive Kalman filtering by variational bayesian approximations, IEEE Trans. Autom. Control 54 (3) (2009) 596–600. [13] I.S. Mbalawata, S. Särkkä, M. Vihola, H. Haario, Adaptive metropolis algorithm using variational bayesian adaptive Kalman filter, Comput. Stat. Data Anal. 83 (2015) 101–115. [14] M.D. Vrettas, C. Dan, M. Opper, Estimating parameters in stochastic systems: a variational bayesian approach, Phys. D, Nonlinear Phenom. 240 (23) (2011) 1877–1900. [15] F.E. Daum, New exact nonlinear filters, Bayesian analysis of time. [16] B. Ait-El-Fquih, I. Hoteit, A variational bayesian multiple particle filtering scheme for large-dimensional systems, IEEE Trans. Signal Process. 64 (20) (2016) 5409–5422. [17] M.A. Sato, Online model selection based on the variational bayes, Neural Comput. 13 (7) (2014) 1649–1681.

TE

12

π



πt |t −1 , g

67 68

x z w v f g p, π q ρ, h

EC

9

− 

ˆ tN|t −1 , g φ

¯2



πˆ tN|t −1 , g φ



CO RR

8

πˆ tN|t −1 , g



OF

πˆ tN|t −1 , g φ



RO



3 4

Appendix F. Main math symbols employed in the paper

where

2

UN

1



p · φt , p

¯ t |t φtp, p . M Similar for initialization, update and resampling step. Thus complete the proof.

69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

JID:AESCTE AID:4809 /FLA

[m5G; v1.246; Prn:24/10/2018; 11:12] P.19 (1-19)

X. Yu et al. / Aerospace Science and Technology ••• (••••) •••–•••

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

EC

36 37 38 39 40

44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

UN

43

CO RR

41 42

OF

4

[35] T. Schön, F. Gustafsson, P.J. Nordlund, Marginalized particle filters for mixed linear/nonlinear state-space models, IEEE Trans. Signal Process. 53 (7) (2005) 2279–2289. [36] R. Douc, E. Moulines, Limit Theorems for Weighted Samples with Applications to Sequential Monte Carlo Methods, ESAIM Proc., vol. 19, EDP Sciences, 2007, pp. 101–107. [37] P. Del Moral, A. Guionnet, On the stability of interacting processes with applications to filtering and genetic algorithms, Ann. Inst. Henri Poincaré Probab. Stat. 37 (2001) 155–194. [38] A. Bain, D. Crisan, D. Dawson, D. Geman, G. Grimmett, I. Karatzas, F. Kelly, Y. Le Jan, B. Øksendal, G. Papanicolaou, et al., Fundamentals of Stochastic Filtering, vol. 60, Springer, 2009. [39] Y. Bar-Shalom, X.R. Li, T. Kirubarajan, Estimation with applications to tracking and navigation, in: Eaepe, 2001, pp. 25–29. [40] S. Särkkä, A. Vehtari, J. Lampinen, Rao-blackwellized particle filter for multiple target tracking ✩, Inf. Fusion 8 (1) (2007) 2–15. [41] S. Haykin, Chapter 5. Dual Extended Kalman Filter Methods, John Wiley and Sons, Inc., 2002. [42] S. Dan, 14. The Unscented Kalman Filter, John Wiley & Sons, Inc., 2006. [43] Y. Wu, D. Hu, M. Wu, X. Hu, Unscented Kalman filtering for additive noise case: augmented versus nonaugmented, IEEE Signal Process. Lett. 12 (5) (2005) 357–360. [44] Z. Liu, S.C. Chan, H.C. Wu, J. Wu, Bayesian unscented Kalman filter for state estimation of nonlinear and non-gaussian systems, in: Signal Processing Conference, 2016, pp. 443–447. [45] S. Särkkä, J. Hartikainen, On gaussian optimal smoothing of non-linear state space models, IEEE Trans. Autom. Control 55 (8) (2010) 1938–1941. [46] I. Arasaratnam, S. Haykin, Cubature Kalman filters, IEEE Trans. Autom. Control 54 (6) (2009) 1254–1269, https://doi.org/10.1109/TAC.2009.2019800. [47] C.M. Bishop, Pattern Recognition and Machine Learning, Inf. Sci. Stat., SpringerVerlag New York Inc., 2006. [48] J. Vanhatalo, A. Vehtari, MCMC Methods for MLP-Network and Gaussian Process and Stuff – A Documentation for Matlab, 2006. [49] S.J. Julier, J.K. Uhlmann, Unscented filtering and nonlinear estimation, Proc. IEEE 92 (12) (2004) 1958, https://doi.org/10.1109/JPROC.2004.837637. [50] S.J. Julier, J.K. Uhlmann, Unscented filtering and nonlinear estimation, Proc. IEEE 92 (3) (2004) 401–422. [51] P. Del Moral, Mean Field Simulation for Monte Carlo Integration, CRC Press, 2013. [52] I.S. Mbalawata, S. Sarkka, On the L4 Convergence of Particle Filters with General Importance Distributions, 2014, pp. 8048–8052.

RO

3

[18] P. Kozierski, M. Lis, D. Horla, Robust particle filter—anti-zero bias modification, Int. J. Robust Nonlinear Control 26 (16) (2016) 3645–3661. [19] S. Ji, B. Krishnapuram, L. Carin, Variational bayes for continuous hidden Markov models and its application to active learning, IEEE Trans. Pattern Anal. Mach. Intell. 28 (4) (2006) 522–532. [20] Z. Ghahramani, M.J. Beal, Propagation algorithms for variational bayesian learning, Adv. Neural Inf. Process. Syst. 13 (2001) 507–513. [21] V. Šmídl, A. Quinn, The Variational Bayes Method in Signal Processing, Springer Berlin Heidelberg, 2006. [22] J. Vermaak, N.D. Lawrence, P. Perez, Variational inference for visual tracking, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, pp. 773–780. [23] H. Valpola, J. Karhunen, An unsupervised ensemble learning method for nonlinear dynamic state–space models, Neural Comput. 14 (11) (2002) 2647–2692. [24] S. Wang, J. Zhao, Z. Huang, R. Diao, Assessing gaussian assumption of PMU measurement error using field data, IEEE Trans. Power Deliv. 99 (2017) 1. [25] J.S. Liu, R. Chen, Sequential Monte Carlo methods for dynamic systems, J. Am. Stat. Assoc. 93 (443) (1998) 1032–1044. [26] A. Doucet, J.F.G.D. Freitas, N.J. Gordon (Eds.), Sequential Monte Carlo Methods in Practice, Springer-Verlag, New York, 2001. [27] J.L. Xingkai Yu, J. Xu, Nonlinear filtering in unknown measurement noise by variational bayesian inference, in: 20th International Conference on Information Fusion, ISIF, Xi’an, China, July 10–13, 2017, pp. 439–444, in press. [28] B. Wang, D.M. Titterington, Lack of consistency of mean field and variational bayes approximations for state space models, Neural Process. Lett. 20 (3) (2004) 151–170. [29] J. Braun, On Kolmogorov’s Superposition Theorem and Its Applications, Südwestdeutscher Verlag Für Hochschulschriften, 2010. [30] Xing Liu, Kolmogorov Superposition Theorem and Its Applications, Phd thesis, 2015. [31] B. Ait-El-Fquih, I. Hoteit, A variational bayesian multiple particle filtering scheme for large-dimensional systems, IEEE Trans. Signal Process. 64 (20) (2016) 5409–5422, https://doi.org/10.1109/TSP.2016.2580524. [32] V. Smidl, A. Quinn, Variational bayesian filtering, IEEE Trans. Signal Process. 56 (10) (2008) 5020–5030. [33] T. Schon, F. Gustafsson, P.J. Nordlund, Marginalized particle filters for mixed linear/nonlinear state-space models, IEEE Trans. Signal Process. 53 (7) (2005) 2279–2289. [34] V. Smidl, A. Quinn, The restricted variational bayes approximation in bayesian filtering, in: Nonlinear Statistical Signal Processing Workshop, 2006, pp. 224–227.

DP

2

TE

1

19

67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125

60

126

61

127

62

128

63

129

64

130

65

131

66

132