Physica D 241 (2012) 671–680
Contents lists available at SciVerse ScienceDirect
Physica D journal homepage: www.elsevier.com/locate/physd
On a nonlinear Kalman filter with simplified divided difference approximation X. Luo a,∗ , I. Hoteit a , I.M. Moroz b a
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
b
Mathematical Institute, 24-29 St Giles’, Oxford, OX1 3LB, UK
article
info
Article history: Received 26 June 2010 Received in revised form 4 December 2011 Accepted 5 December 2011 Available online 16 December 2011 Communicated by B. Sandstede Keywords: Data assimilation Ensemble Kalman filter Divided difference approximation
abstract We present a new ensemble-based approach that handles nonlinearity based on a simplified divided difference approximation through Stirling’s interpolation formula, which is hence called the simplified divided difference filter (sDDF). The sDDF uses Stirling’s interpolation formula to evaluate the statistics of the background ensemble during the prediction step, while at the filtering step the sDDF employs the formulae in an ensemble square root filter (EnSRF) to update the background to the analysis. In this sense, the sDDF is a hybrid of Stirling’s interpolation formula and the EnSRF method, while the computational cost of the sDDF is less than that of the EnSRF. Numerical comparison between the sDDF and the EnSRF, with the ensemble transform Kalman filter (ETKF) as the representative, is conducted. The experiment results suggest that the sDDF outperforms the ETKF with a relatively large ensemble size, and thus is a good candidate for data assimilation in systems with moderate dimensions. © 2011 Elsevier B.V. All rights reserved.
1. Introduction The Kalman filter (KF) is a recursive data assimilation algorithm that yields optimal estimations in linear systems. However, to extend the KF algorithm to nonlinear systems, some approximations have to be adopted. One idea is to linearize the nonlinear system in assimilation so that the KF can be applied to the locally linear system, which leads to the extended Kalman filter (EKF) [1, ch. 8]. Alternatively, one may approximate the probability density function (pdf), rather than the governing equations, of the underlying system state by a Gaussian pdf, despite the presence of nonlinearity. Under the Gaussianity assumption, one can use the KF to update the background mean and covariance to its analysis counterparts, while the estimations of the background statistics are tackled by the Monte Carlo approximation, which is the rationale behind the stochastic ensemble Kalman filter (sEnKF) [2–5], the ensemble square root filter (EnSRF) [6–9], and other similar algorithms [10–13], to name but a few. For convenience, hereafter we call such nonlinear extensions of the KF the ‘‘nonlinear KFs’’. In this work we consider one such extension, called the divided difference filter (DDF) [14,15], which is similar to the EKF in that, in the presence of nonlinearity, the DDF also conducts local expansions of the governing equations. However, the expansion in the DDF is not via Taylor series expansion as in the EKF, but through Stirling’s interpolation formula [15]. By adopting this formula, one avoids computing the derivatives of a nonlinear function. Instead,
∗
Corresponding author. E-mail address:
[email protected] (X. Luo).
0167-2789/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.physd.2011.12.003
one uses the divided difference approximations of the derivatives in computation, and thus circumvents some of the problems in the EKF. For applications of the nonlinear KFs to high-dimensional systems (e.g., models in geophysics), it is often prohibitive to directly manipulate the full rank covariance matrix of the system state. Some modifications of the standard nonlinear KFs may have to be introduced to reduce the computational cost. To this end, we propose a simplified form of the DDFs in [15], which can be considered as a hybrid of the EnSRF and the simplified divided difference approximation, and is thus called the simplified DDF (sDDF). The idea in the sDDF is to incorporate a simplified divided difference approximation, derived based on Stirling’s interpolation formula, to evaluate the background covariance, and then use an EnSRF algorithm to update the background statistics to their analysis counterparts. By employing Stirling’s interpolation formula, symmetric analysis ensembles need to be produced. The presence of symmetry in the analysis ensembles can reduce the sampling errors and spurious modes in the evaluation of the background statistics at the next assimilation cycle, as shown in the appendix of [16]. This work is organized as follows. In Section 2, we derive the simplified divided difference approximation based on Stirling’s interpolation formula. Applying this approximation scheme to the prediction step of an EnSRF leads to the sDDF, as to be presented in Section 3. Then in Sections 4 and 5 we conduct experiments to compare the performance of the sDDF and the ensemble transform Kalman filter (ETKF) under various ensemble sizes. Finally, we conclude the work in Section 6.
672
X. Luo et al. / Physica D 241 (2012) 671–680
2. Stirling’s interpolation formula and its application to estimating the statistics of nonlinearly transformed random vectors This section describes how the divided difference approximation can be derived based on Stirling’s interpolation formula at the prediction step. To this end, we consider the estimation problem in the following scenario: suppose that one transforms an m-dimensional random vector x by a nonlinear function F so as to obtain a transformed m-dimensional random vector η = F (x). Our objective is to estimate the mean η¯ and covariance Pη of η, given the known mean x¯ and covariance Px of x. In terms of ensemble filtering, one may quickly identify that this estimation problem is essentially equivalent to the estimation of the background statistics based on the propagation of an analysis ensemble at the previous assimilation cycle. To solve the above estimation problem, one can expand F around x¯ through Stirling’s interpolation polynomials [15]. For convenience of discussion, here we first define some notations. We rewrite the random vector x as x = x¯ + δ x, where δ x corresponds to a random vector with m-dimensional zero mean 0m and m × m covariance matrix Px . Let Sx be an m × L square root matrix of Px , such that Px = Sx (Sx )T . Then we can introduce a normalized L-dimensional random vector δ e, whose mean and covariance are the L-dimensional zero vector 0L and the L × L identity matrix IL , respectively, i.e., E(δ e) = 0L and Cov(δ e) = IL . In addition, we also define δ ei as the i-th element of δ e. By this definition, one has
E (δ ei ) = 0,
E δ ei δ ej = δij
for i, j = 1, . . . , L,
1
η = F (¯x + δ x) ≈ F (¯x) + Dδe F (¯x) + Dδ2e F (¯x) ,
(2) 2 where Dδ e and Dδ2e represent the divided difference operators defined as follows:
Dδ e F (¯x) =
Dδ2e F
(¯x) =
h
δ ei Pi (h/2)Ni (h/2) F (¯x) ,
(3a)
i =1
1
L
h2
(δ ei )2 [Ni (h/2)]2
1
Dδ2e F (¯x) =
4h
+
(6a)
i=1
L L
δ ei δ ej Ni (h)Nj (h) F (¯x) ,
(6b)
i=1 j=1,j̸=i
where N (h) ≡ [N1 (h), . . . , NL (h)]T . Note that in Eq. (6b), the summation L L
δ ei δ ej Ni (h)Nj (h)F (¯x)
i=1 j=1,j̸=i
contains L(L − 1)/2 terms, which may incur substantial computational cost. For instance, if L = 100, then there is a need to evaluate about 5000 such terms, which is infeasible in the context of ensemble data assimilation in large-scale systems. To reduce computational cost, we follow [15] and discard these ‘‘cross’’ terms, such that Eq. (6b) is reduced to
Dδ2e F (¯x) ≈
L 2
h2 i =1
(δ ei )2 [Pi (h) − I] F (¯x) .
(7)
Since in Eqs. (6) and (7), the parameterized operators become
Ni (h) and Pi (h), we need to generate some symmetric system states {Xi }2L i=0 following the definition in Eq. (4), with X0 = x¯ , Xi = x¯ + h (Sx )i ,
i = 1, 2, . . . , L,
Xi = x¯ − h (Sx )i−L ,
(8)
i = L + 1, L + 2, . . . , 2L.
For convenience, we called such symmetric system states {Xi }2L i=0 sigma points following [17]. The estimations of the mean and covariance of η are then carried out based on the transformed sigma points
Zi ≡ F (Xi ) ,
1
h
h
(Sx )i + F x¯ − (Sx )i 2 2 2 h h Ni (h/2)F (¯x) = F x¯ + (Sx )i − F x¯ − (Sx )i , F
x¯ +
2
ηˆ ≈ F (¯x) +
, (4)
=
2
with the parameter h being the interval length of interpolation, and (Sx )i the i-th column of the square root matrix Sx of Px . With some algebra, it can be shown that [Pi (h/2)]2 F (¯x) =
1
[Pi (h) + I] F (¯x) ; 2 [Ni (h/2)]2 F (¯x) = 2 [Pi (h) − I] F (¯x) ; [Pi (h/2)Ni (h/2)] F (¯x)
= [Ni (h/2)Pi (h/2)] F (¯x) =
1 2
Ni (h)F (¯x) .
(5)
for i = 0, . . . , 2L
(9)
of {Xi }2L i=0 . Concretely, to estimate the mean of η , one first substitutes Eqs. (7) and (6a) into Eq. (2), and then takes the expectation on both sides of Eq. (2), so that the estimated mean
satisfying
Pi (h/2)F (¯x) =
(δ e)T N (h)F (¯x) , L 1 8 (δ ei )2 [Pi (h) − I] 2
2h
i=1 L L
δ ei δ ej [Pi (h/2)Ni (h/2)] i=1 j=1,j̸=i × Pj (h/2)Nj (h/2) F (¯x) . (3b) In Eq. (3), Pi (h/2) and Ni (h/2) are parameterized operators +
Dδ e F (¯x) =
(1)
where δij = 1 if i = j, and δij = 0 otherwise. With this setting, δ x = Sx δ e corresponds to a random vector with zero mean 0m and covariance matrix Px . In [15], the authors consider the second order Stirling’s interpolation formula
L 1
Note that in the above equation, I represents the identity operator such that IF (¯x) = F (¯x). With Eqs. (4) and (5), Eq. (3) becomes
=
h2 − L h2 h2 − L h2
L 1
h2 i=1
F (¯x) +
Z0 +
E (δ ei )2 [Pi (h) − I] F (¯x)
L 1
h2 i=1
2L 1
2h2 i=1
Pi (h)F (¯x)
Zi ,
(10)
while in the deduction of Eq. (10), to derive the first line, the condition E(δ e) = 0L is applied, such that E(Dδ e F (¯x)) = 0 when taking the expectation on the right hand side of Eq. (2); to derive the second line, the condition E (δ ei )2 = 1 in Eq. (1) is then adopted; and to derive the third line, the definitions in Eqs. (4), (8) and (9) are used.
X. Luo et al. / Physica D 241 (2012) 671–680
Similarly, using Eqs. (2), (7), (10) and (6a), the estimated covariance is given by
ˆ η ≈ E η − ηˆ P
=
L 1
4h2 i=1
+
T η − ηˆ
(Zi − ZL+i ) (Zi − ZL+i )T
L 1
4h4 i=1
(E(δ ei )4 − 1) (Zi + ZL+i − 2Z0 )
× (Zi + ZL+i − 2Z0 )T ,
(11)
where E(δ ei )4 denotes the fourth order moment of the random variable δ ei . The derivation of Eq. (11) is omitted for brevity. Readers are referred to [18, Section 5.3.1.2] for the details. In particular, if E(δ ei )4 = 1 for all i = 1, . . . , L, then Eq. (11) is reduced to
ˆη ≈ P
L 1
4h2 i=1
(Zi − ZL+i ) (Zi − ZL+i )T .
(12)
Eq. (12) is the simplified form for evaluation of the background covariance, as to be used in the sDDF in the next section. The motivation to use this simplified form is based on the following observation: the m × L matrix Sˆ η =
1 2h
[Z1 − ZL+1 , . . . , ZL − Z2L ]
(13)
ˆ η , without conducting any matrix factorization. is a square root of P If one treats the set of sigma points {Xi }2L i=0 in Eq. (8) as the analysis ensemble at some assimilation cycle, and Sˆ η the square root matrix of the background covariance at the next cycle, then by applying the formula in an EnSRF to update Sˆ η , one obtains an m × L analysis square root matrix. In the spirit of Eq. (8), one can generate a new set of sigma points, with 2L + 1 members again, using the analysis square root matrix and the analysis mean. Thus by adopting Eq. (12), there is no extra computational cost needed in order to produce the same number of sigma points at different assimilation cycles. This computational advantage is normally not available in the reduced-rank sigma point Kalman filters. For instance, in [16,18], the truncated singular value decomposition (TSVD) has to be conducted at each assimilation cycle in order to prevent the number 2L + 1 of sigma points growing. The computational complexity of the TSVD is in the order of m2 L (O (m2 L)), much lower than O (m3 ) in the full SVD when L ≪ m. However, for high dimensional systems with large m, the TSVD still incurs substantial computational cost. In addition, since the simplified form does not reply on the TSVD to produce sigma points, the ranks of the analysis covariance matrices does not impose any limitation on the number of sigma points that one can produce. This is a feature not available in [16,18]. The simplified form in Eq. (12) introduces certain approximation errors. In the context of ensemble filtering, however, one may adopt the standard ensemble filtering configuration [19,20], which equips an ensemble filter with both the covariance inflation [9,21] and localization [22] techniques to alleviate the effect of approximation errors [23].
673
Eqs. (14a) and (14b) represent the mx -dimensional dynamical system and my -dimensional observation system, respectively. Mk,k−1 : Rmx → Rmx is the transition operator, while Hk : Rmx → Rmy represents the observation operator. We assume that the model error in the dynamical system Eq. (14a) is characterized by a white Gaussian random process uk with zero mean and covariance Qk (denoted by uk ∼ N (uk : 0, Qk )), while the observation error in Eq. (14b) is described by another white Gaussian process vk with zero mean and covariance Rk (vk ∼ N (vk : 0, Rk )). In addition, ui and vj are independent from each other for all indices i and j. The main idea in the simplified divided difference filter (sDDF) is as follows: At the prediction step, we apply Eqs. (10) and (13) to estimate the mean and a square root of the covariance matrix of the background ensemble, respectively. While at the filtering step, we adopt the update formulae in one of the EnSRFs [6–9] to update the background mean and square root matrix to their analysis counterparts, and to generate the analysis ensemble accordingly.1 In doing this, the computational cost of the sDDF is less than that of the EnSRF given the same ensemble size, as to be discussed later. 3.1. Prediction step Without loss of generality, we assume that at time index k − 1, one has an mx -dimensional analysis mean xˆ ak−1 and an associated
ˆ ak−1 such mx × n square root matrix Sˆ ak−1 of the analysis covariance P ˆ ak−1 = Sˆ ak−1 (Sˆ ak−1 )T . Based on xˆ ak−1 and Sˆ ak−1 , one can construct that P a set of sigma points Xak−1 ≡ {Xak−1,i : i = 0, . . . , 2n} with 2n + 1
members2 in the following manner
Xak−1,0 = xˆ ak−1 ,
i
Xak−1,i = xˆ ak−1 + hk−1 Sˆ ak−1 Xak−1,i = xˆ ak−1 − hk−1 Sˆ ak−1
i = 1 , 2 , . . . , n,
i −n
,
(15)
i = n + 1, n + 2, . . . , 2n,
is the ith column of Sˆ ak−1 , and hk−1 is the length of interpolation interval (cf. Eq. (8)) with respect to the transition operator Mk,k−1 . For convenience, we say that the set Xak−1 of sigma where
Sˆ ak−1
,
i
points is generated with respect to the triplet (hk−1 , xˆ ak−1 , Sˆ ak−1 ). If there is no dynamical noise in the dynamical system Eq. (14a), such that the covariance matrix Qk = 0, one may let Xbk ≡ {xbk,i : xbk,i = Mk,k−1 (Xak−1,i ), i = 0, . . . , 2n} be the set of the propagated sigma points, then applying Stirling’s interpolation formula to Mk,k−1 , with the set Xak−1 of sigma points in Eq. (15) acting as the interpolation points, one can estimate the ˆ bk in the spirit of Eqs. (10) and background mean xˆ bk and covariance P (12). Concretely, the background mean xˆ bk is estimated by xˆ bk =
h2k−1 − n h2k−1
xbk,0 +
1
2n
2h2k−1 i=1
xbk,i ,
(16)
ˆ bk by while the background covariance P ˆ bk = P
1
n
4h2k−1 i=1
xbk,i − xbk,i+n
xbk,i − xbk,i+n
T
.
(17)
3. The simplified divided difference filter We are interested in the state estimation problem in the following system: xk = Mk,k−1 (xk−1 ) + uk ,
(14a)
yk = Hk (xk ) + vk .
(14b)
1 One may construct the hybrid of the simplified divided difference approximation and the stochastic EnKF [2–4] in a similar way, which, however, will not be investigated in this work. 2 To generate an analysis ensemble with even number of members, one may simply discard the centre Xak−1,0 of sigma points in Eq. (15) [16,17].
674
X. Luo et al. / Physica D 241 (2012) 671–680
If there does exist dynamical noise in the dynamical system Eq. (14a) such that Qk ̸= 0, then one may construct the propagated sigma points as Xbk ≡ {xbk,i : xbk,i = Mk,k−1 (Xak−1,i ) + ξk,i , i = 0, . . . , 2n}, where ξk,i are some realizations of the dynamical noise uk . In doing this, Eqs. (16) and (17) are again applicable to estimate the background mean and covariance, respectively. However, in general the sample covariance of the realizations ξk,i will underestimate the true covariance of uk . In addition, in Section 2 we have also made some approximations in order to derive the simplified form for covariance evaluation. For these reasons, it is rational to provide some compensation for the covariance estimation form in Eq. (17). Following the idea of covariance inflation [9,21], we introduce a non-negative covariance inflation ˆ bk , such that factor δ to P
ˆ bk = P
n (1 + δ)2
4h2k−1
xbk,i − xbk,i+n
xbk,i − xbk,i+n
T
.
(18)
i=1
As a result, the mx × n matrix Sˆ bk =
1+δ 2hk−1
[xbk,1 − xbk,1+n , . . . , xbk,n − xbk,2n ]
(19)
ˆ bk , such that Sˆ bk (Sˆ bk )T = Pˆ bk . is a square root matrix of P 3.2. Filtering step To update the background to the analysis, one needs to first compute the Kalman gain, whose evaluation involves the observation operator Hk . If Hk is nonlinear, one may apply Stirling’s interpolation formula once more in evaluation. To this end, one generates another set of sigma points Xbk ≡ {Xbk,i : i =
to update the background mean xˆ bk in Eq. (16) to its analysis
ˆ bk , we follow counterpart xˆ ak . To update the background covariance P the idea of the ensemble transform Kalman filter (ETKF) [7] to update the mx × n square root matrix Sˆ bk to an mx × n square root ˆ ak . In [7], Sˆ ak is given by matrix Sˆ ak of P Sˆ ak = Sˆ bk Tk ,
(26)
where the transform matrix 1
Tk = Ek (Dk + In )− 2
(27)
is an n × n matrix, with In being the n-dimensional identity matrix, Dk the n × n diagonal matrix whose diagonal elements consisting 1 ˆh of the eigenvalues of the n × n matrix (Sˆ hk )T R− k Sk , and Ek the n × n
1 ˆh matrix consisting of the corresponding eigenvectors of (Sˆ hk )T R− k Sk . a Readers are referred to [7] for the deduction of Tk . Given xˆ k and
Sˆ ak , one may generate an analysis ensemble Xak = {Xak,i : i =
0, . . . , 2n} with respect to the triplet (hk , xˆ ak , Sˆ ak ), in the spirit of Eq. (15), where hk is the length of interpolation interval at the kth assimilation cycle. We note that the ensemble size of Xak is equivalent to that of a Xk−1 at the previous cycle, and there is no additional computational cost to achieve this. Therefore, given an analysis ensemble Xak−1 with 2n + 1 members, the computational costs in propagating Xak−1 forward and computing the statistics of the resulting background ensemble will nearly be the same for both the sDDF and the ETKF. However, in the sDDF, the dimensions of the square root matrices Sˆ bk and Sˆ hk are mx × n and my × n, respectively (cf. Eqs. (19) and (23)),
while in the ETKF, the dimensions of Sˆ bk and Sˆ hk will be mx ×(2n + 1) and my × (2n + 1) [7]. As a result, the ETKF requires (slightly)
0, . . . , 2n} with respect to the triplet (dk , xˆ bk , Sˆ bk ), i.e.,
ˆk more computational costs in evaluating the Kalman gain K 1 ˆh (Eq. (24)) and in conducting SVD on the matrix (Sˆ hk )T R− S to obtain k k the transform matrix Tk (Eq. (27)).
Xbk,0 = xˆ bk ,
4. Experimental setup
Xbk,i = xˆ bk + dk Sˆ bk
,
i = 1, 2, . . . , n,
(20)
i b b ˆ Xk,i = xk − dk Sˆ bk , i = n + 1, n + 2, . . . , 2n, i −n where Sˆ bk is the ith column of Sˆ bk , and dk is the length of
Numerical experiments were conducted to compare the performance of the sDDF and the ETKF. In this section the details of the experiments setup is first introduced. The experiment results are presented and discussed in the next section.
interpolation interval with respect to the observation operator Hk . Let
4.1. The system in assimilation
Ybk ≡ {ybk,i : ybk,i = Hk (Xbk,i ), i = 0, . . . , 2n}
We use the 40-dimensional Lorenz and Emanuel model [24] (LE98 model hereafter) for illustration, whose governing equations are given by
i
(21)
be the set of forecasts of the projections of sigma points Xbk to the observation space, then applying Eq. (12) with respect to Hk , one ˆ in has the estimated innovation covariance matrix P k , analogous to Eq. (17), given by
ˆ in P k =
n 1
4d2k i=1
ybk,i − ybk,i+n
ybk,i − ybk,i+n
T
+ Rk .
(22)
In addition, one can construct an my × n square root Sˆ hk =
1 2dk
[ybk,1 − ybk,1+n , . . . , ybk,n − ybk,2n ],
(23)
which we refer to as the projection square root matrix (with respect to the background ensemble Xbk ), such that the approximate
ˆ k is given by Kalman gain K −1 ˆ k = Sˆ bk (Sˆ hk )T Sˆ hk (Sˆ hk )T + Rk K .
(24)
With an incoming observation yk , we use the following formula
ˆ = xˆ bk + Kˆ k (yk − Hk (ˆxbk ))
xak
(25)
dxi
= (xi+1 − xi−2 ) xi−1 − xi + F , i = 1, . . . , 40. (28) dt The quadratic terms simulate advection, the linear term represents internal dissipation, and F acts as the external forcing term [25]. For consistency, we define x−1 = x39 , x0 = x40 , and x41 = x1 . In our experiment we let F = 8. We use the fourth-order Runge–Kutta method to numerically integrate the system from time 0 to 30, with a constant integration step of 0.05. The system state between time 0 and 25 is considered as the transition trajectory and discarded, while it is the trajectory between time 25.05 and 30 that is used for data assimilation. We generate the synthetic observations through the following system yk = [xk,1 , xk,3 , . . . , xk,39 ]T + vk , (29) where xk,j (j = 1, 3, . . . , 39) denotes the jth element of the state vector xk at time instant k, vk follows the Gaussian distribution N (vk : 0, I20 ), with I20 being the 20-dimensional identity matrix. The observations are made for every 5 integration steps.
X. Luo et al. / Physica D 241 (2012) 671–680
4.2. Measures for filters comparison For performance comparison between the filters, we calculate the time-averaged absolute root mean squared error (absolute rmse for short). Given a set of mx -dimensional state vectors {xk : xk = (xk,1 , . . . , xk,mx ), k = 0, . . . , T }, with kmax being the maximum time index, suppose that one has an analysis xˆ ak = (ˆxak,1 , . . . , xˆ ak,mx ) of xk , then the time-averaged absolute rmse eˆ is defined as eˆ =
1
T
T + 1 k=0
ek ,
mx 1 √ a ek ≡ ∥ˆxk − xk ∥2 / mx = (ˆxak,i − xk,i )2 ,
(30)
mx i=1
where ∥ • ∥2 denotes the Euclidean norm in R mx 2
mx
such that ∥xk ∥2 =
675
root matrix Sˆ a0 . Thus applying Eq. (15), one can generate a (2n + 1)member analysis ensemble at the first cycle. We note that one only needs to conduct SVD once at the first assimilation cycle. Starting from the second assimilation cycle, one can apply the formulae in the sDDF, such that the ensemble sizes of all the background and analysis ensembles at the subsequent cycles remain to be 2n + 1, ˆ bk any without conducting SVD on the background covariance P more. The other scenario is n > mx . In this case we do not conduct SVD ˆ b0 . Instead, we simply select n out of N ensemble members from on P the set Xb0 , and follow the method in the EnSRF to compute a square root matrix Sˆ b0 based on these n ensemble members. After updating Sˆ b0 to Sˆ a0 , an analysis ensemble with 2n + 1 members can also be generated in the spirit of Eq. (15). The subsequent procedures follow those in the sDDF as described in Section 3. 5. Results of the experiments
j=1 xk,j .
A possible problem in directly using eˆ as the performance measure is that eˆ itself may depend on some intrinsic parameters (e.g., the covariance inflation factor δ ) of the filters, which might lead to contradictory conclusions at different parameter values. To avoid this problem, we adopt the following strategy: we relate a filter’s best possible performance to the minimum absolute rmse eˆ min , which is the minimum value of eˆ that the filter can achieve within the chosen ranges of the filter’s intrinsic parameters. We say that A performs better than B if the minimum absolute rmse eˆ Amin of filter A is less than the minimum absolute rmse eˆ Bmin of filter B. 4.3. Initialization of the sDDF Suppose that at the first data assimilation cycle, one is given an initial background ensemble, which is not necessarily the propagation of some symmetric system states from the previous time as required in the sDDF. For this reason, in the first assimilation cycle, one is not able to employ the formulae in the sDDF to compute the background statistics. Instead, one may use an EnSRF, e.g., the ETKF, to compute the background statistics and update them to their analysis counterparts. Once the analysis mean and analysis square root matrix are obtained, one can generate a set of sigma points accordingly, in the spirit of Eq. (15). A further issue in the initialization involves the generation of the ensembles with a specified ensemble size. Suppose that at the first assimilation cycle, one is given an N-member initial background ensemble Xb0 = (xb0,1 , xb0,2 , . . . , xb0,N ), while in the subsequent assimilation cycles, one can at most afford to run 2n + 1-member ensembles, with 2n + 1 ≤ N (the 2n-member case may follow the idea in footnote 2). To generate such an ensemble with a specified size, the following idea may be employed. Here we consider two possible scenarios. One is that n ≤ mx , the dimension of the system state. In this case, there exists relatively sparse information from the background ensemble, hence one may want to use all of the available information, rather than simply selecting n out of N members (see the discussion in the n > mx scenario). To this end, we choose to conduct an SVD on the sample ˆ b0 of Xb0 (with N members) to obtain the first covariance matrix P n leading eigenvalues {σ02,1 , σ02,2 , . . . , σ02,n } and the corresponding eigenvectors {v0,1 , v0,2 , . . . , v0,n }. We then construct an mx × n approximate square root matrix Sˆ b0 = [σ0,1 v0,1 , σ0,2 v0,2 , . . . , σ0,n v0,n ]
ˆ b0 . Updating Sˆ b0 to its analysis counterpart through the formula of P in the ETKF (cf. Eq. (26)), one obtains an mx × n analysis square
5.1. Performance comparison under various ensemble sizes We first compare the performance of the sDDF and the ETKF under various ensemble sizes. In the experiments both the covariance inflation and localization techniques are adopted. 5.1.1. Experiment with 21 ensemble members In the first experiment we let n = 10 in the sDDF, which corresponds to an ensemble size p = 2n + 1 = 21 that is less than the system dimension 40. For comparison, we also let the ensemble size p = 21 in the ETKF. In the experiment, we choose to vary the value of the covariance inflation factor δ from 0 to 0.2, with an even increment of 0.05 each time, which is denoted by δ ∈ {0 : 0.05 : 0.2}. On the other hand, we follow the setting in [26] to conduct covariance localization, which brings an additional intrinsic parameter lc , called the length scale of covariance localization, to the filters. We increase the value of lc from 50 to 300, with an even increment of 50 each time, which is denoted by lc ∈ {50 : 50 : 300} accordingly. In addition, we set the interval lengths of interpolation (cf. Eqs. (15) and (20)) hk = dk = 0.01 for all k. This setting will be used in all experiments unless otherwise mentioned. To reduce statistical fluctuations, we repeat all the experiments for 10 times, each time with a randomly drawn initial state vector, and some random perturbations of the initial state as the initial background ensemble. The size of the initial background ensemble is set equal to the specified ensemble size p (= 21 in the present experiment). To initialize the sDDF, we follow the strategy in Section 4.3. The sDDF in the other experiments is initialized in a similar way. In Fig. 1 we show the absolute rms errors eˆ (averaged over 10 experiments) of both the sDDF and the ETKF as functions of δ and lc . In this small ensemble case, the sDDF under-performs the ETKF for all values of δ and lc . Moreover, as shown in Table 1, the minimum error eˆ min = 1.7146 in the sDDF, and the minimum error eˆ min = 1.0648 in the ETKF. The relatively poorer performance of the sDDF in this case is largely due to the fact that, given the ensemble size p = 21, the number of state vectors, when applying Eq. (17) to compute the background covariances, is only 10, such that the ranks of the background (hence the analysis) covariance matrices of the sDDF are at most 9. For distinction, hereafter we use ‘‘sample size’’ to refer to the number of state vectors in evaluating a background covariance, and ‘‘ensemble size’’ to refer to the number of state vectors contained in a background or analysis ensemble. For the sDDF, the ‘‘sample size’’ is half the ‘‘ensemble size’’ minus 1. In contrast, for the ETKF, the ‘‘sample size’’ is equal to the ‘‘ensemble size’’. As a result, the ranks of the covariance matrices in the
676
X. Luo et al. / Physica D 241 (2012) 671–680
(a) sDDF with 21 ensemble members.
(b) ETKF with 21 ensemble members.
Fig. 1. Experiment with 21 ensemble members: averaged absolute rms errors (over 10 experiments) of both the sDDF and the ETKF as functions of the covariance inflation factor δ and the length scale lc of covariance localization. Table 1 Minimum absolute rms errors of the sDDF and the ETKF with different ensemble sizes. eˆ min
Ensemble size
11 21 31 41 81 161 321 641
Parameter values
sDDF
ETKF
2.3646 1.7146 1.3438 1.0549 1.0219 0.8537 0.9797 1.0338
1.0457 1.0648 1.0845 1.0995 1.0324 1.1174 1.1252 1.1879
δ ∈ {0 : 0.05 : 0.2}, lc ∈ {50 : 50 : 300}
ETKF are up to = 20 in case that the ensemble members are independent of each other. In a chaotic system like the LE98 model, though, the ensemble members normally have certain correlations with each other, thus the ranks of the covariance matrices are lower than the maximum possible one. These correlations stem from the existence of a chaotic attractor in the LE98 model, such that the trajectories of the ensemble members are attracted into the basin of attraction, which can be considered as a subspace embedded in the state space R40 . In [24] it is reported that the fractal dimension [27] of the LE98 model is approximately 27, less than the dimension 40 of the state space. This indicates that there indeed exists such a chaotic attractor, and that there may exist certain correlations among the ensemble members. Therefore, even for an ensemble with the number p of members less than the system dimension, the rank of its sample covariance may be lower than p − 1. In Fig. 2 we plot the time series of the ‘‘effective ranks’’ of the sample covariances computed with various ensemble sizes. Here, ˆ is computed the ‘‘effective rank’’ of an mx × mx sample covariance P ˆ as follows. Let {σj2 , j = 1, 2, . . . , mx } be the eigenvalues of P arranged in a non-ascending order, i.e., σj2 ≥ σl2 if j < l, and
ˆ) = Trace(P
ˆ then the ‘‘effective rank’’ σj2 be the trace of P, ˆ is defined as the minimum integer such that k σj2 ≥ k of P j =1 ˆ ), where T c ≤ 1 is the cut-off threshold. We choose T c Trace (P T c = 0.99 throughout this work. mx
j =1
The ‘‘effective rank’’ provides an insight in understanding the behaviours of the sDDF and the ETKF with various ensemble sizes. In Fig. 2(a), the ‘‘effective ranks’’ of the sample covariances are around 7 when the sample size is 11, which roughly depicts the
situation in the sDDF with p = 21. In contrast, the ‘‘effective ranks’’ of the sample covariances are around 12 when the sample size is 21 (cf. Fig. 2(b)), which corresponds to the situation in the ETKF with p = 21. This shows that the ‘‘effective ranks’’ of the background covariances of the sDDF are substantially lower than those of the ETKF, and this difference dominates the benefits of symmetry [16] in the sDDF. As a result, in this small ensemble scenario, the sDDF under-performs the ETKF. 5.1.2. Experiment with 41 ensemble members Next we examine the case n = 20 in the sDDF, which corresponds to an ensemble size p = 2n + 1 = 41. The same ensemble size p is used in the ETKF. The covariance inflation factor δ ∈ {0 : 0.05 : 0.2}, and the length scale lc ∈ {50 : 50 : 300}. The experiment is also repeated for 10 times to reduce statistical fluctuations. In Fig. 3 we show the absolute rms errors eˆ (averaged over 10 experiments) of both the sDDF and the ETKF as functions of δ and lc . Compared to the case n = 10 (cf. Fig. 1), the difference between the sDDF and the ETKF, in terms of eˆ , is narrowed. In particular, in the area around the point (δ = 0.18, lc = 200) of Fig. 3(a), the absolute rms errors eˆ of the sDDF are lower than the rms errors eˆ of the ETKF in Fig. 3(b). In addition, as indicated in Table 1, in the sDDF the minimum error eˆ min = 1.0549, and in the ETKF the minimum error eˆ min = 1.0995, higher than that in the sDDF now. Since in this case, the sample size of the sDDF in estimating the background covariance matrices is 20, Fig. 2(b) provides a rough depiction of the ‘‘effective ranks’’ in the sDDF. On the other hand, the sample size of the ETKF in estimating the background covariance matrices is 41, and Fig. 2(d) indicates that the ‘‘effective ranks’’ in the ETKF are between 18 and 21. For both filters, their ‘‘effective ranks’’ increase with the sample sizes. Consequently, their errors in estimating the background covariances are reduced. This is particularly noticeable for the sDDF. By increasing p from 21 to 41, the absolute rms errors eˆ in Fig. 3(a) are substantially lower than those in Fig. 1(a). On the other hand, the performance of the ETKF when p increases from 21 to 41 seems not improved when comparing Figs. 3(b) and 1(b). Indeed, the minimum error eˆ min = 1.0995 when p = 41 and eˆ min = 1.0648 when p = 21. Similar results are also found in the experiments below, and in other works, e.g., [18,28]. This seemingly counter-intuitive phenomenon will be discussed in Section 5.1.4 with a possible explanation.
X. Luo et al. / Physica D 241 (2012) 671–680
(a) Effective rank with 11 ensemble members.
(b) Effective rank with 21 ensemble members.
(c) Effective rank with 31 ensemble members.
(d) Effective rank with 41 ensemble members.
(e) Effective rank with 81 ensemble members.
(f) Effective rank with 161 ensemble members.
677
Fig. 2. Time series of the effective ranks of the LE98 model with the ensemble sizes of (a) 11, (b) 21, (c) 31, (d) 41, (e) 81, (f) 161, (g) 321, and (h) 641.
5.1.3. Experiment with 161 ensemble members Now we examine the case n = 80 in the sDDF. The ensemble size p = 2n + 1 = 161 in both the sDDF and the ETKF. Again the covariance inflation factor δ ∈ {0 : 0.05 : 0.2}, and the length scale lc ∈ {50 : 50 : 300}. The experiment is also repeated for 10 times.
In Fig. 4 we show the absolute rms errors eˆ (averaged over 10 experiments) of both the sDDF and the ETKF as functions of δ and lc . Comparing Figs. 4(a) and 3(a), it is clear that the performance of the sDDF is further improved when p increases from 41 to 161. In this case, the sample size in estimating the background covariances
678
X. Luo et al. / Physica D 241 (2012) 671–680
(g) Effective rank with 321 ensemble members.
(h) Effective rank with 641 ensemble members. Fig. 2. (continued)
(a) sDDF with 41 ensemble members.
(b) ETKF with 41 ensemble members.
Fig. 3. Experiment with 41 ensemble members: averaged absolute rms errors (over 10 experiments) of both the sDDF and the ETKF as functions of the covariance inflation factor δ and the length scale lc of covariance localization.
(a) sDDF with 161 ensemble members.
(b) ETKF with 161 ensemble members.
Fig. 4. Experiment with 161 ensemble members: averaged absolute rms errors (over 10 experiments) of both the sDDF and the ETKF as functions of the covariance inflation factor δ and the length scale lc of covariance localization.
X. Luo et al. / Physica D 241 (2012) 671–680
of the sDDF is 80, thus roughly speaking, the ‘‘effective ranks’’ are between 24 and 27 according to Fig. 2(e). In addition, as reported in Table 1, the resulting minimum rms errors eˆ min = 0.8534. In contrast, the performance of the ETKF does not improve again when p increases from 41 to 161. In this case, the ‘‘effective ranks’’ in the ETKF are between 29 and 31 according to Fig. 2(f), and in Table 1 the resulting minimum rms errors eˆ min = 1.1174, much higher than that of the sDDF. 5.1.4. Further results Additional experiments, including the cases n = 5, 15, 40, 160, 320 (corresponding to p = 2n + 1 = 11, 31, 81, 321, 641, respectively), were also conducted. For brevity, however, the full details are not presented. Instead, we report their minimum absolute rms errors in Table 1, together with the choices of δ and lc in the experiments (while in all the experiments hk = dk = 0.01 for all k in the sDDF). For the sDDF, the minimum absolute rmse eˆ min in Table 1 decreases with the ensemble size p until it reaches 161. After that, eˆ min arises as p increases further. The behaviour of the ETKF is slightly different. Its eˆ min first increases as p grows from 11 to 41, and researches its global minimum at p = 81. After that, eˆ min arises as p grows further. Similar phenomena are also observed in the sDDF and the ETKF when neither the covariance inflation nor the covariance localization is adopted in the filters (not presented). This is in contrast with the tuition that, as the ensemble size increases, the performance of the sDDF or the ETKF shall be improved. One possible explanation of this phenomenon is that, according to [29], when assimilating a chaotic system, the majority of the analysis ensemble members of the ETKF, except for a few outliers, may spread around a specific location in the state space after applying the deterministic update formulae in the ETKF. As a result, when the ensemble size of the ETKF increases, the filter’s performance is not necessarily improved, since the specific location may be over-populated by the analysis members. This makes the majority of the subsequent trajectories of the ensemble members become too close to a certain location in the state space, and thus results in a higher estimation error. Since in the sDDF, deterministic update formulae similar to those in the ETKF are adopted, the sDDF exhibits similar behaviour to the ETKF. The experiment results show that, for a relatively small ensemble size, the ETKF outperforms the sDDF. In contrast, for a relatively large ensemble size, the sDDF may outperform the ETKF instead. Our explanation of this phenomenon is the following. The filters’ performance is influenced by the following two factors: one is the approximation error of the background covariance, and the other is the symmetry of the analysis ensembles. On the one hand, the approximation error of the background covariance is less when the sample size is larger. On the other, the analytical result in [16] shows that, in the context of ensemble filtering, the presence of symmetry in the analysis ensembles can reduce the errors in estimating the background mean and covariance at the next assimilation cycle. Since the sample size in estimating a background covariance of the sDDF is only half that of the ETKF for a given ensemble size p, the sDDF suffers from more estimation errors in computing the background covariances. However, this disadvantage is compensated, to some extent, by the benefits of symmetry in the analysis ensembles of the sDDF. In the small ensemble scenario, the disadvantage in estimating the background covariance dominates the benefits of symmetry, so that the sDDF under-performs the ETKF. However, as the ensemble size increases, the effect of sample size deficiency in estimating the background covariances is mitigated. Indeed, as can be observed in Fig. 2, for a relatively large ensemble size, say, p = 161, increasing it to p = 321 only increases the ‘‘effective rank’’ by 2. In contrast, when increasing p from 11 to 161, one increases the ‘‘effective rank’’ by about 22. As a result, in the relatively large ensemble scenario, the benefits of symmetry dominates the effect of sample size deficiency, so that the sDDF outperforms the ETKF instead.
679
5.2. Sensitivity of the sDDF to the interval lengths hk and dk To examine the sensitivity of the sDDF to the interval lengths hk and dk (cf. Eqs. (15) and (20)), in principle one shall vary the values of hk and dk for each k. However, doing this is computationally intensive and impractical. Therefore here we still choose to let hk = h and dk = d for all k in an assimilation run, while h and d are some constants. In our experiment we let h ∈ {0.01 : 0.05 : 0.25} and d ∈ {0.01 : 0.05 : 0.25} to examine the performance of the sDDF with different combinations of h and d. The other experiment settings are as follows: n = 20 (p = 41), δ = 0.02 and lc = 200. The experiment is also repeated for 10 times to reduce statistical fluctuations. The experiment result shows that the absolute rmse eˆ is insensitive to the values of h and d. Indeed, the absolute rmse changes very little within the tested ranges of h and d. The minimum absolute rmse is 1.4262, and the maximum is 1.4331. 6. Conclusions In this work we introduced an ensemble-based nonlinear Kalman filter, called the simplified divided difference filter (sDDF), which can be considered as a hybrid of the ensemble square root filter (EnSRF) and the simplified divided difference approximation. With this simplified approximation scheme, the sDDF generates symmetric analysis ensembles without extra computational cost in comparison to the EnSRF. We conducted experiments to compare the performance of the sDDF and the ensemble transform Kalman filter (ETKF), as the representative of the EnSRF, with various ensemble sizes. The results showed that the sDDF under-performed the ETKF in the small ensemble scenario, while with relatively large ensemble sizes, the sDDF performed better than the ETKF instead. A possible explanation is the following: when the ensemble size is relatively small, the sDDF has larger estimation errors in evaluating the background covariances than the ETKF, and these errors dominate the benefits of symmetry in the sDDF such that the sDDF exhibits larger rms errors. As the ensemble size increases, the effect of sample size deficiency in the sDDF is mitigated, and the presence of symmetry in the sDDF makes the sDDF perform better than the ETKF instead. This may happen, for example, when the ensemble size is large enough such that the ‘‘effective ranks’’ of the background covariances are close to, or even higher than, the fractal dimension (rather than the dimension of the state space) of a chaotic system. Due to the existence of a chaotic attractor, there may not be significant accuracy loss in estimating the background covariances of the sDDF in comparison with the estimates of the ETKF, so that the symmetry in the sDDF becomes dominant. Applications of the sDDF may depend on the system in assimilation, in particular, the dimension of system. For a system with a moderate dimension (e.g., a network system with a few hundred or thousand nodes) such that it is affordable to run an ensemble filter with the ensemble size close to or even larger than the system dimension, the sDDF may appear to be a better choice than the ETKF, since in such circumstances, given the same ensemble size, the computational cost of the sDDF in updating the background statistics is less than that of the ETKF (cf. Section 3.2). Moreover, the presence of symmetry in the sDDF also helps to improve the performance of the filter. In contrast, for large scale systems like the numerical weather prediction models, it is impractical for nowadays computers to run an ensemble filter with the ensemble size close to or even larger than the system dimension. Therefore it would be more appropriate to choose the conventional ETKF for data assimilation. However, with the ever increasing power of parallel computing, we anticipate that, in the future it may become affordable to do so, and in such circumstances the sDDF may appear to be a better choice than the ETKF.
680
X. Luo et al. / Physica D 241 (2012) 671–680
Acknowledgements We would like to thank two anonymous referees for their constructive suggestions. References [1] B.D.O. Anderson, J.B. Moore, Optimal Filtering, Prentice-Hall, 1979. [2] G. Burgers, P.J. van Leeuwen, G. Evensen, On the analysis scheme in the ensemble Kalman filter, Mon. Weather Rev. 126 (1998) 1719–1724. [3] G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res. 99 (C5) (1994) 10143–10162. [4] G. Evensen, P.J. van Leeuwen, Assimilation of geosat altimeter data for the aghulas current using the ensemble Kalman filter with a quasi-geostrophic model, Mon. Weather Rev. 124 (1996) 85–96. [5] P.L. Houtekamer, H.L. Mitchell, Data assimilation using an ensemble Kalman filter technique, Mon. Weather Rev. 126 (1998) 796–811. [6] J.L. Anderson, An ensemble adjustment Kalman filter for data assimilation, Mon. Weather Rev. 129 (2001) 2884–2903. [7] C.H. Bishop, B.J. Etherton, S.J. Majumdar, Adaptive sampling with ensemble transform Kalman filter. Part I: theoretical aspects, Mon. Weather Rev. 129 (2001) 420–436. [8] M.K. Tippett, J.L. Anderson, C.H. Bishop, T.M. Hamill, J.S. Whitaker, Ensemble square root filters, Mon. Weather Rev. 131 (2003) 1485–1490. [9] J.S. Whitaker, T.M. Hamill, Ensemble data assimilation without perturbed observations, Mon. Weather Rev. 130 (2002) 1913–1924. [10] J.D. Beezley, J. Mandel, Morphing ensemble Kalman filters, Tellus A 60 (2007) 131–140. [11] I. Hoteit, D.T. Pham, J. Blum, A simplified reduced order Kalman filtering and application to altimetric data assimilation in Tropical Pacific, J. Mar. Syst. 36 (2002) 101–127. [12] D.T. Pham, J. Verron, M.C. Roubaud, A singular evolutive extended Kalman filter for data assimilation in oceanography, J. Mar. Syst. 16 (1998) 323–340. [13] M. Zupanski, Maximum likelihood ensemble filter: theoretical aspects, Mon. Weather Rev. 133 (2005) 1710–1726. [14] K. Ito, K. Xiong, Gaussian filters for nonlinear filtering problems, IEEE Trans. Automat. Control 45 (2000) 910–927.
[15] M. Nørgaard, N.K. Poulsen, O. Ravn, New developments in state estimation for nonlinear systems, Automatica 36 (2000) 1627–1638. [16] X. Luo, I.M. Moroz, Ensemble Kalman filter with the unscented transform, Physica D 238 (2009) 549–562. [17] S. Julier, J. Uhlmann, H. Durrant-Whyte, A new method for the nonlinear transformation of means and covariances in filters and estimators, IEEE Trans. Automat. Control 45 (2000) 477–482. [18] X. Luo, Recursive Bayesian filters for data assimilation, Ph.D. Thesis, University of Oxford, 2010. URL: http://arxiv.org/abs/0911.5630. [19] T.M. Hamill, J.S. Whitaker, J.L. Anderson, C. Snyder, Comments on sigmapoint Kalman filter data assimilation methods for strongly nonlinear systems, J. Atmospheric Sci. 66 (2009) 3498–3500. [20] P.J. Van Leeuwen, Particle filtering in geophysical systems, Mon. Weather Rev. 137 (2009) 4089–4114. [21] J.L. Anderson, S.L. Anderson, A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts, Mon. Weather Rev. 127 (1999) 2741–2758. [22] T.M. Hamill, J.S. Whitaker, C. Snyder, Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter, Mon. Weather Rev. 129 (2001) 2776–2790. [23] X. Luo, I. Hoteit, Robust ensemble filtering and its relation to covariance inflation in the ensemble Kalman filter, Mon. Weather Rev. 139 (2011) 3938–3953. [24] E.N. Lorenz, K.A. Emanuel, Optimal sites for supplementary weather observations: simulation with a small model, J. Atmospheric Sci. 55 (1998) 399–414. [25] E.N. Lorenz, Predictability—a problem partly solved, in: T. Palmer (Ed.), Predictability, ECMWF, Reading, UK, 1996, pp. 1–18. [26] X. Luo, I.M. Moroz, I. Hoteit, Scaled unscented transform Gaussian sum filter: theory and application, Physica D 239 (2010) 684–701. [27] P. Grassberger, I. Procaccia, Measuring the strangeness of strange attractors, Physica D 9 (1983) 189–208. [28] I. Hoteit, X. Luo, D.T. Pham, Particle Kalman filtering: an optimal nonlinear framework for ensemble Kalman filters, Mon. Weather Rev. (in press). URL: http://journals.ametsoc.org/doi/pdf/10.1175/2011MWR3640.1. [29] W. Lawson, J. Hansen, Implications of stochastic and deterministic filters as ensemble-based data assimilation methods in varying regimes of error growth, Mon. Weather Rev. 132 (8) (2004) 1966–1981.