Physica D 237 (2008) 1021–1028 www.elsevier.com/locate/physd
Unbiased ensemble square root filters David M. Livings, Sarah L. Dance ∗ , Nancy K. Nichols School of Mathematics, Meteorology and Physics, University of Reading, Reading, UK Received 29 May 2007; received in revised form 13 November 2007; accepted 3 January 2008 Available online 11 January 2008 Communicated by C.K.R.T. Jones
Abstract There is an established framework that describes a class of ensemble Kalman filter algorithms as square root filters (SRFs). These algorithms produce analyses by updating a state estimate and a square root of the ensemble covariance matrix. The matrix square root of the forecast covariance is post-multiplied by another matrix to give a matrix square root of the analysis covariance. The choice of post-multiplier is not unique and can be multiplied by any orthogonal matrix to give another scheme that is also a SRF. Not all filters of this type bear the desired relationship to the forecast ensemble: the analysis ensemble mean may not be equal to the analysis state estimate and consequently there may be an accompanying shortfall in the spread of the analysis ensemble as expressed by the ensemble covariance matrix. This points to the need for a restricted version of the notion of an ensemble SRF, which we call an unbiased ensemble SRF. This paper provides a generic set of necessary and sufficient conditions for the scheme to be centred on the analysis state estimate (unbiased). A few of these results have already been published elsewhere in the literature; this paper brings these together with new results and provides simple proofs and examples, as well as a mathematical description of the set of unbiased ensemble SRFs. Many (but not all) published ensemble SRF algorithms satisfy the criteria that we establish. While these conditions are not a cure-all and cannot deal with independent sources of bias such as model and observation errors, they should be useful to designers of ensemble SRFs in the future. c 2008 Elsevier B.V. All rights reserved.
PACS: 92.60Wc; 02.50-r Keywords: Data assimilation; Ensemble
1. Introduction The ensemble Kalman filter (EnKF) and its many variants [8] are fast gaining popularity in data assimilation applications such as numerical weather prediction (NWP), where the state space is too large and the models are too complex for use of longer established techniques such as the Kalman filter (KF) or its nonlinear generalization, the extended Kalman filter (EKF) [11,16]. The key idea is to use an ensemble of forecasts to calculate a state estimate and an error covariance matrix that acts as a measure of the uncertainty in the estimate. The ensemble is updated with every analysis to reflect information provided by the observations, and is evolved using the forecast model between analyses. ∗ Corresponding address: Department of Meteorology, University of Reading, PO Box 243, Earley Gate, Reading RG6 6BB, UK. E-mail address:
[email protected] (S.L. Dance).
c 2008 Elsevier B.V. All rights reserved. 0167-2789/$ - see front matter doi:10.1016/j.physd.2008.01.005
This paper focuses on a particular class of ensemble filter known as ensemble square root filters (SRFs). These are placed in a uniform framework by Tippett et al. [23] and include the ensemble transform Kalman filter (ETKF) of Bishop et al. [4], the ensemble adjustment Kalman filter (EAKF) of Anderson [1], and the filter of Whitaker and Hamill [25]. Here the matrix square root of the forecast covariance is postmultiplied by another matrix to give a matrix square root of the analysis covariance. Tippett et al. [23] pointed out that the choice of post-multiplier is not unique and can be multiplied by any orthogonal matrix to give another scheme that is also a SRF. This has been exploited, for example, by Evensen [9] and Leeuwenburgh et al. [17], who generate random rotations in the hope of improving the higher order moments of the ensemble statistics. Not all ensemble SRFs bear the desired relationship to the forecast ensemble: the analysis ensemble mean may not be
1022
D.M. Livings et al. / Physica D 237 (2008) 1021–1028
equal to the analysis state estimate and consequently there may be an accompanying shortfall in the spread of the analysis ensemble as expressed by the ensemble covariance matrix. While this problem has been recognized and dealt with for particular implementations (e.g. the revised ETKF of Wang et al. [24]), and some slightly more general criteria developed for linear [21] and nonlinear [15] observation operators, these results are not widely known. This paper builds on the preliminary work of Livings [18] and Livings et al. [19], to bring these previously published results together with new results. We give a set of generic necessary and sufficient conditions for the ensemble to remain centred on the analysis state estimate (unbiased). These conditions provide practically useful criteria for designing unbiased ensemble SRFs. We give simple proofs of these criteria and apply our results to published filter algorithms. Additionally, a mathematical characterization of unbiased SRFs is given in terms of the set of orthogonal matrices. The rest of this paper is organized as follows. Section 2 introduces some notation and explains the problem of bias in ensemble SRFs. The formulation is constructed in a way that makes sense for nonlinear observation operators as well as linear ones and this generality is maintained throughout the paper. Section 3 presents a review of sufficient conditions for an ensemble SRF to be unbiased. We provide generic, simple proofs of these criteria and clarify some of the theory. The criteria should be useful to workers devising new filters within the ensemble SRF framework. Section 4 provides new results on partial converses to the theorems presented in Section 3. Section 5 examines various published filter algorithms for bias, and shows that whilst many are unbiased some are not. It also considers the effects on this type of bias of techniques such as covariance inflation and localization and the incorporation of model error, that are often used in ensemble filtering. Some concluding remarks are made in Section 6. Appendices provide details on the model used in numerical examples and on a mathematical characterization of the set of unbiased SRFs in terms of the set of orthogonal matrices. 2. Bias and ensemble filters This section introduces some notation, gives some desired properties for the ensemble SRF, and demonstrates that some filters do not have these properties. More specifically, it shows that the ensemble SRF framework includes some filters that are not centred on the ensemble state estimate. Let {xi } (i = 1, . . . , m) be an m-member ensemble in an n-dimensional state space. The ensemble mean is the vector defined by x=
m 1 X xi . m i=1
(1)
The ensemble perturbation matrix is the n × m matrix defined by X= √
1 m−1
x1 − x
x2 − x
...
xm − x .
(2)
√ Note that this definition incorporates the factor 1/ m − 1, as in e.g., [4,20]. The ensemble covariance matrix is the n × n matrix defined by P = XXT =
m 1 X (xi − x)(xi − x)T . m − 1 i=1
(3)
If the members of {xi } are drawn independently from the same probability distribution, then x is an unbiased estimator of the population mean and P is an unbiased estimator of the population covariance matrix (e.g., [3, Chapter 5]), although in practice the estimates obtained thereby may be subject to large sampling errors. The first equality in (3) may be expressed by saying that X is a matrix square root of P. (Note that this use of the term ‘square root’ is inconsistent with its most common use in the mathematical literature, where a square root of the matrix P is often defined to be a matrix X such that P = X2 (see, for example, [12, section 4.2.10]). However, the usage is well established in the engineering literature (as in [11, section 8.4]) and is also common in geophysical applications (as in Tippett et al. [23]). If X is symmetric then the definitions coincide.) Let y be an observation in a p-dimensional observation space, let H be the observation operator (possibly nonlinear) mapping state space to observation space, and let R be the p× p observation error covariance matrix. Let forecast (prior) quantities be denoted by the superscript f and analysis (posterior) quantities by the superscript a. There is no need for a notation to distinguish different times because this paper is concerned purely with the analysis step, which happens all at one time. The analysis step of an ensemble filter updates the forecast f ensemble {xi } to give an analysis ensemble {xia } that will be used as the starting point for the next forecast. This ensemble update implies an update of the ensemble mean and covariance. The update step may involve the addition of some random noise terms (for example the perturbed observation filter [5, 14]), but we exclude these types of filter for the purposes of this discussion. In this case, there are certain properties that the analysis ensemble might be expected to satisfy exactly. These f may be written in terms of a forecast observation ensemble {yi } defined by f
f
yi = H (xi ).
(4)
Like any other ensemble, the forecast observation ensemble has an ensemble mean y f and an ensemble perturbation matrix Y f . (In the special case of a linear observation operator, H, y f = Hx f and Y f = HX f . The following theory on unbiased ensembles applies to the nonlinear as well as the linear case.) Using these relationships and the relationship P f = X f (X f )T , we may write the desired properties for the analysis ensemble as xa = x f + K(y − y f ),
(5)
P = (X − KY )(X ) ,
(6)
K = X f (Y f )T D−1 ,
(7)
D = Y (Y ) + R.
(8)
a
f
f
f
f T
f T
D.M. Livings et al. / Physica D 237 (2008) 1021–1028
The matrix D is positive definite and invertible since R is a positive definite covariance matrix, and the product Y f (Y f )T is positive semi-definite. For algorithms that satisfy the constraints, (5)–(8), it can be shown that, with linear forecast and observation models, the update equations converge to those of the standard Kalman filter, as m → ∞. (An indication of the proof is given by Furrer and Bengtsson [10].) For nonlinear observation and forecast models, the constraints may no longer guarantee that the state estimate is equal to the mean of the posterior distribution. Furthermore, for values of m that are small compared to the number of degrees of freedom of the dynamical system, sampling errors may be significant. (Furrer and Bengtsson [10] provide results on sampling issues in ensemble filtering.) Nevertheless, for nonlinear models or non-Gaussian statistics, the constraints can still prove to be useful for sub-optimal filtering (e.g. [1,25]). For the rest of this description, the bias problems caused by sampling errors and use of nonlinear models etc. will be neglected, since the issue of inconsistent ensemble centring still applies in all of these cases. An ensemble SRF algorithm does not compute (6) directly, but instead updates an ensemble perturbation matrix with the hope that the resulting ensemble statistics satisfy this equation, as in the following definition. Definition 1. An ensemble square root filter [23] is an ensemble filter in which the analysis ensemble {xi } is obtained as xi = e x + xi0
(12). Then a more general solution is TU where U is an arbitrary m × m orthogonal matrix [23]. The corresponding value of e X is e X = X f TU.
e X1 = X f TU1.
x =e x + x0 .
m 1 X (xi − x)(xi − x)T m − 1 i=1
(17)
m 1 X m T x0 x0 (xi − e x)(xi − e x)T − m − 1 i=1 m−1
(18)
=
e x = x f + K(y − y f ),
=e Xe XT −
(11)
TT = I − (Y ) D T
f T
−1
Y . f
(12)
Here T is an m × m matrix. There is an obvious resemblance between (10) and the right-hand side of the constraint (5) on the ensemble mean. Additionally, using (11), (12) and (7) it is easy to see that e Xe XT = (X f − KY f )(X f )T ,
(13)
which should be compared to (6). However, as e x need not be equal to the mean of the updated ensemble, e X need not have all the desired properties for Xa . In particular (2) implies that the sum of the columns of an ensemble perturbation matrix must be zero, ([21], eq. 29; [24], eq. 8; [15], Section 2.3.3) and this does not necessarily follow from (11) and (12), which are the only constraints on e X. To see this, let T be a particular solution of
(16)
It is clear thate x satisfies the right-hand side√ of the constraint (5), but, by definition in this example, x0 = ( m − 1/m)e X1 6= 0 (because e X is invalid as an ensemble perturbation matrix). Thus the ensemble is not centred on the analysis state estimate, or to put it another way, there is a bias in the ensemble mean. Furthermore, the ensemble covariance matrix is
for i = 1, 2, . . . , m, where the state estimate, e x, and the perturbations, xi0 , are defined as follows. The column n-vector e x satisfies
e X = X f T,
(15)
Thus e X is a valid ensemble perturbation matrix if and only if U1 lies in the null space1 of X f T. The vector U1 is nonzero and can be made to point in any direction by an appropriate choice of U. Therefore, unless X f T = 0 (in which case the analysis ensemble collapses to a point), there will be at least some choices of U that give e X1 6= 0 and hence an e X that is invalid as an ensemble perturbation matrix. To see the problems caused when e X1 6= 0, consider the analysis ensemble updated according to (9)–(12). The mean of this ensemble is
P=
and K is the n × p gain matrix defined by (7). The √ column n-vector xi0 is the i-th column of the n × m matrix m − 1e X, where e X satisfies
(14)
Now let 1 be a column m-vector in which every element is 1; that is 1 = (1, 1, . . . , 1)T . Then the sum of the columns of e X can be written as
(9)
(10)
1023
m T x0 x0 . m−1
(19)
Although, as we saw in (13), e Xe XT satisfies the right-hand side of the constraint (6), it is clear that P does not satisfy this constraint when x0 6= 0. In particular the ensemble standard deviation will be too small for any coordinate in which there is also a bias in the mean. Thus the resulting analysis ensemble is not only biased but overconfident as well. Such an analysis is likely to lead to a biased and overconfident forecast. The filter will then give more weight than it should to the forecast in the next analysis step and less to the observation. This will prevent the observation from properly correcting the bias in the forecast and the next analysis will be biased and overconfident as well. In extreme cases the filter may become increasingly overconfident until it is in effect a free-running forecast model diverging from the truth and taking no notice of observations. An example of output from a filter algorithm where the ensemble is not centred on the analysis state estimate is shown in Fig. 1. The dynamical system is an eight-dimensional linear 1 The null space of an n × m matrix M is the set of all column m-vectors u such that Mu = 0.
1024
D.M. Livings et al. / Physica D 237 (2008) 1021–1028
Thus, to construct an ensemble SRF satisfying the desired properties (5)–(8) we need to find a solution T of the square root condition (12) and check that the matrix e X defined by (11) satisfies e X1 = 0. It would be helpful to replace this condition on e X with some more practically useful conditions on T. These are given in this section. Versions of some of these theorems (or special cases thereof) have previously been published by Ott et al. [21] and Hunt et al. [15]. It is assumed throughout that T satisfies the matrix square root condition (12). The first theorem gives a sufficient condition for the resulting ensemble SRF to be unbiased. Theorem 1. If 1 is an eigenvector of T, then the ensemble SRF is unbiased. Proof. By hypothesis T1 = λ1 for some scalar λ. Therefore e X1 = X f T1 = λX f 1 = 0. Therefore the filter is unbiased. Fig. 1. Simulations with ETKF, ensemble size m = 4, for linear model system with a state space of dimension n = 8. The upper panel shows the first component of the true solution plotted as a function of time, t. The middle panel shows values of the first component of e X1 at analysis times, plotted as ◦. The lower panel shows the first component of the output of the ETKF, plotted relative to the truth. The dotted line at zero indicates the value of a perfect solution. Three solid lines show ensemble mean and ensemble mean ± ensemble standard deviation. The first component of assimilated observations are shown as + with error bars indicating the observation standard deviation.
system, summarized in Appendix A. A linear system was chosen in order to avoid complications with other sources of bias (nonlinearity) in solutions. The filter is an ETKF [4] with ensemble size m = 4. The model used in the forecast step is the same as the model used to generate the true trajectory. The model noise is taken to be zero. All coordinates are observed. Observations are taken every 0.5 time units (except at time zero). The observations have synthetically generated uncorrelated errors with standard deviation 1.0. The same covariance matrix is used in generating a random initial ensemble, centred on the true initial state. The upper panel of Fig. 1 shows the first component of the true solution. This solution is chosen to be close to a stable trajectory of the analytic system of equations, to avoid the need for excessively small time steps in the numerical scheme. The middle panel shows the values of the first component of e X1 at analysis times. It is clear that these values are nonzero, and we are in the situation described in the discussion around (16). The lower panel illustrates the difference between the filter and the truth. The true state (represented by zero on the vertical axis) is often outside the band defined by the ensemble mean ± ensemble standard deviation, indicating that the filter is overconfident. Behaviour of the other components of the solution (not illustrated) are similar. 3. Unbiased ensemble SRFs — sufficient conditions Section 2 points to the need to find an additional condition to rule out filters that do not have the desired analysis ensemble statistics. This leads to the following definition. Definition 2. An unbiased ensemble SRF is an ensemble SRF in which e X1 = 0.
An important special case is that of a symmetric T. This is the subject of the following theorem. Theorem 2. If T is symmetric, then the ensemble SRF is unbiased. Proof. We need only show that if T is symmetric, then 1 is an eigenvector of T. Since Y f is an ensemble perturbation matrix, it satisfies Y f 1 = 0. Therefore it follows from (12) that T2 1 = 1 for symmetric T. Thus 1 is an eigenvector of T2 . But the eigenvectors of the square of a symmetric matrix are the same as those of the original matrix. Therefore 1 is an eigenvector of T. Although not as general as Theorem 1, Theorem 2 provides a particularly simple test for unbiasedness. However, there are unbiased filters for which T is not symmetric. An example of this type is given by Livings et al. [19], Section 8.2, and is related to the EAKF [1]. A recent innovation is to use random rotations (e.g. [9,17]) to try to improve the higher order moments of the ensemble statistics. The next theorem provides a condition on these postmultiplied rotation matrices. It assumes that the ensemble SRF with post-multiplier matrix T is unbiased and considers the ensemble SRF with post-multiplier matrix TU where U is orthogonal. Theorem 3. If the ensemble SRF with matrix T is unbiased and 1 is an eigenvector of U, then the ensemble SRF with matrix TU is unbiased. Proof. By hypothesis X f T1 = 0 and U1 = λ1 for some scalar λ. Therefore X f TU1 = λX f T1 = 0. Therefore the ensemble SRF with matrix TU is unbiased. Some ensemble SRFs (e.g., the EAKF of Anderson [1]) have been presented in the pre-multiplier form Xa = AX f instead of the post-multiplier form Xa = X f T. The following theorem shows that the ability to write an ensemble SRF in premultiplier form provides an alternative test for unbiasedness. Theorem 4. If the analysis step (11) of an ensemble SRF can be written in the form e X = AX f , then the ensemble SRF is unbiased.
1025
D.M. Livings et al. / Physica D 237 (2008) 1021–1028
Proof. e X1 = AX f 1 = 0.
Some further remarks about pre-multiplier filters are given after Theorem 8.
Proof. Theorem 5 implies that 1 is an eigenvector of both T and TU. Since T is invertible by Theorem 6, 1 is also an eigenvector of T−1 TU = U. The next theorem provides a partial converse to Theorem 4.
4. Necessary conditions This section provides some partial converses to theorems in Section 3. These results are also needed to characterize the set of unbiased SRFs in Appendix B and used in the proof of Theorem 9 in Section 5. To provide a partial converse to Theorem 1 it is first necessary to introduce the concept of a nondegenerate ensemble perturbation matrix. The columns of an ensemble perturbation matrix X are not linearly independent because there is at least one linear relation between them (they sum to zero). Thus the rank of X is at most m − 1. Definition 3. An n × m ensemble perturbation matrix is nondegenerate if it has rank m − 1. If n < m −1, the rank of the ensemble perturbation matrix is at most n. In this case, the ensemble perturbation matrix is never nondegenerate. More typically, n m, so that nondegeneracy of X is more probable (considering ensemble members as random draws from a multivariate probability distribution). The nondegeneracy condition is equivalent to the null space of X being equal to the one-dimensional space spanned by 1. The following theorem states that the sufficient condition for unbiasedness in Theorem 1 is also a necessary condition when X f is nondegenerate.
0f
Theorem 5. If is nondegenerate and the ensemble SRF is unbiased, then 1 is an eigenvector of T. Proof. Because the filter is unbiased, 0 = e X1 = X f T1. f Therefore T1 is in the null space of X . Because X f is nondegenerate, this implies T1 = λ1 for some scalar λ. Therefore 1 is an eigenvector of T. To provide a partial converse of Theorem 3, we first need to show that T is invertible. Theorem 6. Any solution of (12) is nonsingular. Proof. Suppose that T is singular. Then TTT is singular. It follows from [23, equation (15)], or may be verified by direct multiplication, that the RHS of (12) may be rewritten as (20)
I + (Y f )T R−1 Y f
is symmetric positive definite, so is and since I − (Y f )T D−1 Y f . Therefore TTT is symmetric positive definite and hence nonsingular. This contradiction implies that T is nonsingular. We are then in a position to prove the following result. Theorem 7. If X f is nondegenerate and the ensemble SRFs with matrices T and TU are unbiased, then 1 is an eigenvector of U.
for i = 1, . . . , m − 1.
xi0 = Axi
(21)
Since X f 1 = 0 and e X1 = 0 it follows that 0f
xm = −
m−1 X
xi ,
0f
(22)
xi0 .
(23)
i=1
x0m = −
m−1 X i=1
Therefore 0f
Axm = −
Xf
I − (Y f )T D−1 Y f = (I + (Y f )T R−1 Y f )−1 ,
Theorem 8. If X f is nondegenerate and an ensemble SRF is unbiased, then the analysis step (11) of the ensemble SRF can be written in the form e X = AX f . √ 0f Proof. Let xi and xi0 denote the ith columns of m − 1X f and √ m − 1e X respectively. By the nondegeneracy of X f there is a 0f linearly independent set of m − 1 members of {xi }. Assume (without loss of generality) that these vectors correspond to the subscripts i = 1, . . . , m − 1. Because these vectors are linearly independent, they can be mapped onto an arbitrary set of m − 1 vectors by a linear transformation. In particular there exists a matrix A such that
m−1 X
0f
Axi = −
i=1
m−1 X
xi0 = x0m .
(24)
i=1
Eqs. (21) and (24) together imply e X = AX f .
In view of Theorems 4 and 8 the reader may wonder whether a framework based on a pre-multiplier update would be preferable to the established post-multiplier framework: there would be no bias problems and it would cover most unbiased ensemble SRFs. In typical applications n m and hence the n × n matrix A is very much larger than the m × m matrix T. However, neither of these matrices would normally be computed explicitly in an efficient numerical algorithm. The storage required and operation count would depend on the implementation. Some operation counts for several different algorithms are given in [23]. In Appendix B we use the results of this section to characterize the structure of the set of all unbiased ensemble SRFs in terms of the set of orthogonal matrices. 5. Applications This section examines some published formulations of the ensemble SRF from the point of view of bias. Tippett et al. [23] place several of these in the ensemble SRF framework but without consideration of bias. Of the filters discussed by Tippett et al. [23], the EAKF [1] and filter of Whitaker and Hamill [25] are formulated in pre-multiplier form. Hence by Theorem 4
1026
D.M. Livings et al. / Physica D 237 (2008) 1021–1028
they are unbiased. Further discussion of these algorithms is given in [19]. The original formulation of the ETKF [4] requires a more detailed examination. It computes the eigenvalue decomposition (Y f )T R−1 Y f = C0CT ,
(25)
where 0 is a m × m real, non-negative, diagonal matrix and C is an m × m orthogonal matrix. This gives a solution of (12) as 1
T = C(I + 0)− 2 .
(26)
Using this T in (11) gives the original version of the ETKF. It is not obvious that such a T will always yield an unbiased filter. The experiment of Fig. 1 suggests that it may not and Theorem 9 below shows that it rarely does. This issue is addressed in the context of the ensemble generation problem in Wang et al. [24], where two revised versions of the ETKF are proposed. The first such revision [24, appendix Ca] is equivalent to post-multiplying (26) by CT to give T = C(I + 0)
− 12
CT .
(27)
CT
Since is orthogonal, this is a solution of (12), and since this T is symmetric, using it in (11) yields an unbiased filter by Theorem 2. The unbiasedness of this revised ETKF enables the conditions under which the original ETKF is unbiased to be clarified. Theorem 9. If X f is nondegenerate (i.e., has rank m − 1, see Section 4) and the ETKF is unbiased, then Y f = 0. Proof. Since (26) may be obtained from (27) by postmultiplying by the orthogonal matrix C, Theorem 7 in Section 4 implies that 1 is an eigenvector of C. Let C1 = λ1. Then, by (25), 01 = CT (Y f )T R−1 Y f C1 = λCT (Y f )T R−1 Y f 1 = 0. But 01 is the column vector of diagonal elements of the diagonal matrix 0. Thus 0 = 0 and, by (25) again, (Y f )T R−1 Y f = 0. Since R−1 is positive definite, it follows that Y f = 0. In words, the condition Y f = 0 for unbiasedness in Theorem 9 says that there is no observable difference between the members of the forecast ensemble. Thus an unbiased filter will be of rare occurrence for observation operators that supply useful information. When the necessary condition does occur, it follows from (25) that 0 = 0 and C is arbitrary, and thus that the original ETKF reduces to Xa = X f C where C is an arbitrary orthogonal matrix. Note that in this case Pa = P f . Repeating the experiment of Fig. 1 using the revised ETKF gives the results shown in Fig. 2. The values of each component of e X1 are very small, O(10−16 ), in these simulations. The size of the ensemble spread (indicated by ± the ensemble standard deviation) is larger than in the experiments with the old ETKF in Fig. 1. The true state (represented by zero on the vertical axis) is usually inside the band defined by the ensemble mean ± ensemble standard deviation, indicating stable filter behaviour. In practical uses of ensemble filters, small ensemble sizes often lead to issues with filter divergence or spurious long
Fig. 2. Output of revised ETKF with ensemble size m = 4 for a linear system. Plotting conventions as in Fig. 1.
range correlations in the forecast error covariances. Popular techniques for ameliorating these problems include covariance inflation and localization [13,14]. Covariance inflation [2] involves applying a uniform factor to the ensemble perturbations before computing the analysis, i.e., xi = α(xi − x) + x for an inflation factor α. It is clear that this will mean that the ensemble SRF will no longer satisfy (5)–(8). Nevertheless, the analysis ensemble will be consistent with the desired statistics if we assume the inflated forecast ensemble as the prior. Covariance localization, as applied to ensemble SRFs by Anderson [1] and Whitaker and Hamill [25] involves a modification to the computation of the gain matrix (7) in the analysis step. As pointed out by Whitaker and Hamill [25], this causes inconsistencies between the value of Pa computed using Xa as in (3) and the value computed using K, X f and Y f as in (6). However, for an unbiased algorithm, the analysis perturbation matrix, Xa , will still satisfy Xa 1 = 0, so the ensemble will still be centred on the computed mean. 6. Summary and discussion Since its original presentation by Evensen [7], several alternative formulations of the EnKF have been published. The ensemble SRF framework of Tippett et al. [23] encompasses several variants. Not all ensemble SRFs bear the desired KFlike relationship to the forecast ensemble. In particular, the analysis ensemble mean may not be equal to the analysis state estimate. We denote filters with this problem as biased ensemble SRFs. The analysis ensemble statistics produced by a biased ensemble SRF are undesirable for a number of reasons beyond the fact that a biased mean tends to put the filter’s best state estimate in the wrong place. Such a bias would not be too great a problem if it were accompanied by an increase in the size of the error estimate provided by the filter’s covariance matrix. Users of the output would then be aware of the increased error, although they would remain unaware that part of the error is
1027
D.M. Livings et al. / Physica D 237 (2008) 1021–1028
systematic rather than random. However, there is actually a decrease in the size of the error estimate rather than an increase, and the worse the bias, the worse the overconfidence of the error estimate. Such problems could ultimately lead to filter divergence. While this problem has been previously recognized and dealt with for particular implementations (e.g. the ETKF of Wang et al. [24]), and some more general criteria developed for linear [21] and nonlinear [15] observation operators, these results are not widely known. This paper brings these previously published results together with new results to give a set of generic necessary and sufficient conditions for the ensemble to remain centred on the analysis state estimate. We provide generic, simple proofs of these criteria and clarify some of the theory. Possibly the simplest tests to use in practice state that a filter is unbiased if its post-multiplier matrix is symmetric or if it can be written in pre-multiplier form. The symmetry of the post-multiplier matrix is not a necessary condition. However, under certain nonrestrictive assumptions on X f , unbiased filters can always be written in pre-multiplier form. For randomrotations formulations of the ensemble SRF, the post-multiplier orthogonal matrices should have 1 as an eigenvector. Indeed, in Appendix B we use this result to show that the set of unbiased ensemble SRFs is in one-to-one correspondence with the set of orthogonal matrices O(1) × O(m − 1). In this paper it has been assumed that the ensemble provides both a best state estimate (the ensemble mean) and a measure of the uncertainty in this estimate (the ensemble covariance matrix). However, an alternative approach is possible in which a best state estimate is maintained separately from the ensemble, which still provides the measurement of estimation error. The forecast and analysis perturbation matrices are taken relative to the forecast and analysis best state estimates rather than the ensemble means. It is then not necessary for the columns of these matrices to sum to zero and hence there is no need to impose an unbiasedness condition in the analysis step. An example of such a filter is the maximum likelihood ensemble filter (MLEF) of Zupanski [26], in which the analysis step updates the best state estimate using 3D-Var (with a cost function that uses the forecast ensemble covariance matrix instead of a static background error covariance matrix) and updates the ensemble perturbations using an ETKF. Finally, it must be stated that the type of bias discussed in this paper is not the only type of bias that may be encountered with an EnKF. Inconsistent ensemble statistics have been observed in formulations of the EnKF other than the original ETKF. Houtekamer and Mitchell [14] present results showing problems with a perturbed observation EnKF and Anderson [1] discusses the issue in the context of the EAKF. The causes of the inconsistencies in these cases are not due to the issue identified here. They are probably due to the use of small ensembles [10] and to other approximations made in the course of deriving the filters. The more fundamental problems of sampling errors and model and observation biases are not addressed here. Thus, while the criteria for unbiasedness developed in this paper are not a cure-all, they
should nevertheless be useful to designers of ensemble SRFs in the future. Acknowledgements This paper is partially based on the first author’s Master’s dissertation, written whilst he was a NERC-funded student in the Department of Mathematics at the University of Reading. The second author is supported in part by an RCUK Academic Fellowship and the third in part by the NERC Centre of Excellence, the Data Assimilation Research Centre (DARC). Appendix A. A linear model The equations of motion for the linear model used in the experiments are given by z˙ = Az,
(A.1)
where z is an 8 × 1 vector and A is the 8 × 8 matrix constructed as A = QJQT .
(A.2)
Q is an 8 × 8 symmetric orthogonal matrix whose i, j-th component is given by 2i jπ 2 . (A.3) Q i, j = √ sin 17 17 J is a block diagonal matrix given by 1
−1 2 1 1 2 0 0 0 0 J= 0 0 0 0 0 0 0 0
0
0
0
0
0
0
0 0 0 0 0 0 1 −2 0 0 0 0 2 1 0 0 0 0 (A.4) . 0 0 −1 2 0 0 0 0 −2 −1 0 0 1 0 0 0 0 − 1 2 1 0 0 0 0 −1 − 2 It is clear from the construction of A that this matrix has complex eigenvalues with both positive and negative real parts. This system is discretized and solved using the MATLAB ode45 function. This function uses an explicit Runge–Kutta (4, 5) pair with an adaptive step size [6]. The initial conditions used for the truth in the experiments in Sections 2 and 5 are given by 10 times the last column of Q. Appendix B. Structure of the set of unbiased ensemble SRFs This appendix describes the set of unbiased ensemble SRFs in terms of a well-known group of matrices. To establish this result, we first need to prove some preliminaries. Theorem B.1. If X1 and X2 are two n × m matrices such that X1 XT1 = X2 XT2 , then there exists an orthogonal matrix U such that X2 = X1 U.
1028
D.M. Livings et al. / Physica D 237 (2008) 1021–1028
The proof of this theorem is omitted. It is very similar to a proof for the existence of the singular value decomposition [22, section 6.6], or alternatively see [19]. Theorem B.1 allows us to relate all SRFs to a given reference solution. Theorem B.2. Let T1 be a solution of the matrix square root condition (12). Then any solution of (12) may be uniquely expressed in the form T = T1 U where U is orthogonal. Proof. Theorem B.1 establishes the existence of an orthogonal U such that T = T1 U. To show the uniqueness of U, suppose that there exists an orthogonal matrix U1 such that T1 U = T1 U1 . Since T1 is nonsingular (Theorem 6) it may be cancelled from both sides of this equation to give U = U1 . Therefore U is unique. Theorem B.2 implies the following description of the set of all solutions T of (12) in terms of the group O(m) of all m × m orthogonal matrices. Corollary B.3. Let T1 be a solution of (12). Then U ↔ T1 U defines a one-to-one correspondence between O(m) and the solutions T of (12). Thus the set of all ensemble SRFs is in one-to-one correspondence with O(m). Now we restrict ourselves to the set of unbiased ensemble SRFs. Theorem B.4. Suppose that X f is nondegenerate and that T1 is the post-multiplier matrix of an unbiased ensemble SRF. Then U ↔ T1 U defines a one-to-one correspondence between the subgroup of all matrices U in O(m) that have 1 as an eigenvector and the set of post-multiplier matrices of unbiased ensemble SRFs. Proof. This follows from Corollary B.3 and Theorems 3 and 7. The subgroup in Theorem B.4 is given more concrete form by the following theorem. Theorem B.5. There is a one-to-one correspondence between the subgroup of all matrices in O(m) that have 1 as an eigenvector and the group O(1) × O(m − 1). Proof. Suppose W is any m × m orthogonal matrix, then U ↔ WT UW is a one-to-one correspondence between O(m) and itself. Now we choose W to be an orthogonal matrix in which the first column is a scalar multiple of 1, say α1. Writing the coordinate vector e1 = (1, 0, . . . , 0)T , we have We1 = α1. Thus, under this correspondence, matrices U that have 1 as an eigenvector correspond to matrices V = WT UW that have e1 as an eigenvector. The latter matrices are those of the form V1 0 , (B.1) 0 V2 where V1 is an element of O(1) (that is to say U1 = ±1) and V2 is an element of O(m − 1). This establishes the required correspondence.
Thus, in the case of nondegenerate X f , the set of all unbiased ensemble SRFs is in one-to-one correspondence with O(1) × O(m − 1). References [1] J.L. Anderson, An ensemble adjustment Kalman filter for data assimilation, Mon. Weather Rev. 129 (2001) 2884–2903. [2] J.L. Anderson, S.L. Anderson, A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts, Mon. Weather Rev. 127 (1999) 2741–2758. [3] R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, John Wiley & Sons Ltd., Chichester, 1989. [4] C.H. Bishop, B.J. Etherton, S.J. Majumdar, Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects, Mon. Weather Rev. 129 (2001) 420–436. [5] G. Burgers, P.J. van Leeuwen, G. Evensen, Analysis scheme in the ensemble Kalman filter, Mon. Weather Rev. 126 (1998) 1719–1724. [6] J.R. Dormand, P.J. Prince, A family of embedded Runge–Kutta formulae, J. Comput. Appl. Math. 6 (1980) 19–26. [7] G. Evensen, Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res. 99 (1994) 10143–10162. [8] G. Evensen, The Ensemble Kalman Filter: Theoretical formulation and practical implementation, Ocean Dyn. 53 (2003) 343–367. [9] G. Evensen, Sampling strategies and square root analysis schemes for the EnKF, Ocean Dyn. 54 (2004) 539–560. [10] R. Furrer, T. Bengtsson, Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants, J. Multivariate Anal. 98 (2007) 227–255. [11] A. Gelb (Ed.), Applied Optimal Estimation, The M.I.T. Press, 1974, 374 pp. [12] G.H. Golub, C.F. Van Loan, Matrix Computations, 3rd ed., The Johns Hopkins University Press, 1996, 694 pp. [13] T.M. Hamill, J.S. Whitaker, C. Snyder, Distance-dependent filtering of background error covariance estimates in an Ensemble Kalman Filter, Mon. Weather Rev. 129 (2001) 2776–2790. [14] P.L. Houtekamer, H.L. Mitchell, Data assimilation using an ensemble Kalman filter technique, Mon. Weather Rev. 126 (1998) 796–811. [15] B.R. Hunt, E.J. Kostelich, I. Szunyogh, Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter, Physica D 230 (2007) 112–126. [16] A.H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, 1970, 376 pp. [17] O. Leeuwenburgh, G. Evensen, L. Bertino, The impact of ensemble filter definition on the assimilation of temperature profiles in the tropical pacific, Q. J. R. Meteorol. Soc. 131 (2005) 3291–3300. [18] D.M. Livings, Aspects of the ensemble Kalman filter, Master’s Thesis, University of Reading, 2005. [19] D.M. Livings, S.L. Dance, N.K. Nichols, Unbiased ensemble square root filters, Numerical Analysis Report 06/06, University of Reading Mathematics Dept. Available from: http://www.maths.rdg.ac.uk/research/ publications/publications.asp, 2006. [20] A.C. Lorenc, The potential of the ensemble Kalman filter for NWP, a comparison with 4D-Var, Q. J. R. Meteorol. Soc. 129 (2003) 3183–3203. [21] E. Ott, B.R. Hunt, I. Szunyogh, A.V. Zimin, E.J. Kostelich, M. Corazza, E. Kalnay, D.J. Patil, J.A. Yorke, A local ensemble Kalman filter for atmospheric data assimilation, Tellus A56 (2004) 415–428. [22] G.W. Stewart, Introduction to Matrix Computations, Academic Press, 1973, 442 pp. [23] M.K. Tippett, J.L. Anderson, C.H. Bishop, T.M. Hamill, J.S. Whitaker, Ensemble square root filters, Mon. Weather Rev. 131 (2003) 1485–1490. [24] X. Wang, C.H. Bishop, S.J. Julier, Which is better, an ensemble of positive–negative pairs or a centered spherical simplex ensemble? Mon. Weather Rev. 132 (2004) 1590–1605. [25] J.S. Whitaker, T.M. Hamill, Ensemble data assimilation without perturbed observations, Mon. Weather Rev. 130 (2002) 1913–1924. [26] M. Zupanski, Maximum likelihood ensemble filter: Theoretical aspects, Mon. Weather Rev. 133 (2005) 1710–1726.