Physics Letters A 330 (2004) 365–370 www.elsevier.com/locate/pla
Estimating the state of large spatio-temporally chaotic systems E. Ott a , B.R. Hunt a , I. Szunyogh a,∗ , A.V. Zimin a , E.J. Kostelich a,b , M. Corazza a , E. Kalnay a , D.J. Patil a , J.A. Yorke a a University of Maryland, College Park, MD 20742, USA b Department of Mathematics and Statistics, Arizona State University, Tempe, AZ 85287, USA 1
Received 5 April 2004; accepted 4 August 2004 Available online 20 August 2004 Communicated by A.R. Bishop
Abstract We consider the estimation of the state of a large spatio-temporally chaotic system from noisy observations and knowledge of a system model. Standard state estimation techniques using the Kalman filter approach are not computationally feasible for systems with very many effective degrees of freedom. We present and test a new technique (called a Local Ensemble Kalman Filter), generally applicable to large spatio-temporally chaotic systems for which correlations between system variables evaluated at different points become small at large separation between the points. 2004 Elsevier B.V. All rights reserved. PACS: 05.45.Jn; 89.75.-k; 92.60.Wc Keywords: Spatio-temporal chaos; Kalman filter; Weather forecasting; Chaos
Spatio-temporally chaotic systems abound in nature (e.g., the Earth’s atmosphere, oceans, engineering fluid flows, chemically reacting systems, etc.). Recently there has been much work addressing the general nature of spatio-temporal chaos [1]. One result is that for many such systems the fractal dimension of the global system attractor scales linearly with volume
* Corresponding author.
E-mail address:
[email protected] (I. Szunyogh). 1 Permanent address.
0375-9601/$ – see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.physleta.2004.08.004
of the system. Thus such spatio-temporally chaotic systems become more complex as their size increases. In this Letter we develop and illustrate a new data assimilation scheme applicable to large spatiotemporally chaotic systems. We call our technique the Local Ensemble Kalman Filter method. Data assimilation is the process by which an estimate of the system state is obtained through the use of measured data (which may be noisy and incompletely describe the system state) and a model of the system. Such state estimation (also called analysis) is necessary for formulating initial conditions for insertion into a model
366
E. Ott et al. / Physics Letters A 330 (2004) 365–370
for the purpose of making predictions of the future system state (e.g., data assimilation is a key step in numerical weather forecasting). In addition, state estimation is also necessary in the control of systems and for diagnostic studies (e.g., for assessing climate change). In the case of linear system dynamics, the problem of state estimation is optimally solved by the classic Kalman filter method [2,3]; and, via linearization about orbits, it can be adapted (though it is no longer optimal) to nonlinear systems (the so-called extended Kalman filter [3]). Although the Kalman filter methodology has been extremely successful for lowdimensional systems, its straightforward application to very high-dimensional systems is problematic because it requires matrix operations on very large matrices. For example, for existing global models of the Earth’s atmosphere used in weather prediction, rapid inversion of a matrix of dimension in the millions would be required. This task is many orders of magnitude beyond present computer capabilities. Furthermore, in the future even larger matrices may be involved due to increased model resolution. Thus a number of alternate techniques have been proposed [4,5]. These techniques avoid linearization about nonlinear orbits (used in an extended Kalman filter) by making use of the ensemble Kalman filter idea [4,5] (explained subsequently). Our technique is also an ensemble Kalman filter. It is based on the hypothesis (motivated by results from Ref. [6]) that, when the spatial domain of the problem is divided up into regions of moderate size, vectors of the error of different forecasts (forecast minus true state) in such regions approximately tend to lie in a subspace of low dimension, even if the number of system state variables in each region is high. We expect that this hypothesis will be valid for many spatio-temporally chaotic systems. This implies that operations on only relatively low-dimensional matrices are required, and, using this, we demonstrate a method in which the analyses in each of the local regions are independent, allowing massively parallel computation to be exploited. In what follows we first present our method, and then test it on a simple spatiotemporally chaotic model system [7]. For simplicity we consider a scalar variable ξ(m, t) evaluated on a one-dimensional grid m = 1, 2, . . . , M, with periodic boundary conditions ξ(1, t) = ξ(M + 1, t). We assume that ξ(m, t) evolves via a known set
of equations dξ(m, t) = Gm ξ(1, t), ξ(2, t), . . . , ξ(M, t) . (1) dt (In what follows the time variable t will often be suppressed.) For each m we consider the (2 + 1)-dimensional vector xm = [ξ(m − ), ξ(m − + 1), . . . , ξ(m + )]T , where is sufficiently large compared to a correlation length (which is much less than M). We call xm the local vector, and we call the points, m − , . . . , m, . . . , m + , local region m. Now assume that at time t = t0 , we have some knowledge of xm in the form of its probability distribution function Fm (xm ), which we further assume can be usefully approximated as Gaussian T −1 xm − x¯ bm , Fm (xm ) ∼ exp − xm − x¯ bm Pbm 2 where Pbm and x¯ bm are, respectively, the so-called background error covariance matrix and most probable state for the local vector. In our procedure Pbm and x¯ bm are derived from a previously obtained (see below) (k + 1) member ensemble of (background) global fields, ξ b(i) (m), i = 1, 2, . . . , (k + 1). These (k + 1) fields represent the information in x¯ bm and Pbm via x¯ bm = (k + 1)−1 (2) xb(i) m , i
Pbm
= Xbm XbT m ,
(3)
where Xbm = (k)−1/2 [δxm |δxm | · · · |δxm ], b(i) b(i) b(i) b δxm = xm − x¯ m , and xm is the local vector corresponding to ensemble member i. We will typically consider k < (2 + 1). Thus the (2 + 1) by (2 + 1) matrix Pbm generally has rank k < (2 + 1). Therefore, we introduce the k-dimensional subspace Sm orthogonal to the [(2 + 1) − k]-dimensional null space of Pbm . The space Sm is (2) spanned by the k orthonormal eigenvectors {u(1) m , um , (k) . . . , um } of Pbm whose eigenvalues are nonzero. This suggests that we reduce the dimensionality of our problem by working in the subspace Sm , and by using (2) (k) the basis vectors {u(1) m , um , . . . , um } for Sm . Defin(2) (k) ing a (2 + 1) by k matrix, Qm = {u(1) m |um | · · · |um }, (i) we denote the um basis representation of a (2 + 1)dimensional local vector wm projected onto Sm by a b(1)
b(2)
b(k+1)
E. Ott et al. / Physics Letters A 330 (2004) 365–370
ˆ m = QTm wm . Similarly, for a k-dimensional vector, w (2 + 1) by (2 + 1) symmetric matrix Mm , the action of this matrix restricted to Sm is described by the k by ˆ m = QT Mm Qm . k matrix, M m Now suppose that we are given observational data which for simplicity we assume to be measured at the time t = t0 . As an illustrative case, we assume that, within the local region m, om (2 + 1) of the compo(1) (2) nents of xm are measured. Let ym = (ym , ym , . . .)T be the om -dimensional vector of these observations in local region m, and assume that the errors in these measurements are unbiased and normally distributed with om by om covariance matrix Rm . The error in the measurement can be represented (ym − Hm xm ), where Hm is an om by (2 + 1) matrix whose (i, m ) element is one if ξ at location m is measured and is recorded as the ith component of ym , and is zero otherwise. With this information, we can obtain a new, tighter estimate of the probability distribution of the local vector. We denote by xam the random variable associated with the local vector after the observations are taken into account, and we wish to estimate the probability distribution function of xam ; i.e., obtain its mean and covariance matrix. Let xam = xam − x¯ bm .
(4)
Since our background covariance matrix has null space orthogonal to Sm , we consider xam to lie in (i) Sm . Representing xam in the um basis, we have ˆxam = QTm xam , xam = Qm ˆxam . Using standard techniques [3], the Gaussian probability distribution of ˆxam is given by the following covariance matrix and mean ˆ T R−1 H ˆ m Pˆ b −1 , Pˆ am = Pˆ bm I + H (5) m m m a a T −1 b ˆ m Rm ym − Hm x¯ m , xˆ¯ m = Pˆ m H (6) ˆ m = Hm Qm and Pˆ bm = QTm Pbm Qm . We call where H this process of incorporation of the data to obtain a new state distribution “the analysis” (hence the superscript a). We have for the analysis mean x¯ am = Qm xˆ¯ am + x¯ bm .
(7)
Having obtained the analysis information, Pˆ am and x¯ am , we now wish to determine an ensemble of (k + 1) global analysis fields ξ a(i) (m, t0 ), i = 1, 2, . . . , (k + 1), where these analysis fields incorporate our current knowledge of Pˆ am and x¯ am . Once
367
this is done, we can use our model, Eq. (1) to advance each member of the analysis ensemble to the next time t0 + t, thus yielding a new background ensemble, ξ b(i) (m, t0 + t), representing the distribution of the system state at time t0 + t (see Eqs. (2) and (3)). This would complete the loop, allowing assimilation of data at the new time t0 + t, and similarly for all subsequent times, t = t0 + j t, j = 2, 3, . . . . To determine the ensemble of global analysis fields, we first find an a(i) ensemble of local analysis vectors xm corresponding to each local region of length (2 + 1) centered at point m. The global analysis ensemble could, for example, then be constructed by taking for ξ a(i) (m, t0 ) the component of the (2 + 1)-dimensional vector xa(i) m (t0 ) corresponding to the center of the local region m. Alternatively, ξ a(i) (m, t0 ) can be constructed by averaging the results of the analysis for ξ at m from several overlapping local regions (e.g., in our numerical experiments below, we average over the five local regions m ± 2, m ± 1, and m). a(i) It remains to find the xm . Let ¯ am + δxa(i) xa(i) m =x m ,
(8)
a(i) a(i) with δxm = Qm δ xˆ m . Thus we now wish to detera(i) given Pˆ am subject to the requirement that mine δ xˆ (a)im ˆ m = 0 (so that the average over i of the xa(i) m i δx a is x¯ m ; see (8)). Since the analysis ensemble represents
the analysis covariance, we require ˆ am X ˆ aT Pˆ am = X m , where seek a
(9)
a(1) a(2) a(k+1) ˆ am = (k)−1/2{δ xˆ m |δ xˆ m | · · · |δ xˆ m }. We X solution to Eq. (9) for the δ xˆ a(i) in the form m
ˆ bm Ym . ˆ am = X X
(10)
That is, we formally express each of the δ xˆ a(i) m as a linear combination of the δ xˆ b(i) . Thus from (9) and (10), m we have that Ym satisfies ˆ bm Ym YTm X ˆ bT Pˆ am = X m ,
(11)
ˆ b = QT Xb . A solution to (11) is the followwhere X m m m 2 ing (see Ref. [5], for discussion and references) 2 Although the solution of (10) for Y is not unique, Eq. (12) is m the optimal solution of (10) in that it yields an ensemble of analysis a(i) perturbations δxm that is closest to the background perturbations b(i) δxm in an appropriate mean squared sense [5].
368
E. Ott et al. / Physics Letters A 330 (2004) 365–370
ˆ bT Pˆ b −1 Pˆ a − Pˆ b Pˆ b −1 X ˆ b 1/2 , Ym = 1 + X m m m m m m (12) where the matrix square root is defined as positive, ˆ bm = QTm Xbm . (That (12) solves and symmetric and X (11) can be verified by directly substituting it into (11) ˆ b ˆ bT and using Pˆ bT m = Xm Xm , which follows from (3).) Eq. (12) completes the specification of our Local Ensemble Kalman Filter method. We emphasize that our procedure can be generalized in several ways [5]: the spatial domain can be of higher dimension than one, the dependent variable can be a vector rather than a scalar, each measurement can be a function of several model-state variables, the dimension of Sm can be taken to be less than k, etc. Summarizing our method: (i) for each background b(i) ensemble member ξ b(i) (m), form local vectors (xm ) b(i) in each local region m; (ii) project the xm into Sm , b(i) b(i) a a xˆ m = Qm xm ; (iii) obtain Pˆ m and x¯ m , and form a a(i) new local analysis ensemble xm (Eqs. (8), (11) and a(i) (12)); (iv) using xm form an ensemble of global a(i) analysis fields ξ ; (v) use the model Eq. (1) to advance ξ a(i) (m, t0 ) to ξ b(i) (m, t0 + t), and return to (i). We note that the key point is that the estimate in (iii) makes use of the forecast state probability distribution (i.e., x¯ bm and Pbm ) estimated via dynamical evolution of the ensemble from the previous analysis time. In order to illustrate and test our method, we consider the simple spatio-temporal model proposed in [7], Eq. (1) with Gm = ξ(m + 1, t) − ξ(m − 2, t) ξ(m − 1, t) − ξ(m, t) + 8
(13)
with periodic boundary conditions ξ(1, t) = ξ(M + 1, t). For this system the attractor’s information dimension calculated from Lyapunov exponents with M = 40 is 27.1 [7] (for larger M the dimension is expected to scale linearly with M [1]). In our tests we first generate the ‘true state’, ξ ∗ (m, t), m = 1, . . . , M, by a long model integration; we then create ‘observations’ at every time interval t = 0.05 by adding uncorrelated, normally distributed noise with variance 1.00 to the true state (this observational noise is to be compared with the value 3.61 for the time averaged rms deviation of the solution ξ(m, t) of (13) from its mean). Starting with observations at all m, the effect of
Fig. 1. E¯ vs. number of observations s for M = 40. For the LEKF method the size of the local region is = 6 and the ensemble size is k + 1 = 10. For the FEKF method k + 1 = 40.
a fixed, reduced observational network of size s < M, was studied by removing observations at randomly selected locations. The observations are assimilated at each interval t, and the accuracy of the analysis is ¯ of the rms analymeasured by the time E, average, −1 a ∗ 2 ¯ sis error, E = M m (ξ − ξ (m)) . For the sake of comparison with our local ensemble Kalman filter results (LEKF), we consider two standards: ‘full’ ensemble Kalman filter results (FEFK), and ‘direct insertion’ results (DI). The full ensemble Kalman filter may be regarded as the best achievable Kalman filter result that could be obtained given that computer resources were not a constraint (in contrast to a case like operational weather prediction, this is indeed the case for (13) with the M values we use), and is obtained considering the state x(t) = [ξ(1, t), ξ(2, t), . . . , ξ(M, t)] on the entire domain, rather than on a local region of size (2 + 1) < M (and by using a larger ensemble than is needed in the LEKF). In ‘direct insertion’ we take the analysis field at time t0 (now a single field rather than an ensemble) at point m to be the observation, if there is an observation at m, and, if ξ at m is not observed, we take the analysis at m to be the background value (forecast) obtained by integrating the previous analysis via (13) from t0 − t to t0 . Results are shown for the time averaged error E¯ versus the size s of the observational network in Fig. 1 for M = 40. It is seen that both the LEKF method (with = 6 and k + 1 = 10 ensemble members) and the FEKF method (with k + 1 = 40 ensemble members) are much more accurate than DI, and that, to within the width of the plotted line, the same accuracy is obtained for the LEKF and the FEKF methods. The results in Fig. 1 for the LEKF and the FEKF methods are obtained
E. Ott et al. / Physics Letters A 330 (2004) 365–370
using a small variance inflation; i.e., the analysis covariance was artificially inflated by adding I (where I is the k by k unit matrix) to Pˆ am , where is small ( = r[Tr(Pˆ am )]/k with r = 0.03 for Fig. 1). Empirically, ¯ Other ensemthis decreases the time averaged error E. ble Kalman filter techniques [4] also use variance inflation to compensate for a tendency to underestimate the uncertainty in the background state. This underestimation of the uncertainty may be due to nonlinearity, limited ensemble size, and/or model error (see next paragraph). For linear dynamics and no model error, the Kalman filter is optimal [2,3], and variance infla¯3 tion (or deflation) could only increase E. The discussion above and the results in Fig. 1 are for a ‘perfect model’ scenario: the evolution equations used for data assimilation are exactly the same as those by which the system state actually evolves. In practice, the model equations will typically only give an approximation to the dynamics of the real system. Furthermore, the real dynamics may be subject to noise of unknown characteristics.4 These considerations motivate us to examine a situation where (i) the true dynamics is subject to noise such that, at each time nt in the integration of (13), we add to each ξ(m, t) a random noise, uncorrelated in m, whose distribution is uniform in the interval (−0.01, 0.01); and (ii) the model we use in doing the data assimilation differs from the true dynamics in that the last term in (13) is 8.4, rather that the ‘true’ value of 8. Due to (i) and (ii) the error will be greater than in the absence of (i) and (ii). A simple, practical way of crudely accounting for these added errors is by means of increased variance inflation, as described above. Varying r to minimize ¯ we obtain the results in Fig. 2. Again the FEKF E, and LEKF methods give almost identical results that are much better than for DI. Fig. 3(a) and (b) (s = M, and the perfect model scenario of Fig. 1) show that, for both the LEKF and the FEKF methods, the smallest achievable error is attained when the ensemble size exceeds a minimum value, k kmin . While both methods yield approxi3 In the case of Fig. 1, there is no model error, and, for k = 9, the ensemble size is apparently sufficient (see Fig. 3), so that the improved performance with variance inflation is presumably due to nonlinearity of (13). 4 E.g., in numerical weather prediction, subgrid scale turbulence roughly acts like short spatial-scale noise.
369
Fig. 2. Similar to Fig. 1, but with error in the assumed model, as well as noise added to the true state evolution (see text).
Fig. 3. Dependence of E¯ on ensemble size k + 1 for different system sizes M for the perfect model scenario case with s = M and with other parameters (l and r) the same as in Fig. 1. (a) For the FEKF method, kmin varies approximately linearly with the size M. (b) For the LEKF method, kmin = 6 independent of system size.
mately the same mean error for k kmin , kmin for the FEKF method increases approximately linearly with the system size (Fig. 3(a)), while kmin for the LEKF method is independent of M (Fig. 3(b)). Furthermore, the FEKF method involves matrix operations on matrices of size M by M, while the LEKF method only requires operations on smaller matrices of size (2 + 1) by (2 + 1) or k by k independent of M. Thus, for large systems, the LEKF method holds the promise of achieving performance approaching the same accuracy as that of a full Kalman filter approach, but at very much reduced cost. Finally, we note that the LEKF numerical experiments reported above use = 6 as the size of the local region. The results are almost the same for = 7 and = 5. However, if is taken to be too small, the accuracy of the procedure is found to degrade. For too large, good accuracy can still be obtained, but only at
370
E. Ott et al. / Physics Letters A 330 (2004) 365–370
the expense of increasing the ensemble size (k + 1), thus degrading the computational efficiency (e.g., see Fig. 3). In general, a good choice for the local region size could be obtained experimentally by varying to obtain good accuracy and efficiency (this is how we choose = 6 in our example). In conclusion, we have introduced a new method for state estimation that has the potential of being accurate and computationally feasible for large systems. By operating in local regions of size much less than that of the full system, the method performs many independent analyses involving relatively small matrices, thus allowing for efficient, fast parallel computation.5 Acknowledgements This work was supported by the W.M. Keck Foundation, by a James S. McDonnell 21st Century Research Award, by the Office of Naval Research (Physics), by the Army Research Office, and by the National Science Foundation (Grant 0104087 and Grant PHYS 0098632). 5 Preliminary application of our method to a real operational weather code (the National Center for Environmental Prediction MRF model) with ∼ 2 × 104 grid points on the Earth’s surface and 28 height levels assimilates 3 × 106 observations in 5 minutes on a PC cluster.
References [1] For example see, A. Torcini, A. Politi, G.P. Puccioni, G. Dalessandro, Physica D 53 (1991) 85; D.A. Egolf, H.S. Greenside, Nature 369 (1994) 129; S.M. Zoldi, H.S. Greenside, Phys. Rev. Lett. 78 (1997) 1687; E. Olbrich, R. Hegger, H. Kantz, Phys. Lett. A 244 (1998) 538; A. Pikovski, A. Politi, Nonlinearity 11 (1998) 1049. [2] R. Kalman, Trans. Am. Soc. Mech. Eng., Ser. D: J. Basic Eng. 82 (1960) 35; R. Kalman, R. Bucy, Trans. Am. Soc. Mech. Eng., Ser. D: J. Basic Eng. 83 (1961) 95. [3] E.g., see: C.K. Chui, G. Chen, Kalman Filtering, SpringerVerlag, Berlin, 1987. [4] T.M. Hamill, M.J. Whitaker, C. Snyder, Mon. Weather Rev. 129 (2001) 2776; C. Kappenne, H. Rienecker, Mon. Weather Rev. 130 (2002) 2951; J.L. Anderson, Mon. Weather Rev. 129 (2001) 2884; C.H. Bishop, B.J. Etherton, S. Majumdar, Mon. Weather Rev. 129 (2001) 429; G. Evensen, J. Geophys. Res. 99 (1994) 10143; P.L. Houtekamer, H.L. Mitchell, Mon. Weather Rev. 129 (2001) 796; E. Kalnay, Atmospheric Modeling, Data Assimilation, and Predictability, Cambridge Univ. Press, 2002. [5] E. Ott, et al., physics/0203058. [6] D.J. Patil, et al., Phys. Rev. Lett. 86 (2001) 5878. [7] E.N. Lorenz, K.A. Emanuel, J. Atmos. Sci. 55 (1998) 399.