Advances in Water Resources 86 (2015) 257–259
Contents lists available at ScienceDirect
Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres
Preface
Data assimilation for improved predictions of integrated terrestrial systems
1. Introduction Predicting states or fluxes in a terrestrial system, such as, for example, a river discharge, groundwater recharge or air temperature is done with terrestrial system models, which describe the processes in an approximate way. Terrestrial system model predictions are affected by uncertainty. Important sources of uncertainty are related to model forcings, initial conditions and boundary conditions, model parameters and the model itself. The relative importance of the different uncertainty sources varies according to the specific terrestrial compartment for which the model is built. For example, for weather prediction with atmospheric models it is believed that a dominant source of uncertainty is the initial model condition [12]. For groundwater models on the other hand, a general assumption is that parameter uncertainty dominates the total model prediction uncertainty. Sequential data assimilation techniques allow improving model predictions and reducing their uncertainty by correcting the predictions with measurement data. This can be done on-line with realtime measurement data. It can also be done off-line by updating model predictions with time series of historical data. Off-line data assimilation is especially interesting for estimating parameters in combination with model states, or for a reanalysis of past states. The most applied sequential data assimilation techniques for terrestrial system model predictions are the Ensemble Kalman Filter (EnKF) [8] and the Particle Filter (PF) [2]. EnKF provides an optimal solution for Gaussian distributed parameters, states and measurement data, whereas the PF is more flexible but computationally more expensive and provides in theory an optimal solution independent of the distribution type [22]. The historical evolution of the use of sequential data assimilation methods in terrestrial system models can be understood from compartment-specific model needs. Sequential data assimilation was already used for several decades for operational weather prediction before it became popular in the land surface modeling community (e.g., [20]). This development can be understood from the increasing importance given to the lower atmospheric boundary condition for weather prediction. Later EnKF was introduced in petroleum engineering and groundwater hydrology (e.g., [4] or [14]). Given the importance of parameter uncertainty for those compartment-specific applications, EnKF and other data assimilation algorithms were reformulated for joint state and parameter updating [4,16,23]. These developments also had a feedback on the land surface modeling community, where also increasingly joint state-parameter updating was tested (e.g., [19]). The importance or usefulness of parameter updates in these systems is an open question. Sequential data assimilation is http://dx.doi.org/10.1016/j.advwatres.2015.10.010 0309-1708/© 2015 Elsevier Ltd. All rights reserved.
less often applied for the vadose zone as parameter uncertainty also plays a major role in this compartment and relations between model states and parameters are highly non-linear and often also strongly non-Gaussian. The use of sequential data assimilation methods for the vadose zone is stimulated by real-time management problems like optimization of irrigation and the central role of the vadose zone in integrated hydrological models and terrestrial system models. With the development of integrated terrestrial system models, such as coupled subsurface-landsurface-atmosphere models [21], possibilities arise to use observations in a given terrestrial compartment and improve predictions for other compartments. Coupling can in these models be done by the continuity of variables (such as pressure) or fluxes in one system instead of exchange fluxes from one system to the other as boundary conditions. An example for data assimilation in a coupled system is the assimilation of piezometric head data to improve the characterization of the soil moisture profile in the unsaturated zone. Another example is the assimilation of soil moisture data to improve the estimation of the atmospheric boundary layer height. Data assimilation in integrated terrestrial system models requires an interdisciplinary effort. This was the motivation for this special issue. Many of the contributions to this special issue are related to the conference on “Computational Methods in Water Resources” in Stuttgart, Germany in June 2014. We identified three main focus areas, which will be further highlighted below. 2. Scope of the special issue 2.1. Methodological aspects Sequential data assimilation in terrestrial system models, and especially the joint estimation of states and parameters, is affected by several problems. For EnKF, these problems are associated with the low rank approximation of the model error covariance matrix (i.e., the use of a limited number of ensemble members) and the linear approximation of the dependencies among different states and between states and parameters as induced by the covariance matrix. For PF, the large numbers of states and parameters for coupled models in combination with the elevated CPU-need of this method are the main challenges. This raises the need for more advanced data assimilation methods, efficient for large scale problems, with a better performance for strongly non-linear and non-Gaussian problems. Ghorbanidehno et al. [9] present the spectral Kalman Filter (SpecKF), a new data assimilation algorithm for complex and large physical problems with many unknowns. The CPU-time needed by
258
Preface / Advances in Water Resources 86 (2015) 257–259
SpecKF scales with the number of measurements (EnKF scales with the number of ensemble members), an advantage for many practical cases. Two synthetic studies show that SpecKF achieves a higher accuracy than EnKF with a comparable or smaller computational cost. Hut et al. [11] address the problem of the need for very large ensembles for data assimilation with complex forecast models. They suggest an alternative filter method, where the ensemble is not stored, but only its first and second moments. This makes the filter less memory intensive. The method is set up to be suitable for massive parallel computation. Its performance is studied with the highly non-linear Lorenz model. Pauwels and De Lannoy [18] analyze the approach of using forecast bias estimate. They study the bias and its error covariance and their propagation in the filter, where different approaches are considered. This question is studied with the example of a hydrologic storage model with storage for groundwater. It is demonstrated that a forecast bias estimate reduces the bias in predictions of groundwater storage and river discharge. The different approaches to propagate the error covariance had no strong influence on the performance of the filter. Li et al. [13] compare EnKF and the Ensemble pattern matching (EnPAT) algorithm for a groundwater flow problem, estimating spatially variable hydraulic conductivity. EnKF performs optimally for multiGaussian distributions of states, parameters and observations, but is suboptimal for deviations from Gaussianity. It is shown that EnPAT outperforms EnKF for the characterization of non-multiGaussian parameter distributions. As a consequence, transport predictions are also improved with EnPAT. The problem of retrieving heterogeneous parameter patterns with non-Gaussian distributions is also addressed in the contribution of Xu and Gomez-Hernandez [24]. They assimilate observations of hydraulic conductivity and piezometric head in a heterogeneous aquifer sequentially in space. The inverse sequential simulation method follows the same structure as an EnKF, but the parameter update is done by a sequential simulation using conditioned covariances from data inside a search neighborhood. The non-Gaussianity is treated with a normal score transform. The method is tested with a non-Gaussian permeability field. It is demonstrated that a sufficiently large ensemble is needed to obtain reasonable estimates of the covariances and that the search neighborhood needs to be large enough to cover the correlation structure.
Dong et al. [6] investigated the potential of distributed temperature sensing (DTS) for estimating the vertical soil moisture profile. The authors conclude that temperature measurements at two different depths allow estimating the vertical soil moisture profile with a strongly improved precision as compared to simulations without data assimilation. These positive results were found for different soil textures. Erdal et al. [7] also studied data assimilation for the unsaturated zone, using the EnKF. They point out that under very dry conditions the pressure head distribution in the soil is strongly skewed and the Gaussian assumption for EnKF results in artefacts in the data assimilation results. The performance of the filter is strongly improved if state transformation is applied. The authors tested the normal score transformation and the transform with the retention function applied to the model states. The contribution of Gonzalez et al. [10] compares an Ensemble Smoother and a restart Ensemble Kalman filter to detect leakages (zones with high permeability) in the caprock overlying an aquifer that is used for carbon dioxide injection (carbon capture and storage). A numerical test case is used for this purpose. Both methods assimilate pressure observations in an aquifer above the carbon dioxide storage to update parameter fields. Different aspects are discussed, such as using normal score transforms to cope with nonGaussianity of the parameter distributions. The results illustrate that a restart filter, although often computationally more expensive than the smoother, leads to better parameter estimates. The contribution of Bazargan et al. [3] also addresses the problem of retrieving a heterogeneous parameter field, here for an oil reservoir. They derive a surrogate model using polynomial chaos expansion for the system and predict the posterior probability distribution of permeability (otherwise obtained with the Markov Chain Monte Carlo, MCMC, method). To avoid the immense computational cost of this method for a problem with a high-dimensional parameter space, impact factors are calculated from the Karhunen–Loeve decomposition to retain the most important components in the expansion only. The method is tested with a numerical example, where posterior distributions for a strongly non-Gaussian permeability field are obtained from oil production data. It is demonstrated that the posterior distribution could be estimated well using the surrogate model with no large differences to the MCMC estimate, but the surrogate model estimate was faster than MCMC by orders of magnitude. 2.3. Data assimilation across compartments
2.2. Data assimilation for the subsurface: dealing with non-linearities The vadose zone plays a central role in terrestrial system models, for example in vadose zone-groundwater models, land surface models or rainfall-runoff models. As indicated before, data assimilation is particularly challenging for the vadose zone, especially if parameters are also updated. Besides better methods, also the exploration of new data types and data assimilation strategies specific for the vadose zone are important to handle it better in data assimilation for integrated terrestrial system models. Developments in the area of data assimilation for petroleum engineering applications and multiphase flow are also important for the vadose zone, as these problems are also highly non-linear and are described by similar models. Pasetto et al. [17] study data assimilation for the unsaturated zone. They chose a synthetical set-up to avoid model structural bias. The set-up mimicked the Biosphere 2 Landscape Evolutionary Laboratory in Arizona (USA). Spatially distributed fields of soil moisture content and saturated hydraulic conductivity were reconstructed with help of a large number of soil moisture measurements. Pasetto et al. [17] find that for lower numbers of soil moisture data state updating alone is associated with larger errors in total water storage, whereas joint state and parameter updating provides also in those conditions very good results.
A central question related to data assimilation for integrated terrestrial system models is the data value. What is the benefit of assimilating a certain data type for the characterization of model states and fluxes in other compartments of the terrestrial system? Which observation types are important for which type of predictions, and what is the time dependence of this relationship? In addition, still many other questions on data assimilation in integrated models are relatively open. This concerns for example questions related to ensemble generation, treatment of bias and filter degeneration, amongst others. Zhang et al. [26] test sequential data assimilation with EnKF for the integrated hydrological model MIKE-SHE. Piezometric head data are assimilated for a synthetic catchment that mimics the Karup catchment in Denmark. The authors conclude that different model error sources have to be represented well, and if model uncertainty is not represented adequately a degradation of the assimilation performance can be expected. Abaza et al. [1] study data assimilation in two watersheds with an important snow compartment using a semi-distributed hydrological model. They use the Ensemble Kalman filter to assimilate streamflow data and compare the update of different states with the goal of improving predictions of river discharge, given precipitation. The compared states are soil moisture at different locations and the
Preface / Advances in Water Resources 86 (2015) 257–259
overland routing reservoir and/or the snow water equivalent. It could be demonstrated that all states are relevant and updating all improves predictions. Chen et al. [5] compare different assimilation methods to predict soil moisture in a land surface model using data from a site on the Mongolian plateau. An Ensemble Kalman filter without soil parameter updating was compared to one with parameter updates. As a third approach, the ‘simultaneous optimization and data assimilation’ method is used, where states are updated as in the Ensemble Kalman filter, but parameters are updated by optimization. By using brightness temperature observations and given the external forcing, the prediction of soil moisture is tested. It is shown that the parameter updates help to improve these predictions and that the optimization is the best, but naturally most expensive method. 3. Outlook EnKF is appealing for predictions with integrated terrestrial system models, as the method provides an estimate of the complete posterior probability density function of states and/or parameters at a reasonable cost. However, limitations of EnKF encountered by applications in single terrestrial compartments are expected to be more serious for studies that include the vadose zone or multiple terrestrial compartments. In this special issue, and in other publications, it is shown that improved performance can be achieved with variants of EnKF, or iterative approaches. We expect that in the future also hybrid approaches of EnKF and the particle filter will be increasingly applied and further developed. In addition, more experience should be acquired on data assimilation for the vadose zone. Besides methodological improvements, also engineering type solutions could improve the filter performance and in addition more insight should be gained on ensemble generation, ensemble size, tuning parameters, localization and inflation and treatment of bias. It is also important to explore new data types that become available for data assimilation, especially from remote sensing, but also from terrestrial observatories [25]. The large number of different data types available for the land surface (from remote sensing) requires more testing on multivariate data assimilation strategies. In spite of the large number of different data types, the number of published articles where many different data types are assimilated simultaneously is limited [15]. Finally, if integrated models for terrestrial systems encompass compartments that act on very different time scales (e.g., a fast atmosphere, a slower soil moisture reservoir and very slow aquifers and resistant carbon pools), the question should also be raised whether one integrated data assimilation system is optimal for those conditions. Would it for example be better to have different systems for different terrestrial compartments, which are interlinked? We expect that these and other questions will be important for future research on data assimilation for terrestrial systems. Acknowledgments Financial support from the Deutsche Forschungsgemeinschaft for the research unit FOR2131 is gratefully acknowledged. References [1] Abaza M, Anctil F, Fortin V, Turcotte R. Exploration of sequential streamflow assimilation in snow dominates watersheds. Adv Water Resour 2015. [2] Arulampalam MS, Maskell S, Gordon N, Clapp T. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans Signal Process 2002;50:174–88. [3] Bazargan H, Christie M, Elsheikh AH, Ahmadi M. Surrogate accelerated sampling of reservoir models with complex structures using sparse polynomial chaos expansion. Adv Water Resour 2015.
259
[4] Chen Y, Zhang DX. Data assimilation for transient flow in geologic formations via Ensemble Kalman Filter. Adv Water Resour 2006;29:1107–22. [5] Chen W, Huang C, Shen H, Li X. Comparison of ensemble-based state and parameter estimation methods for soil moisture data assimilation. Adv Water Resour 2015. [6] Dong J, Steele-Dunne SC, Ochsner TE, van de Giesen N. Determining soil moisture by assimilating soil temperature measurements using the Ensemble Kalman Filter. Adv Water Resour 2015. [7] Erdal D, Rahman MA, Neuweiler I. The importance of state transformations when using the Ensemble Kalman filter for unsaturated flow modeling: dealing with strong nonlinearities. Adv Water Resour 2015. [8] Evensen G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics. J Geophys Res Oceans 1994;99(C5):10143–62. [9] Ghorbanidehno H, Kokkinaki A, Yue Li J, Darve E, Kitanidis PK. Real-time data assimilation for large-scale systems: the spectral Kalman filter. Adv Water Resour 2015. [10] González-Nicolás A, Baú D, Alzraiee A. Detection of potential leakage pathways from geological carbon storage by fluid pressure data assimilation. Adv Water Resour 2015. [11] Hut R, Amisigo BA, Steele-Dunne S, van de Giesen N. Reduction of used memory Ensemble Kalman filtering (RumEnKF): a data assimilation scheme for memory intensive, high performance computing. Adv Water Resour 2015 this issue. [12] Kalnay E. Atmospheric Modelling, Data Assimilation And Predictability. Cambridge University Press; 1992. [13] Li L, Srinivasan S, Zhou H, Gomez-Hernandez JJ. Two-point or multiple point statistics? A comparison between the ensemble Kalman filtering and the ensemble pattern matching inverse methods. Adv Water Resour 2015. [14] Liu N, Oliver DS. Ensemble Kalman filter for automatic history matching of geologic facies. J Petroleum Sci Eng 2005;47(3-4):147–61. [15] Montzka C, Pauwels VRN, Franssen HJH, Han XJ, Vereecken H. Multivariate and Multiscale Data Assimilation in Terrestrial Systems: A Review. Sensors 2012;12(12):16291–333. [16] Moradkhani H, Sorooshian S, Gupta HV, Houser PR. Dual state-parameter estimation of hydrological models using ensemble kalman filter. Adv Water Resour 2005;28:135–47. [17] Pasetto D, Niu GY, Pangle L, Paniconi C, Putti M, Troch PA. Impact of sensor failure on the observability of flow dynamics at the Biosphere 2 LEO hillslopes. Adv Water Resour 2015. [18] Pauwels VRN, De Lannoy GJM. Error covariance calculation for forecast bias estimation in hydrologic data assimilation. Adv Water Resour 2015. [19] Pauwels VRN, Balenzano A, Satalino G, Skriver H, Verhoest NEC, Mattia F. Optimization of Soil Hydraulic Model Parameters Using Synthetic Aperture Radar Data: An Integrated Multidisciplinary Approach. IEEE Trans Geosci Remote Sens 2009;47(2):455–67. [20] Reichle RH, Walker JP, Koster RD, Houser PR. Extended versus Ensemble Kalman filtering for land data assimilation. J. Hydrometeorol 2002;3:728–40. [21] Shrestha P, Sulis M, Masbou M, Kollet S, Simmer C. A scale-consistent terrestrial systems modeling platform based on COSMO, CLM, and ParFlow. Monthly Weather Rev 2015;142(9):2466–83. [22] van Leeuwen PJ. Particle Filtering in Geophysical Systems. Monthly Weather Rev 2009;137(12):4089–114. [23] Vrugt JA, Diks CGH, Gupta HV, Bouten W, Verstraten JM. Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation. Water Resour Res 2005;41. http://dx.doi.org/10.1029/ 2004WR003059. [24] Xu T, Gomez-Hernandez JJ. Inverse sequential simulation: Performance and implementation details. Adv Water Resour 2015. [25] Zacharias S, Bogena H, Samaniego L, et al. A network of terrestrial environmental observatories in Germany. Vadose Zone J 2011;10(3):955–73. [26] Zhang D, Madsen H, Ridler ME, Refsgaard JC, Jensen KH. Impact of uncertainty description on assimilating hydraulic head in the MIKE SHE distributed hydrological model. Adv Water Resour 2015.
Harrie-Jan Hendricks Franssen∗ Forschungszentrum Jülich GmbH, Agrosphere (IBG-3), Wilhelm Johnen Strasse, 52428 Jülich, Germany Insa Neuweiler1 Institute of Fluid Mechanics, University of Hannover, Appelstrasse 9a, 30167 Hannover, Germany ∗
Corresponding author. Tel.: +492461614462. E-mail address:
[email protected] (H.-J.H. Franssen) 1 Tel.: +495117623567. Received 15 October 2015