An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting

An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting

Journal of Hydrology xxx (2014) xxx–xxx Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhy...

2MB Sizes 3 Downloads 130 Views

Journal of Hydrology xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting Yuan Li a,b, Dongryeol Ryu a,⇑, Andrew W. Western a, Q.J. Wang b, David E. Robertson b, Wade T. Crow c a b c

Department of Infrastructure Engineering, The University of Melbourne, Parkville, Victoria 3010, Australia CSIRO Land and Water Flagship, P.O. Box 56, Highett, Victoria 3190, Australia USDA-ARS Hydrology and Remote Sensing Laboratory, Rm. 104, Bldg. 007, BARC-W, Beltsville, MD 20705-2350, USA

a r t i c l e

i n f o

Article history: Available online xxxx Keywords: Ensemble Kalman filter Ensemble Kalman smoother Maximum a posteriori estimation Real-time flood forecasting

s u m m a r y For operational flood forecasting, discharge observations may be assimilated into a hydrologic model to improve forecasts. However, the performance of conventional filtering schemes can be degraded by ignoring the time lag between soil moisture and discharge responses. This has led to ongoing development of more appropriate ways to implement sequential data assimilation. In this paper, an ensemble Kalman smoother (EnKS) with fixed time window is implemented for the GR4H hydrologic model (modèle du Génie Rural à 4 paramètres Horaire) to update current and antecedent model states. Model and observation error parameters are estimated through the maximum a posteriori method constrained by prior information drawn from flow gauging data. When evaluated in a hypothetical forecasting mode using observed rainfall, the EnKS is found to be more stable and produce more accurate discharge forecasts than a standard ensemble Kalman filter (EnKF) by reducing the mean of the ensemble root mean squared error (MRMSE) by 13–17%. The latter tends to over-correct current model states and leads to spurious peaks and oscillations in discharge forecasts. When evaluated in a real-time forecasting mode using rainfall forecasts from a numerical weather prediction model, the benefit of the EnKS is reduced as uncertainty in rainfall forecasts becomes dominant, especially at large forecast lead time. Ó 2014 Elsevier B.V. All rights reserved.

1. Introduction Flood forecasting is important for timely flood warning and emergency responses but is also subject to uncertainties in input data, initial states, model structure and parameters (Sene, 2008). Data assimilation provides an effective way to integrate observation information, such as gauged discharge data and remotely sensed data, into hydrologic models to reduce these uncertainties (Liu et al., 2012). Past research has demonstrated that assimilating remotely sensed data, such as soil moisture (Crow and Ryu, 2009; Flores et al., 2012), evapotranspiration (Zhang et al., 2009), snow cover and snow water equivalent (Andreadis and Lettenmaier, 2006; He et al., 2012; Slater and Clark, 2006) can reduce the uncertainty in model outputs. However, assimilating discharge observations is still a more popular choice for real-time flood forecasting (Clark et al., 2008a; He et al., 2012; Komma et al., 2008; Lee et al., 2011; Lee et al., 2012; Ricci et al., 2011; Seo et al., 2009; Thirel et al., 2010), as discharge observations are more directly ⇑ Corresponding author. Tel.: +61 3 8344 7115. E-mail address: [email protected] (D. Ryu).

related to streamflow forecasts and, in many basins, are readily available for operational application. Various data assimilation methods have been proposed for integrating discharge data into hydrologic models, including both sequential and variational techniques (Liu et al., 2012). Variational data assimilation has been used for operational forecasting systems (Lee et al., 2012; Seo et al., 2009); however, as a deterministic approach, it is less compatible with probabilistic real-time forecasting than sequential techniques. As a sequential and stochastic approach, the ensemble Kalman filter (EnKF) can improve streamflow forecasts by integrating observations to sequentially update model states, e.g., soil moisture (Komma et al., 2008; Li et al., 2013; McMillan et al., 2013; Thirel et al., 2010), and snow water equivalent (DeChant and Moradkhani, 2011b; He et al., 2012). In addition, the EnKF is sometimes applied to simultaneously update model states and parameters (Moradkhani et al., 2005b; Nie et al., 2011; Wang et al., 2009a). Despite the existence of more complex competing approaches, such as the particle filter (PF) (DeChant and Moradkhani, 2011a; Salamon and Feyen, 2009) and particle Markov chain Monte Carlo (PMCMC) methods (Andrieu et al., 2010; Moradkhani et al., 2012; Vrugt et al., 2013), the EnKF still remains

http://dx.doi.org/10.1016/j.jhydrol.2014.08.009 0022-1694/Ó 2014 Elsevier B.V. All rights reserved.

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

2

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

a popular choice for operational streamflow forecasting due to its computational efficiency and ability to directly generate ensemble forecasts (Clark et al., 2008a; He et al., 2012; Komma et al., 2008; McMillan et al., 2013). One issue with the standard EnKF is that it is not designed to account for the time lag between errors in soil moisture and in discharge originating from water travel time between the hillslope and basin outlet. For hydrologic models whose routing processes are fully simulated through conceptual storage states, e.g., the probability distributed model (PDM) (Li et al., 2011; Li et al., 2013) and the conceptual hydrologic model (HyMOD) (Moradkhani et al., 2005a), the lagged errors accumulate in routing storages and can be effectively corrected by real-time filtering methods. In these cases, there is no need to account for time lags in the data assimilation procedure (Li et al., 2013; Moradkhani et al., 2005b). However, for other models whose routing processes are fully or partially simulated by unit hydrographs, e.g., the Hydrologiska Byrns Vattenbalansavdelning (HBV-96) model (Pauwels and De Lannoy, 2009) and the modèle du Génie Rural à 4 paramètres Journalier (GR4J model) (Li et al., 2013), the errors are lagged via unit hydrographs, which are difficult to directly update via stream discharge assimilation. For models employing unit hydrographs, the standard EnKF is not an ideal tool to correct errors in the antecedent states (Li et al., 2013; McMillan et al., 2013; Pauwels and De Lannoy, 2009). To address this time lag issue, various new approaches have been investigated, which include shifting observations backwards by a fixed number of steps (Weerts and El Serafy, 2006), updating antecedent model states with a fixed lag (Meier et al., 2011), mapping the observational information to states within a time window through a retrospective EnKF (Pauwels and De Lannoy, 2006; Pauwels and De Lannoy, 2009), a recursive EnKF (McMillan et al., 2013), and an EnKS (Li et al., 2013). Even though the recursive EnKF (McMillan et al., 2013) modifies model states within a past time window, it updates antecedent and present model states iteratively. Due to its ‘iterative’ update-prediction process which uses the same observation multiple times, there is a potential risk of over-correcting the state error. The retrospective EnKF updates past and present states within a time window simultaneously then, however; it reruns the whole model from the beginning of the past analysis time window to obtain the model prediction. This means that corrected model states within the time window, except for the initial states, are not used for prediction. Consequently, error accumulates throughout the analysis time window and the benefits of considering a time lag can become marginal (Pauwels and De Lannoy, 2009). The EnKS with a fixed time window addresses the time lag issue in a theoretically robust and efficient manner and can produce more accurate forecasts than the standard EnKF; however, to date, the EnKS has only been assessed in synthetic studies (Li et al., 2013). As a result, its reliability in real data assimilation scenarios is uncertain. Sequential data assimilation updates model state estimates using a weighted average of the background model prediction and the observation. The weights used in the averaging are largely determined by a comparison of model and observation uncertainties. Therefore, besides the time lag issue, quantifying these uncertainties remains a significant challenge in hydrologic data assimilation (Liu et al., 2012). The uncertainty of discharge observations is normally represented by applying either additive (Pauwels and De Lannoy, 2009) or multiplicative Gaussian noise (Clark et al., 2008a; DeChant and Moradkhani, 2012; Komma et al., 2008) or a combination of both (Noh et al., 2011). Model uncertainty is usually conceptually separated into several different sources including forcing error (e.g., precipitation, potential evapotranspiration, and temperature), model state error, parameter error, and model structure error. A large number of approaches have been applied

to represent modeling errors in uncertainty analyses and/or data assimilation studies. For instance, precipitation error can be assumed to be additive (Weerts and El Serafy, 2006) or multiplicative random noises (DeChant and Moradkhani, 2012) or to follow a more sophisticated error model based on the consideration of spatial-temporal correlation (Clark et al., 2008a; McMillan et al., 2013). Parameter error can be represented by direct perturbation (DeChant and Moradkhani, 2012; Nie et al., 2011) or simulated through Markov Chain Monte Carlo (MCMC) based approaches (Moradkhani et al., 2012; Vrugt et al., 2013). Model structural error can be characterized by multi-model predictions and addressed through Bayesian model averaging (Duan et al., 2007; Leisenring and Moradkhani, 2012). Finally, model state error can be characterized directly through absolute or relative noises (Li et al., 2013; Noh et al., 2011; Ryu et al., 2009) or propagated from forcing and other uncertainties (Komma et al., 2008; Weerts and El Serafy, 2006). While various error models can be applied, determining parameter values for these models is generally challenging. For discharge observations, it is possible to approximately quantify the uncertainty from the observation itself, for example, by analyzing the uncertainty of rating curves used to calculate discharge (Clark et al., 2008a). But it is hard to directly quantify forcing and model uncertainties. To address this challenge, two types of approaches have been suggested. One type is to use adaptive filtering techniques, which quantify observation and model errors during the online cycling of data assimilation (Crow and Reichle, 2008; Crow and van den Berg, 2010). However, these approaches rely on a number of assumptions, including the serial independence of observation errors. More sophisticated adaptive filtering approach, which require a reduced set of assumptions, have been developed (Crow and Yilmaz, 2014) but not widely applied or tested. A second approach for estimating error parameter are likelihood-based/Bayesian uncertainty estimation techniques which disaggregate the total differences between model predictions and observations into various sources of uncertainties. Specific examples include the generalized likelihood uncertainty estimation (GLUE) (Beven and Binley, 1992), the framework for understanding structural errors (FUSE) (Clark et al., 2008b), the Bayesian total error analysis (BATEA) (Kavetski et al., 2006; Renard et al., 2011), and the differential evolution adaptive metropolis (DREAM) algorithm (Vrugt et al., 2008). However, unlike the particle filtering techniques, these uncertainty estimation approaches have not been widely applied to quantify error parameters for data assimilation. One exception is the integrated uncertainty and ensemble-based data assimilation (ICEA) system (He et al., 2012), which use an MCMC-based uncertainty analysis tool to generate ensemble spread for the EnKF. In addition to the adaptive filtering and likelihood-based/Bayesian uncertainty estimation techniques, Leisenring and Moradkhani (2012) suggested a variable variance multiplier approach, which was further improved by Moradkhani et al. (2012). It rescales variance multipliers (error parameters) according to the accuracy of the mean predictions relative to a certain confidence interval (CI) of ensemble predictions. Therefore, it can complement the uncertainty estimation methods as an error variance corrector. The aim of this paper is to address the time lag issue and error estimation issue through an integrated data assimilation scheme. Specifically, the focus is twofold: (1) testing the EnKS in real data assimilation scenarios and (2) examining the potential of likelihood-based/Bayesian approaches to inform error parameters required for hydrologic data assimilation. To achieve the objective, an ensemble-based maximum a posteriori (MAP) estimation method, which is a relatively simple likelihood-based approach, is incorporated to quantify the model and observation error parameters. The observational error information drawn from the analysis of gauging data is used as an a priori constraint for the

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

3

MAP estimation. The EnKF and EnKS are implemented and compared in a real data assimilation case, an extended work of Li et al. (2013). The results are examined in both a hypothetical forecasting scenario forced by observed precipitation, and a real-time forecasting scenario forced by numerical weather prediction (NWP) forecast precipitation.

(low flows), the Nash–Sutcliffe efficiency of Box–Cox transformed flows (mid-range flows), the Kling Gupta Efficiency (variance and high flows) (Gupta et al., 2009), and the Bias skill score (mass balance error) (Misirli et al., 2002).

2. Methodology

The total difference between model prediction and observation is the combination of the model error and observation error. The model error originates from model input, initial state variables, parameters and model structure. In this paper, we represent the total error by perturbing three important elements in the data assimilation system: rainfall input data, model-based soil moisture storage estimates and assimilated discharge observations. Although there are additional error contributions from uncertainty in potential evapotranspiration (PET), error in model parameters and deficiencies in model structure, we assume that they accumulate in the soil water storage (S) and can be represented via the direct perturbation of S.

2.1. Rainfall-runoff model and its calibration The model used in this research is the GR4H model, a variant of the GR4J model operating on an hourly time step (Perrin et al., 2003). The model choice is based on the fact that it is one of the major candidate models for operational flood forecasting by the Australian Bureau of Meteorology. Preliminary testing of daily forecasting models shows that GR4J performs best among 13 candidate models within 240 Australian test catchments (Pagano et al., 2010). GR4H is a conceptual rainfall-runoff model with only two state variables, the soil water storage (S) and the routing water storage (R), and four parameters, including the maximum capacity of S (x1), the water exchange coefficient (x2), the ‘reference’ capacity of R (x3), and the length parameter of the unit hydrographs (x4) (Fig. 1). The rainfall (P) contribution after interception, Pn, is divided into two parts. One part, Ps, goes to S, which is then drained by evapotranspiration (E) and percolation. The other part forms the surface runoff, Pn–Ps, which is then mixed with the percolation as a total runoff Pr. The total runoff is partitioned into two routing components in a fixed ratio. Ninety percent of the runoff is routed by a combination of a unit hydrograph (UH1) and R. The other 10% of the runoff is routed by another unit hydrograph (UH2). The length of UH2 is twice UH1. The function F(x2) controls the ground water exchange between the modeled catchment and adjacent catchments. All four model parameters are calibrated using the shuffled complex evolution (SCE-UA) method (Duan et al., 1992). The objective function used here is a arithmetical average of the Nash–Sutcliffe efficiency (Nash and Sutcliffe, 1970) of log flows

2.2. Error parameter estimation

2.2.1. Observation error analysis from flow gauging data Compared with rainfall and state variables, it is relatively easy to obtain uncertainty information for the assimilated discharge observations. For a standard flow gauging station, the discharge is converted from the continuous measurement of water level via the application of rating curves. The rating curves are established and revised periodically by comparing water level data and discharge measurements obtained from a set of velocity meters and depth observations – flow gauging data. Therefore, assuming that the flow gauging measurements represent the true discharge, the observation uncertainty can be derived by comparing the water level based discharge with the flow gauging based discharge (Clark et al., 2008a; McMillan et al., 2013). Fig. 2a shows that the variance of errors calculated from the difference between water level based discharge (Q) and flow gauging based discharge (Qtruth) increases with flow magnitude. Normalizing the error by flow magnitude removes this heteroscedasticity (Fig. 2b). This suggests that an error model accounting for heteroscedasticity should be applied to the discharge observation:

Q 0 ¼ Q þ nq Q; E

P interception

En

Pn

Es Production x 1 store

Ps

Pn-Ps

UH 1

Pr

0.9

0.1 UH 2

x4

2·x 4

Q9 Routing store

x3

Q1 F (x 2)

F (x 2)

R Qr

where nq is Gaussian noise with zero mean and a standard deviation (eq) of 9.25% estimated from the data. This information is then used as an a priori distribution for the discharge observation uncertainty and is further analysed using maximum a posteriori estimation (MAP) calculated in Section 2.2.3. 2.2.2. Model errors Rainfall error is an important component of the model uncertainty; however, it is very difficult to estimate the error in the catchment average rainfall as it is typically calculated using a few samples from a rainfall field with unknown spatio-temporal correlation. A lognormal multiplicative error model (Eq. (2)) has been widely used to simulate rainfall error in hydrologic data assimilation (Crow et al., 2011; DeChant and Moradkhani, 2012; Nijssen and Lettenmaier, 2004):

S Perc

ð1Þ

Qd Q

Fig. 1. The structure of GR4H model (Li et al., 2013).

P0 ¼ np P;

ð2Þ

where np follows a lognormal distribution with the mean of np and standard deviation of ep. To create an unbiased rainfall ensemble, np is set to be 1. In most previous data assimilation studies, the variance of the rainfall multiplier was empirically specified, e.g., ep = 0.25 (25% error) was adopted in DeChant and Moradkhani (2012). In this study, ep is estimated through the MAP estimation scheme.

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

4

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

40

250

100×(Q−Qtruth)/Qtruth (%)

(a)

Q−Qtruth (m3/s)

30 20 10 0 −10

0

50

100

150

200

(b)

200 150 100 50 0 −50 −100

0

Qtruth (m3/s)

50

100

150

200

Qtruth (m3/s)

Fig. 2. The relationship between discharge errors and flow rate.

In accordance with the definition of the lognormal probability distribution model, the natural logarithm of np, denoted as nLN p , follows a normal distribution: 2 nLN p  Nðl; r Þ;

where

l ¼ lnðnp Þ  12 lnð1 þ

ep 2 np

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi e 2 Þ and r ¼ lnð1 þ np Þ.

ð3Þ

p

To consider the temporal correlation of the rainfall error, a first order autoregressive lognormal model (Troutman, 1978; Vogel et al., 1998), namely AR(1)-LN, is implemented in this study (instead of the more typical temporally independent lognormal model), such that

pffiffiffiffiffiffiffiffiffiffiffiffiffiffi LN 2 nLN p;t ¼ l þ aðnp;t1  lÞ þ dp;t r 1  a ;

ð4Þ

where a is the lag-one autocorrelation coefficient, which is estimated through the MAP estimation scheme, and dp,t is a standard normal noise. As mentioned above, the aggregate uncertainty in soil moisture storage stages is an accumulation of the uncertainties associated with model parameters, model structure, and forcing data. However there is limited knowledge about the error structure of the soil moisture storage, as it is a conceptual state variable. Therefore, following previous similar studies (Ryu et al., 2009), we use an additive Gaussian noise truncated by the upper and lower bounds of soil moisture to represent the soil moisture error.

S0 ¼ S þ n s ;

ð5Þ

where ns is the truncated Gaussian noise with zero mean and standard deviation of es, which is also estimated through the MAP estimation scheme. 2.2.3. Maximum a posteriori estimation of model and observation error parameters Based on the assumed error structure described in Section 2.2.2, four error parameters need to be specified: the standard deviations of soil moisture error (es), observed discharge error (eq) and rainfall error (ep), and the lag-one autocorrelation coefficient of rainfall error (a). A maximum a posteriori (MAP) estimation method (Sorenson, 1980) is used to estimate the error parameters. According to the Bayes’ theorem, the a posteriori probability of error parameters (H) with given observed discharge (Qobs), model parameters (h), precipitation (P), and potential evapotranspiration (PET) could be expressed as

pðHjQ obs;t ; h; Pt ; PET t ; t 2 T c Þ / LðHÞ Y pðQ obs;t jH; h; Pt0 !t ; PET t0 !t Þ; ¼ pðHÞ  t2T c

ð6Þ

where t, indicates the time step; t0 indicates the beginning of the calibration period; and Tc indicates the calibration period. pðHjQ obs;t ; h; Pt ; PET t ; t 2 T c Þ is the a posteriori probability of H. pðQ obs;t jH; h; Pt0 !t ; PET t0 !t Þ is the conditional probability density (likelyhood) of Qobs,t with given H, h, P t0 !t , and PET t0 !t . p(H) denotes the a priori probability density of H. The aim of the MAP here is to find an optimal estimate of H by maximizing L(H). To achieve this objective, the negative logarithm of L(H) is defined as an objective function and the SCE-UA is used to minimize L(H) so as to identify the global optimum of H. During each evaluation (iteration) of the objective function in SCE-UA, the pðQ obs;t jH; h; Pt0 !t ; PET t0 !t Þ in Eq. (6) is calculated through stochastic simulation of the hydrologic process. As the hydrologic process is usually highly nonlinear, the simulation is conducted by running the hydrologic model in an ensemble mode. The detailed description of the SCE-UA algorithm can be found in Duan et al. (1992). Here, only the specific procedure of one evaluation in SCE-UA is summarized as follow: Step 1:

A set of error parameters H is sampled from predefined error parameter space. FOR t = 1,. . ., Tc DO Step 2 to 4 (hydrologic modeling) Step 2: An ensemble of rainfall P0t and initial soil moisture S0t1 are generated through the perturbation schemes (Eqs. (2) and (5)) with errors sampled based on the predefined error models and sampled H. Step 3: An ensemble prediction of state variables (S0t and R0t ) and discharge (Q 0t ) are simulated by running the calibrated GR4H model. Then Q 0t is further perturbed by the observation error through Eq. (1) so as to get an ensemble prediction of observed discharge (Q 0obs;t ) instead of true discharge. Q 0obs;t is fitted to a Gaussian probability density function (PDF), Nðlpredict ; r2predict Þ, by calculating the lpredict and

r2predict from the ensemble prediction. Step 4

The pðQ obs;t jH; h; Pt0 !t ; PET t0 !t Þ in Eq. (6) is then calculated by finding the probability density of Qobs,t from Nðlpredict ; r2predict Þ.

END FOR (hydrologic modeling) Step 5 The a priori probability density of error parameters, p(H) in Eq. (6), is calculated by the product of the probability densities of the four error parameters. The standard deviation of the

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

5

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

Step 6

multiplicative observation error eq is assumed to follow an a priori Gaussian distribution with the mean of 9.25% and standard deviation of 0.925%. As there is no prior information of the other three error parameters, es, ep and a, they are assumed to follow non-informative uniform a priori distributions. The a posteriori probability of error parameters L(H) is then calculated through Eq. (6).

2.3. State updating methods A lag-aware data assimilation scheme based on an EnKS with a fixed time window is implemented for state updating and forecasting. The EnKF is used as a control case to demonstrate the comparative performance of the EnKS. 2.3.1. Updating algorithms Both ensemble Kalman filters and smoothers employ an iteration of two steps: model prediction and state updating. For most climate and hydrologic models, the model prediction can be constructed in the form of a first order Markov process in which an ensemble of state variables is transferred within the state space and into the observation space through: iþ i i xi t ¼ f ðxt1 ; ut ; hÞ þ xt

ð7Þ

i = 1, 2. . .n,

^it ¼ hðxi y t ; hÞ;

ð8Þ

where xiþ t1 is the state vector updated at the previous time step; t1; xi t is the predicted state vector at the current time step, t; uit is the input data vector perturbed by forcing error; h is the model ^ it is the obserparameter vector; xit is the model state error vector; y vation prediction vector (discharge in this case); and i enumerates a member of a size-n ensemble. In the state updating step, an EnKF updates the states at current step via: i i ^i xiþ t ¼ xt þ Kt ðy t  y t Þ;

ð9Þ

yit ¼ yt þ git ;

Rxy t

Rxt t y B Rxt1 y B ¼B t @ 

x y Rt tkþ1

1

yy y Kt ¼ Rxy t ½Rt þ Rt  ;

ð11Þ

where Rxy t is the cross covariance matrix of the state variables and observation prediction; Ryy is the error covariance matrix of the t observation prediction; and Ryt is the observation error covariance matrix. The EnKS with fixed time window updates the current and past states via Eqs. (12) and (10): i xiþ t!tkþ1 ¼ xt!tkþ1 þ

Kt ðyit

^it Þ; y

ð12Þ

where k is the size of the fixed time window. The Kalman gain is calculated as: 1

yy y Kt ¼ Rxy t ½Rt þ Rt  ;

ð13Þ

where is the cross covariance matrix of all the state variables within the fixed time window and current observation prediction:

1 C C C: A

ð14Þ

For models constructed in a first order Markov process form (Eqs. (9) and (10)), the EnKF and the EnKS provide an identical analysis of current model states (Evensen and van Leeuwen, 2000), and have the same effect on forecasts (Li et al., 2013). However, for rainfall-runoff models with unit hydrograph based routing, such as GR4H, the unit hydrographs are difficult to directly update and are usually treated as part of the observation operator (Li et al., 2013; Pauwels and De Lannoy, 2009; Weerts and El Serafy, 2006). In this case, the observation operator links the current observation not only to the current states but also to past model states, which could be described as a Markov process of order m:

^it ¼ hðxi y t!tmþ1 ; hÞ;

ð15Þ

where m is a backward time step contributing to the current observation prediction, e.g., the maximum length of the unit hydrograph. 2.3.2. Updating scheme Both the EnKF and the EnKS are implemented to update two state variables in GR4H. For the EnKS, the size of the time window k is set equal to the length of UH2. Fig. 3 illustrates the EnKS updating scheme employed here. Current and antecedent S and R within the fixed time window are updated directly by the smoother against the current observed discharge, whilst the water detained within UH1 and UH2 are updated through the model dynamics. 2.4. Evaluation methods There are two important aspects of an ensemble forecast: accuracy and reliability. The accuracy indicates how close the model forecast is to the truth/observation, while the reliability describes the capability of the ensemble spread to represent the real probabilistic uncertainty of the forecast (Jolliffe and Stephenson, 2003). In this study, the accuracy of ensemble forecasts is evaluated through temporal mean of the ensemble root mean squared error (MRMSE), the root mean squared error (RMSE) of the ensemble mean, the Nash Sutcliffe efficiency coefficient (NS) of the ensemble mean and the mean error (BIAS), which are described in Eqs. (16)– (19) respectively:

ð10Þ

where yt is the observation vector; yit is the perturbed observation vector; git is the observation error; and Kt is the Kalman gain matrix. Kt can be calculated as:

Rxy t

0

MRMSE ¼

1 XL t¼1 L

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 1 XN ðQ ipre;t  Q obs;t Þ ; i¼1 N

ð16Þ

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 XL RMSE ¼ ðQ pre;t  Q obs;t Þ2 ; t¼1 L

ð17Þ

PL ðQ pre;t  Q obs;t Þ2 NS ¼ 1  Pt¼1 ; 2 L  t¼1 ðQ obs;t  Q obs Þ

ð18Þ

BIAS ¼

1 XL ðQ pre;t  Q obs;t Þ; t¼1 L

ð19Þ

where Q pre;t is the predicted ensemble mean; Q ipre;t is the ith realiza obs is tion of predicted discharge; Q obs;t is the observed discharge; Q the temporal mean of observed discharge; N is the ensemble size; L is the total time steps of evaluation (updating/forecasting) period. The reliability is evaluated using a rank histogram approach (Hamill, 2001). To construct the rank histogram, N ensemble members in the forecast at time t are ranked in a monotonic order, and the forecast space is delineated into N + 1 intervals. The frequencies of observation falling into these intervals are then plotted into a histogram. Ideally, as the ensemble is generated from a given

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

6

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

Updating

St-n+1

St-2

St-1

St

UH1t-n+1

UH1t-n+2

UH1t-1

UH1t

UH2t-n+1

UH2t-n+2

UH2t-1

UH2t

Rt-n+1

Rt-n+2

Rt-1

Rt

t=t+1

Prediction

Qp, t

St-n+1

St-2

Qo, t

St-1

St

St+1

UH1t-n+1

UH1t-n+2

UH1t-1

UH1t

UH1t+1

UH2t-n+1

UH2t-n+2

UH2t-1

UH2t

UH2t+1

Rt-n+1

Rt-n+2

Rt-1

Rt

Rt+1 Qp, t+1

Fig. 3. The updating scheme of the EnKS. The red boxes indicate the state variables updated against the observation and the blue boxes indicate how the updates are transferred to predicted states and discharge. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

distribution, the probability associated with each ensemble member should be equal. Therefore, a relatively flat rank histogram indicates a reliable ensemble spread; a U-shaped histogram indicates the ensemble spread is too narrow; and an asymmetric histogram is a sign of bias.

30 years from 1971 to 2000, major floods occurred in 1974, 1993, 1996 and 1998. The town of Myrtleford was affected by three out of these four events.

3. Study area and data

3.2.1. Observed data Observed data used in this paper include hourly precipitation depth, monthly averaged PET and hourly accumulated discharge from 1999 to 2010. The data from 1999 to 2004 are used for model calibration and uncertainty estimation, whilst the data from 2005 to 2010 are used for evaluating the state updating schemes. The PET is the catchment average derived from the gridded (5 km) Australian Water Availability Project (AWAP) data (Raupach et al., 2009; Raupach et al., 2012). The precipitation data is interpolated from 14 rain gauges inside or adjacent to the catchment (Fig. 4) using the inverse distance weighting method with a power of 2, then averaged over the whole study basin. The discharge data is observed by Myrtleford discharge observation station (station code: 403210) at the outlet of the catchment.

3.1. Study area The study area is a 1210 km2 catchment located upstream of Myrtleford in the Ovens River basin located in southeast Australia (Fig. 4). The catchment drains the northern slopes of the Great Dividing Range and is mainly covered by eucalyptus forest. The mean annual flow is 1470 ML/Day (440 mm/annum) and the major flows typically occur between July and October (Fig. 5), according to the records from the Victoria Water Resources Data Warehouse (http://www.vicwaterdata.net/vicwaterdata). The Ovens River basin is prone to flooding. According to records from Flood Victoria (http://www.floodvictoria.vic.gov.au), in the

3.2. Data

Fig. 4. Study basin – Myrtleford sub-basin of Ovens River basin.

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

7

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

Monthly Flow (m3/s)

35 30 25 20 15 10 5 0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Month

set, the four error parameters (es, eq, ep, and a) are then estimated using MAP and the same set of observations (i.e., 1999–2004). Then the EnKF and the EnKS with an ensemble size of 100 are implemented from 2005 to 2010 to continuously assimilate real discharge observations for state updating. The hypothetical forecasts with a maximum lead time of 60 h are constructed for every hourly time step from 2005 to 2010 after data assimilation. For the hypothetical forecasts, the rainfall forcing is perturbed using the rainfall observation error fitted with MAP. Finally, real-time forecasts are issued using the post processed NWP rainfall forecasts starting at 9:00 LST each day with lead times up to 60 h throughout 2010. 4.1. Model calibration

Fig. 5. Annual pattern of flow at Myrtleford (1999–2010).

3.2.2. NWP-based precipitation The error in the NWP rainfall forecasts is an important source of uncertainty in real-time flood and short-term streamflow forecasting. In Australia, the Australian Community Earth Systems Simulator (ACCESS) model forms the basis of a suite of operational NWP models (Australian Bureau of Meteorology, 2010). The ACCESS NWP models provide deterministic rainfall forecasts and the raw output usually contains systematic biases (Shrestha et al., 2013). In this study, a post-processing approach following Robertson et al. (2013) is incorporated to create bias-free ensemble rainfall forecasts for the state updating scheme evaluation in a real-time scenario equivalent to operational conditions. Deterministic NWP rainfall forecasts from January 2010 to August 2011 (approximately 600 daily forecasts) are obtained from the regional version of ACCESS (ACCESS-R). Forecasts issued at 12:00 UTC are used. The forecast output is on hourly time steps for lead times up to 72 h. However, as operational streamflow forecasts are issued at 23:00 UTC (09:00 LST), the first 11 h of NWP rainfall forecasts are discarded in post processing, which means that post-processed flow forecasts have lead times of 0–60 h. The post processing method uses a simplified version of the Bayesian joint probability approach (Wang et al., 2009b) to generate forecast probability distributions for individual locations and forecast periods. Ensemble forecasts with appropriate space–time correlations are then generated by linking samples from the forecast probability distributions using the Schaake shuffle (Clark et al., 2004). A more detailed description of the post processing can be found in Robertson et al. (2013). Post processed NWP rainfall forecasts with an ensemble size of 100 for the entire year of 2010 (365 forecasts) are used to evaluate the data assimilation schemes for the real-time forecasting scenarios.

Tables 1 and 2 list calibrated model parameters and summarise model performance during the calibration period, respectively. The calibration process gives good results in terms of mass balance, variance and accuracy (Table 2). The predicted mean is about 5.1% lower than the observed mean, indicating that the model is slightly underestimating discharge. However, the predicted standard deviation is about only 4.4% lower than the observed standard deviation, indicating that the model is able to capture the variance of the discharge. The Nash Sutcliffe coefficient of efficiency and the coefficient of determination indicate good model calibration. The x4 parameter is crucial to the EnKS since it is related to the time lag between soil water storage and discharge. A value of 21.28 h for x4 implies that discharge is contributed from soil water storages for 22  2 antecedent time steps. Therefore, the fixed time window for the EnKS is set equal to 44 h. 4.2. Error parameter estimation

The model is calibrated against the observed discharge from 1999 to 2004. Using the calibrated catchment model parameter

Using MAP with the SCE-UA global optimization approach, the four error parameters (es, eq, ep and a) are estimated (Table 3). The calibration process converged after around 9000 model evaluations, which is much longer than the model parameter calibration convergence time, even though the number of calibrated parameters is the same (4). Repeating the calibration process using a different random number generator seed results in similar, but not identical, sets of error parameters. For example, es varies between 2.2 mm and 2.3 mm. This is because each evaluation is based on a different set of stochastic realizations. However, we have tested the EnKF and EnKS for different combinations of error parameters, and their results do not differ significantly. This is to be expected because the model states are continuously updated and the small realization-to-realization changes in derived error parameters will not cause significant changes in assimilation results. The results suggest the analyzed rainfall error follows a multiplicative log-normal distribution with a standard deviation of 0.31 (equivalent to 31% error on average) and lag-one correlation of 0.36. The error added to the soil moisture, 2.28 mm (0.4% of x1), is relatively large compared with assumptions in some

Table 1 Optimal model parameters.

Table 3 MAP error parameters.

4. Results and discussions

x1 (mm)

x2 (mm)

x3 (mm)

x4 (h)

a

ep

es (mm)

eq

545.30

2.70

149.83

21.28

0.36

0.31

2.28

0.097

Table 2 Calibrated model performance. Observed mean (m3/s)

Predicted mean (m3/s)

Observed std. (m3/s)

Predicted std. (m3/s)

NS efficiency coeff.

Coeff. of determination

12.93

12.26

17.51

16.73

0.83

0.83

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

8

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

0.5

0.5

(a) Calibration

(b) E valuation 0.4

Probability

Probability

0.4 0.3 0.2 0.1

0.3 0.2 0.1

0

1

2

3

4

5

6

7

8

0

9 10

1

2

3

4

5

6

7

8

9 10

Fig. 6. The rank histogram of ensemble prediction based on the estimated error parameters. (a) is for calibration period (1999–2004) while (b) is for evaluation period (2005– 2010). Red line indicates a uniform distribution of the probabilities. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Discharge (m3/s)

500 400 300 200 100 0

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Month Fig. 7. Ensemble spread (95% CI) of predicted discharge in 2010. The solid curve is the observed discharge while the gray area is the 95% confidence interval of predictions, similarly hereinafter.

previous studies (Crow and Ryu, 2009; Li et al., 2013). However, the discharge error with a standard deviation of 9.7% of the observed discharge is consistent with that estimated in some previous studies (Clark et al., 2008a). It is important to point out that the estimation of eq is sensitive to the a priori density used. In this study, eq is assumed to follow an a priori Gaussian distribution with mean of 9.25% and standard deviation of 0.925%, which constrains eq. The larger the standard deviation is, the larger the difference between estimated eq and the a priori mean value will be. A rank histogram analysis indicates that the error estimation gives a reliable spread of the ensemble discharge in the calibration period, as the rank histogram exhibits a relatively uniform distribution – the slight underestimation of discharge is likely caused by the model calibration rather than the error parameter estimation (Fig. 6a). However, the model tends to overestimate the discharge in the evaluation period as the probability that observed discharge falls into the range with low ranked predictions is higher than the rest (Fig. 6b). Fig. 7 shows the ensemble spread of predicted discharge in 2010. The 95% confidence interval (CI) covers most of the observations but the model is more likely to overestimate than underestimate discharge, especially during low flow periods. The different reliabilities between the calibration and evaluation periods probably stem from the fact that the catchment is very dry from 2006 to 2009, relative to the calibration period (1999–2004). However, the last year of the evaluation period (2010) is quite wet. 4.3. Hypothetical forecasting with observed rainfall 4.3.1. Forecast skill Fig. 8 summarizes the accuracy skill scores (MRMSE, RMSE, NS, BIAS) for hypothetical discharge forecasts based on updated states for lead times up to 60 h. Figs. 9 and 10 plot the ensemble spread (95% confidence interval) with lead times of 6 and 60 h for two different time periods.

It is expected that state updating should improve the accuracy of the discharge forecasts and the improvement for the EnKS should be more significant than that of the EnKF (Li et al., 2013). Firstly, compared with the open loop (black line), state updating significantly improves short-term forecasts (Fig. 8). The RMSE and MRMSE of 1-h lead time forecasts are reduced to around 60% and 80% of open loop values by the EnKS (red line), respectively. The BIAS is corrected from about 2.5 m3/s to nearly 0 m3/s and the NS efficiency coefficient is improved from about 0.88 to as high as 0.98. These improvements result from the state variable error correction achieved by continuously assimilating discharge observations. The improvements, however, decrease with increasing forecast lead time due to the uncertainties in the input data, model parameters and model structure, which accumulate with time. Secondly, the EnKS-based forecasts outperform the EnKF-based (blue line) forecasts. The RMSE and BIAS are consistently lower in EnKS based forecasts for all lead times. This is because, since the EnKF is not lag aware, it attributes errors from antecedent states only to current states, which leads to an overcorrection of the states (Li et al., 2013). The overcorrection results in spurious peaks and oscillations in the forecast discharge. Typical examples of spurious peaks in the EnKF discharge forecasts can be found around the end of June and start of October in 2009, emphasized by the red boxes 1 and 2 in Fig. 9. These two spurious peaks do not exist in the open loop forecasts (Fig. 9a) but appear after EnKF updating. Red boxes 3 and 4 in Fig. 10 illustrate two examples of oscillations in discharge forecasts. The first oscillation (enclosed by red box 3) seems to originate from a forcing data error, as it exists in each of the openloop, the EnKF and the EnKS forecasts; nevertheless, the oscillation is more serious in the EnKF case than in the EnKS case, which reveals the superiority of the EnKS. The second oscillation (enclosed by red box 4) is not found in the open loop (Fig. 10a) but appears only after the EnKF updating. Spurious peaks often occur when model states have been over-corrected due to a large difference between predictions and observations. The oscillations

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

9

10

10

8

8

RMSE (m3/s)

MRMSE (m3/s)

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

(a)

6 4 2 0

0

20

40

(b)

6 4 2 0

60

0

20

Leadtime (hour)

40

60

Leadtime (hour) 4

1

(d)

BIAS (m3/s)

3

NS

(c) 0.9

2 1 0

0.8

−1 0

20

40

60

Leadtime (hour)

0

20

40

60

Leadtime (hour)

Fig. 8. Accuracy scores of forecasts with observed forcing data and lead time up to 60 h (2005–2010). Black curves represent results from the open loop; blue curves represent results based on the EnKF; and red curves represent results based on the EnKS. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

often appear during the recession period when the states have been over-corrected to predict peak discharge and subsequently require some time to stabilize. The spurious peaks and oscillatory behavior of standard EnKFs can also be found in previous studies (Clark et al., 2008a; McMillan et al., 2013). The overcorrection issue can be reduced by using a lag-aware data assimilation scheme like the EnKS. However, it should be noted that GR4H has a routing storage R, through which 90% of runoff is routed (Fig. 1), and the errors accumulated in R can be corrected directly by a real-time filter. As mentioned above, only the antecedent water detained within the two unit hydrographs is additionally updated by model dynamics when the EnKS is used. Consequently, the majority of runoff passes through a routing store that can be directly updated. Therefore, the EnKS forecasts are expected to show greater benefit when there is no storage in the routing process to be updated directly by the smoother or filter (Li et al., 2013). 4.3.2. Reliability of ensemble spread Fig. 11 shows rank histograms for discharge predictions with lead times of 1, 6, 12, 24 and 60 h for the various scenarios. The open loop ensemble spread exhibits identical rank distribution with changing lead times. This is because there is no difference between predictions and forecasts, when the observed rainfall data are used. The open loop rank histograms indicate a positive bias in the forecasts. The rank histograms for the 1-h forecasts show that this bias can be alleviated by both the EnKF and the EnKS to some extent. The reliability decreases with increasing lead time. As implied by the ‘U’ shapes of rank histograms in the 60-h lead time forecasts, ensemble spread is under-predicted for longer lead time forecasts. This suggests that there are parameter and structure uncertainties that have not been adequately addressed by the EnKF and EnKS. Nevertheless, there is no obvious reliability difference between EnKF and EnKF forecasts. 4.3.3. Discussion Consistent with the previous synthetic study (Li et al., 2013), the EnKS improves the forecast accuracy compared with the EnKF. It is interesting to note that this improvement is more significant in

the real data implementation here than that in the earlier synthetic case examined in Li et al. (2013). For example, the MRMSE for the EnKS case is reduced by 13–17% compared with the EnKF case in the real data implementation whilst the improvement was 4–12% in the synthetic experiment with GR4H. Typically, data assimilation techniques tend to appear more effective in synthetic cases (relative to real data implementations) due to the complexity and uncertainty of various sources not fully incorporated in the ideal synthetic setup (Crow and Ryu, 2009; Weerts and El Serafy, 2006). In this study, the larger benefit in the real data case may be associated with the serial correlation of errors in real streamflow observations. A streamflow gauge is more likely to give positively biased discharge observation at the next time step if it is positively biased at the present time step due to the nature of the measurements. For this reason, the updating process will bring model predictions closer to the observations and the improvement will persist longer in the real data case if observations are continuously assimilated. But it should be noted that, in the real data case presented here, model forecasts are compared to observations instead of the truth as there is no way to obtain the absolute truth. Therefore, it remains unclear whether the improvement of using an EnKS is more significant in a real data case. Although the overcorrection issue is obvious when the time lag is not considered in the EnKF-based forecasts, there is also evidence that overcorrection occurs, albeit on a reduced scale, in EnKS-based forecasts. Firstly, spurious peaks and oscillations also occur in the EnKS case (Figs. 9 and 10). Secondly, the increase of RMSE with increasing lead time is steeper than the increase of MRMSE (Fig. 8). As RMSE is calculated based on the ensemble mean while the MRMSE is calculated based on all the ensemble members, it can be inferred that the spread increases more slowly than the mean error, which indicates that the EnKS has excessively reduced ensemble spread. Thirdly, the U-shape feature of the rank histograms becomes increasingly obvious with increasing lead time (Fig. 11), which also indicates the increasing under dispersion of the ensemble spread. The cause of the overcorrection is related to two main issues. On one hand, the state updating approaches attribute uncertainties from all sources – including inputs, parameters, model structures

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

10

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

Discharge (m3/s)

100

60 40 20

Discharge (m3/s)

0 100

Discharge (m3/s)

2

1

60 40 20

(c) 6−h Qf of EnKS

80 60 40 20 0 100

Discharge (m3/s)

(b) 6−h Qf of EnKF

80

0 100

(d) 60−h Qf of EnKF

80 60 40 20 0 100

Discharge (m3/s)

(a) Openloop Qf

80

(e) 60−h Qf of EnKS

80 60 40 20

0 08/06/2009

26/07/2009

13/09/2009

01/11/2009

Date Fig. 9. Ensemble spread (95% CI) of discharge forecasts with observed forcing (08/06/2009–25/11/2009). (b) and (c) are 6 h lead time forecasts based on the EnKF and EnKS. (d) and (e) are 60 h lead time forecasts based on the EnKF and EnKS. Red circles emphasize the relatively obvious different accuracies in the EnKF and the EnKS. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

as well as background states – into state variables, which may cause the overcorrection of states. This issue could be alleviated by addressing errors in parameters together with states through dual data assimilation (Moradkhani et al., 2005b; Nie et al., 2011), particle filters (DeChant and Moradkhani, 2012; Noh et al., 2011) or particle MCMC (Andrieu et al., 2010; Moradkhani et al., 2012; Vrugt et al., 2013) approaches. On the other hand, the observation error is likely to be auto-correlated. As discharge observations are continuously assimilated into the model, if the observation error is auto-correlated, the prediction error becomes correlated with the observation error, which violates error assumptions underlying the optimality of the EnKF and EnKS. As a consequence, the EnKS and the EnKF will both push the model prediction excessively closer to the observation. This is equivalent to assigning additional weight to observations. This issue may

potentially be addressed through the colored-noise Kalman Filter (Chui and Chen, 1991) but difficulties might exist in implementing an ensemble-based form of colored-noise filter or smoother. 4.4. Real-time forecasting with NWP forecast rainfall As real-time forecasting is conducted at 9:00 LST each day in 2010, there are limited samples to represent the real-time situation. The hypothetical forecasting is reanalyzed here for the same period to illustrate the effect of precipitation forecast error. The pairs of blue, red, and green lines in Fig. 12 show the 90% CI of the ensemble forecasts with lead times of 1–24, 25–48, and 49–60 h respectively for a selected period in 2010. The figures display four combinations of data assimilation schemes and forcing datasets, the EnKF (a and b) and the EnKS (c and d) for hypothetical

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

11

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

Discharge (m3/s)

500

Discharge (m3/s) Discharge (m3/s)

4

200 100

(b) 6−h Qf of EnKF

400 300 200 100 0 500

(c) 6−h Qf of EnKS

400 300 200 100 0 500

Discharge (m3/s)

3

300

0 500

(d) 60−h Qf of EnKF

400

4

300 200 100 0 500

Discharge (m3/s)

(a) Openloop Qf

400

(e) 60−h Qf of EnKS

400 300 200 100 0 01/08/2010

19/09/2010

06/11/2010

25/12/2010

Date Fig. 10. Ensemble spread (95% CI) of discharge forecasts with observed forcing (01/08/2010–25/12/2010).

forecasting (a and c) and real-time forecasting (b and d). Inferring from Fig. 12, the difference between forecasts based on the EnKF and the EnKS is not significant in both hypothetic and real-time forecasts. This may be because the number of samples (only 365 forecasts) is quite limited to represent the overall performance. Besides, the model tends to give better forecasts in 2010 (Fig. 10) compared with 2009 (Fig. 9), which indicates there is smaller space for the open loop forecasts to be improved and thus the less difference between the EnKS and EnKF. However, it is clear that the NWP rainfall forecasts bring more uncertainty to the discharge forecasts and this reduces the benefit of state updating as the lead time increases. Fig. 13 shows variations in MRMSE, RMSE, NS efficiency coefficient and BIAS with increasing lead time of up to 60 h. There is little difference between results in the hypothetical forecasting (solid curves) and real-time forecasting (dashed curves) scenarios for short lead times. However, it is obvious that error (MRMSE and

RMSE) increases and the NS efficiency coefficient decreases much more rapidly with increasing lead time in the real-time forecasting scenario due to the propagation of NWP rainfall forecast errors into the discharge forecasts. While state updating improves the accuracy of forecasts significantly compared with the open loop, it is hard to judge which updating scheme gives better forecasts. In the hypothetical forecasting scenario, the EnKS exhibits a smaller improvement in terms of MRMSE and BIAS compared with the EnKF. However, the EnKS tends to give worse forecasts with lead times from 42 h to 52 h in terms of RMSE and NS efficiency coefficient. In the real-time forecasting scenario, the EnKS always performs better than the EnKF in terms of MRMSE, BIAS, RMSE and NS efficiency coefficient. The large sample sizes in the hypothetical forecasting case (Section 4.3) shows that the EnKS outperforms the EnKF; however, the limited sample size and small relative difference in the real-time forecasting scenario makes it harder to be certain that the EnKS outperforms the EnKF in that case.

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

12

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

0

Probability

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

0.1 0

Probability

0.1 0

0.1

Probability

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

0.1 0

0.2 0.1 0

0.1

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

0.2

0

0

1 2 3 4 5 6 7 8 9 10

0.2

1 2 3 4 5 6 7 8 9 10

0.1

1 2 3 4 5 6 7 8 9 10

0.2

0

0.2

1 2 3 4 5 6 7 8 9 10

0.2

1 2 3 4 5 6 7 8 9 10

0.2

Probability

0.1

Probability

0

0.2

Probability

0.1

Probability

0.2

EnKS

Probability

Probability

EnKF

Probability

Probability

Probability

Probability

Probability

Probability

Openloop

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

Fig. 11. Rank histograms of discharge forecasts with observed forcing and lead time of 1, 6, 12, 24 and 60 h. Red lines indicate a uniform distribution of the probabilities. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

800

(a) EnKF with observed rainfall

Discharge (m3/s)

Discharge (m3/s)

800 600 400 200

0 18/08

24/08

30/08

05/09

(b) EnKF with NWP rainfall

600 400 200 0 18/08

11/09

Date (2010) 800

(c) EnKS with observed rainfall

600 400 200 0 18/08

24/08

30/08

05/09

30/08

05/09

11/09

Date (2010)

Discharge (m3/s)

Discharge (m3/s)

800

24/08

11/09

Date (2010)

(d) EnKS with NWP rainfall

600 400 200 0 18/08

24/08

30/08

05/09

11/09

Date (2010)

Fig. 12. Ensemble spread (90% CI) of discharge forecasts with the lead time from 1 to 60 h (an event from 18/08 to 11/09 in 2010). (a) and (c) are forecasts forced by observed rainfall while (b) and (d) are forecasts forced by NWP rainfall. The black curve is the observed discharge. The curves with color represent 90% CI of ensemble forecasts. More specific, blue sections of the curves represent 90% CI of 1–24 h forecasts; red sections represent 90% CI of 25–48 h forecasts; green sections represent 90% CI of 49–60 h forecasts. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

For the real-time case, the improvement in forecasts by using state updating compared with the open loop is most significant at the beginning of the forecasting period (i.e., lead times of 1–6 h) (Fig. 13). For longer lead times, the improvement becomes less significant, especially for RMSE and the NS efficiency coefficient. This is due to the propagation of rainfall errors in the

forecasting mode, which is consistent with the results shown in Section 4.3. Fig. 14 illustrates the rank histograms for the discharge forecasts in both hypothetical forecasting and real-time forecasting scenarios for lead times up to 60 h. It should be noted that, due to the limited number of samples, the forecasts with different lead

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

13

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

20

30

(b) 15

RMSE (m3/s)

MRMSE (m3/s)

(a) 20

10

10 5 0

0 0

20

40

0

60

20

40

60

Leadtime (hour)

Leadtime (hour) 1

6

0.9

4

BIAS (m3/s)

(d) NS

(c) 0.8 0.7 0.6 0

20

40

60

2 0 −2 0

20

Leadtime (hour)

40

60

Leadtime (hour)

Fig. 13. Accuracy scores of forecasts with lead time of up to 60 h (calculated for the whole 2010). Solid curves represent forecasts with observed rainfall while dash curves are forecasts with NWP rainfall. Black curves represent results from the open loop; blue curves represent results based on the EnKF; and red curves represent results based on the EnKS. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

Probability

0.1

EnKS

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

Probability

Probability

0.2

0

EnKF

Probability

Probability Probability

Real-time

Synthetic

Openloop

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

0.2 0.1 0

1 2 3 4 5 6 7 8 9 10

Fig. 14. Rank histograms of discharge forecasts with lead time from 1 to 60 h in hypothetical forecasting and real-time forecasting scenarios. Red lines indicate a uniform distribution of the probabilities. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

50

0

0

48

96

144

192

240

288

Precipitation (mm)

200

Discharge (m3/s)

times are mixed and summarized in a single rank histogram. The use of NWP rainfall has not changed the shape of rank histograms much, which implies that the ensemble spread of post-processed NWP rainfall properly represents the uncertainty of NWP forecasts. The under dispersion issue in the post-update analysis also exists in the real-time forecasting scenario. It is also interesting that the real-time forecasts still exhibit acceptable skill at the lead time of 60 h, e.g., the NS efficiency is still above 0.7. This may result from the soil moisture memory and routing that moderates the impact of uncertainties of NWP rainfall. To examine the delay caused by soil moisture and routing, a series of simple synthetic experiments are conducted in a 2-year period. A constant precipitation of 0.131 mm/h (averaged from 1999 to 2010) is applied for the 2-year period and a single 50 mm rainfall event (hourly) is assumed to appear at the beginning of the second year. To isolate the response of discharge to rainfall, the evapotranspiration (ET) and water exchange (F(x2)) are assumed to be zero, i.e., no water is drained by ET or F(x2) and all water goes to discharge. Fig. 15 illustrates the discharge responses to the assumed rainfall event. The discharge prediction reaches and stays at 44.1 m3/s (equal to a runoff depth of 0.131 mm/h) after several

0 336

Leadtime (hour) Fig. 15. Synthetic simulation of flow response. Red bar is a 50 mm hypothetic rainfall event; blue curve is discharge response. Black dashed line is the effective discharge amount driven by the constant rainfall without the 50 mm rainfall event. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

months’ warm-up period (the black dashed line in Fig. 15). When a 50 mm rainfall event is added, discharge goes up and peaks after 22 h. The discharge above 44.1 m3/s (called ‘‘effective discharge’’

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

14

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx

hereafter) essentially comes from the 50 mm rainfall event. The slow recession is caused by the use of the nonlinear routing storage (R). The ratio of 60-h (i.e. NWP lead time) cumulative discharge to 50 mm rainfall is 23.8%. As there is no ET and F(x2), this represents the ratio of rainfall arriving at the catchment outlet within 60 h. As only 23.8% of the antecedent rainfall reaches catchment outlet within 60 h due to the delay caused by soil moisture (percolation), fast and slow routing, it can be inferred that the uncertainty of the 60-h real-time discharge forecasts (Figs. 12 and 13) is to a large extent affected by the catchment initial conditions at the time of forecasting, rather than the effect of the NWP rainfall forecasts which is delayed by the percolation and routing process. 5. Conclusion In this study, an integrated error estimation and lag-aware data assimilation scheme is developed to assimilate discharge observations into a hydrologic model for real-time streamflow forecasting. The MAP error estimation helps to specify the error parameters more logically, and provides reliable ensemble spread. Both the EnKF and the EnKS correct errors and biases, thereby improving the accuracy and reliability of streamflow predictions relative to the open loop. The EnKS outperforms the EnKF in terms of accuracy in the hypothetical forecasting scenario; however, these two data assimilation methods do not differ much in terms of their reliability. This is because these two data assimilation algorithms are based on the same mathematical principles, and their abilities to reduce ensemble spreads should represent their abilities to correct realistic errors. Nevertheless, the EnKS is more capable of reducing errors in discharge predictions than the EnKF as it updates both current and antecedent model states. Forecasts based on the updated states show an under-dispersion of ensemble spread for longer lead times, i.e., in lead times longer than 24 h, indicating that there are parameter and structure errors which are not appropriately addressed by state updating. The benefit of using a lagaware data assimilation scheme becomes increasingly insignificant as lead time increases when using real-time NWP rainfall forecasts. The routing process in the GR4H model is only partially unit hydrograph based; therefore, the benefit of accounting for time lag issues is likely to be less significant in it than in models with based fully on unit hydrograph routing, e.g., the HBV96 model (Pauwels and De Lannoy, 2009). However, based on this study and the synthetic study by Li et al. (2013), it can be concluded that the EnKS always gives similar or better discharge forecasts. Therefore, it is better to use the EnKS rather than the EnKF in cases where time lag issues exist. Acknowledgements This work has been financially supported by a CSIRO research studentship and the Water Information Research and Development Alliance between the Australian Bureau of Meteorology and CSIRO Water for a Healthy Country Flagship. We would like to acknowledge Australian Bureau of Meteorology and Department of Environment and Primary Industries Water Resources Division for providing data. We would like to thank Dr Chun-Hsu Su, four anonymous reviewers, and the associate editor for their constructive comments that significantly improved this paper. References Andreadis, K.M., Lettenmaier, D.P., 2006. Assimilating remotely sensed snow observations into a macroscale hydrology model. Adv. Water Resour. 29 (6), 872–886. Andrieu, C., Doucet, A., Holenstein, R., 2010. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B 72 (3), 269–342.

Australian Bureau of Meteorology, 2010. Operational Implementation of the ACCESS Numerical Weather Prediction Systems, Australian Bureau of Meteorolgy, Melbourne. Beven, K., Binley, A., 1992. The future of distributed models: model calibration and uncertainty prediction. Hydrol. Process. 6 (3), 279–298. Chui, C.K., Chen, G., 1991. Kalman Filtering with Real-Time Applications. SpringerVerlag, New York, 195pp. Clark, M.P., Gangopadhyay, S., Hay, L., Rajagopalan, B., Wilby, R., 2004. The Schaake shuffle: a method for reconstructing space – time variability in forecasted precipitation and temperature fields. J. Hydrometeorol. 5 (1), 243–262. Clark, M.P., Rupp, D.E., Woods, R.A., Zheng, X., Ibbitt, R.P., Slater, A.G., Schmidt, J., Uddstrom, M.J., 2008. Hydrological data assimilation with the ensemble Kalman filter: use of streamflow observations to update states in a distributed hydrological model. Adv. Water Resour. 31 (10), 1309–1324. Clark, M.P., Slater, A.G., Rupp, D.E., Woods, R.A., Vrugt, J.A., Gupta, H.V., Wagener, T., Hay, L.E., 2008b. Framework for Understanding Structural Errors (FUSE): a modular framework to diagnose differences between hydrological models. Water Resour. Res. 44, W00B02. http://dx.doi.org/10.1029/2007wr006735. Crow, W.T., Reichle, R.H., 2008. Comparison of adaptive filtering techniques for land surface data assimilation. Water Resour. Res. 44, W08423. http://dx.doi.org/ 10.1029/2008wr006883. Crow, W.T., Ryu, D., 2009. A new data assimilation approach for improving runoff prediction using remotely-sensed soil moisture retrievals. Hydrol. Earth Syst. Sci. 13 (1), 1–16. Crow, W.T., van den Berg, M.J., 2010. An improved approach for estimating observation and model error parameters in soil moisture data assimilation. Water Resour. Res. 46, W12519. http://dx.doi.org/10.1029/2010wr009402. Crow, W.T., van den Berg, M.J., Huffman, G.J., Pellarin, T., 2011. Correcting rainfall using satellite-based surface soil moisture retrievals: the soil moisture analysis rainfall tool (SMART). Water Resour. Res. 47, W08521. http://dx.doi.org/ 10.1029/2011wr010576. Crow, W.T., Yilmaz, M.T., 2014. The auto-tuned land data assimilation system (ATLAS). Water Resour. Res. 50 (1), 371–385. DeChant, C.M., Moradkhani, H., 2011a. Improving the characterization of initial condition for ensemble streamflow prediction using data assimilation. Hydrol. Earth Syst. Sci. 15 (11), 3399–3410. DeChant, C.M., Moradkhani, H., 2011b. Radiance data assimilation for operational snow and streamflow forecasting. Adv. Water Resour. 34 (3), 351–364. DeChant, C.M., Moradkhani, H., 2012. Examining the effectiveness and robustness of sequential data assimilation methods for quantification of uncertainty in hydrologic forecasting. Water Resour. Res. 48, W04518. http://dx.doi.org/ 10.1029/2011wr011011. Duan, Q., Ajami, N.K., Gao, X., Sorooshian, S., 2007. Multi-model ensemble hydrologic prediction using Bayesian model averaging. Adv. Water Resour. 30 (5), 1371–1386. Duan, Q., Sorooshian, S., Gupta, H.V., 1992. Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resour. Res. 28 (4), 1015–1031. Evensen, G., van Leeuwen, P.J., 2000. An ensemble Kalman smoother for nonlinear dynamics. Mon. Weather Rev. 128 (6), 1852–1867. Flores, A.N., Bras, R.L., Entekhabi, D., 2012. Hydrologic data assimilation with a hillslope-scale-resolving model and L band radar observations: synthetic experiments with the ensemble Kalman filter. Water Resour. Res. 48, W08509. http://dx.doi.org/10.1029/2011WR011500. Gupta, H.V., Kling, H., Yilmaz, K.K., Martinez, G.F., 2009. Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling. J. Hydrol. 377 (1–2), 80–91. Hamill, T.M., 2001. Interpretation of rank histograms for verifying ensemble forecasts. Mon. Weather Rev. 129 (3), 550–560. He, M., Hogue, T.S., Margulis, S.A., Franz, K.J., 2012. An integrated uncertainty and ensemble-based data assimilation approach for improved operational streamflow predictions. Hydrol. Earth Syst. Sci. 16 (3), 815–831. Jolliffe, I.T., Stephenson, D.B., 2003. Forecast verification: a practitioner’s guide in atmospheric science. Chichester, West Sussex, England; Hoboken, NJ: J. Wiley, c2003. Kavetski, D., Kuczera, G., Franks, S.W., 2006. Bayesian analysis of input uncertainty in hydrological modeling: 1. Theory. Water Resour. Res. 42, W03407. http:// dx.doi.org/10.1029/2005wr004368. Komma, J., Bloschl, G., Reszler, C., 2008. Soil moisture updating by ensemble Kalman filtering in real-time flood forecasting. J. Hydrol. 357 (3–4), 228–242. Lee, H., Seo, D.-J., Koren, V., 2011. Assimilation of streamflow and in situ soil moisture data into operational distributed hydrologic models: effects of uncertainties in the data and initial model soil moisture states. Adv. Water Resour. 34 (12), 1597–1615. Lee, H., Seo, D.-J., Liu, Y., Koren, V., McKee, P., Corby, R., 2012. Variational assimilation of streamflow into operational distributed hydrologic models: effect of spatiotemporal scale of adjustment. Hydrol. Earth Syst. Sci. 16 (7), 2233–2251. Leisenring, M., Moradkhani, H., 2012. Analyzing the uncertainty of suspended sediment load prediction using sequential data assimilation. J. Hydrol. 468–469, 268–282. Li, Y., Ryu, D., Wang, Q.J., Pagano, T.C., Western, W.A., Hapuarachchi, P., Toscas, P., 2011. Assimilation of streamflow discharge into a continuous flood forecasting model. In: Blöschl, G., Takeuchi, K., Jain, S., Farnleitner, A., Schumann, A. (Eds.), Proceedings of Symposium H03 held during IUGG GA 2011 in Melbourne. IAHSAISH publication, IAHS Publication, Melbourne, Australia, pp. 107–113.

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009

Y. Li et al. / Journal of Hydrology xxx (2014) xxx–xxx Li, Y., Ryu, D., Western, A.W., Wang, Q.J., 2013. Assimilation of stream discharge for flood forecasting: the benefits of accounting for routing time lags. Water Resour. Res. 49 (4), 1887–1900. Liu, Y.Q., Weerts, A.H., Clark, M., Hendricks Franssen, H.J., Kumar, S., Moradkhani, H., Seo, D.J., Schwanenberg, D., Smith, P., van Dijk, A.I.J.M., van Velzen, N., He, M., Lee, H., Noh, S.J., Rakovec, O., Restrepo, P., 2012. Advancing data assimilation in operational hydrologic forecasting: progresses, challenges, and emerging opportunities. Hydrol. Earth Syst. Sci. 16 (10), 3863–3887. McMillan, H.K., Hreinsson, E.Ö., Clark, M.P., Singh, S.K., Zammit, C., Uddstrom, M.J., 2013. Operational hydrological data assimilation with the recursive ensemble Kalman filter. Hydrol. Earth Syst. Sci. 17 (1), 21–38. Meier, P., Frömelt, A., Kinzelbach, W., 2011. Hydrological real-time modelling in the Zambezi river basin using satellite-based soil moisture and rainfall data. Hydrol. Earth Syst. Sci. 15 (3), 999–1008. Misirli, F., Gupta, H.V., Sorooshian, S., 2002. Bayesian recursive estimation of parameter and output uncertainty for watershed models. In: Duan, Q., Gupta, H., Sorooshian, S., Rousseau, A., Turcotte, R. (Eds.), AGU Monograph: Advances in Model Calibration. Moradkhani, H., DeChant, C.M., Sorooshian, S., 2012. Evolution of ensemble data assimilation for uncertainty quantification using the particle filter-Markov chain Monte Carlo method. Water Resour. Res. 48, W12520. http://dx.doi.org/ 10.1029/2012WR012144. Moradkhani, H., Hsu, K.L., Gupta, H.V., Sorooshian, S., 2005a. Uncertainty assessment of hydrologic model states and parameters: sequential data assimilation using the particle filter. Water Resour. Res. 41, W05012. http:// dx.doi.org/10.1029/2004wr003604. Moradkhani, H., Sorooshian, S., Gupta, H.V., Houser, P.R., 2005b. Dual stateparameter estimation of hydrological models using ensemble Kalman filter. Adv. Water Resour. 28 (2), 135–147. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I – A discussion of principles. J. Hydrol. 10 (3), 282–290. Nie, S., Zhu, J., Luo, Y., 2011. Simultaneous estimation of land surface scheme states and parameters using the ensemble Kalman filter: identical twin experiments. Hydrol. Earth Syst. Sci. 15 (8), 2437–2457. Nijssen, B., Lettenmaier, D.P., 2004. Effect of precipitation sampling error on simulated hydrological fluxes and states: anticipating the global precipitation measurement satellites. J. Geophys. Res. 109, D02103. http://dx.doi.org/ 10.1029/2003JD003497. Noh, S.J., Tachikawa, Y., Shiiba, M., Kim, S., 2011. Applying sequential Monte Carlo methods into a distributed hydrologic model: lagged particle filtering approach with regularization. Hydrol. Earth Syst. Sci. 15 (10), 3237–3251. Pagano, T.C., Hapuarachchi, P., Wang, Q.J., 2010. Continuous rainfall-runoff model comparison and short-term daily streamflow forecast skill evaluation. CSIRO Land and Water. Pauwels, V.R.N., De Lannoy, G.J.M., 2006. Improvement of modeled soil wetness conditions and turbulent fluxes through the assimilation of observed discharge. J. Hydrometeorol. 7 (3), 458–477. Pauwels, V.R.N., De Lannoy, G.J.M., 2009. Ensemble-based assimilation of discharge into rainfall-runoff models: a comparison of approaches to mapping observational information to state space. Water Resour. Res. 45, W08428. http://dx.doi.org/10.1029/2008WR007590. Perrin, C., Michel, C., Andréassian, V., 2003. Improvement of a parsimonious model for streamflow simulation. J. Hydrol. 279 (1–4), 275–289. Raupach, M.R., Briggs, P.R., Haverd, V., King, E.A., Paget, M., Trudinger, C.M., 2009. Australian water availability project (AWAP): CSIRO marine and atmospheric research component: final report for phase 3, CSIRO Marine and Atmospheric Research, Canberra, Australia. Raupach, M.R., Briggs, P.R., Haverd, V., King, E.A., Paget, M., Trudinger, C.M., 2012. Australian Water Availability Project. CSIRO Marine and Atmospheric Research, Canberra, Australia, http://www.csiro.au/awap.

15

Renard, B., Kavetski, D., Leblois, E., Thyer, M., Kuczera, G., Franks, S.W., 2011. Toward a reliable decomposition of predictive uncertainty in hydrological modeling: characterizing rainfall errors using conditional simulation. Water Resour. Res. 47, W11516. http://dx.doi.org/10.1029/2011WR010643. Ricci, S., Piacentini, A., Thual, O., Le Pape, E., Jonville, G., 2011. Correction of upstream flow and hydraulic state with data assimilation in the context of flood forecasting. Hydrol. Earth Syst. Sci. 15 (11), 3555–3575. Robertson, D.E., Shrestha, D.L., Wang, Q.J., 2013. Post-processing rainfall forecasts from numerical weather prediction models for short-term streamflow forecasting. Hydrol. Earth Syst. Sci. 17 (9), 3587–3603. Ryu, D., Crow, W.T., Zhan, X.W., Jackson, T.J., 2009. Correcting unintended perturbation biases in hydrologic data assimilation. J. Hydrometeorol. 10 (3), 734–750. Salamon, P., Feyen, L., 2009. Assessing parameter, precipitation, and predictive uncertainty in a distributed hydrological model using sequential data assimilation with the particle filter. J. Hydrol. 376 (3–4), 428–442. Sene, K., 2008. Flood Warning, Forecasting and Emergency Response. Springer, Berlin, London. Seo, D.-J., Cajina, L., Corby, R., Howieson, T., 2009. Automatic state updating for operational streamflow forecasting via variational data assimilation. J. Hydrol. 367 (3–4), 255–275. Shrestha, D.L., Robertson, D.E., Wang, Q.J., Pagano, T.C., Hapuarachchi, H.A.P., 2013. Evaluation of numerical weather prediction model precipitation forecasts for short-term streamflow forecasting purpose. Hydrol. Earth Syst. Sci. 17 (5), 1913–1931. Slater, A.G., Clark, M.P., 2006. Snow data assimilation via an ensemble Kalman filter. J. Hydrometeorol. 7 (3), 478–493. Sorenson, H.W., 1980. Parameter Estimation: Principles and Problems. Control and Systems Theory, 9. M. Dekker, New York, 398 pp. Thirel, G., Martin, E., Mahfouf, J.F., Massart, S., Ricci, S., Habets, F., 2010. A past discharges assimilation system for ensemble streamflow forecasts over France – Part 1: description and validation of the assimilation system. Hydrol. Earth Syst. Sci. 14 (8), 1623–1637. Troutman, B.M., 1978. Reservoir storage with dependent, periodic net inputs. Water Resour. Res. 14 (3), 395–401. Vogel, R.M., Tsai, Y., Limbrunner, J.F., 1998. The regional persistence and variability of annual streamflow in the United States. Water Resour. Res. 34 (12), 3445– 3459. Vrugt, J.A., ter Braak, C.J.F., Clark, M.P., Hyman, J.M., Robinson, B.A., 2008. Treatment of input uncertainty in hydrologic modeling: doing hydrology backward with Markov chain Monte Carlo simulation. Water Resour. Res. 44, W00B09. http:// dx.doi.org/10.1029/2007WR006720. Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Schoups, G., 2013. Hydrologic data assimilation using particle Markov chain Monte Carlo simulation: theory, concepts and applications. Adv. Water Resour. 51, 457–478. Wang, D., Chen, Y., Cai, X., 2009a. State and parameter estimation of hydrologic models using the constrained ensemble Kalman filter. Water Resour. Res. 45, W11416. http://dx.doi.org/10.1029/2008WR007401. Wang, Q.J., Robertson, D.E., Chiew, F.H.S., 2009b. A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites. Water Resour. Res. 45, W05407. http://dx.doi.org/10.1029/2008WR007355. Weerts, A.H., El Serafy, G.Y.H., 2006. Particle filtering and ensemble Kalman filtering for state updating with hydrological conceptual rainfall-runoff models. Water Resour. Res. 42, W09403. http://dx.doi.org/10.1029/2005wr004093. Zhang, Y., Chiew, F.H.S., Zhang, L., Li, H., 2009. Use of remotely sensed actual evapotranspiration to improve rainfall-runoff modeling in Southeast Australia. J. Hydrometeorol. 10 (4), 969–980.

Please cite this article in press as: Li, Y., et al. An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J. Hydrol. (2014), http://dx.doi.org/10.1016/j.jhydrol.2014.08.009