Accepted Manuscript Research papers Evaluation of medium-range ensemble flood forecasting based on calibration strategies and ensemble methods in Lanjiang Basin, Southeast China Li Liu, Chao Gao, Weidong Xuan, Yue-Ping Xu PII: DOI: Reference:
S0022-1694(17)30568-1 http://dx.doi.org/10.1016/j.jhydrol.2017.08.032 HYDROL 22195
To appear in:
Journal of Hydrology
Received Date: Revised Date: Accepted Date:
21 October 2016 24 July 2017 21 August 2017
Please cite this article as: Liu, L., Gao, C., Xuan, W., Xu, Y-P., Evaluation of medium-range ensemble flood forecasting based on calibration strategies and ensemble methods in Lanjiang Basin, Southeast China, Journal of Hydrology (2017), doi: http://dx.doi.org/10.1016/j.jhydrol.2017.08.032
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Evaluation of medium-range ensemble flood forecasting based on calibration strategies and ensemble methods in Lanjiang Basin, Southeast China Li Liu, Chao Gao, Weidong Xuan, Yue-Ping Xu* Institute of Hydrology and Water Resources, Civil Engineering, Zhejiang University, 310058, Hangzhou Corresponding author: Y.P. Xu, Institute of Hydrology and Water Resources, Civil Engineering, Zhejiang University, Hangzhou, 310058. (
[email protected])
Summary Ensemble flood forecasts by hydrological models using numerical weather prediction products as forcing data are becoming more commonly used in operational flood forecasting applications. In this study, a hydrological ensemble flood forecasting system comprised of an automatically calibrated Variable Infiltration Capacity model and quantitative precipitation forecasts from TIGGE dataset is constructed for Lanjiang Basin, Southeast China. The impacts of calibration strategies and ensemble methods on the performance of the system are then evaluated. The hydrological model is optimized by the parallel programmed ε-NSGAⅡmulti-objective algorithm. According to the solutions by ε-NSGA Ⅱ, two differently parameterized models are determined to simulate daily flows and peak flows at each of the three hydrological stations. Then a simple yet effective modular approach is proposed to combine these daily and peak flows at the same station into one composite series. Five ensemble methods and various evaluation metrics are adopted. The results show that ε-NSGAⅡcan provide an objective determination on parameter estimation, and the parallel program permits a more efficient
1
simulation. It is also demonstrated that the forecasts from ECMWF have more favorable skill scores than other Ensemble Prediction Systems. The multimodel ensembles have advantages over all the single model ensembles and the multimodel methods weighted on members and skill scores outperform other methods. Furthermore, the overall performance at three stations can be satisfactory up to ten days, however the hydrological errors can degrade the skill score by approximately 2 days, and the influence persists until a lead time of 10 days with a weakening trend. With respect to peak flows selected by the Peaks Over Threshold approach, the ensemble means from single models or multimodels are generally underestimated, indicating that the ensemble mean can bring overall improvement in forecasting of flows. For peak values taking flood forecasts from each individual member into account is more appropriate.
Key word Ensemble flood forecasting; Variable Infiltration Capacity model; Multi-objective calibration strategies; Ensemble methods
1. Introduction Over the last three decades, flooding is becoming a serious issue in China and worldwide due to urban expansion and climate change (Du et al., 2010), resulting in an urgent need to improve the accuracy of flood forecasting systems with sufficient lead time to take appropriate prevention measures to mitigate loss of life and property. Numerous studies (Liu and Gupta, 2007; Randrianasolo et al., 2014; Tiwari and Chatterjee, 2010) suggest that an improved hydrological forecasting system can be realized by efforts to properly consider uncertainty in hydrologic predictions. Generally speaking, there are five dominant sources 2
contributing to modeling uncertainty (Renard et al., 2010; Walker et al., 2003; Warmink et al., 2010): (1) input uncertainty, (2) model uncertainty, (3) parameter uncertainty, (4) model outcome uncertainty, and (5) context uncertainty. According to Cuo et al. (2011) and Reggiani and Weerts (2008), the uncertainties from precipitation forecasts and observations, system conceptualization and model parameterization are dominant in hydrological ensemble prediction system, thus considered in this study. To articulate the uncertainty in input data, utilization of quantitative precipitation forecasts (QPFs) extracted from ensembles of numerical weather predictions (NWPs) to generate ensemble flood predictions is accelerated following the remarkable progress and improvement in meteorological models and data (Cloke and Pappenberger, 2009), and this has been demonstrated for several historical flood events using QPFs from diverse Ensemble Prediction Systems (EPS) (Alfieri et al., 2014; Amengual et al., 2015; Bartholmes et al., 2009; Verkade et al., 2013; among others). However, an ensemble from a single EPS can only account for part of the uncertainties originating from initial conditions and stochastic physics (Roulin, 2006), while other sources of uncertainty involving numerical implementations or data assimilation can be only embraced by a grand multimodel ensemble (He et al., 2009), which is obtained by combining different EPSs with different weights. An equally weighted ensemble is applied by assigning each single forecast with equal weights. Combined grand multimodel ensemble means are able to outperform even the best single model forecast (He et al., 2009; Huang et al., 2012; Pappenberger et al., 2008; Park et al., 2008). Bourgin et al. (2014) and Pagano et al. (2013) used an intuitive and operationally appealing method, ensemble dressing, to improve the skill of hydrological ensemble forecasting systems. This method is essentially a statistical post-processing by reshaping the ensemble forecasts with the error distribution of model residual. Some studies like Möller 3
et al. (2013) and Raftery et al. (2005) utilized Bayesian Model Averaging to generate ensemble forecasts, confirming further improvements by grand multimodel ensemble. Raw forecasts from EPSs generally contain system errors in the mean and spread of their forecast distribution (Verkade et al., 2013), thus it is often recommended to use the model output statistics (MOS) to post-process the forecasts before feeding them into the hydrological model. Many investigations have successfully applied MOS for rainfall (Gneiting et al., 2005; Shrestha et al., 2015) and medium-range streamflow forecasting systems (Bennett et al., 2014). Benefiting from the growth in computational power, a better representative of actual hydrological process is realized by hydrological models with its development from lumped conceptual models to distributed physics models (Liu and Gupta, 2007). Throughout investigations in recent years, the widely used hydrological models for ensemble streamflow/flood prediction are lumped ones, such as HBV (Demirel et al., 2013; 2015; Renner et al., 2009), GR4J (Demirel et al., 2013; 2015), GRP (Randrianasolo et al., 2014) and so on. The reasons for this phenomenon are possibly attributed to easy calibration yet better performance under cross-validation in lumped models (Wang et al., 2011). Therefore, sophisticated distributed hydrological models, like the adopted model in this study, the Variable Infiltration Capacity (VIC) model, is often excluded for ensemble flood forecasting in most cases due to its complicated calibration and time-consuming simulation. Nevertheless, it has been verified that distributed models have the potential to take better advantage of the spatial information of high-resolution distributed NWP outputs (Alvarez et al., 2015). Furthermore, the better representation of hydrological processes by distributed models compared with lumped models should in theory allow better out-of-sample predictions. Naturally, these theoretical benefits are unlikely to be realized without more detailed data and information on boundary conditions. These data are probably not available now, 4
but the benefits of establishing flood forecasting system based on distributed models that can work now will be strongly evident in future when more environmental datasets become available. The spatial variability in precipitation and basin properties are also believed to be important in determining the characteristics of streamflow hydrographs. According to Maurer et al. (2002), one of the distinguishing characteristics of VIC is that it uses a subgrid parameterization of the effects of spatial variability in soils, topography, and vegetation to represent the nonlinear soil moisture dependence of the partitioning of precipitation into direct runoff and infiltration. Numerous studies (Christensen and Lettenmaier, 2006; Mo and Lettenmaier, 2014; Schumann et al., 2013) have applied VIC successfully in ensemble streamflow forecasting, but few of them addressed the inefficiency in calibration. Reed et al. (2004) found that calibrated distributed models can perform at a level comparable to or better than a calibrated lumped model, but the parameterization has a great impact on simulation accuracy. Thus, how to address the problem in efficiency of calibration and to increase the accuracy in distributed models turn to be a focus for hydrological ensemble predictions in this study. Another interesting approach to deal with structural uncertainties is the use of a multi-hydrological model ensemble, as one hydrological model often does not capture a simultaneously optimal representation of different aspects of the system behavior. Seiller et al. (2012) used a multimodel system composed by twenty lumped hydrological models to account for model structure uncertainty, showing that simulation from a single model may provide hazardous results and generally not as good as that of the twenty-model ensemble. Before this, a similar method, modular approach, is proposed by Corzo et al. (2009) and Toth (2009) to reintegrate several single models into one composite model to provide streamflow with less uncertainty, which is proved to produce generally better results than any single-member model simulation. 5
Parameter uncertainty usually originates from the inaccurate estimation of model parameters. It is well known that model parameters are conceptual representations of spatially and temporally heterogeneous properties of the hydrological processes in real system (Liu and Gupta, 2007). Errors in parameter estimation can result in significant errors in model outputs. Hence, the application of VIC in this study remains a challenge for its immature calibration technology. Many studies have successfully applied VIC regionally as well as globally (Schumann et al., 2013; Xie et al., 2007; Yuan et al., 2013; 2014), but the model calibration was generally performed manually (Mishra et al., 2010; Schumann et al., 2013), or performed based on non-parallel optimization algorithms, such as MOCOM-UA (Voisin et al., 2011), SCE-UA (Troy et al., 2008; Yuan et al., 2013) and so on. Numerous studies have utilized parallel computing methods to improve optimization in distributed hydrological model, demonstrating the parallel programmed multi-objective evolutionary algorithm is efficient and valuable for model calibration. Confesor and Whittaker (2007) adopted the NSGAⅡalgorithm based on a parallel genetic algorithm library (PGAPACK) and Her et al. (2015) utilized AMALGAM optimization algorithm to calibrate the SWAT model and clearly demonstrated the effectiveness of the parallel methods in reducing computational time and uncertainty of the parameter calibration and spatial optimization. Thus, an automatic calibration based on parallel programming is expected in this study to achieve automatic optimization with reduced parameter uncertainty and higher computation efficiency in hydrological flood forecasting systems. To the best of our knowledge, no studies have been reported in openly available hydrologic literature that have used automatically calibrated VIC coupled with parallel processed genetic algorithm to generate ensemble flood forecasting. We adopt a modular approach to combine simulations from two differently parameterized VIC models to increase the accuracy of simulation. Five distinctive ensemble 6
methods are selected to produce weighted ensemble forecasts. The structure of this paper is as follows. A description of study area and data used is presented in Section 2. Section 3 provides the methodology applied and performance comparison and case study is presented in Section 4. Section 5 presents the discussions and conclusions.
2. Study area and data The selected study areas are Lanjiang Basin and its two subbasins, namely Jinhua River subbasin and Qu River subbasin, located on the upper reach of the Qiantang River basin in Zhejiang Province, Southeast China. Lanjiang Basin covers an area of 17,747 km and contributes to about two-thirds of the discharge to the downstream Qiantang River basin (Zhang et al., 2014). The basin is dominated by a subtropical humid monsoon climate, with annual mean rainfall of about 1,600 and annual mean temperature of about 17 °. The hydrological regime is characterized by the occurrence of high flow periods from May to July, sometimes resulting in catastrophic floods and leading to enormous economic losses. Qu River and Jinhua River are two important tributaries of Lanjiang Basin, with an area of 5,613 km and 5,993 km respectively. The Qu River subbasin is comprised of mountains and hills, while the Jinhua River subbasin is dominated by plain areas. The average annual total precipitation during the study period is 1,900 and 1,550 for Qu and Jinhua River respectively, of which more than 50% occurs in summer. Qu River and Jinhua River are most
severely affected by floods and droughts due to the uneven temporal distribution of precipitation. Fig. 1 shows the basin location, elevations, hydro-meteorological stations and three outlets that are used in this study. [Fig.1]
7
Daily gauged meteorological data (precipitation, wind speed, maximum and minimum temperature, relative humidity) from 1987 to 2012 at 28 meteorological stations were collected from China Meteorological Administration as input of the hydrological models. Three discharge stations are utilized in this study, i.e. the Lanxi, Quzhou, and Jinhua stations. Due to the intermittence of data, hydrological model calibration and validation are carried out using gauged meteorological data and discharge data before 2000, and the first 9 years from 1987 to 1995 are used for calibration, while the consecutive 5 years are for validation. In the absence of measured discharges at Lanxi Station and Jinhua Station during the evaluation period from 2009 to 2012, simulated streamflow from VIC forced by gauged meteorological data is considered as a proxy for the observed streamflow at these two stations, which is also beneficial to avoid the uncertainty from hydrologic model when analyzing the results. Whilst for Quzhou Station, all the analyses are performed based on both measured discharge (observation) and simulated streamflow by VIC with gauged inputs (proxy). This further makes it possible to estimate the relative contributions to the total errors from meteorological forecasts and hydrologic model respectively. The QPFs used for ensemble streamflow forecasting are derived from the TIGGE (THORPEX Interactive Grand Global Ensemble) database from 2009 to 2012. Five EPSs have been chosen to generate ensemble streamflow predictions with lead time from 24h to 240h. Basic information about the systems used in this study is presented in Table 1, and more information can refer to ECMWF portal and documents by Hamill (2012) and Park et al. (2008). The digital elevation model (DEM) data with 90 by 90 resolution used in VIC are downloaded from Geospatial Data Cloud (http://www.gscloud.cn). The 1 km China soil map based on Harmonized World Soil Database (Fischer et al., 2008) and the global 1 km land cover classification 8
dataset (Hansen et al., 2000) are used to define the model soil and vegetation parameters. Table 1. Basic information about the systems used in this study.
EPS
Country/region
Horizontal resolution
Members
ECMWF
Europe
~0.3 ° × 0.3 °
50+1*
NCEP
United States
UK
United Kingdom
CMC
Canada
CMA
China
1.0 ° × 1.0 °
0.56 ° × 0.83 ° 1.0 ° × 1.0 °
0.56 ° × 0.56 °
20+1 23+1 20+1 14+1
*Control member
3. Methodology 3.1 General framework
This study is designed to investigate the impact of calibration strategies and ensemble methods on flood forecasting through VIC. The framework of this study is shown in Fig. 2, primarily consisting of four main components: VIC hydrological model, ε-NSGA Ⅱ multi-objective algorithm, model combination (modular approach) and ensemble methods, and the detailed introductions are presented in Section 3.2-3.5. A brief description about the framework is made in this section. Firstly, the ε-NSGAⅡ algorithm is employed to calibrate VIC with gauged meteorological data and discharge. After calibration, two differently parameterized hydrological models are determined to simulate daily flows (VIC/D) and peak flows selected by the Peaks Over Threshold (POT) approach (VIC/P) at each station respectively. Once the optimal parameter set is determined, it remains constant during the whole validation and evaluation period. At the same time, on the last day of each period, the value of moisture 9
content in each soil layer, base flows and other variables are recorded as initial conditions for the next period. The streamflow series from above two differently parameterized local models at each station are combined into one composite series using the modular approach. Then, single model ensemble means are obtained by giving equal weights to each member from each EPS, and multimodel ensemble means are computed based on five ensemble methods. The Quantile Mapping post-processor is adopted to correct the bias of the daily flows for better performance in simulating flood. [Fig. 2]
3.2 Distributed hydrological model
Variable Infiltration Capacity (VIC) macroscale semi-distributed grid based hydrological model is originally developed by Liang et al. (1994; 1996). VIC simulates the water and energy balances in each grid cell and subgrid cell heterogeneity is simulated by variable infiltration, elevation bands, and mosaics scheme of vegetation classes (Schumann et al., 2013). Routing of runoff and base flow is performed separately from the land surface simulation, using a standalone model (typically the routing model of Lohmann et al., 1996 and 1998). The routing model is a source-to-sink model that solves linearized Saint-Venant equations. For this application, the VIC model (version 4.2.c) is run at a three-hourly time step in only water balance mode. The gauged meteorological data and QPFs from TIGGE dataset are interpolated into 0.05 ° × 0.05 ° gridded data as input for VIC using the Inverse Distance Weighted (IDW) method.
3.3 Calibration strategy Many of the parameters for VIC and the routing model are based on observations or educated
10
guesses. According to suggestions from previous studies (Lohmann et al., 1996; Mishra et al., 2010; Voisin et al., 2011), seven parameters with regard to soil property in VIC including variable infiltration curve parameter, , maximum velocity of baseflow, , fraction of where non-linear
baseflow begins, , fraction of maximum soil moisture where non-linear base flow occurs, , and soil depth of each layer are often determined through calibration. For daily applications, the routing model is calibrated jointly, and two parameters associated with the velocity and diffusivity are also calibrated. An evolutionary multi-objective optimization, the Epsilon-Dominance Non-Dominated Sorted Genetic Algorithm II (ε-NSGAⅡ), is employed in this study to automatically calibrate the model parameters. This algorithm is a global optimization method, and its effectiveness and robustness have been demonstrated with various optimization problems (Chu et al., 2015; Kollat and Red, 2006; Zhou et al., 2016). The ε-NSGAⅡ builds on the NSGAⅡ(Deb et al., 2002) by adding ε-dominance archiving (Laumanns et al., 2002), adaptive population sizing, and automatic termination to minimize the need for parameter calibration (Reed et al., 2003), meanwhile preserving the fast non-dominated sorting approach to classify solutions and a crowded comparison operator to maintain solution diversity. The ε-dominance archiving permits the users to specify the precision that they want to obtain the Pareto-optimal solutions to a multi-objective problem. Larger ε values lead to fewer solutions while smaller ε values produce more solutions (Kollat and Red, 2006). Additionally, ε-NSGAⅡcan be operated parallel through multithreading and multicore approaches on the basis of portable implementation of Message Passing Interface (MPI) standard (which is also applied for ensemble forecasting during the evaluation period to simulate multi-streamflow one time) to solve the problem of large computation in the calibration of hydrological models. In this study, the calibration is carried out with the concurrency of multicores on a high-performance computer, and the parameter setting of 11
ε-NSGAⅡrefers to Kollat and Red (2006). Instead of using a single hydrological model to reproduce the full range of catchment responses, two differently parameterized hydrological models are used for each catchment and each of them is assigned to a specific task. Separately, VIC in this study is designed to simulate daily flows (VIC/D) and only peak flows (VIC/P). For this purpose, the model is calibrated with four objectives at each station: Nash–Sutcliffe efficiency and relative bias for daily discharges and same objective functions but for peak flows. The relevant mathematical formulas are defined as: ∑1 ((
(,-(
(,,0
)*+ +./ !"/ = 1 − ∑.23 1 (( 55555555555 (,-( (4,,0
6789⁄ =
.23
)*+
∑1 .23[(+./(,-()*+ (,] ∑1 .23 ()*+ (,
In which
(1)
100%
(2)
∑? (()*+,>(,-(+./,>(,,0 0 5555555555555 (()*+,> (,-( )*+,> (4,,
!"/= = 1 − ∑.23 ?
6789⁄= =
)*+
.23
∑? .23[(+./,> (,-()*+,> (,] ∑? .23 ()*+,> (,
(3)
100%
(4)
and @ are the number of the daily and peak flows, respectively; ABC and A are the
observed and simulated daily streamflow; and ABC,D and A,D are the observed and simulated peak flows extracted from the daily streamflow, respectively.
The peaks are selected using the Peaks Over Threshold (POT) approach. POT retains all peak values that “exceed” a certain threshold, not confined to only one event per year and allows for a more rational selection of flood events (Liang et al., 1999). Peaks selected based on POT should meet the independence condition. An extensively used independence criteria for POT is proposed by the United States Water Resources Council (USWRC, 1976). It recommends that the interval between two consecutive peak flows has to be larger than five days plus the natural logarithm of the basin area in square miles, and the intermediate flows between these two consecutive peaks must drop below 75% of the lowest of these two flood events. Based on these independence criteria, thresholds for Jinhua River, 12
Qu River subbasins and Lanjiang Basin are determined as 340 E /9, 450 E /9, and 1,000 E /9 respectively to assure independence as well as sufficient number of peaks for further hydrologic analysis.
3.4 Model combination
Outputs from VIC/P provide a better description for higher flows while VIC/D is more accurate in simulating lower flow (Fig. 3). By acknowledging the performance limitations of these models, a modular approach is recommended for hydrological simulation to reintegrate these two differently parameterized models into one composite model to depict the full spectrum of streamflow. The combination technique herein is inspired by the work of Fenicia et al. (2007), and some modifications are made as observed streamflow during the evaluation period (2009-2012) are missing for the Jinhua and Lanxi stations. Detailed modifications are described in the following paragraph. In our experience, this modified combination technique is found to have the comparable effect as the original one by Fenicia et al. (2007) (Not show here). [Fig. 3] According to Fenicia et al. (2007), in order to express aforementioned difference in the degree of believability of the outputs of these two models, each model is assigned a certain membership coefficient. The membership coefficient assigned to the VIC/D model (FGH/I(A,) is 1 when the simulated flow is below the threshold J, and it is 0 when the flow is above the threshold δ. For convenience, the coefficient is defined as 0.5 when the flow is between the thresholds J and δ, rather than a linear change in Fenicia et al. (2007). The membership coefficient for VIC/P follows a symmetric behavior. Membership coefficients for the two differently parameterized models are
13
described as follows: 1, 7N A /AB, < J FGH/L (A, = M 0.5, 7N J ≤ A /AB, < Q S 0, 7N A /AB, ≥ δ
(5)
0, 7N A /AB, < J FGH/D (A , = M0.5, 7N J ≤ A /AB, < QS 1, 7N A /AB, ≥ δ
(6)
where AB, is the maximum of observed flow series during the whole study period due to the
absence of observations in the evaluation period; the values of J and δ can be determined by optimization as recommended in Fenicia et al. (2007), but to match the calibration strategy, in this study J is determined by the ratio of the thresholds used to defined peaks by the POT approach and
AB, , and δ is defined by the ratio of the expected value of all the selected peaks by the POT approach and AB, ; J and δ can be regarded as threshold for low flows and for high flows respectively. The outputs of the two local models were finally combined according to the equation:
AH (t, =
UVW/X ∙(UVW/X (Z,[UVW/> ∙(UVW/>(Z, UVW/X [UVW/>
(7)
In which AFGH/L (\, and AFGH/D (\, are outputs of VIC/D and VIC/P.
3.5 Multimodel ensemble method
Multi-model ensemble combines the strengths from different EPSs. Some EPSs might perform better than others at different times and for different subbasins. Therefore further improvements of the multi-model ensemble might be realized by giving the EPSs different weights. According to Johnson and Swinbank (2009), the multi-model ensemble mean can be expressed as: ]̅__ =
`
_
∑_ bc` ab ]̅ b
(8)
Wherein @ is the number of ensemble EPSs; ]̅ b is the single-model ensemble mean which is given 14
by the following formula: de ]̅b = d ∑c` ]b . `
e
b
Where
(9)
is the number of members in single EPS f; ]b is the combined discharge forced by
member in f Zg EPS.
b
The weight given to each single model is defined as: ab =
hi j
∗ m
l
(10)
i
In which n is the average number of members of the single model forecasts used to construct the multi-model ensemble; !b is the used skill score to make a skill based combination; J is a normalization factor to ensure that
`
_
∑ ab = 1.
Similarly, the multimodel ensemble probability is given by a weighted average of the single model probabilities: o__ =
`
_
∑_ bc` ab ob
(11)
Here ob is the single model probability defined by the formula: ob =
`
de
e ∑d cb pb , wherein pb is the
binary forecast of whether or not an event occurs, based on ensembles member i from single model k. Ultimately, five different grand multimodel ensembles are constructed and evaluated in this study. Table 3 shows the different grand ensemble set ups and their weighting formulas used to calculate the weights given to the single model means. A detailed description about the first four multimodel ensembles can be found in Johnson and Swinbank (2009). The Bayesian Model Averaging (BMA) applied in the fifth multimodel ensemble (MM5) estimates the BMA weights by maximizing the likelihood function using the Expectation-Maximization (EM) algorithm, and the likelihood function is defined as the probability of available training data, namely discharges in the evaluation period. Interested readers can refer to work by Raftery et al. (2005). 15
Table 3. Different ensemble combination methods used in this study. Experiment
Description
Weights
MM1
Combine means with equal weights
ab = 1
MM2
Combine members with equal weights
MM3 MM4 MM5
Combine means with unequal weights (based on Continuous Ranked Probability Score) Combine members with unequal weights (based on Continuous Ranked Probability Score) Combine means with Bayesian Model Averaging
Nr T γ ab = Sr ab = ab =
Nr γ ∗ T Sr
Raftery et al. (2005)
3.6 Bias correction The Quantile Mapping (QM) method is adopted to remove biases in ensembles. QM is an extensively used post-processing technique in hydrological applications (Themeßl et al., 2011; 2012; Zhang et al., 2015). Its effectiveness is demonstrated for bias correction in weather forecasts (Li et al., 2010; Salathé et al., 2014) and streamflow forecasts (Hashino et al., 2006) in hydrological prediction. Though it has been shown that QM cannot correct errors in ensemble reliability, nor can it guarantee coherence (Wood and Schaake 2008; Zhao et al. 2017). In this study we prefer QM for its simplicity and favourable performance in higher quantiles (Themeßl et al., 2011). QM focuses on correcting for errors in the shape of empirical cumulative distribution functions (ecdf), and is therefore capable to correct errors in variability as well (Themeßl et al., 2011). Given the limited improvements in QPFs by post-processors at hand (Liu et al., under review), we exclude the meteorological post-processing from the study, and focus on the impact of post-processor on streamflow. For this study, bias correction of QM is achieved by replacing the forecasted values with observed values at the same quantiles based on the constructed ecdf of forecasted and observed (proxy) datasets in the evaluation period. Similar to 16
Wilcke et al. (2013), the ecdf is constructed using a sliding window of 31 days centred over the focus day.
3.7 Evaluation metrics
According to WMO (2015), an accurate probability forecast system should have three properties, namely the reliability, sharpness and resolution. The reliability measures the agreement between forecast probability and mean observed frequency; the sharpness is used to define the tendency to forecast probabilities near 0 or 1, as opposed to values clustered around the mean, and the last property indicates the ability of the forecast to resolve the set of sample events into subsets with characteristically different outcomes. The first two properties can be shown by reliability diagram and the corresponding sharpness diagram, while Brier Score (BS) can be decomposed to describe the reliability and resolution. Additionally, the Continuous Ranked Probability Skill Score (CRPSS) is adopted to express the overall performance of the forecasts, and the skill score of BS (BSS) is also used in this study to assess the forecasts over different thresholds. Though the rank histograms do not measure specific scores, it is also used in this study to visually examine some qualities of the forecasts. Their detailed mathematical basis is omitted here, but the interested reader may refer to the review on probabilistic forecasting verification by Gneiting and Katzfuss (2014). A brief introduction to metrics interpretation is presented in the following. A rank histogram (Hamill, 2001; Mendoza et al., 2012) provides a visual measure of the fraction with which observations fall between ranked members of ensembles. A perfectly reliable forecasting system would produce a horizontally uniform rank histogram. A U-shaped or concave rank population signifies a lack of variability in the ensemble or underdispersion (overconfident) (Wilks, 2006); an
17
inverted U-shape rank histogram is indicative of an excessive dispersion; and an asymmetric histogram is a sign of bias in the mean and possibly in higher moments of the ensemble (Fan et al., 2014). The BSS (Brier, 1950; Nester et al., 2012) is skill score of the Brier Score (BS) compared with a reference forecast. We chose the climatological forecast as a reference forecast as proposed by Roulin (2006). It is calculated as the Mean Square Error of observed probability and frequency of occurrence of streamflow exceeding the prescribed threshold during the whole study period. BS measures the differences between forecast probabilities and matching dichotomous observations. BSS is positively oriented, a perfect forecast with BSS =1, while a BSS≤0 indicates no forecast skill compared with reference forecasts. Since more attention is given to high flows in this study, percentiles Q50, Q70, Q75, Q80, Q85, Q90, Q93, Q95, Q97, Q99 based on historical observations from 1987 onward are chosen as thresholds to obtain BSS. Following Wilks (2006) and Murphy (1973), the reliability and resolution decomposed from BS are used. A favorable forecast should have a reliability term to be as small as possible, and the resolution term to be as large as possible. Likewise, CRPSS (Hersbach, 2000) is calculated through normalizing the Continuous Ranked Probability Score (CRPS) by a reference forecast which is usually the climatology or persistence of a study area. In this study, reference forecast of an ensemble of hydrological forecasts simulated by the hydrological model using sampled historical meteorological observations at the same calendar day as input to the model (Bennett et al., 2013; 2014) is used. CRPS is a verification tool that measures the degree of agreement between the cumulative probability distribution of ensemble forecasts for all the range of possible values (no predefined threshold is considered) with the matching single observations (Ye et al., 2013). The value of CRPSS ranges from −∞ to 1, with best score equal to 1. The reliability diagram (Bröcker and Smith, 2007; Wilks, 2006) permits a graphical assessment of 18
both reliability and resolution of probabilistic forecasts. This metric summarizes the correspondence of the forecast probabilities with the observed frequency of occurrence. In the diagram, the diagonal line indicates perfect reliability. If the curve lies below the diagonal line, the forecast is overestimated, otherwise underestimated. The flatter the curve in the reliability diagram, the less resolution it has. The corresponding sharpness diagram shows the frequency of forecasts falling into each probability bin, and is used to indicate the sharpness of the forecasts. In this study, six probability bins (0, 0.05, 0.2, 0.4, 0.6, 0.8, and 1) and three different thresholds (Q75, Q85 and Q95) are used to define the reliability diagram.
5. Results 5.1 Calibration and validation of the hydrological model
When the calibration is automatically finished by the termination criteria in ε-NSGAⅡalgorithm, near 3000 non-dominated solutions are obtained, and plots of the inter-objective tradeoffs derived from these solutions are shown in Fig.4, but for the purpose of comparison only the points with NSE greater than zero are plotted. The upper panels are results at Lanxi Station; results at the other two stations are similar and not shown here. Fig. 4(a) reveals that NSE/P turns to negatively correlated to NSE/D when NSE/P increases up to 0.85, indicating a conflict in better performance of both daily flows and peak flows, while Bias and NSE generally behave consistently (Fig. 4(b)). Despite different topography, daily flows at three stations can be optimally simulated simultaneously with the same parameter set as presented in Fig. 4(c), while with respect to peaks, desirable performances at three stations take place with respective optimal parameters (Fig. 4(d)). Accordingly, four sets of parameters are determined: one parameter set used to simulate daily flows in all three study areas, and the remaining three 19
parameter sets for simulation of peak flows in three areas respectively. Thus, two differently parameterized local VIC hydrological models are generated for each basin. Based on the modular approach, the performance of combined streamflow simulation during calibration and validation is depicted in the following subparagraphs.
[Fig. 4] Simulated daily flows and peak flows compared with observation during the calibration period are displayed in Fig. 5. The overall performances at three stations are desirable, particularly at Lanxi Station with NSE greater than 0.9 in both simulations of daily flow series and POT peaks. Meanwhile, all the stations have Bias smaller than 10%, and the daily flows have tendency to be slightly overestimated. The relatively inferior behaviors at Jinhua Station are possibly attributed to the sparse hydrometeorological stations in its controlled basin. It is clear that the low flows are well captured by VIC (Fig. 5(a)-(c)), while the peaks, especially peaks greater than 2,000 E /9 at Jinhua and Quzhou
stations, are generally underestimated. An overestimation in peaks ranging from 3,000 E /9 to 5,000 E /9 at Lanxi Station results in an overall positive bias.
Fewer peaks are extracted from daily flow series during the validation period, and the related graphical comparisons are shown in Fig. 6. The simulations are slightly inferior to those in calibration as expected but consistently good, with NSE at three stations greater than 0.80. The inferiority is mainly embodied in noticeable positive bias in daily flows shown in Fig. 6(a)-(c), especially for Lanxi Station with Bias greater than 10%. The underestimation of peaks in higher quantiles persistently exists at Jinhua and Quzhou stations. Peaks at Lanxi Station are simulated well with NSE equal to 0.9, though values smaller than 2,000 E /9 are modestly overestimated. In conclusion, the quantitative indicators show that the calibration strategy performs remarkably well 20
conditional on the values of NSE and Bias at three stations, indicating that the VIC hydrological model coupled with multi-objectives optimization algorithm and modular approach gives significant confidence in modeling the hydrological process, and the constructed VIC model can be used for ensemble streamflow forecasting confidently. The results of ensemble forecasting during the evaluation period from 2009 to 2012 are presented in following sections. [Fig. 5] [Fig. 6]
5.2 Single model ensemble evaluation
Rank histograms of single model daily flow forecasting at Lanxi Station for lead times of 72h, 120h and 240h are shown in Fig. 7 to serve as a demonstration of visual comparison of reliability and consistency of ensembles. Similar results can be observed for other stations, thus omitted here. For display reasons, forecasts forced by different models are present in columns and forecasts at different lead times are shown in rows. The pronounced “U” shape distribution of flows forced by NCEP and UK is indicative of not enough spread in the ensemble. While the asymmetry-distributed histograms of CMA and ECMWF driven flows signify bias in ensemble, more precisely, positive bias in CMA and negative bias in ECMWF. However, with increase of lead-time, the underdispersion is slightly mitigated as can be perceived by the flatter distribution of the histograms at longer lead times, but the forecasts are still overconfident except the ensemble of CMC. The BSS and two decomposed component scores from BS at lead times of 72h against increasing threshold quantiles at Quzhou Station are presented in Fig. 8. The scores against lead time are omitted here; the general tendency is deterioration in quality with increase of lead time; the BSS and resolution scores gradually decrease to zero, while the reliability score exhibits less obvious change.
21
Values of BSS are mostly positive for all EPSs and thresholds, indicating skillful forecasts compared with reference foecasts. The best BSS and resolution score are obtained in lower thresholds as expected, but the reliability score turns to be more satisfactory with increase of thresholds. ECMWF consistently scores the highest, followed by CMC and UK. There is not much differences in scores between diverse EPSs when threshold increases up to Q99, implying similar ability absence in forecasting rare events. From Fig. 8(c), it is obviously observed that the scores based on proxy are much favorable than those based on observations in thresholds smaller than Q90, signifying that the impact of hydrological uncertainty is dominant in lower thresholds. Fig. 9 displays the reliability and corresponding sharpness diagram at Quzhou Staion with three different thresholds at lead time of +72h. Fig. 9(a) shows the results computed on proxy. It is observed that the NCEP and UK ensembles have lower resolution and are overconfident. This deficiency becomes less evident with increasing threshold. The CMA ensemble shows overforecasting, while the forecasts from ECMWF and CMC lie very close to the diagonal for most probability bins. Compared with reliability diagram based on proxy, the results based on observations show poorer resolution, but the difference between these two evaluation data in Q95 becomes minor. Sharpness diagrams for all three thresholds indicate that the lowest probabilities (0–0.05) are used much more often than others in this region, and this is caused by the fact that fewer members can capture these higher flow events. The frequency distribution of probability used for Q75 is the best, and becomes worse with increase of thresholds. For different EPSs, the sharpness diagrams are almost the same. [Fig. 8][Fig. 9] Last impression on the level of skill of the single model ensembles is given by comparing the CRPSS as a function of lead time as shown in Fig. 10. The performance varies significantly with EPSs 22
as well as lead time, and the streamflow forced by EPSs is more skillful than the reference forecast at least up to 5 days. ECMWF consistently scores the higher CRPSS than others, and it is the only EPS that has CRPSS greater than zero throughout all the lead times. CMC and UK are equally matched, ranking the second. The comparison between the CRPSS value calculated based on observation and proxy at Quzhou Station (Fig. 10(c)) shows that the errors from the hydrological model can degrade the forecast skill by approximately 2 days, and the influence gradually decreases with increase of lead time but persists until lead time of 10 days. [Fig. 10]
5.3 Multimodel ensemble evaluation
The improvement in BSS, reliablity score and resolution score by multimodel ensembles is expressed as the form of different values between score of multimodel ensemble and best single model ensemble (ECMWF). The positive difference value means better skill over the reference best single model. According to BSS and resolution score in Fig. 11, superior improvements are made by combination weighted on the ensemble members (MM2) and combination with unequal weights (MM4), nevertheless weighting based on BMA (MM5) performs well in modification of realibity score. The combination without consideration of member and skill score (MM1) generally provides the poorest improvement. Broadly speaking, improvements are more evident in higher threshold quantiles for BSS and resolution scores, while for reliability score, the gains in lower thresholds are more, perhaps attributing to already good reliability score in higher thresholds. From Fig. 11(c), the more improvements are made for forecasts evaluated on observations, particularly for thresholds less than 85%.
23
[Fig. 11] The impact of different multimodel ensemble methods on reliability and sharpness of ensemble streamflow forecasting is displayed in Fig. 12. The reliability diagrams derived from multimodel ensembles are plotted together with the forecasts of ECMWF for comparison. It is easily observed that the improvement by multimodel ensemble is significant as well, but the distinction between different ensemble methods is not as evident as that in BSS. To be specific, for the results based on proxy (Fig. 12(a)), the reliability is well improved, especially for lower probability bins. For the diagrams based on observations (Fig. 12(b)), the poor resolution is mitigated slightly compared with ECMWF. The reliability diagrams of multimodel ensembles based on different evaluation data show remarkable differences. For Q75 and Q85, the forecasts based on proxy lie extremely close to the diagonal, while for results using observations the overestimation is prominent. This is highly likely to be caused by the positive bias in the hydrological model, since during the evaluation period, the bias between the observation and proxy is 11.87% for daily flows and 3.32% for peak flows. The difference among sharpness diagrams for the same threshold is invisible, while compared with single model ensemble, the distribution of the frequency becomes better. [Fig. 12] The improvements in CRPSS are analogous to the results in BSS, as it is well known that CRPSS is the weighted average skill of BSS over all possible quantile values defining the discrete categories (Bradley and Schwartz, 2011; Hersbach, 2000). Hence the detailed evaluation is omited herein, instead an investigation on the contributions of each individual EPS to the multimodel system is performed like Hagedorn et al. (2012). Reduced multimodel versions are considered with one individual EPS component removed from the grand multimodel ensembles, and the reduced multimodels are 24
assembled by assigning each member with equal weight. The results confirm that ECMWF has the greatest impact on the ensembles particularly with increase of lead time while the remaining single models have less contributions to the skill of ensembles (Fig. 13(a)). Fig. 13(b) further demonstrates that when removing the best skilled single model, an ensemble formed by inferior EPSs yet provides comparable forecasts as the best single model, indicating the advantage of multimodels. [Fig. 13]
5.4 POT peak flows and bias correction
The bias correction is performed for daily flows, but in order to highlight the impact on flood, only the performance of peaks extracted from the bias corrected daily flows using the POT approach is described in this subsection. Fig. 14 displays the empirical cumulative probability distribution of peaks extracted from raw single model ensemble streamflow compared with the ones from the bias corrected series at Lanxi Station. The underestimation in raw peaks is pronounced with increasing lead time and quantile. The possible reason why CMA ensemble mean has the best skill in forecasting peaks is that CMA ensemble provides consistent positive bias in forecasting as presented in rank histograms, and so does the desirable performance by UK ensemble. The effect of QM is well-marked with overall improvement in peak flows greater than 6,000E /9. However the peak values located in lower and medium quantiles are not improved significantly. Similar comparison of peaks selected from multimodels by different ensemble methods is presented in Fig. 15. The improvement by multimodel ensembles is slight and it is difficult to distinguish the performance between diverse ensemble methods. Given the significant underestimation in the form of ensemble mean, an alternative presentation of POT peaks is illustrated in Fig. 16 to explore whether the underestimation of peaks is generated by the
25
process of ensemble mean or the members in each EPS indeed miss the prediction. It can be observed that all the peak flows, except the maximum value, fall into the coverage of forecasts by individual member, and most of peaks are confined within the interquartile range (25%-75%) of the boxplots. Compared with the evident underestimation in Fig. 14 and Fig. 15, the conclusion can be drawn that the process of ensemble mean smoothed out some of the extreme peak flows, which is perhaps detrimental for flood peak forecasting. [Fig. 14] [Fig. 15] [Fig. 16]
6. Discussion The results at Jinhua Station during the whole study period were less satisfactory compared to those at the other two stations. The relatively inferior performance in calibration and validation was attributed to sparse meteorological gauges in this subbasin. Accordingly, the less skill during evaluation was possibly caused by the inaccurately estimated proxy due to insufficient gauged meteorological data. The influence of different evaluation data on forecasts at Quzhou Station has been described in evaluation with multiple metrics (Figs. 8-12), demonstrating that the hydrological errors could degrade the skill scores by approximately 2 days, and persist until the lead time of 10 days with a weakening trend. In shorter lead time, the hydrological errors enabled to halve the forecasts quality (Fig. 10 (c)), indicating similar contributions to total errors by the hydrological model and meteorological forecasts. Whilst for longer forecast lead times, the impact by hydrological error became minor, and the meteorological errors turned to be more dominant in forecasts. It has been verified that those errors from hydrological model can be fixed to some extent with assimilation or updating scheme by using observed streamflow to update the model states (Chen et al., 2013; Clark et al., 2008; Demirel et al.,
26
2013). Pan and Wood (2005) proposed a constrained Ensemble Kalman Filter to update the soil moisture states in VIC, resulting in less bias in estimation of water budget. More recently, DeChant and Moradkhani (2014) successfully applied Particle Filter to maximize the reliability of seasonal forecasts by VIC. However, due to the absence of observations during the evaluation period, updating for VIC was excluded in this study. To what degree the updating scheme improves the performance of flood forecasting will be implemented and tested in our future work. The gauges in Qu River subbasin were much denser than that in Lanjiang Basin (Fig. 1). Nevertheless, the simulations in these two study basins were comparable. It could be concluded that VIC is better at modeling in basins with larger scale. During the evaluation period, hydrological model uncertainty was excluded by using discharge proxy, thus the slightly superior behavior in Lanjiang Basin was credited to the facts that the performance of global EPSs is dependent on basin size and tends to increase with catchment size (Pappenberger et al., 2011; Wu et al., 2011). The overall performance based on metrics of multimodel ensemble means indicated that the weighted ensemble mean had benefits in comparison to the best single model ensemble mean (ECMWF), and combination of members (MM2 and MM4) provided more improvements than others. MM2 tended to give greater weights to EPSs that have more ensemble members, while the MM4 combination method preferred to assign greater weight to EPSs that provide smaller CRPS and have more ensemble members in the meantime. Thus, these two methods essentially assigned greater weights to ECMWF ensemble. From the evaluation of single model ensemble, it was verified that ECMWF persistently gained the highest skill scores based on multiple metrics, and contributes to 20% achievements in the multimodel ensemble. This means giving greater weights to EPSs which provide better forecasts would lead to a fairly improved multimodel ensemble. However, this doesn’t mean that 27
the forecasts from those inferior EPSs are useless. The comparison of ECMWF and ensemble formed by the remaining EPSs showed that ensemble based on inferior components can provide comparable skillful forecasts as the best single model. The proposed calibration strategy and modular approach undoubtedly provided a time-saving optimization and favorable performance concerning statistical indicators of NSE and Bias for both daily flows and peak flows. Compared with antecedent applications (Voisin et al., 2011; Yuan et al., 2014), the value of NSE was found to be drastically improved by the calibrated VIC hydrological model in this study. Despite the substantial improvement, the systematical uncertainty propagation from the simple combination method was poorly considered given that this was not the emphasis of this study, and only a rough comparison is made for Quzhou Station. It is found that the existence of hydrological errors is significant when comparing the skill scores calculated from observation and proxy during the evaluation period. So far, there are several studies that proposed various methods, such as heteroscedastic autocorrelated residual error models (Evin et al., 2013), Bayesian methods for residual error modeling (Evin et al., 2014), ERRIS (Li et al., 2016) and so on, to handle hydrological errors. It is well recognized that consideration of the residual errors of hydrological model is beneficial for accurate forecasts. The expression of peaks remains a challenge for ensemble forecasting. As demonstrated by other studies (Hamill, 2012; Shrestha et al., 2013), the EPSs in nature behave much less successfully in predicting heavy precipitation. As a matter of fact, an antecedent evaluation of TIGGE precipitation in this study area by our team has confirmed that the EPSs do not produce enough events exceeding 20mm/24h (Liu et al., under review). Hence, the unsatisfactory performance in peak flows was to a great extent attributed to defective precipitation forecasts. This study suggested that the ensemble mean 28
could bring in an overall improvement in flood process forecasting, but for flood peaks, it was not appropriate to use ensemble means as the extreme flows would be smoothed out and the ensemble mean describes only the most certain component of forecasts (Surcel et al., 2014). A boxplot or a hydrograph of each streamflow realization seems to be an optional choice. If the ensemble was requisite, some studies (Amengual et al., 2015; Iyer et al., 2016) have used probability matching (PM) (Ebert et al., 2001) to reassign the ensemble mean using the frequency distribution of ensemble members. Thus, PM ensemble mean has the spatial distribution of the ensemble mean, but the specific amounts reproduce the distribution of the individual members. Detailed procedure can be found in Iyer et al. (2016). The results were proven to be slightly improved. Alternatively, it is also recommended to incorporate limited-area ensemble prediction systems to supplement the global EPSs (Diomede et al., 2014; Verbunt et al., 2007), and fundamentally improve the precipitation forecasting.
7. Conclusion In this study, an ensemble flood forecasting system has been developed with the VIC hydrological model based on parallel automatic calibration and multimodel ensemble methods. Results suggested that the automatic calibration based on ε-NSGAⅡmulti-objective parallel algorithm provides a time-saving optimization, showing good performance concerning statistical indicators of NSE and Bias for both daily flows and peak flows. The use of modular approach to combine simulations from VIC/D model and VIC/P model resulted in an overall improved simulation. Comparison between the forecasts of single model ensemble means showed that streamflow driven by ECMWF consistently had the best overall performance in terms of CRPSS, BSS and reliability diagram, followed by CMC and UK. The forecasts by CMA ensemble had obvious overestimation. As for ensemble methods, combination based
29
on members and skill scores (MM2 and MM4) provided more improvements than others. The equally weighted combination (MM1) had the poorest skills. This study highlighted the importance of calibration strategy for distributed hydrological model and ensemble methods in flood forecasting, and can be useful for practical flood forecasting.
Acknowledgement This study is financially supported by Zhejiang Provincial Natural Science Foundation of China (LR14E090001), National Key Research and Development Plan of China "Inter-governmental
Cooperation in International Scientific and Technological Innovation"(2016YFE0122100), and National Natural Science Foundation of China (91547106; 51379183). National Climate Center of China Meteorological Administration and Zhejiang Hydrological Bureau are greatly acknowledged for providing meteorological and hydrological data in the study area. QPFs were obtained from ECMWF’s TIGGE data portal. Thanks are also given to ECMWF for the development of this portal software and for the archives of this immense dataset.
Reference Alfieri, L., Pappenberger, F., Wetterhall, F., Haiden, T., Richardson, D., Salamon, P., 2014. Evaluation of ensemble streamflow predictions in Europe. J. Hydrol. 517, 913-922. Alvarez, G.C., Ryu, D., Western, A.W. ,Su, C.H., Crow, W.T., Robertson, D.E., Leahy, C., 2015. Improving operational flood ensemble prediction by the assimilation of satellite soil moisture: comparison between lumped and semi-distributed schemes. Hydrol. Earth Syst. Sci. 19(4), 1659-1676. Amengual, A., Homar, V., Jaume, O., 2015. Potential of a probabilistic hydrometeorological 30
forecasting approach for the 28 September 2012 extreme flash flood in Murcia, Spain. Atmos. Res.166, 10-23. Bartholmes, J.C., Thielen, J., Ramos, M.H., Gentilini, S., 2009. The European flood alert system EFAS–Part 2: Statistical skill assessment of probabilistic and deterministic operational forecasts. Hydrol. Earth Syst. Sci. 13(2), 141-153. Bennett, J.C., Robertson, D.E., Shrestha, D.L., Wang, Q.J., 2013. Selecting reference streamflow forecasts to demonstrate the performance of NWP-forced streamflow forecasts. 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1–6 December 2013. Bennett JC, Robertson DE, Shrestha DL, Wang QJ, Enever D, Hapuarachchi P, Tuteja NK, 2014. A system for continuous hydrological ensemble forecasting (SCHEF) to lead times of 9 days. J. Hydrol. 519, 2832-2846. Bourgin, F., Ramos, M. H., Thirel, G., Andreassian, V., 2014. Investigating the interactions between data assimilation and post-processing in hydrological ensemble forecasting. J. Hydrol. 519, 2775-2784. Bradley, A. A., Schwartz, S.S., 2011. Summary verification measures and their interpretation for ensemble forecasts. Mon. Weather Rev. 139, 3075-3088. Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1-3. Bröcker, J., Smith, L.A., 2007. Increasing the reliability of reliability diagrams. Weather Forecast. 22(3), 651-661. Chen, H., Yang, D., Hong, Y., Gourley, J. J., Zhang, Y., 2013. Hydrological data assimilation with the Ensemble Square-Root-Filter: Use of streamflow observations to update model states for real-time flash flood forecasting. Adv. Water Resour. 59, 209-220. 31
Christensen, N., Lettenmaier, D.P., 2006. A multimodel ensemble approach to assessment of climate change impacts on the hydrology and water resources of the Colorado River Basin. Hydrol. Earth Syst. Sci. Discuss. 3(6), 3727-3770. Chu, J., Zhang, C., Fu, G., Li, Y., Zhou, H., 2015. Improving multi-objective reservoir operation optimization with sensitivity-informed dimension reduction. Hydrol. Earth Syst. Sci. 19, 3557-3570. Clark, M.P., Rupp, D.E., Woods, R.A., Zheng, X., Ibbitt, R.P., Slater, A.G., Schmidt, J., Uddstrom, M.J., 2008. Hydrological data assimilation with the ensemble Kalman filter: Use of streamflow observations to update states in a distributed hydrological model. Adv. Water Resour. 31(10), 1309-1324. Cloke, H.L., Pappenberger, F., 2009. Ensemble flood forecasting: A review. J. Hydrol. 375, 613-626. Confesor, R.B., Whittaker, G.W., 2007. Automatic calibration of hydrologic models with multi-Objective evolutionary algorithm and Pareto optimization. J. Am. Water Resour. Assoc. 43(4), 981-989. Corzo, G.A., Solomatine, D.P., Hidayat, de Wit, M., Werner, M., Uhlenbrook, S., Price, R.K., 2009. Combining semi-distributed process-based and data-driven models in flow simulation: a case study of the Meuse river basin. Hydrol. Earth Syst. Sci. 13, 1619–1634. Cuo, L., Pagano, T.C., Wang, Q.J., 2011. A review of quantitative precipitation forecasts and their use in short-to medium-range streamflow forecasting. J. Hydrometeorol. 12(5), 713-728. Deb, K., Pratap A., Agarwal S., Meyarivan, T., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197. DeChant, C.M., Moradkhani, H., 2014. Toward a reliable prediction of seasonal forecast uncertainty: Addressing model and initial condition uncertainty with ensemble data assimilation and sequential Bayesian combination. J. Hydrol. 519, 2967-2977. 32
Demirel, M.C., Booij, M.J., Hoekstra, A.Y., 2013. Effect of different uncertainty sources on the skill of 10 day ensemble low flow forecasts for two hydrological models. Water Resour. Res. 49(7), 4035-4053. Demirel, M. C., Booij, M., Hoekstra, A., 2015. The skill of seasonal ensemble low-flow forecasts in the Moselle River for three different hydrological models. Hydrol. Earth Syst. Sci. 19, 275–291 Diomede, T., Marsigli, C., Montani, A., Nerozzi, F., Paccagnella, T., 2014. Calibration of limited-area ensemble precipitation forecasts for hydrological predictions. Mon. Weather Rev. 142(6), 2176-2197. Du, N., Ottens, H., Sliuzas, R., 2010. Spatial impact of urban expansion on surface water bodies: A case study of Wuhan, China. Landscape Urban Plan. 175-185. Ebert, E. E., 2001. Ability of a poor man's ensemble to predict the probability and distribution of precipitation. Mon. Weather Rev. 129(10), 2461-2480. Evin, G., Kavetski, D., Thyer, M., Kuczera, G., 2013. Pitfalls and improvements in the joint inference of heteroscedasticity and autocorrelation in hydrological model calibration. Water Resour. Res. 49, 4518-4524. Evin, G., Thyer, M., Kavetski, D., McInerney, D., Kuczera, G., 2014. Comparison of joint versus postprocessor approaches for hydrological uncertainty estimation accounting for error autocorrelation and heteroscedasticity. Water Resour. Res. 50, 2350-2375. Fan, F.M., Collischonn, W., Meller, A., Botelho, L.C.M., 2014. Ensemble streamflow forecasting experiments in a tropical basin: The São Francisco river case study. J. Hydrol. 519, 2906-2919. Fischer, G., Nachtergaele, F., Prieler, S., van Velthuizen, H.T., Verelst, L., Wiberg, D., 2008. Global agro-ecological zones assessment for agriculture (GAEZ 2008). IIASA, Laxenburg, Austria and FAO, Rome, Italy. 33
Fenicia, F., Solomatine, D.P., Savenije, H.H.G., Matgen, P., 2007. Soft combination of local models in a multi-objective framework. Hydrol. Earth Syst. Sci. 11(6), 1797-1809. Gneiting, T., Katzfuss, M., 2014. Probabilistic forecasting. Ann. Rev. Statist. Appl. 1, pp. 125-151. Gneiting T, Raftery AE, Westveld AH, Goldman T, 2005. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. Hagedorn, R., Buizza, R., Hamill, T.M., Leutbecher, M., Palmer, T.N., 2012. Comparing TIGGE multimodel forecasts with reforecast‐calibrated ECMWF ensemble forecasts. Q. J. R. Meteorol. Soc. 138(668), 1814-1827. Hamill, T.M., 2001. Interpretation of rank histograms for verifying ensemble forecasts. Mon. Weather Rev. 129, 550-559. Hamill, T.M., 2012. Verification of TIGGE multimodel and ECMWF reforecast-calibrated probabilistic precipitation forecasts over the contiguous united states. Mon. Weather Rev. 140, 2232-2252. Hansen, M.C., Defries, R.S., Townshend, J.R.G., Sohlberg, R., 2000. Global land cover classification at 1 km spatial resolution using a classification tree approach. Int. J. Remote Sens. 21, 1331-1364. Hashino, T., Bradley, A.A., Schwartz, S.S., 2006. Evaluation of bias-correction methods for ensemble streamflow volume forecasts. Hydrol. Earth Syst. Sci. Discuss. 3(2), 561-594. Her, Y., Cibin, R., Chaubey, I., 2015. Application of parallel computing methods for improving efficiency of optimization in hydrologic and water quality modeling. Appl. Eng. Agric. 31(3), 455-468. Hersbach, H., 2000. Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather. Forecast. 15, 559–570. He, Y., Wetterhall, F., Cloke, H. L., Pappenberger, F., Wilson, M., Freer, J., McGregor, G., 2009. 34
Tracking the uncertainty in flood alerts driven by grand ensemble weather predictions. Meteorol. Appl. 16(1), 91-101. Huang, L. X., Isaac, G.A., Sheng, G., 2012. Integrating NWP forecasts and observation data to improve nowcasting accuracy. Weather Forecast. 27(4), 938-953. Iyer, E.R., Clark, A.J., Xue, M., Kong, F., 2016. A comparison of 36–60-h precipitation forecasts from convection-allowing and convection-parameterizing ensembles. Weather Forecast. 31(2), 647-661. Johnson, C., Swinbank, R., 2009. Medium-range multimodel ensemble combination and calibration. Q. J. R. Meteorol. Soc.135(640), 777–794. Kollat, J.B., Reed, P.M., 2006. Comparing state-of-the-art evolutionary multi-objective algorithms for long-term groundwater monitoring design. Adv. Water Resour. 29(6), 792-807. Laumanns, M., Thiele, L., Deb, K., Zitzler, E., 2002. Combining convergence and diversity in evolutionary multiobjective optimization. Evol. Comput. 10(3), 263-282. Li, H., Sheffield, J., Wood, E.F., 2010. Bias correction of monthly precipitation and temperature fields from Intergovernmental Panel on Climate Change AR4 models using equidistant quantile matching. J. Geophys. Res. Atmos. 115(D10). Li, M., Wang, Q.J., Bennett, J.C., Robertson, D.E., 2016. Error reduction and representation in stages (ERRIS) in hydrological modelling for ensemble streamflow forecasting. Hydrol. Earth Syst. Sci. 20, 3561-3579. Liang, M., Ouarda, T.B.M.J., Bobee, B., 1999. Review towards operational guidelines for over-threshold modeling. J. Hydrol. 225, 103–117. Liang, X., Lettenmaier, D.P., Wood, E.F., Burges, S.J., 1994. A simple hydrologically based model of land surface water and energy fluxes for GSMs. J. Geophys. Res. 99(D7), 14415-14428, 35
Liang, X., Lettenmaier, D.P., Wood, E.F., 1996. One-dimensional statistical dynamic representation of subgrid spatial variability of precipitation in the two-Layer Variable Infiltration Capacity model, J. Geophys. Res. 101(D16) 21,403-21,422. Liu, l., Gao C., Zhu, Q., Xu, Y.-P., Evaluation of TIGGE daily accumulated precipitation forecasts over Qu River basin of China. Theor. Appl. Climatol. under review. Liu, Y., Gupta, H.V., 2007. Uncertainty in hydrologic modeling: Toward an integrated data assimilation framework. Water Resour. Res. 43(7). Lohmann, D., Nolte-Holube, R., Raschke, E., 1996. A large-scale horizontal routing model to be coupled to land surface parameterization schemes, Tellus. 48(A), 708-721. Lohmann, D., Raschke, E., Nijssen, B., Lettenmaier, D.P., 1998. Regional scale hydrology: I. Formulation of the VIC-2L model coupled to a routing model. Hydrol. Sci. J. 43(1), 131-141. Maurer, E.P., Wood, A.W., Adam, J.C., Lettenmaier, D.P., Nijssen, B., 2002. A long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States. J. Clim. 15(22), 3237-3251. Mendoza, P.A., Mcphee J., Vargas, X., 2012. Uncertainty in flood forecasting: A distributed modeling approach in a sparse data catchment. Water Resour. Res. 48, W09532. Mishra, V., Cherkauer, K.A., Bowling, L.C., 2010. Parameterization of lakes and wetlands for energy and water balance studies in the great lakes region. J. Hydrometeorol., 11(5), 1057-1082. Mo, K.C., Lettenmaier, D.P., 2014. Hydrologic prediction over the conterminous United States using the national multi-model ensemble. J. Hydrometeorol. 15(4), 1457-1472. Möller, A., Lenkoski, A., Thorarinsdottir, T. L., 2013. Multivariate probabilistic forecasting using ensemble Bayesian model averaging and copulas. Q. J. R. Meteorol. Soc. 139(673), 982-991. 36
Murphy, A.H., 1973. A new vector partition of the probability score. J. App. Meteo. 12(4), 595-600. Nester, T., Komma, J., Viglione , A., Blӧschl, G., 2012. Flood forecast errors and ensemble spread—A case study. Water Resour. Res. 48, W10502. Pagano, T.C., Shrestha, D.L., Wang, Q.J., Robertson, D., Hapuarachchi, P., 2013. Ensemble dressing for hydrological applications. Hydrol. Process. 27(1), 106-116. Pan, M., Wood, E.F., 2006. Data assimilation for estimating the terrestrial water budget using a constrained ensemble Kalman filter. J.Hydrometeorol. 7(3), 534-547. Pappenberger, F., Bartholmes, J., Thielen, J., Cloke, H.L., Buizza, R., de Roo, A., 2008. New dimensions in early flood warning across the globe using grand‐ensemble weather predictions. Geophys. Res. Lett. 35(10). Pappenberger, F., Thielen, J., Del Medico, M., 2011. The impact of weather forecast improvements on large scale hydrology: analysing a decade of forecasts of the European Flood Alert System. Hydrol. Process. 25, 1091–1113. Park, Y.Y., Buizza, R., Leutbecher, M., 2008. TIGGE: Preliminary results on comparing and combining ensembles. Q. J. R. Meteorol. Soc. 2029-2050. Raftery, A.E., Gneiting, T., Balabdaoui, F., Polakowski, M., 2005. Using Bayesian model averaging to calibrate forecast ensembles. Mon. Weather Rev. 133(5), 1155-1174. Randrianasolo, A., Thirel, G., Ramos, M.H., Martin, E., 2014. Impact of streamflow data assimilation and length of the verification period on the quality of short-term ensemble hydrologic forecasts. J. Hydrol. 519, 2676-2691. Reed, P., Minsker, B.S., Goldberg, D.E., 2003. Simplifying multiobjective optimization: An automated design methodology for the nondominated sorted genetic algorithm-II. Water Resour. Res. 39(7), 37
1196-1201. Reed, S., Koren, V., Smith, M., Zhang, Z., Moreda, F., Seo, D.J., Participants, D.M.I.P., 2004. Overall distributed model intercomparison project results. J. Hydrol. 298(1), 27-60. Renard, B., Kavetski, D., Kuczera, G., Thyer, M., Franks, S.W., 2010. Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors. Water Resour. Res. 46(5). Renner, M., Werner, M.G.F., Rademacher, S., Sprokkereef, E., 2009. Verification of ensemble flow forecasts for the River Rhine. J. Hydrol. 376(3), 463-475. Roulin, E., 2006. Skill and relative economic value of medium-range hydrological ensemble predictions. Hydrol. Earth Syst. Sci. 3(4), 1369-1406. Salathé Jr, E.P., Hamlet, A.F., Mass, C.F., Lee, S.Y., Stumbaugh, M., Steed, R., 2014. Estimates of twenty-first-century flood risk in the Pacific Northwest based on regional climate model simulations. J. Hydrometeorol. 15(5), 1881-1899. Seiller, G., Anctil, F., Perrin, C., 2012. Multimodel evaluation of twenty lumped hydrological models under contrasted climate conditions. Hydrol. Earth Syst. Sci. 16(4), p-1171. Schumann, G.P., Neal, J.C., Voisin, N., Andreadis, K.M., Pappenberger, F., Phanthuwongpakdee, N., Hall, A.C., Bates, P.D., 2013. A first large‐scale flood inundation forecasting model. Water Resour. Res. 49(10), 6248-6257. Shrestha, D.L., Robertson, D.E., Bennett, J.C., Wang, Q.J., 2015. Improving precipitation forecasts by generating ensembles through postprocessing. Mon. Weather Rev. 143, 3642-3663. Shrestha, D.L., Robertson, D.E, Wang, Q.J., Pagano, T.C., Hapuarachchi, H.A.P., 2013. Evaluation of numerical weather prediction model precipitation forecasts for short-term streamflow forecasting 38
purpose. Hydrol. Earth Syst. Sci. 17, 1913-1931. Surcel, M., Zawadzki, I., Yau, M. K., 2014. On the filtering properties of ensemble averaging for storm-scale precipitation forecasts. Mon. Weather Rev. 142(3), 1093-1105. Themeßl, M.J., Gobiet, A., Heinrich, G., 2012. Empirical-statistical downscaling and error correction of regional climate models and its impact on the climate change signal. Clim. Chang. 112, 449-468. Themeßl, M.J., Gobiet, A., Leuprecht, A., 2011. Empirical-statistical downscaling and error correction of daily precipitation from regional climate models. Int. J. Climatol. 31. 1530-1544. Tiwari, M.K., Chatterjee, C., 2010. Uncertainty assessment and ensemble flood forecasting using bootstrap based artificial neural networks (BANNs). J. Hydrol. 382(1), 20-33. Toth, E., 2009. Classification of hydro-meteorological conditions and multiple artificial neural networks for streamflow forecasting, Hydrol. Earth Syst. Sci. 13, 1555-1566. Troy, T.J., Wood, E.F., Sheffield, J., 2008. An efficient calibration method for continental‐scale land surface modeling. Water Resour. Res. 44(9). Verbunt, M., Walser, A., Gurtz, J., Montani, A., Schär, C., 2007. Probabilistic flood forecasting with a limited-area ensemble prediction system: selected case studies. J. Hydrometeorol 8(4), 897-909. Verkade, J.S., Brown, J.D., Reggiani, P., Weerts, A.H., 2013. Post-processing ECMWF precipitation and temperature ensemble reforecasts for operational hydrologic forecasting at various spatial scales. J. Hydrol. 501, 73-91. Voisin, N., Pappenberger, F., Lettenmaier, D.P., Buizza, R., Schaake, J.C., 2011. Application of a medium-range global hydrologic probabilistic forecast scheme to the Ohio River basin. Weather Forecast. 26(4), 425-446. Walker, W.E., Harremoës, P., Rotmans, J., Van der Sluijs, J.P., Van Asselt, M.B., Janssen, P., Krayer 39
von Krauss, M.P., 2003. Defining uncertainty - a conceptual basis for uncertainty management in model-based decision support. Integr. Assess. 4, 5-17. Wang, E., Zhang, Y., Luo, J., Chiew, F.H., Wang, Q.J., 2011. Monthly and seasonal streamflow forecasts using rainfall‐runoff modeling and historical weather data. Water Resour. Res. 47 (5). Warmink, J.J., Janssen, J.A.E.B., Booij, M.J. and Krol, M.S., 2010. Identification and classification of uncertainties in the application of environmental models. Environ. Modell. Softw. 25, 1518-1527. Wilcke, R.A.I., Mendlik, T., Gobiet, A., 2013. Multi-variable error correction of regional climate models. Clim Chang. 120(4), 871-887. Wilks, D.S., 2011. Statistical methods in the atmospheric sciences (Vol. 100). Academic press. WMO (2015) Forecast verification: Issues, Methods and FAQ. Available from: http://www.cawcr.gov.au/projects/verification/ Wood, A.W, Schaake, J.C., 2008. Correcting errors in streamflow forecast ensemble mean and spread. J. Hydrometeorol. 9, 132-148. Wu, L., Seo, D., Demargne, J., Brown, J.D., Cong, S., Schaake, J., 2011. Generation of ensemble precipitation forecast from single-valued quantitative precipitation forecast for hydrologic ensemble prediction. J. Hydrol. 399, 281-298. Xie, Z., Yuan, F., Duan, Q., Zheng, J., Liang, M., Chen, F., 2007. Regional parameter estimation of the VIC land surface model: Methodology and application to river basins in China. J. Hydrometeorol. 8(3), 447-468. Ye, J., He, Y., Pappenberger, F., Cloke, H.L., Manful, D.Y., Li, Z., 2013. Evaluation of ECMWF medium-range ensemble forecasts of precipitation for river basins. Q. J. R. Meteorol. Soc. 140, 1615–1628. 40
Yuan, X., Wood, E.F., Roundy, J.K., Pan, M., 2013. CFSv2-based seasonal hydro-climatic forecasts over the Conterminous United States. Clim. Chang. 26(13), 4828-4847. Yuan, X., Wood, E.F., Liang, M., 2014. Integrating weather and climate prediction: Toward seamless hydrologic forecasting. Geophys. Res. Lett. p.2014GL061076. Zhang X., Booij, M.J., Xu, Y.P., 2015. Improved simulation of peak flows under climate change: postprocessing or composite objective calibration? J. Hydrometeorol. 16, 2187-2208. Zhao, T., Bennett, J.C., Wang, Q.J., Schepen, A., Wood, A.W., Robertson, D.E., Ramos, M.-H., 2017. How suitable is quantile mapping for post-processing GCM precipitation forecasts? J. Clim. 30(9), 3185-3196. Zhou, R., Li, Y., Lu, D., Liu, H., Zhou, H., 2016. An optimization based sampling approach for multiple metrics uncertainty analysis using generalized likelihood uncertainty estimation. J. Hydrol. 540, 274-286.
41
42
ε-NSGAII
VIC model
Observed meteorological data
Spatial interpolation
Parallel program Multi-objective calibration
VIC/P Hydrological model for peak flows by POT approach
VIC/D Hydrological model for daily flow series
Model combination Flood forecasting
Post-processing
Ensemble
Evaluation
QPFs from TIGGE
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Fig. 1 Basin location, elevations, hydro-meteorological stations and basin outlets. Fig. 2. Methodological framework employed in this study. Fig. 3. Comparison of streamflow from (a) VIC/D model; (b) VIC/P model; (c) composite model at Lanxi Station during calibration period. Fig.4. Inter-objective tradeoffs in solutions produced by ε-NSGA Ⅱ .The upper panels are inter-objective comparisons at Lanxi Station for (a) NSE/P versus NSE/D, (b) NSE/P versus Bias/P. The lower ones show the objective tradeoffs between different stations for (c) daily flows, and (d) peak flows. Fig. 5. Daily flow series (lines) and selected peak flows (circles) using POT approach during calibration period at (a) Jinhua, (b) Quzhou and (c) Lanxi. The right panels show the ecdfs of the observed and simulated peak flows at three stations: (d) Jinhua; (e) Quzhou; and (f) Lanxi. Fig. 6. Daily flow series (lines) and selected peak flows (circles) using POT approach during validation period at (a) Jinhua, (b) Quzhou and (c) Lanxi. The right panels show the ecdfs of the observed and simulated peak flows at three stations: (d) Jinhua; (e) Quzhou; and (f) Lanxi. Fig. 7. Rank histograms of single model ensemble streamflow forecast. Grey lines indicate an ideally uniform distribution of the fractions. The forecasts forced by the same model are shown in rows and in columns are the histograms at same lead time. Fig. 8. Brier Skill Score, the reliability score and resolution score decomposed from Brier Score for different singlemodel ensemble means at lead time of 72h conditional on different threshold quantiles. The different EPSs are indicated in different symbols. The filled markers are results calculated from proxy, and the unfilled ones are on observations. (a) Jinhua, (b) Lanxi and (c) Quzhou station. Fig. 9. Plots of the reliability diagram and the corresponding sharpness diagram for single model 57
ensemble mean at lead time of 72h conditional on three different threshold quantiles from 2009 to 2012. (a)forecasts at Quzhou Station evaluated on proxy; (b) forecasts at Quzhou Station evaluated on observations, and (c) sharpness diagram, where the bars for each EPSs mean the frequency of forecasts falling into each probability bins. Fig. 10. CRPSS of daily flows by different single model ensembles with reference forecasts at three stations versus the forecast lead time. (a) Jinhua, (b) Lanxi, and (c) Quzhou with solid lines for proxy, and dashed lines for observation. Fig. 11. Improvement in Brier Skill Score, the reliability score and resolution score decomposed from Brier Score for different singlemodel ensemble means at lead time of 72h against different threshold quantiles. The different EPSs are indicated in different symbols. The filled markers are results calculated on proxy, and the unfilled ones are on observations. (a) Jinhua, (b) Lanxi and (c) Quzhou Station. Fig. 12. Comparison of the raliability diagram and the sharpness diagram between the forecasts of different ensemble methods and the ECMWF. The lead time is 72h, and three different thresholds are used. (a)forecasts at Quzhou Station evaluated on proxy; (b) forecasts at Quzhou Station evaluated on observations, and (c) sharpness diagram where the bars for each multimodel ensembles mean the frequency of forecasts falling into each probability bins. Fig.13. (a) Relative contribution of single model to the multimodel ensembles as a function of lead time at Lanxi Station. The relative contribution is defined as 1 − CRPSS{_ ⁄|}=!!~_ ∗ 100% with CRPSS{_ denoting reduced multimodel and |}=!!~_ the original grand multimodel.The CRPSS{_
is obtained by removing one of the constituent EPSs from |}=!!~_ . (b) CRPSS of reduced multimodel ensembles by removing ECMWF from |}=!!~_ (black line) compared with the best 58
single model ECMWF ensemble (grey line) versus lead time. Fig. 14. Peaks extracted from raw single model ensemble mean of streamflow series using POT approach (upper panels) compared with that from the bias corrected series by QM (lower panels). Figures from left to right are forecasts at lead time of 72h, 120h and 240h. Fig. 15. Peaks extracted from raw multimodel ensemble mean of streamflow series using POT approach (upper panels) compared with that from the bias corrected series by QM (lower panels). Figures from left to right are forecasts at lead time of 72h, 120h and 240h. Fig. 16. The ecdf of peaks (black circles) extracted from proxy at Lanxi Station with lead time of 72h. The boxplots contain the corresponding peaks extracted from forecasts of each single member, and the boxplots present the medians (red lines), the 25% and 75% quantiles (the edges of the box), most extreme data points (whispers) and outliers (cross) in peak forecasts.
59
A hydrological ensemble flood forecasting system including an automatically calibrated VIC model with parallel algorithm is constructed.
A simple yet efficient modular approach is proposed to combine the advantage of two differently parameterized hydrological models.
Different multimodel ensembles based on various weighting methods are explored to improve the forecasting performance of daily flows and peak flows.
60