Journal of Hydrology (2007) 338, 161– 173
available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/jhydrol
Sensitivity testing of a coupled Escherichia coli – Hydrologic catchment model S. Haydon
a,*
, A. Deletic
b
a
Melbourne Water Corporation, CRC for Water Quality & Treatment, Department of Civil Engineering, Victoria 3800, Australia b Institute for Sustainable Water Resources, Department of Civil Engineering, Monash University, Clayton Victoria 3800, Australia Received 27 October 2005; received in revised form 8 September 2006; accepted 1 February 2007
KEYWORDS Escherichia coli; Catchment model; Sensitivity; Model parameters; Calibration
Summary A conceptual model of microbial behaviour in catchments coupled with a standard hydrological model, known as the EG model, has been previously developed and tested for Escherichia coli. Due to the unavailability of pathogen data, E. coli has been used as a pathogen indicator. However, the model uses a broad conceptual approach and therefore should be tested for other microbes in future. This paper presents work done on sensitivity of the EG model, as well as its further refinement. Sensitivity of the model results to all E. coli calibration parameters was carried out. The EG model was then tested for its sensitivity to the number of events used to calibrate the model. The data collected at three different Australian drinking water catchments were used. Of the four parameters in the E. coli component of the EG model, two proved to be insensitive while the other two proved to be important. The sensitive parameters were the coefficients associated with the ‘wash-off’ functions in the model, while the two insensitive coefficients were associated with the E. coli decay functions in the model. However, the model became more sensitive towards the decay parameters in cleaner catchments. This indicates that the hydrologic aspects of the E. coli transport processes dominate rather than the E. coli decay functions. Apart from one catchment (that was partly urbanised and much smaller than the other two), the model was successfully calibrated using a small number of monitored events. It was concluded that the EG model could be simplified further by not modelling the decay of the pathogen indicator, E. coli. ª 2007 Elsevier B.V. All rights reserved.
* Corresponding author. Tel.: +61 3 9235 7221. E-mail addresses:
[email protected] (S. Haydon),
[email protected] (A. Deletic). 0022-1694/$ - see front matter ª 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2007.02.037
162
Introduction Waterborne illness is a significant public health issue killing approximately 3.3 million people per year (WHO, 2001). Waterborne illness is still a considerable disease burden within developed countries. For example, the USA EPA estimates that waterborne illness causes in the vicinity of 900 000 illnesses per year (AWWA, 1999). The bulk of this disease burden is from microbiological pathogens washed from catchments as opposed to chemical or radiological impacts. To evaluate the pathogen risks associated with various land-uses and hydrological characteristics of drinking water catchments, it is necessary to reliably model the concentrations of pathogens or at least their indicators washed from these catchments into nearby streams. There is surprisingly little published on rainfall-runoff models that also incorporate modelling of pathogens. A review by Ferguson et al. (2003) of the major knowledge gaps in pathogen research in catchments cited seven published pathogen modelling studies. However, most of these models are existing rainfall-runoff or sediment transport catchment models that are simply adopted for pathogen evaluation. In other words, these models were not originally intended to incorporate pathogen movement, but equations developed for other processes were simply adopted for modelling of pathogens. Recently, the authors have developed a lumped conceptual pathogen model coupled to a hydrological model, known as the EG model (Haydon and Deletic, 2004, 2006). The model was successfully tested for the pathogen indicator Escherichia coli on three catchments in Australia (Haydon and Deletic, 2006). The model is still to be tested for pathogens, such as Salmonella, Camplyobacter, Cryptosporidum, Giardia, and other indicators such as Enteroccocci. To the authors’ knowledge, no major study has been published on uncertainty or sensitivity of pathogen or pathogen indicator models. However, this is crucial for proper development and consequent application of any model. For example, Rabitz (1989) cites Kolb who claims that modelling without sensitivity analysis is ‘intellectually dishonest’. Butts et al. (2004) points out that the main sources on uncertainty or sensitivity in flow modelling are: 1. model structure, 2. uncertainties due to sub-optimal parameter values, 3. uncertainty in model output data used in the process of calibration and verification, 4. uncertainty in input data (e.g., uncertainties in measured data used to feed the models). It could be argued that similar sources of uncertainty would occur in a coupled hydrologic-pathogen model. Determination of modelled results sensitivity to the model parameters is usually the first step in assessing model uncertainties. The classical method used for these analyses is to plot the surface of the objective function as a function of model parameters, hoping that it is well behaved and smoothly slopes to global minima, which clearly points to the optimal parameter set (Beven, 2001). The analogy being
S. Haydon, A. Deletic that the objective function surface resembles a single depression on a flat plain. However in many areas of hydrologic modelling this surface is highly irregular and more like a mountain range with a series of peaks and numerous valleys. Kuczera (1997) argues that many models are poorly formulated or ‘ill-posed’ making it difficult to identify the global optimum. Beven (2001) also discussed the notion of an optimal parameter set and concluded that the classical techniques of statistical inference do not apply well to hydrologic modelling and a more flexible approach is needed. He argues that ‘. . . the concept of the optimum parameter set may be ill-founded in hydrological modelling.’. Therefore, throughout this paper the parameter set that gives the minimum of a chosen objective function is referred to as the ‘reference parameter’ set rather than the optimal parameter set. The importance of a calibration data set has been shown in the literature on pollution catchment modelling. Mourad et al. (2004) showed that urban stormwater model validation performance improved as the number of events used to calibrate increased but only up to a certain level. At some point extra events do not improve a models predictive performance. Assessing the optimal number of events needed for calibration in the case of pathogens or their indicators will have huge implications since pathogen data collection is highly expensive. Within the literature there is not a clear definition of what constitutes sensitivity analysis and what constitutes uncertainty analysis and the terms are often used interchangeably. For example in Saltelli et al. (2004) the entire research field is regarded as sensitivity analysis, while Amirghassemi and Modarres (1990) regard it all as uncertainty analysis. Cacuci (2003) separates the two, using sensitivity analysis to examine the effects of parameter variations on model output and uncertainty analysis to examine the effect of parameter uncertainties on model output uncertainty. Amirghassemi and Modarres (1990) made the point that uncertainty occurs both in the process description (the model) and the state of the process (behaviour of the model) implying that the models structure is also a source of uncertainty, an area of research for others (Butts et al., 2004). To avoid further confusion, we differentiate sensitivity analysis from uncertainty analysis on the following basis (in some way similar to Cacuci, 2003): • Sensitivity analysis is the term used to study the influence that non-measurable factors have on the modelled results. The impact of calibration model parameters will be the first factor examined. The importance of the different parts (equations) of the model; known as the sensitivity of the model structure; also belongs to this category. The determination of a models parameter set is inherently linked to the calibration process and the amount of data used for calibration. Therefore the number of events or length of data used for calibration is also a sensitivity issue. • Uncertainty analysis is the term used to describe the uncertainty in the model results due to uncertainty in measurable inputs that are fed into the model. For example input data such as rainfall data, evapotranspira-
Sensitivity testing of a coupled Escherichia coli – Hydrologic catchment model tion, or catchment area are measured with some uncertainty and that will affect the reliability of the model results (e.g., Molini et al., 2001). In a similar way, uncertainty in the measurable data used for model calibration may impact the model results. In other words, when you measure something you can evaluate its uncertainty, and then propagate it through a model. But when you vary non-measurable factors, such as a model calibration parameter or use a different length of input data record, there is no uncertainty about that, even though the output of the model may be largely affected by the given changes. However, it should be noted that some researchers and modellers would use different terms for the above analyses. This paper presents results from the sensitivity analysis of the EG E. coli model. The sensitivity of the model calibration parameters was studied and conclusions drawn on the important components of the models structure. In the second part the sensitivity of the model results in relation to the amount of data used for calibration was assessed. This is regarded as important since collection of pathogen data is very expensive. The model structure was refined as well as some interesting findings drawn on the appropriate application domain of the EG model. However, it should be noted that some readers may refer to some parts of this work to be on model structure uncertainty.
The EG model parameters The EG model can predict concentrations and loads of pathogen indicators E. coli in discharges from large catchments. The model is fully explained in another paper (Haydon and Deletic, 2006) and therefore only a brief description is presented here. As shown in Figure 1, EG is a simple conceptual E. coli model coupled with the hydrologic model SimHyd (Chiew et al., 2002), that is able to predict concentrations of E. coli during both dry and wet weather on a continuous
163
basis There are two model components namely surface, and sub-surface components with respective stores in each. Loss and discharge of E. coli was modelled for each store. The governing equations are detailed in Table 1 (flow rates and volumes are modelled using SimHyd). As the model is a mass-balance model, exports and losses are constrained to ensure that they do not exceed the total available amount of E. coli. So it is possible to run a store empty or a concentration to zero but it is not allowed to go into a deficit. The EG model does not explicitly incorporate resuspension of E. coli in streams or on river banks as this is a hydraulic process rather than a hydrologic one. However, the two stores will provide an equivalent function but to a lesser extent. Later research is planned to incorporate this process in a suitably modified hydraulic model linked to the EG model. Even more importantly, the EG model has a broad conceptual basis that could be adopted for modelling of other microbes and should also be tested for a number of pathogens. The EG model coefficients a1, a2, b1, and b2 (Table 1 and Fig. 1), that must be positive or zero, should be calibrated for each catchment. Their brief description is below. • Parameter a1 is attached to the surface loss function, and is essentially a surface E. coli decay parameter that sets the rate of loss from the surface store of E. coli. • Parameter a2 is attached to the infiltration export function and effectively determines how much of the E. coli in the surface store can be transported to the sub-surface store. • Parameter b1 is attached to both the interflow and baseflow export functions and determines the concentration of the E. coli in the baseflow and the interflow. It essentially sets the total amount exported from the sub-surface store. • Parameter b2 is attached to the sub-surface loss equation and is similar to a1 in that it determines the loss or decay rate of the sub-surface store.
Deposition
Surface Loss: Parameter a1
Surface store
Surface Component
Infiltration: Parameter a2 SimHyd
Sub-surface Loss: Parameter b2
Sub-surface store
Sub-surface Component
Discharge: Parameter b1 Concentration of pathogens
Figure 1
The EG model structure.
Flow rates, Water volumes
164 Table 1
S. Haydon, A. Deletic EG model equations
The model equation
Explanation
Eq. no.
8 < Ps ðt 1Þ v s ðt 1Þ Pinfil ðtÞ Q infil ðtÞ Dt Ps ðtÞ v s ðtÞ ¼ max Ps;loss ðtÞ v s ðt 1Þ þ FDEPðtÞ v s ðt 1Þ : 0
Mass-balance
1
Ps,loss(t) = a1 · Ps(t 1) · PET(t) and 0 6 Ps,loss(t) 6 Ps(t 1) Pinfil(t) = a2 · Qinfil(t) · Ps(t 1) and 0 6 Pinfil(t) 6 [Ps(t 1) · vs(t 1)/(Qinfil(t) · Dt)] FDEP(t, x, i) = MFEDP(t, x, i)/MONTH(i)
Loss Infiltration Inputs
2 3 4
Mass-balance
5
Loss
6
Surface component
Sub-surface component 8 P ss ðtÞ v ss ðt 1Þ þ Pinfil ðtÞ Q infil ðtÞ Dt > > < P inter ðtÞ Q inter ðtÞ Dt Pbas ðtÞ Q bas ðtÞ Dt Pss ðtÞ v ss ðtÞ ¼ max P ss;loss ðtÞ v ss ðt 1Þ þ v ss ðt 1Þ SEPTIC > > : 0 and 0 6 Pss,loss(t) 6 Pss(t 1) Pss;loss ðtÞ ¼ b2 Pvssssðt1ÞSMS ðt1Þ h i Q inter ðtÞDt ss ðt1Þv ss ðt1ÞP inter ðtÞ Pinter ðtÞ ¼ b1 Pss ðt 1Þ and 0 6 P inter ðtÞ 6 ðPinterPðtÞ v ss ðtÞ Q inter ðtÞþP bas ðtÞ Q bas ðtÞÞDt h i P ss ðt1Þv ss ðt1ÞP bas ðtÞ Pbas ðtÞ ¼ b1 Q basv ssðtÞDt P ss ðt 1Þ and 0 6 Pbas ðtÞ 6 ðPinter ðtÞQ ðtÞ inter ðtÞþP bas ðtÞQ bas ðtÞÞDt
Interflow export
7
Baseflow export
8
SEPTIC(t)
Inputs
9
Surface component Ps(t) is the concentration of pathogens in the store at time t in orgs/l Pinfil(t) is the concentration of pathogens leaving the store by infiltration at time t in orgs/l Ps,loss(t) is the loss of pathogens in the store during Dt in orgs/l FDEP(t) is direct deposition of pathogens on the block in organisms per time step vs(t) is the volume in the surface store at time t in litres (this must be > 0) Qinfil(t) is the outflow rate via infiltration from the store at time t in l/h MFEDP(t, x, i) is the monthly deposition of pathogens on the store in organisms/month MONTH(i) is the number of time steps in the month PET(t) is the potential evapotranspiration at time t in l/h Dt is the time step (recommended to be 1 h) a1, a2, are surface component calibration coefficients Sub-surface component Pss(t) is the concentration of pathogens in the store at time t in orgs/l Pinfil(t) is the concentration of pathogens entering the store via infiltration at time t in orgs/l Pbas(t) is the concentration of pathogens leaving the store via baseflow at time t in orgs/l Pinter(t) is the concentration of pathogens leaving the store via interflow at time t in orgs/l Pss,loss(t) is the loss of pathogens in the store during Dt SEPTIC(t) is the septic( or other sources) input into the store during time t in orgs/l vss(t) is the volume in the subsurface store at time t in litres Qinfil(t) is the infiltration inflow from surface runoff into the store at time t in l/h Qinter(t) is the outflow from the store due to interflow processes at time t in l/h Qbas(t) is the outflow from the store due to baseflow processes at time t in l/h b1, and b2 are sub-surface component calibration coefficients
It should be noted that the SimHyd hydrologic model (that the EG model is coupled to) produces a notable proportion of the flows as interflow or excess from its Soil Moisture store, rather than as overland flow; SimHyd is effectively modelling discharge by putting the water through its Soil Moisture store. In similar way, the EG model is restricting the supply of E. coli to the pathogen sub-store as a function of the inflow into its Soil Moisture store. However, surface flow behaviour is catered for by the Soil Moisture Stores excess and interflow processes.
In conclusion, these are all artefacts of the conceptualisation of the split of the discharge into overland and interflow.
Methods Sensitivity analysis As stated in the Introduction, the main tasks were to determine how sensitive the EG model is to its parameters (and
Sensitivity testing of a coupled Escherichia coli – Hydrologic catchment model eventually learn more about the model structure) and the amount of data used for calibration. The model was run on an hourly timestep (in the previous work it was found that this is the minimum required for the catchments studied, Haydon and Deletic, 2006), and the following was carried out for one catchment: 1. manual calibration of the model, 2. automatic calibration of the four EG model parameters using PEST (Doherty, 2003), 3. local sensitivity analysis – the objective function was analysed around its optimal PEST value. As previously mentioned the ‘PEST optimal’ parameter set is used as the ‘reference’ set for the sensitivity analyse, 4. model calibration and verification for different combinations of measured events. These are explained in more detail below.
Calibration of the model Since the EG model is non-linear and complex, an approximate optimisation method had to be applied to determine the reference set of calibration parameters (Beven, 2001). The PEST package (Doherty, 2003) determines optimal parameter sets for models using an objective function (U) that minimizes the sum of the squares of the error between measured and predicted (i.e., least squares). This is a common method for optimisation in which the main weakness is that it tends to fit the peaks at the expense of a good fit in other parts of the time series (Beven, 2001). However, as the peak values are important for modelling of microbes (i.e., peak concentrations are the most critical for management of treatment works), the method was regarded as acceptable. A manual calibration was undertaken first to determine a reasonable set of starting parameters for the automatic calibration package. The SimHyd model was optimised first followed by the optimisation of the EG model parameters. As this trial was focussing on the E. coli component of the model, once the optimal set of parameters were determined for the SimHyd component these were effectively ‘locked in’ and used for the rest of the investigation. An option for calibration that was considered but not used, was to jointly calibrate both the hydrologic and the E. coli model, rather than the two-step process used here. A coupled optimization was not undertaken because the EG E. coli model is intended to be an addition to a suite of catchment models that are all coupled to the hydrologic model SimHyd. This modelling tool, named E2 (Toolkit, 2005), currently models sediment, phosphorous and nitrogen within catchments. Therefore, optimising SimHyd for E. coli only may have proved sub-optimal for the other uses of the hydrologic model. It is more common practice to calibrate the hydrologic model first, independent of the other attached models, optimise it and use it as a fixed base model to run other models on.
Local sensitivity analysis The objective was to test the sensitivity of the model results to changes in the model parameters around their reference
165
values. Literature refers to such testing as ‘local’ analysis (Saltelli et al., 2004; Cacuci, 2003). The method applied for preliminary evaluation of the importance of each parameter has been referred to as ‘dotty plots’ in the literature (e.g., Wagener et al., 2003). A simple stepping system was used around the reference values of the four parameters determined by the PEST optimisation. The runs were carried out according to the following protocol: • Each parameter was incremented by ±5%, ±10%, ±15%, ±20% around its reference value (determined by PEST). This gave a total of 9 steps per parameter: 80%, 85%, 90%, 95%, 100%, 105%, 110%, 115%, 120% of its reference value. • A combination matrix of the values that 4 parameters can take at one time was developed. Since each parameter can take 9 values for one catchment, there were 94 = 6561 parameter combinations per catchment. • To test each combination of parameters, 6561 runs of the EG model were carried out for each catchment. The value of the objective function, U, was recorded for each run. As the major area of interest was the optima only, the runs that had the 1000 lowest values of U were analysed (the 1000 ‘best’ fits) using the dotty plot technique; the U function was plotted against all model parameters. Because the model has 4 parameters, the dotty plots were difficult to interpret. To compare the sensitivity of the parameters a frequency analysis of the parameters was carried out as follows: • The parameter sets for one catchment were ranked with respect to U and then only the runs that had a U within 10% around minimal U (i.e., the top runs) were analysed further. For example, using only these runs, it was calculated how often Parameter a1, set at any of its step values (i.e., 80%, 85%, 90%, 95%, 100%, 105%, 110%, 115%, 120% of its reference value) occurs. The frequency for each step value was presented as a ratio between how often the step value occurs and the total number of all runs that have U within 10% of its minima. • The parameter frequencies were compared for a number of areas around the reference U. To test the influence of the size of the areas the analysis was repeated for U ± 1%, 5%, 10%, 20%, and 50% around its minimal value. The 10% value gave the clearest response. PEST calculates a composite sensitivity for each parameter, which in effect is the amount of ‘leverage’ each parameter is exerting on the optimisation process. So that parameters with low composite sensitivities are not contributing much to the optimisation process and can be fixed so that they take no further part in the optimisation process (Doherty, 2003). Finally, the independence between the model parameters was examined by calculating cross-correlation coefficients between the model parameters using only ‘top’ 1000 parameter sets.
166
S. Haydon, A. Deletic
Model calibration and verification using different combinations of events The objective of this analysis was to assess the sensitivity of the results to the amount of the data used for calibration. For reasons explained above, at the start of the analyses the SimHyd parameters were fixed at their reference (calibrated by PEST separately using all available flow data). Calibration of the four parameters of the EG model was done using PEST in a number of ways as listed below. 1. Reference values of the four model parameters were determined using all available data (as already explained above), 2. Non-baseflow parameter values were optimised using only storm event data; all baseflow data were omitted and only the data collected during storm (wet weather) events used, 3. Calibration and verification using different number of storm (wet weather) events in the following way: A set of parameters was determined using data on a single event. The set was then used to predict concentrations and loads of E. coli for the other events recorded at the catchment (i.e., verification for all other events was carried out). This was repeated for each of N events recorded at the catchment. A set of parameters was determined using data on 2, 3, . . ., N 1 events. It was then used to predict concentrations and loads of E. coli for the remaining events not used in the calibration. This was repeated for each combination of 2, 3, . . ., N 1 events at the catchment. The listed calibration runs were done independently. However, for the runs listed under 2 and 3 above, the reference parameter set was used as initial parameter values in PEST.
Data The data used in this investigation were collected at three catchments in southern Australia. They varied in size, land-use and hydrology, and their main characteristics are presented in Table 2. O’Shannassy is a large catchment that
Table 2
is fully forested with no human development within the catchment (a protected drinking water catchment). This catchment is notable for its significant baseflow component. Myponga is also large, but open and a multi-use catchment with a ‘flashier’ hydrology. Aldgate is a small urbanised catchment with a high E. coli load and the most variable hydrology. Table 2 also summarises the extent of the microbiological data set used in this study. It is clear that Aldgate is a much more variable catchment having a large range of both baseline and storm event E. coli levels compared to the other two catchments. The monitoring program carried out on these catchments is fully presented in Roser et al. (2002); and only a brief description is presented here. Rainfall was recorded using 0.2 mm/tip tipping buckets, while flow was monitored by recording stage height on a rated profile. The continuous hourly rainfall and streamflow data were collected for several years at all three sites. Storm event sampling was undertaken using fully refrigerated automated samplers during wet weather episodes. In the O’Shannassy catchment a storm event was deemed to occur if more than 25 mm of rainfall occurred at a specified raingauge in less than 24 h (in a similar way storm events were defined in other two catchments). By collecting a number of samples per event, time series of E. coli concentration during wet weather were monitored (rather than a single event sample as is usual practice). Samples were also manually collected during dry weather (base flow). E. coli concentrations were measured in all collected water samples as the main pathogen indicator All results are for confirmed analyses using a defined substrate method (i.e., Colilert). Other microorganisms (i.e., Cryptosporidium, Giardia, Clostridia, Campylobacter, Enteroccoci) were also measured but on a more reduced scale. This projects quality assurance program was more stringent than usual water quality monitoring practice to ensure good quality data (Roser and Ashbolt, 2004). In the case of the E. coli data for every ten event or baseflow samples taken a further 3 samples (approximately) were taken for quality assurance. These samples were tested either as blanks, replicates or spiked. Blanks are used to show the number of false positives produced by the laboratory. In the case of the E. coli data used here only one sample out of 41 blanks was recorded as giving a false positive, and this returned a low
Catchment and data details
Catchment
O’Shannassy
Myponga
Aldgate
Area (ha) Land-use Monitoring period Average rainfall (mm/year) Average streamflow (mm/year) Number of events monitored/total number of event samples Number of baseflow samples Total number of samples Range of baseflow samples E. coli (orgs/100 ml) Range of event samples E. coli (orgs/100 ml) Range of event peaks E. coli (orgs/100 ml)
11 900 Forested and fully protected 2000–2002 1322 675 4/48
7668 Rural 2001–2002 763 72 3/33
780 61% urban 2001–2002 934 250 7/84
12 60 1–260 51–2000 400–2000
13 46 68–550 4600–56 000 11 000–56 000
12 96 86–13 000 270–130 000 3100–130 000
Sensitivity testing of a coupled Escherichia coli – Hydrologic catchment model
fall and flow. The agreement between measured and modelled flow rates was assessed. The correlation coefficient between measured and modelled ranged from r2 = 0.55 to 0.74, and the Nash–Sutcliffe Coefficient of Model Efficiency ranged from E = 0.5 to 0.74. All the sites had less than one hundred E. coli data points, grouped into baseflow and 3–7 events (depending on the catchment). This clearly posed some challenges in calibration of the E. coli model. Unfortunately some problems were recorded during calibration of the EG model for Aldgate. While trying to reach the lowest value for its objective function for the measured peak concentrations, the model was predicting extremely high concentrations during very low flows (i.e., baseflows and very small events) for which generally no measured data were available. Fig. 2 shows results of unrestricted PEST optimisation for Aldgate, with clear unrealistically high concentrations around 31/10/2002. Therefore it was decided to restrict the range within which the PEST was allowed to search for parameters to eliminate this problem. While PEST was generally as unrestricted as possible, allowing it to produce parameter sets that gave clearly physically unrealistic results for Aldgate and was not prudent. Fig. 3 shows the results for Aldgate for the restricted PEST calibration. At this point it was considered whether to use the measured flux of E. coli (a product of the flow rate and concentration usually expressed in organism per hour) instead of concentrations for the calibration process. It is very likely that the problem would have largely disappeared if this approach was used as the flows during the period in question were very small. However, the authors have chosen not to do this and have kept using the concentration rather than the flux for a number of reasons. Concentrations are typically reported when collecting E. coli information, while high concentrations represent the maximum challenge that a water treatment facility is usually placed under. However, even more importantly, by using the flux in the process of calibration, medium but highly loaded events could have similar ‘importance’ as large but lightly loaded events (if their total loads are similar). Therefore, the larger events would dominate even more than they do using a concentration approach, skewing the objective function heavily to fit the large events. This is obviously not desired, since
E. coli relative standard deviation (%) replicate
Catchments
Baseflow
Event
Myponga and Aldgate O’Shannassy
19 ± 11 (n = 16) [0–41]
15 ± 13 (n = 8) [0–41] NA
23 ± 12 (n = 3) [0–47]
reading of 20 orgs/100 ml. This indicates that the level of false positives in the data is low. Similarly, spiking of samples tests for false negatives, and none were recorded of the 37 tested. Replicates are used to test the reproducibility of sample analysis. A series of multiple replicates of event and baseflow samples were tested. Unfortunately replicates of event samples for O’Shannassy were not taken and replicate sample results for Aldgate and Myponga were reported as a single number, not separately. However, the results give an indication of the inherent variability in sample measurement of E. coli for this project. The results of this testing was reported for the various microbes analysed as relative standard deviation (RSD), which is: RSD = 100 * standard deviation/mean. Table 3 shows the replicate results taken from Roser and Ashbolt, 2004. The curved bracketed figures indicate the number of replicate runs tested, while the ± figure indicates the variability between replicate runs. The square bracket figures are the range of RSD for plus and minus 2 standard deviations on the average RSD. So for example for Myponga and Aldgate baseflow, a total of 16 replicate runs were undertaken with an average RSD of 19% but with a standard deviation of 11%. Which means the plus and minus 2 standard deviation range is from 0 to 41%. Notably one of the other catchments studied in this project reported an event RSD of 12 ± 9 similar to the Myponga and Aldgate event RSD.
Results and discussion Model calibration The calibration of the SimHyd models was quite straight-forward due to the large volume of the available data on rain-
1
40000 35000 E coli orgs/100ml
30000
E Coli
0.5
Pred Path
25000
Flow mm/hr
Table 3 results
167
Act Flow
20000 15000
0
10000 5000 0
-0.5 1/0
9/ 2
11 00
2
/ 09
21 / 20
02
/ 09
/ 20
1/ 1 02
0/ 2
11 00
2
/10
21 / 20
02
/10
31 / 20
02
/ 10
10 /20
02
/11
20 / 20
02
/ 11
30 / 20
02
/ 11
10 / 20
/12
02
Time
Figure 2
The optimised model by unrestricted PEST (using all data) for Aldgate.
/ 20
02
168
S. Haydon, A. Deletic 1
40000 Measured E coli Predicted E coli
30000
Measured Flow
0.5
25000
Flow mm/hr
E coli orgs/100ml
35000
20000 15000
0
10000 5000 0
-0.5 1 /0
9 /2
11 00
2
/0 9
21 /2 0
/0 9
02
/2 0
1 /1 02
0 /2
11 00
2
/1 0
21 /2 0
/1 0
02
31 /2 0
02
/1 0
10 /2 0
02
/1 1
20 /2 0
02
/1 1
30 /2 0
/1 1
02
10 /2 0
/1 2
02
/2 0
02
Time
Figure 3
The optimised model by restricted PEST (using all data) for Aldgate.
medium events with high concentrations may present the main problem, as explained earlier.
Local sensitivity of model parameters
2.00E-03
3.00E-04
1.90E-03
2.80E-04
1.80E-03
2.60E-04
1.70E-03
2.40E-04
1.60E-03
2.20E-04
a2
a1
The change in the objective function as a function of each of the four EG model parameters, the dotty plots, are presented in Fig. 4 for O’Shannassy and in Fig. 5 for Aldgate (it should be noted that results from only 1000 parameter combinations with the lowest U were plotted). Fig. 4 shows that there are clear optima for all parameters for O’Shannassy, with some indications that a2 and b1 are to be more sensi-
tive than the other two. The third catchment, Myponga had results similar to this. However, the plots for Aldgate gave a far less clear message. It appears that the range in which parameters have been tested should have been extended, as no clear optima were achieved. However, as discussed earlier when parameters were unrestricted the model was producing unacceptable results for the periods of low flows (Fig. 2). Figs. 6–8 present the frequency of the parameters for runs that had a U within 10% of its minimum, for Aldgate, Myponga and O’Shannassy, respectively. They are all presented in a non-dimensional way, as the parameter fre-
1.50E-03
2.00E-04
1.40E-03
1.80E-04
1.30E-03
1.60E-04
1.20E-03
1.40E-04
1.10E-03
1.20E-04
1.00E-03 2.50E+08
2.55E+08
2.60E+08
2.65E+08
2.70E+08
1.00E-04 2.50E+08
2.75E+08
220
2.55E+08
2.60E+08
2.65E+08
2.70E+08
2.75E+08
2.80E+08
9.50E-02 9.00E-02
200
8.50E-02 8.00E-02
180
b2
b1
7.50E-02 160
7.00E-02 140
6.50E-02 6.00E-02
120
5.50E-02 100 2.56E+08 2.58E+08 2.60E+08 2.62E+08 2.64E+08 2.66E+08 2.68E+08 2.70E+08 2.72E+08
Figure 4
5.00E-02 2.50E+08
2.55E+08
2.60E+08
2.65E+08
Objective function (U) versus the four model parameters for O’Shannassy.
2.70E+08
2.75E+08
Sensitivity testing of a coupled Escherichia coli – Hydrologic catchment model
169
2.40E-08 1.20E-05
2.20E-08 1.10E-05
2.00E-08
a2
a1
1.00E-05
9.00E-06
1.80E-08
8.00E-06
1.60E-08
7.00E-06
1.40E-08
6.00E-06 2.56E+12
2.58E+12
2.60E+12
2.62E+12
2.64E+12
2.66E+12
2.68E+12
1.20E-08 2.56E+12 2.58E+12 2.60E+12 2.62E+12 2.64E+12 2.66E+12 2.68E+12 2.70E+12
2.70E+12
70000
1.40E-03
65000
1.30E-03
60000 1.20E-03
b1
b2
55000 1.10E-03
50000 1.00E-03
45000 9.00E-04
40000
2.58E+12
2.60E+12
2.62E+12
Figure 5
2.64E+12
2.66E+12
2.68E+12
2.70E+12
8.00E-04 2.56E+12 2.58E+12 2.60E+12 2.62E+12 2.64E+12 2.66E+12 2.68E+12 2.70E+12
Objective function (U) versus the four model parameters, for Aldgate.
quency versus a% of change of a parameter around its reference value. The rule for ‘reading’ these graphs is very simple; for insensitive parameters the frequency distribution should be fairly flat (i.e., the parameter can take any value and will still have a good fit), while for sensitive parameters it should be a curved distribution with a clear optima (there is an value that the majority of ‘good runs’ have). All three figures show clearly that, for all catchments studied, the models output is more sensitive to parameters a2 and b1 than the other two parameters (a1 and b2). This is noteworthy as the sensitive parameters are associated with the hydrologic transport components of the E. coli model, while the insensitive parameters are associated with the decay functions for the two microbe stores (Fig. 1). This implies that the transport functions are more critical than the decay functions in the model. Intuitively this makes sense, as there is often a large number of E. coli ready to be transported but the restriction is the energy available to move them, which in this model is represented by the flows. To some extent from Figs. 6–8 it is also notable that, for the relatively lightly loaded O’Shannassy catchment there is less of a clear separation between the sensitivity of the transport and decay coefficients. It appears that for relatively ‘clean’ catchments, where it is possible to run out of E. coli to wash-off the decay terms are more important than for ‘dirty’ catchments where there is an abundant supply of E. coli. While similar analyses, using the runs that U within 1%, 5%, 20%, and 50% around its minimal values were tried the 10% range gave the clearest result. Due to the relatively small number amount of data used in the 1% and 5% cases the response was very ‘‘noisy’’. While for the higher 20%
800 a1 700
a2 b1
600
b2
500 F r eq u en cy
35000 2.56E+12
400 300 200 100 0 -20
-15
-10
-5
0
5
10
15
20
% Change in Parameter from Reference
Figure 6 Aldgate parameter frequency within 10% of the reference value of the objective function, U.
and 50% values, the opposite occurs, there is so much data that everything is averaged, and the curves in Figs. 6–8 appear as straight lines (see Table 4). The composite sensitivities for the parameters calculated by PEST for each catchment are shown in Table 5. The composite sensitivities for the log-transformed parameters in ranked order, by catchment are: • Aldgate: a2, b1, a1, b2. • Myponga: b1, a2, b2, a1. • O’Shannassy: b1, a2, a1, b2. There appears to be two pairs in the above results, as a2 and b1 are always first or second (higher sensitivity) and a1
170
S. Haydon, A. Deletic 800
Table 5
Parameter
Aldgate
Myponga
O’Shannassy
600
a1 a2 b1 b2
33 992 68 754 68 307 8364
33 505 53 142 80 110 37 176
1479 1672 2523 1396
500 F r eq u en cy
PEST composite sensitivities
700
400 300 a1 200
a2 b1
100
b2
0 -20
-15
-10
-5
0
5
10
15
20
% Change in Parameter from Reference
Figure 7 Myponga parameter frequency within 10% of the reference value of the objective function, U.
800 700
a1 a2
600
b1 b2
F req u en cy
500 400
Table 6 Cross-correlation coefficients between the model parameters r2
a2
b1
b2
O’Shannassy a1 1 a2 0.13 b1 0.56 b2 0.11
1 0.37 0.17
1 0.43
1
Myponga a1 a2 b1 b2
1 0.10 0.18 0.05
1 0.32 0.11
1 0.19
1
1 0.04 0.04 0.0
1 0.69 0.02
1 0.0
1
Aldgate a1 a2 b1 b2
300 200 100
a1
0 -20
-15
-10
-5
0
5
10
15
20
% Change in Parameter from Reference
Figure 8 O’Shannassy parameter frequency within 10% of the reference value of the objective function, U.
and b2 are always third or fourth (lower sensitivity). This outcome is consistent with the results presented above. Based on the two approaches giving the same result it can be concluded that transport parameters a2 and b1 are more sensitive than the decay parameters a1 and b2. The calculated coefficients of cross-correlation between the EG parameters are presented in Table 6. There appears to be little pattern or significance to the correlation between the parameters, with the highest being between a2
Table 4 Calibration parameters, and the reference objective function (local minimum) from the PEST calibration Parameter
Aldgate
Myponga
O’Shannassy
a1 a2 b1 b2
3.33 E05 1.71 E8 70 259 8.56 E4
5.31 E2 3.52 E4 213 0.179
1.093 E3 1.555 E4 198.067 0.1743
Objective function (U)
2.08 E + 12
1.91 E + 11
4.31E + 8
and b1, however the magnitude and the sign varies between the catchments quite significantly. Given the rest of the analysis has shown that b2 and a1 are relatively insensitive it was not expected that any correlation would occur.
Effect of number of events used in calibration There was little significant change in the parameters and the objective function when the baseflow data was omitted for all catchments. For example for Myponga the parameter sets as well as U values were identical at the level of significant figures chosen. One of the main reasons for this is that baseflow E. coli concentrations are often several orders of magnitude below storm event concentrations and as the objective functions is seeking to minimize least squares, small values will not have much impact. The initial analysis of the event combinations was to plot the correlation coefficients between modelled and observed E. coli concentrations, r2, against the number of events used for the calibration. Fig. 9b and Fig. 10b shows these plots for O’Shannassy and Aldgate (respectively), that indicate that the range of the correlation coefficient will decrease with an increase in the numbers of events (it is clear that if only one event is used for calibration r2 will vary a lot depending which event was used). Myponga results had similar trend. However, in the case of Aldgate
Sensitivity testing of a coupled Escherichia coli – Hydrologic catchment model
b 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
r2
r2
a
171
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0
1
2
3
4
0
1
2
No of Events
3
4
No of Events
Figure 9 Correlation coefficient between measured and modelled E. coli concentrations versus number of events for calibration (a) and validation (b) of O’Shannassy.
tively. As the number of events increases there is a reduction in the range of r2 for O’Shannassy, as well as for Myponga (where the predictive correlations coefficients were all quite good and the small amount of scatter was present). However in the case of Aldgate there appears to
even after using six events there is still a large range of correlation coefficients (Fig. 10b). How the number of events used in validation affects the r2 between modelled and measured concentrations is shown in Fig. 9a and Fig. 10a, for O’Shannassy and Aldgate, respec-
a
b 1
1 0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
r2
r2
0.9
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0
1
2
3
4
5
6
7
0
1
2
3
no of Events
4
5
6
7
no of Events
Figure 10 Correlation coefficient between measured and modelled E. coli concentrations versus number of events for calibration (a) and validation (b) of Aldgate.
a
b 100000.00% % Change in Parameter from Reference
% Change in Parameter from Reference
10000.00%
1000.00%
100.00%
10.00%
1.00%
1000.00%
100.00%
10.00%
1.00%
0.10% 0
1
2 no of events
Figure 11
10000.00%
3
4
0
1
2
3
4
5
6
7
no of events
Percentage change in parameter values with number of events used for calibration for O’Shannassy (a) and Aldgate (b).
172 be little improvement in the predictive performance with increasing numbers of events used to calibrate the model. This tends to show that more event data will not help an ill-performing model. The percentage change in each parameter versus the number of events used for calibration was presented in Fig. 11a for O’Shannassy and Fig. 11b for Aldgate. To reduce the number of plots, percentage changes for all four parameters are shown on the same graph. This shows the wider range of parameter estimates that were produced for Aldgate compared to those from O’Shannassy. The Aldgate parameters range over an additional two orders of magnitude than the O’Shannassay parameters. Again the O’Shannassy and Myponga parameters are better behaved with less variation than the Aldgate ones. It is shown again that Aldgate is not as well modelled as the other two catchments. Clearly a much larger number of events for any of these catchments would enable more rigorous testing of the effect of the number of events in calibration.
Model structure and recommendations Given that the model had to be constrained for the Aldgate catchment, that it also had very poor calibration results, and that more events did not improve the calibration, we could conclude that the model may be inappropriate for this catchment. Compared to the other two catchments, Aldgate is much smaller and therefore its hydrologic response is much ‘flashier’, and it is much more variable in terms of E. coli response to storms. If the range of E. coli measured in both storms and baseflow situations is compared then Aldgate is much more variable than the other two catchments. The main problem could lay in the fact that it is an urbanised catchment with high likelihood of significant septic inflows (that have not been modelled). It is therefore very likely that the EG model structure would have to be developed further to cope with small urbanised catchments. However, from the fact that for all catchments decay parameters very not highly sensitive, it was concluded that they could be locked in and calibration performed using only the transport coefficients. This further poses the question about simplification of the model structure. Could the model be run by assuming no losses in one or both stores (Figure 1) is a question that is to be explored in future. From the parameter correlation matrices it appears that there is no significant correlation between the models parameters. This is encouraging because it indicates that the coefficients are independent within the model (models that have highly correlated parameters are usually overparameterised, and generally improve when some highly correlated parameters are removed). Development of the EG model has just commenced. A study on the uncertainty of the EG E. coli model due to uncertainties in the model inputs (rainfall, pathogen deposition rates, etc) has almost been completed. It is also planned to test the EG model for pathogens such as Cryptosporidium, Giardia, Clostridia, Campylobacter and Enteroccoci. Further to this, a study on the behaviour of pathogens and their indicators in streams is underway and it is planned to add an in-stream module to the EG model in a near future. Another area for research is to refine the model for use in urban catchments, where higher levels of
S. Haydon, A. Deletic imperviousness, stormwater drainage systems and sewer overflows all will require a more complex model. To undertake any improvements in this type of modelling will require larger data sets.
Conclusion The performance of the E. coli model is more sensitive to two of the four models parameters. These parameters are associated with the transport (i.e., hydrologic) aspects of the model more so than the E. coli decay aspects of the model. Indicating that focussing on good calibration of the attached hydrologic model is important. Given the paucity of E. coli data, calibration with a low number of events has been attempted. For large rural catchments few events are required for representative calibration. However it was clear that even larger number of events were not achieving good fit for a smaller urbanised catchment. The presented sensitivity analysis has given insight into the fundamental performance of the four components of the model under development and shown areas where future development will be beneficial and where it would be wasted effort. Clearly what is also shown is the value of the data and the danger in using models fitted with limited data for predictive purposes. It can also be concluded that while the model performs acceptably for the two rural catchments it appears to perform less adequately for a smaller urbanised catchment. As the intended use of the model is in larger rural catchments this is not a great concern, however its use in an urban environment should be undertaken with caution even when a large amount of good quality event data is available.
Acknowledgements The authors wish to acknowledge the support of both Melbourne Water and the Co-operative Research Centre for Water Quality & Treatment who are joint sponsors of this research.
References Amirghassemi, A., Modarres, M, 1990. A new look at uncertainty analysis; Uncertainty modelling and analysis. In: Proc. of First Int. Symp., December 1990. pp 676–681. AWWA, 1999. Waterborne pathogens, AWWA Manual M48, American Waterworks Association. Beven, K.J., 2001. Rainfall-runoff modelling. The Primer. John Wiley and Sons. Butts, M., Payne, J., Kristensen, M., Madsen, H., 2004. An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow simulation. J. Hydrol. 298, 242–266. Cacuci, D., 2003. Sensitivity and uncertainty analysis. Vol. 1. Theory’. Chapman and Hall/CRC. Chiew, F.H.S., Peel, M.C., Western, A.W., 2002. Application and testing of the simple rainfall-runoff model SIMHYD. In: Singh, V.P., Frevert, D.K. (Eds.), Mathematical Models of Small Watershed Hydrology and Applications. Water Resources Publication, Littleton, Colorado, pp. 335–367.
Sensitivity testing of a coupled Escherichia coli – Hydrologic catchment model Doherty, J., 2003. PEST: Model-independent parameter estimation – Version 7.0. User manual., Watermark Numerical Computing. Brisbane, Australia. Ferguson, C., Altavilla, N., Ashbolt, N.J., Deere, D.A., 2003. Prioritizing watershed pathogen research. Journal AWWA 95 (2), 92–102. Haydon, S., Deletic, A., 2004. Development of a coupled hydrologic-pathogen catchment model’. In: Proc. IWA 8th Int. Conf. on Diffuse/Nonpoint Pollution, Kyoto, Japan October 2004. pp. 204–210. Haydon, S., Deletic, A., 2006. Development of a coupled hydrologic-pathogen catchment model. J. Hydrol. 328 (3-4), 467–480. Kuczera, G., 1997. Efficient subspace probabilistic optimization for catchment models. Water Resour. Res. 33 (1), 177–185. Molini, A., La Barbera, P., Lanza, G., Stagi, L., 2001. Rainfall intermittency and the sampling error of tipping-bucket rain gages. Phys. Chem. Earth 26 (10-12), 737–742. Mourad, M., Bertrand-Krajewski, J.L., Chebbo, G., 2004. Stormwater quality models: sensitivity to calibration data’. In: Proc. of the 6th Conf. on Urban Drainage Modelling, Dresden. pp. 69–76.
173
Rabitz, H., 1989. Systems analysis at the molecular scale. Science 246, 221–226. Roser, D., Ashbolt, N., (Eds.), 2004. Transforming pathogen water quality data into source water monitoring and control information – Draft Report, Final Report for Project 2.2.1, Co-operative Research Centre for Water Quality and Treatment. Roser, D., Skinner, J., LeMaitre, C., Marshall, L., Baldwin, J., Billington, K., Kotz, S., Clarkson, K., Ashbolt, N.J., 2002. Automated event sampling for microbiological and related analytes in remote sites: a comprehensive system. Water Sci. Technol: Water Supply 2 (3), 123–130. Saltelli, A., Tarantola, S., Campologo, F., Ratto, M., 2004. Sensitivity analysis in practice, a guide to assessing scientific models. John Wiley and Sons. Toolkit 2005, available from:
. Wagener, T., McIntyre, N., Lees, M., Wheater, H., Gepta, H., 2003. Towards reduced uncertainty in conceptual rainfall-runoff modelling: Dynamic indetifiability analysis’. Hydrol. Process 17, 455–476. WHO, 2001. World water day report – water for health - taking charge, World Health Organisation. pp. 22.