Journal of Hydrology 372 (2009) 118–123
Contents lists available at ScienceDirect
Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol
Consideration of sample size for estimating contaminant load reductions using load duration curves Meghna Babbar-Sebens, R. Karthikeyan * Department of Biological and Agricultural Engineering, Texas A&M University, 2117 TAMU, College Station, TX 77843, USA
a r t i c l e
i n f o
Article history: Received 23 June 2008 Received in revised form 26 February 2009 Accepted 8 April 2009
This manuscript was handled by L. Charlet, Editor-in-Chief, with the assistance of Jiin-Shuh Jean, Associate Editor Keywords: Flow duration curve Load duration curve Percentile Quantile Total maximum daily load Water quality
s u m m a r y In Total Maximum Daily Load (TMDL) programs, load duration curves are often used to estimate reduction of contaminant loads in a watershed. A popular method for calculating these load reductions involves estimation of the 90th percentiles of monitored contaminant concentrations during different hydrologic conditions. However, water quality monitoring is expensive and can pose major limitations in collecting enough data. Availability of scarce water quality data can, therefore, deteriorate the precision in the estimates of the 90th percentiles, which, in turn, affects the accuracy of estimated load reductions. This paper proposes an adaptive sampling strategy that the data collection agencies can use for not only optimizing their collection of new samples across different hydrologic conditions, but also ensuring that newly collected samples provide opportunity for best possible improvements in the precision of the estimated 90th percentile with minimum sampling costs. The sampling strategy was used to propose sampling plans for Escherichia coli monitoring in an actual stream and different sampling procedures of the strategy were tested for hypothetical stream data. Results showed that improvement in precision using the proposed distributed sampling procedure is much better and faster than that attained via the lumped sampling procedure, for the same sampling cost. Hence, it is recommended that when agencies have a fixed sampling budget, they should collect samples in consecutive monitoring cycles as proposed by the distributed sampling procedure, rather than investing all their resources in only one monitoring cycle. Published by Elsevier B.V.
Introduction Development of Total Maximum Daily Loads (TMDL) for water bodies not meeting their designated uses is a requirement for all states, as per the Section 303(d) of the Clean Water Act (CWA) and the US Environmental Protection Agency (USEPA) Water Quality Planning and Management Regulations (40 Code of Federal Regulations [CFR] Part 130). An important part of the TMDL planning process requires the establishment of allowable loadings of pollutants for a water body based on the relationship between different sources of pollution and the in-stream water quality (USEPA, 1991). Load Duration Curves (LDCs) are popularly used graphical analytical tools that many states are currently using to illustrate the relationships between stream flow and allowable in-stream pollutant loadings. Load duration curves are based on flow duration curves and the allowable in-stream pollutant concentration based on the water quality criteria. Flow duration curves are graphical representations that provide relationship between stream flow values (e.g., average daily flows obtained from the US Geological Survey stations on the * Corresponding author. Tel.: +1 979 845 7951. E-mail address:
[email protected] (R. Karthikeyan). 0022-1694/$ - see front matter Published by Elsevier B.V. doi:10.1016/j.jhydrol.2009.04.008
streams) and percent of time those values are equaled or exceeded over a historical period (Vogel and Fennessey, 1994). Based on hydrologic conditions, the flow duration curve can also be divided into different flow zones or intervals. A commonly used five flow zones division is illustrated in Fig. 1. It consists of ordered stream flows classified into high flows (0–10% exceedence), moist conditions (10–40% exceedence), mid-range flows (40–60% exceedence), dry conditions (60–90% exceedence), and low flows (90–100% exceedence). A load duration curve is created by multiplying the ordered stream flow values in the flow duration curves by a numeric water quality target, a margin of safety, and a unit conversion factor for the concerned pollutant (USEPA, 2007; Cleland, 2003; Morrison and Bonta, 2008). The LDCs provide the pollutant load capacity of the stream at the monitored location and under different hydrologic conditions. These curves can be used as a reference for guiding pollutant load reduction efforts in the watershed. To estimate the desired pollutant load reductions, actual pollutant loads estimated from monitored pollutant concentrations and monitored stream flow values are plotted on the LDC graph and compared with the LDC. The overall existing load conditions for each flow zone can then be evaluated using different methods, such as regression schemes or other statistical estimators. One of the common statistical methods for estimating existing
119
M. Babbar-Sebens, R. Karthikeyan / Journal of Hydrology 372 (2009) 118–123
1.E-01
0
10
20 30 40 50 60 70 80 Percent of Days Flow Exceeded
Low Flow Zone
1.E+00
Dry Conditions Flow Zone
1.E+01
Mid-range Flow Zone
Moist Conditions Flow Zone
1.E+02 High Flow Zone
Average Daily Stream Flow (cfs)
Flow Duration Curve
1.E+03
90
100
Fig. 1. A flow duration curve for a stream location.
loads in each flow zone is based on multiplying 90th percentile (equivalent to 0.9 quantile) of observed pollutant concentrations with the stream flow value of the flow zone mid point (USEPA, 2007). Load reductions are then calculated by subtracting the LDC load values of the particular flow zone midpoint from the estimated existing loads in the corresponding flow zone. Fig. 2 shows an example of a stream (Plum Creek near Uhland, TX) where the load duration curve and monitored loads for Escherichia coli are used to estimate load reductions (For example, indicated by vertical downward arrow in the moist conditions flow zone), based on the 90th percentile method. Accuracy in estimation of the desired load reduction for a particular flow zone is dependent on the accuracy of the estimate of the existing load in a flow zone, which itself is dependent on the accuracy in the estimate of the 0.9 quantile of the monitored concentrations. The efficiency of an estimator used to estimate quanties depends on its bias, variance, and mean square error. For datasets with smaller sample sizes the lack of efficiency can be particularly significant, and several robust statistical methods based on order statistics have been proposed to improve the efficiencies. Parrish (1990), Sheather and Marron (1990), and Dielman et al. (1994) have reported comparisons of various existing robust methods, along with their merits and demerits. In context of sample size estimates for using LDCs there have been recent efforts made by Morrison and Bonta (2008), but their work only addresses an Monte Carlo simulation based exploratory analysis of how many water quality and water quantity samples are needed to use power regression equations for load estimation and create LDCs. This paper provides guidelines for using order statistics to explicitly calculate minimum samples sizes required for estimating load reductions based on the 90th percentile method described above. An adaptive sampling strategy for improving the performance of
E. coli Load (cfu/day)
1.E+14
the quantile estimator when constrained by an upper limit on the number of water quality samples (due to cost reasons) that can be collected in a sampling program is also proposed in this paper. The sampling strategy provides useful data collection and analysis guidelines for agencies involved in estimating load duration curves and load reductions based on the 90th percentile method for the TMDL programs. The remainder of the paper is organized as follows. The methodology section describes the calculation of minimum sample size requirements for the 90th percentile method and the proposed adaptive sampling methodology. It is followed by the results and discussion section that examines the performance of the proposed method and presents some empirical results. The final section is the conclusions section that summarizes the overall findings of this work.
Methodology Minimum sample size for estimating quantiles Sample size required to achieve a certain desired precision in the estimation of a qth quantile can be estimated based on its standard error and its asymptotic results. For an ordered sample dataset X(1) < X(2)
( ^ q Þ ¼ q; PðX X
^q ¼ and X
X ðnqÞ
if nq is an integer
X ð½nqþ1Þ
otherwise
ð1Þ
where 0 < q < 1, and [nq] is the integer part of nq. The distribution of ^ q , has been found to be asymptotically normal estimated quantile, X with the population mean Xq and variance q(1q)/{nf2(Xq)}, where f(Xq) is the probability density function of X evaluated at Xq (Smirnov, 1935 (cited by Garsd et al. (1983)), Smirnov, 1967). Based on a desired value of precision or relative margin of error d in the estimate of the population qth quantile, a significance level a can be determined such that the probability of the actual error in sample quantile and population quantile is larger than the desired error. Following guidelines provided by Cochran (1977), we obtain:
^ q jÞ > dX q Þ ¼ a; which is equivalent to d X ^ q ¼ Z a=2 rSE Pððj X q X ð2Þ where Za/2 is the upper a/2-quantile of the standard quantile nor^ q is the sample estimate of qth quantile of the mal distribution, X ^ q . In this work a data, and rSE is the asymptotic standard error of X value of 5% is chosen for a, or 1.96 for Za/2. Based on the asymptotic results of the distribution function of the qth quantile and Eq. (2), Garsd et al. (1983) proposed approximate minimum sample size estimation for the qth quantile:
1.E+13
^ q Þg2 ^ q ^f ðX minðnÞ ¼ Z 2a=2 fqð1 qÞg=fdX
1.E+12
^ q Þ is the sample estimate of the probability distribution where ^f ðX ^q. function of X evaluated at X Wilcox (1997) has recommended the use of Rosenblatt’s shifted histogram estimator, which belongs to the class of methods known as kernel density estimators, to estimate the sample probability ^ q Þ. This estimator uses the interquartile distribution function ^f ðX range, IQR, to estimate a span h:
1.E+11 1.E+10 1.E+09 1.E+08
0
10
20 30 40 50 60 70 80 Percent of Days Load Exceeded
90
100
Monitored Loads 90th Percentile of Monitored E.Coli Loads Load Duration Curve (10% Margin of Safety)
Fig. 2. Load duration curve, monitored loads, and load reductions (vertical downward arrow shown only for moist conditions) for E. coli at a stream location.
h¼
1:2ðIQRÞ ; n1=5
^ 0:75 X ^ 0:25 Þ where IQR ¼ ðX
ð3Þ
ð4Þ
If A is equal to the number of observations less than or equal to ^ q þ h, and B is equal to number of observations strictly less than X ^ q Þ can be calculated as: ^ X q h, then the estimate ^f ðX
120
M. Babbar-Sebens, R. Karthikeyan / Journal of Hydrology 372 (2009) 118–123
^f ðX ^ qÞ ¼ A B 2nh
ð5Þ
Eq. (3) can be used as a guideline for collecting water quality data associated with different flow zones of a flow duration curve, and for estimating the 0.9 quantile of monitored pollutant concentrations within a desired level of precision d for each of the flow zones. The estimation of minimum sample sizes of observations ^ 0:9 for different flow zones, however, needs an initial estimate of X ^ 0:9 Þ in Eq. (3). The most straightforward solution is to utilize and ^f ðX sample estimates based on existing historic data for predicting minimum sample size requirements for the next monitoring cycle (or, monitoring period). However, in practice, historic datasets can be very sparse, which can lead to misleading initial sample esti^ 0:9 Þ in Eq. (3). For example, Fig. 2 contains only ^ 0:9 and ^f ðX mates X 5, 15, 8, 14, and 4 monitored E. coli samples for the high flow zone, moist condition zone, mid-range flow zone, dry conditions zone, and low flow zone, respectively. Using these monitored samples ^ 0:9 Þ, and using a precision ^ 0:9 and ^f ðX to obtain initial estimates of X of d = 0.1 in Eq. (3) would lead to estimation of approximately 970, 384, 210, 1031, and 138 as minimum required sample sizes for each of the flow zones respectively. Thus, the sample size requirement calculated based on existing sparse historic data can therefore be impractical and expensive for the data collection agency. In the next section, we propose a sampling strategy for contiguous ^ 0:9 Þ, and mini^ 0:9 , ^f ðX monitoring cycles that adaptively assesses X mum sample sizes for each of the flow zones, while including constraints posed by costs on feasible sample sizes. Adaptive sampling strategy The underlying goal of this strategy is that, given an upper limit on the total number of samples that can be collected during a monitoring cycle, the number of new additional samples for different flow zones should be selected such that the overall improvement in precision in estimate of the 90th percentiles is maximized over all the flow zones. The main benefit of this sampling strategy is that it utilizes existing resources in best possible manner for collecting new water quality samples in different flow zones, while attaining maximum improvement possible in the precision of the 90th percentile. Let z be the total number of flow zones in the flow duration curve that are being considered for calculation of load reductions in the load duration curve, and let the number of samples in each of the flow zone be nf ;i in the ith monitoring cycle (where i = 0 refers to the initial existing historic dataset) and in the f flow zone (where f = 1–z). Let DNi + 1 be the total number of new samples that are scheduled to be collected in the (i + 1)th monitoring cycle in all the flow zones, and whose value is based on constraints posed by sampling cost. The minimum number of sample sizes for each flow zone in (i + 1)th monitoring cycle can then be estimated by including them as decision variables in an optimization problem that maximizes the improvement in total precision of the sample estimates of 90th percentiles of pollutant concentrations in all flow zones. This maximizing objective can be formulated as:
( Max Ddov erall ¼
) z X ^ dq;f ;i d q;f ;iþ1
ð6Þ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qð1 qÞ dq;f ;i ¼ ^ q;f ;i nf ;i X q;f ;i Þ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z a=2 qð1 qÞ ^ d q;f ;iþ1 ¼ ^ q;f ;i ^f i ðX ^ q;f ;i Þ ðnf ;i þ Dnf ;iþ1 Þ X Z a=2 ^f i ðX ^
ð7Þ ð8Þ
^ q;f ;i is the sample estimate of qth quantile (0.9 quantile in where X this case) from the samples collected in the f flow zone during the ^ q;f ;i Þ is the sample estimate of the probabilith monitoring cycle, ^f i ðX ity density function (using Rosenblatt’s shifted histogram in this case) of the qth quantile based on the samples collected in the f flow zone during the ith monitoring cycle, nf,i is the number of samples collected during the ith monitoring cycle in the f flow zone, and Dnf ;iþ1 is the number of new samples that should be collected during the (i + 1)th monitoring cycle in the f flow zone. The decision variables for this maximization problem are Dnf ;iþ1 , which are subjected to the following constraints: z X
Dnf ;iþ1 ¼ DNiþ1
ð9Þ
0 Dnf ;iþ1 DNiþ1
ð10Þ
f ¼1
Any nonlinear optimization technique can be used to solve the above maximization problem. We used the Generalized Reduced Gradient (GRG) algorithm (Lasdon et al., 1974, 1978; Lasdon and Waren, 1978) to solve for the decision variables. GRG has been shown to be very effective for optimization problems where the objective and constraint functions are highly nonlinear (Schittkowski, 1980). The above adaptive sampling strategy can be used in the following two ways for the monitoring cycles: (1) Lumped sampling procedure: If the expenses for collecting total number of maximum new samples (DN iþ1 ) have to be utilized in one new monitoring cycle, then based on the ini^ q;f ;i Þ ^ q;f ;i and ^f i ðX tial existing historic dataset the estimates X can be used to divide the total number of new samples (DN iþ1 ) into smaller sub-sample sizes for different flow zones. Each of the sub-sample size is estimated using the above adaptive sampling strategy for maximum overall improvement in the precision of the qth quantile across all flow zones. (2) Distributed sampling procedure: If multiple monitoring cycles (or monitoring periods) can be used within a sampling program based on the time and budget constraints of the data collection agency, then the budget for collecting the new samples can be distributed over consecutive monitoring cycles such that each monitoring cycle uses the latest esti^ q;f ;i and mates (from the preceding monitoring cycle) of X ^f ðX ^ Þ for calculating the number of new samples that i q;f ;i should be collected for each flow zone and maximum overall improvement in the precision of the qth quantile. The main advantage of this distributed sampling strategy is to assist the agency in deciding which flow zone should be sampled for existing water quality conditions in the next monitoring cycle, once the budget for a monitoring cycle has been decided.
f ¼1
where, Ddov erall is the overall expected improvement in the precision across all flow zones for the next (i + 1)th monitoring cycle, dq,f,i is the existing precision of the sample estimate of the qth quantile in the f flow zone based on existing dataset during the ith monitor^ ing cycle, and d q;f ;iþ1 is the expected new precision of the qth quantile estimate in the f flow zone by adding the new samples in the (i + 1)th monitoring cycle. Eq. (3) can be used to calculate dq,f,i and ^ d q;f ;iþ1 :
Results and discussion This section demonstrates the benefits of using the adaptive sampling strategy and the comparisons between lumped and distributed sampling procedures, for improving precision of estimated 90th percentiles of monitored samples in load duration curves.
121
M. Babbar-Sebens, R. Karthikeyan / Journal of Hydrology 372 (2009) 118–123
Adaptive sampling strategy The existing samples of E. coli collected in Plum Creek (near Uhland, TX) are plotted on the load duration curve in Fig. 2. The water quality samples were separated into sub-samples related to five flow zones (high flow zone, moist conditions zone, mid-range flow zone, dry conditions zone, and low flow zone). Usually, the flows in the high flow zone and low flow zone are not considered for Total Maximum Daily Load (TMDL) development. Therefore only the samples in the moist conditions zone, mid-range flow zone, and dry conditions zone were considered for further sampling to improve the precision of the 90th percentile of E. coli concentrations in each of the flow zones. Fig. 3 shows the initial precision in the estimate of 90th percentile for E. coli concentrations in these flow zones (i.e. when i = 0). It also shows the expected improvements in overall precision of the 90th percentile if different number of the total numbers of samples (DN iþ1 varying from 5 to 50) were used for the first next monitoring cycle (i.e. i = 1) using the lumped sampling procedure of the adaptive sampling strategy and a uniform sampling strategy. For the uniform sampling procedure, we assumed a theoretical equal distribution of DN iþ1 samples, instead of an optimized distribution, across all three flow zones. Fig. 4 shows the distribution of samples across the flow zones needed to achieve precision estimates in Fig. 3, when lumped sampling procedure and uniform sampling are used. The improvements in precision for the lumped and uniform sampling procedures, as ^ q;f ;i and shown in Fig. 3, are dependent on the estimates of X ^f ðX ^ q;f ;i Þ in Eqs. (7) and (8) and are based on only 15, 8, and 14 samples in the three flow zones in the initial dataset (i.e. when i = 0). Fig. 3 shows the benefits of using the optimization framework designed for the adaptive sampling strategy, instead of uniformly sampling all the flow zones with equal sample sizes. The optimized sampling scheme predict better improvements in precision across all the three flow zones (especially dry-conditions flow zone) compared to the uniform sampling schemes, for all sizes of DN iþ1 varying from 5 to 50. Though the lumped sampling procedure achieves a better quantitative performance than the uniform sampling strategy, the difference in precisions achieved by the two procedures and the difference in precision when sample size is increased from 5 to 50 is not significant. For example, Fig. 3 shows that in order to achieve only 30% (i.e. 0.3 units of change in the precision dq,f,i) improvement in the accuracy of the 90th percentile (q = 0.9) of E. coli data during dry conditions at least 24 new samples (Fig. 4) are needed during dry conditions in the next monitoring cycle (i = 1). The change in precision dq,f,i with increase in number of
new samples is even smaller for moist conditions and mid-range flows. This result demonstrates the need to re-sample and re-esti^ q;f ;i Þ, for a more reliable estimate of overall ^ q;f ;i and ^f i ðX mate X improvement in precision and more effective investment of sampling effort. In other words, if the regulatory agency had a budget for 50 new samples, based on the current initial data of only 14 samples that were collected during dry conditions an only 30% improvement in precision of 90th percentile of E. coli concentrations can be expected. Hence, it is advised that if the monitoring agency can incur costs of a sample size as large as 50, then the 50 samples should be distributed into smaller populations of DN iþ1 over multiple monitoring cycles. Each population should then be sampled in a consecutive manner such that the improvement in precision is estimated based on an increasing sample size of data collected up to the most recent monitoring cycle. The advantages of this distributed sampling procedure are presented in the next sub-section. Lumped versus distributed sampling procedure Since no new monitoring plans exist for the monitoring site represented in Fig. 2, artificial E. coli dataset was generated from theoretical normal distributions to compare the lumped and distributed sampling procedures. Initial dataset that represents historic sampling was created for the moist conditions zone, mid-range flow zone, and dry conditions zone related to a hypothetical stream. Three sets of ten random E. coli concentrations (units in CFU/100 ml) were created for the three flow zones by sampling from the assumed normal probability distribution functions N (820.4, 200.0), N (254.6, 50.0), and N (173.1, 50.0), for moist conditions zone, mid-range flow zone, and dry conditions zone, respectively. The allowable maximum number of new samples was set to a size of 50, assuming that the monitoring agency can support expenses for up to 50 new samples. Fig. 5 shows the actual improvement in the overall precision when the maximum samples size DN iþ1 for the next monitoring cycle in the lumped sampling procedure are varied from 5 to 50 samples. The precision of the 90th percentile of the new random samples in the new monitoring cycle and in each flow zone is based on the most optimal division (found by the optimization algorithm) of the different values of DNi+1. The trend in the precisions (dq,f,i) for the three flow zones indicates that precision improves with more monitored samples for lumped sampling procedure, as also observed in see ‘‘Adaptive sampling strategy” for the Uhland site. Fig. 5 also shows the improvement in precision
0.9 Moist Conditions -lumped Mid-range Flows -lumped Dry Conditions -lumped Moist Conditions -uniform Mid-range Flows -uniform Dry Conditions -uniform
0.8
Precision, dq,f,i
0.7 0.6 0.5 0.4 0.3 0.2 initial
5
10
15
20
25
30
35
40
45
50
Total Number of New Samples Monitored Fig. 3. Improvement in precision of 90th percentile of E. coli for different flow zones at a water quality station near Uhland, TX on Plum Creek, using the lumped procedure of adaptive sampling strategy and uniform sampling strategy.
122
M. Babbar-Sebens, R. Karthikeyan / Journal of Hydrology 372 (2009) 118–123
25 Moist Conditions -lumped Mid-range Flows -lumped Dry Conditions -lumped Moist Conditions/Mid-range/Dry Conditions Flow Zone -uniform
15 Flow Zone
Number of New Samples in Each
20
10
5
0 5
10
15
20
25
30
35
40
45
50
Total Number of New Samples Monitored Fig. 4. Number of new samples for each flow zone when lumped procedure of adaptive sampling strategy and uniform sampling are used at a water quality station near Uhland, TX on Plum Creek.
0.5 Moist Conditions - lumped
0.45
Mid-range Flows - lumped Dry Conditions - lumped
0.4
Moist Conditions - distributed
Precision, dq,f,i
Mid-range Flows - distributed
0.35
Dry Conditions - distributed
0.3 0.25 0.2 0.15 0.1 initial
5
10
15
20
25
30
35
40
45
50
Total Number of New Samples Monitored Fig. 5. Improvement in precision of 90th percentile for lumped and distributed adaptive sampling strategies, for a normally distributed E. coli test dataset. The ‘‘initial” on X axis is the initial precision of the samples across different flow zones, before any new samples are monitored/collected.
for the distributed sampling procedure, which used a division of total 50 samples into ten smaller populations of five samples each (i.e. DNi + 1 = 5 for each monitoring cycle). In the experiment results shown in Fig. 5, we assume that the budget and time constraints of the data collection agency allows them to collect samples for ten consecutive monitoring cycles and a convenient distribution of five total samples for each monitoring cycle is chosen here. This, however, need not be the case in a real-world situation. The proposed distributed sampling method is flexible to any chosen upper limit on total number of samples and total number of monitoring cycles. These smaller populations of five samples each were sampled from the normal distributions for each flow zone, in a consecutive manner based on the adaptive sampling strategy. In other words, values of 5, 10, 15, . . . , 50 new samples of lumped sampling procedure on X axis of Fig. 5 correspond to the new monitoring cycles (MC) i = 1, 2, 3, . . . , 10 (in Eqs. (6)–(10)) of the distributed sampling procedure. It can be seen that the
improvement in precision using the distributed sampling procedure is much better and faster than that attained via lumped sampling procedure, for the same sampling cost (or total number of new samples). For example, when the lumped sampling procedure uses 50 new samples for the value of DNi + 1 the optimized division of DNi + 1 within the three flow zones has worse precision than the precision attained using only 20 new samples (i.e. monitoring cycle 4) in the distributed sampling procedure. Also, at the end of the monitoring cycle 10 the number of total new samples used for sampling during moist conditions, mid-range flows, and dry conditions were 30, 20, and 30, respectively. If a lumped sampling procedure had been used to attain the same precision that was attained at the end of the monitoring cycle 10 for each of the flow zones, then the number of samples needed as per Eq. (3) would have been approximately 49, 43, and 62 samples for moist conditions, mid-range flows and dry conditions, respectively. Thus, by using the distributed sampling procedure sampling costs for a total
123
M. Babbar-Sebens, R. Karthikeyan / Journal of Hydrology 372 (2009) 118–123
5 Moist Conditions - distributed
Number of New Samples
Mid-range Flows - distributed
4
Dry Conditions - distributed
3
2
1
0 MC 1
MC 2
MC 3
MC 4 MC 5 MC 6 MC 7 Monitoring Cycle (MC) Number
MC 8
MC 9
MC 10
Fig. 6. Optimized sampling schemes for different flow zones within each monitoring cycle, for the distributed adaptive sampling strategy.
of 74 more new samples were avoided. This demonstrates that dividing maximum total allowable samples into smaller populations sampled over multiple monitoring cycles can achieve higher gains in precision with lowers costs. Fig. 6 shows the optimized divisions of new five samples within the three flow zones and for every monitoring cycle (MC#), based on the distributed sampling procedure of the adaptive sampling strategy. It can be seen that the optimization strategy allocates new samples for a particular flow zone only when the attained benefits in precision are maximum. For example, mid-range flows are not sampled during monitoring cycles MC1, MC9, and MC10. Conclusions In this work, we have analyzed the benefits of using sample size estimation equation and adaptive sampling strategy for improving the precision of estimated 90th percentiles and lowering sampling costs. Based on our research findings, it is recommended to use Eq. (3) and the distributed sampling procedure of the adaptive sampling strategy, when 90th percentiles are used for calculating load reductions. Sample size requirements when other methods such as nonlinear regression (instead of 90th percentiles) are used to estimate load reductions also need to be explored within the topic of load duration curves, in order to provide guidelines to data collection agencies. References Cleland, B.R., 2003. TMDL development from the ‘‘bottom up”. Part III: duration curves and wet-weather assessments. In: 2003 National TMDL Science and Policy Conference. Water Environment Federation, Chicago, IL.
Cochran, W.G., 1977. Sampling Techniques, third ed. Wiley, New York. Dielman, T., Lowry, C., Pfaffenberger, R., 1994. A comparison of quantile estimators. Communications in Statistics–Simulation and Computation 23, 355–371. Garsd, A., Ford, G.E., Waring III, G.O., Rosenblatt, L.S., 1983. Sample size for estimating the quantiles of endothelial cell-area distribution. Biometrics 39, 385–394. Lasdon, L.S., Fox, R.L., Ratner, M.W., 1974. Nonlinear optimization using the generalized reduced gradient method. RAIRO 3 (November), 73–104. Lasdon, L.S., Waren, A.D., 1978. Generalized reduced gradient software for linearly and nonlinearly constrained problems. In: Greenberg, H.J. (Ed.), Design and Implementation of Optimization Software. Sijthoff and Noordhoff, Holland, pp. 335–362. Lasdon, L.S., Warren, A.D., Jain, A., Ratner, M., 1978. Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Transactions on Mathematical Software 4, 34–50. Morrison, M.A., Bonta, J.V., 2008. Development of Duration-Curve Based Methods for Quantifying Variability and Change in Watershed Hydrology and Water Quality. EPA’s Office of Research and Development, EPA/600/R-08/065. Parrish, R.S., 1990. Comparison of quantile estimators in normal sampling. Biometrics 46, 247–257. Schittkowski, K., 1980. Nonlinear Programming Codes: Information, Tests, Performance, Lecture Notes in Economic arid Mathematical Systems, vol. 183. Springer-Verlag, New York. Sheather, S.J., Marron, J.S., 1990. Kernel quantile estimators. Journal of the American Statistical Associations 85, 410–416. Smirnov, N.V., 1935. Uber die Verteilung des Allegemeinen Gliedes in der Variationsreihe. Metron 12, 59–81. Smirnov, N.V., 1967. Some remarks on the limit laws for order statistics. Theory of Probability and its Applications 12, 337–339. US Environmental Protection Agency, 1991. Guidance for Water Quality-Based Decisions: The TMDL Process. Office of Water, EPA 440/4-91-001. US Environmental Protection Agency, 2007. An Approach for Using Load Duration Curves in the Development of TMDLs. Office of Wetlands, Oceans and Watersheds, EPA 841-B-07-006, Washington, DC. Vogel, R.M., Fennessey, N.M., 1994. Flow duration curves. I: New interpretation and confidence intervals. Journal of Water Resources Planning and Management 120 (4), 485–504. Wilcox, R.R., 1997. Introduction to Robust Estimation and Hypothesis Testing. Academic Press.