Analytical recovery of protozoan enumeration methods: Have drinking water QMRA models corrected or created bias?

Analytical recovery of protozoan enumeration methods: Have drinking water QMRA models corrected or created bias?

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8 Available online at www.sciencedirect.com journal homepage: www.elsevier.com/locate/watre...

529KB Sizes 0 Downloads 22 Views

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/watres

Analytical recovery of protozoan enumeration methods: Have drinking water QMRA models corrected or created bias? P.J. Schmidt a,*, M.B. Emelko b, M.E. Thompson a a

Department of Statistics and Actuarial Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1 Department of Civil and Environmental Engineering, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1 b

article info

abstract

Article history:

Quantitative microbial risk assessment (QMRA) is a tool to evaluate the potential impli-

Received 19 June 2012

cations of pathogens in a water supply or other media and is of increasing interest to

Received in revised form

regulators. In the case of potentially pathogenic protozoa (e.g. Cryptosporidium oocysts and

30 January 2013

Giardia cysts), it is well known that the methods used to enumerate (oo)cysts in samples of

Accepted 3 February 2013

water and other media can have low and highly variable analytical recovery. In these

Available online 18 February 2013

applications, QMRA has evolved from ignoring analytical recovery to addressing it in pointestimates of risk, and then to addressing variation of analytical recovery in Monte Carlo

Keywords:

risk assessments. Often, variation of analytical recovery is addressed in exposure assess-

Quantitative microbial risk assess-

ment by dividing concentration values that were obtained without consideration of

ment (QMRA)

analytical recovery by random beta-distributed recovery values. A simple mathematical

Cryptosporidium

proof is provided to demonstrate that this conventional approach to address non-constant

Hierarchical probabilistic models

analytical recovery in drinking water QMRA will lead to overestimation of mean pathogen

Measurement error

concentrations. The bias, which can exceed an order of magnitude, is greatest when low

Non-detect data

analytical recovery values are common. A simulated dataset is analyzed using a diverse set of approaches to obtain distributions representing temporal variation in the oocyst concentration, and mean annual risk is then computed from each concentration distribution using a simple risk model. This illustrative example demonstrates that the bias associated with mishandling non-constant analytical recovery and non-detect samples can cause drinking water systems to be erroneously classified as surpassing risk thresholds. ª 2013 Elsevier Ltd. All rights reserved.

1.

Introduction

Quantitative Microbial Risk Assessment (QMRA) is a tool that has evolved to provide quantitative information about the potential human health risks associated with pathogens in our environment (Haas et al., 1999). QMRA is used in the drinking water sector to estimate infection levels or human health impacts associated with the consumption of pathogens in water. The United States Environmental Protection Agency (US EPA) has used QMRA principles to develop regulations

such as the Long Term 2 Enhanced Surface Water Treatment Rule (US EPA, 2006), which specifies source water monitoring and treatment requirements for many public water systems to control the risks associated with Cryptosporidium in drinking water. The Dutch Drinking Water Act of 2001 (Anonymous, 2001) requires drinking water suppliers to conduct QMRA at least every three years for Cryptosporidium and Giardia (and other index pathogens) and to demonstrate compliance with a health based target of less than 1 infection per 10,000 persons per year (Schijven et al., 2011).

* Corresponding author. E-mail address: [email protected] (P.J. Schmidt). 0043-1354/$ e see front matter ª 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.watres.2013.02.001

2400

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

Monitoring treated drinking water for Cryptosporidium and Giardia is impractical because extremely large sample volumes would be required to detect pathogens in safe water (Regli et al., 1991; Rose et al., 1995; Haas et al., 1996). In addition, concerns have been raised that current methodology is expensive and inefficient, and yields unreliable data in treated water (Allen et al., 2000; Signor and Ashbolt, 2006). Accordingly, risk assessments are typically conducted using information about the concentration of pathogens in the raw water and the reduction (by physico-chemical removal or inactivation) that is achieved by the drinking water treatment system. As illustrated in Fig. 1, the following issues must be addressed to obtain appropriate estimates of the pathogen concentration in treated drinking water from raw water pathogen data (e.g. Regli et al., 1991; Teunis et al., 1997).  Analytical Recovery e concentration estimates must be corrected for the bias associated with losses in the methodology that is used to quantify microorganisms in the raw water.  Infectivity e the raw water concentration must be corrected for the bias associated with quantifying microorganisms that are not capable of causing infection.  Treatment Efficiency e the raw water concentration must be corrected for the level of removal or inactivation achieved by the treatment system. Fig. 1 distinguishes between several types of concentration. The term “‘observed’ concentration” is used throughout this paper to denote any target pathogen concentration value that is derived from enumeration data without consideration of the imperfect analytical recovery of the enumeration method. When analytical recovery e “the capacity of the analyst to successfully count each microorganism or particle of interest in a sample using a specific enumeration method” (Schmidt et al., 2010) e does not have a value of 100%, a difference between the number of pathogens observed in a sample and the number that were actually present is expected to occur. Therefore, concentration values derived from enumeration data without consideration of imperfect analytical recovery characterize the method-specific density of observed pathogens rather than the actual density of pathogens that is present in the environment. The other types of concentrations illustrated in Fig. 1 represent various actual densities of pathogens in the water and must be estimated from source water enumeration data with appropriate

Analytical Recovery

‘Observed’ concentration of enumerable pathogens in raw water

Concentration of enumerable pathogens in raw water

Infectivity

Concentration of infectious pathogens in raw water

Treatment Efficiency

Concentration of infectious pathogens in treated water

Fig. 1 e Relating pathogen occurrence data to treated water concentration.

consideration of analytical recovery, infectivity, and treatment efficiency. Approaches to evaluate actual concentrations given available enumeration data and information about the analytical recovery of the method by which the data were obtained are the focus of this work. Infectivity is an important consideration in the case of Cryptosporidium and Giardia because current methodology (US EPA, 2005) includes enumeration of non-viable (oo)cysts (e.g. Teunis et al., 1999) and non-pathogenic species/genotypes (e.g. Ruecker et al., 2007). Quantitative description of variation in treatment efficiency has been modeled in a variety of ways (e.g. Teunis et al., 1999; Smeets, 2011). Early in the development of QMRA, risk assessment was often conducted using point estimates of concentration (e.g. Rose and Gerba, 1991; Rose et al., 1991; Regli et al., 1991; Medema et al., 1995; Rose et al., 1995; Haas et al., 1996; Gale, 1998). In this context, the equation provided by Regli et al. (1991) to calculate treated water pathogen concentrations, which divides an ‘observed’ raw water concentration estimate (i.e. the count per unit volume analyzed) by analytical recovery, yields an unbiased estimate of the actual microorganism concentration if the analytical recovery of the method (or the mean recovery) is precisely known. Regli et al. (1991) proposed that the point estimates used in risk assessment could be replaced by distributions as more data became available. Teunis et al. (1997) introduced the use of Monte Carlo simulation to account for variability and/or uncertainty in the model parameters (all of which were assumed to be statistically independent) when evaluating the risk associated with Cryptosporidium and Giardia in drinking water. Diverse approaches have been developed to obtain a distribution that describes the variability in the raw water pathogen concentration with consideration of analytical recovery, and many of these involve inflation of either microorganism counts or ‘observed’ concentration values using random recovery values from a beta distribution describing the full variation in analytical recovery among samples. Teunis et al. (1997) fit a negative binomial distribution to the observed counts per 1000 L and subsequently raised random count values to corresponding ‘actual’ counts per 1000 L (which were used as concentrations for risk computation) by inverting the Bernoulli trial process using an independent beta-distributed analytical recovery value. In the first case study described by Medema et al. (2003), a bootstrapping approach was used to select random ‘observed’ concentration estimates and independent empirical recovery values for an unspecified adjustment of concentration values, and the mean adjusted concentration value was used as a pointestimate to compute risk. Rather than fitting a distribution to the raw counts prior to the adjustment for analytical recovery in the manner of Teunis et al. (1997), Pouillot et al. (2004) raised each observed count using independent betadistributed analytical recovery values and a negative binomial model. They subsequently fit a concentration distribution to the raised counts for use in the risk computations. The second of three approaches considered by Petterson et al. (2007) involved fitting a concentration distribution to the enumeration data followed by a Monte Carlo process in which simulated ‘observed’ concentration values were divided by independent beta-distributed analytical recovery values in

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

order to assess the variation in actual concentrations. They found that ‘observed’ (oo)cyst concentration estimates and associated internal seed recovery estimates were uncorrelated in their data set, but also concluded that “a relationship between counts and recovery should be intuitively expected”. Schijven et al. (2011) fit a distribution to the enumeration data to describe variation in the ‘observed’ concentration, and then used a Monte Carlo process for risk computation in which random ‘observed’ concentration values were divided by independent beta-distributed analytical recovery values. Although these approaches differ in specific details, the common theme is adjustment of counts or ‘observed’ concentrations using independent analytical recovery values. In contrast, Jaidi et al. (2009) took distributions describing ‘observed’ concentrations and analytical recovery and used a Monte Carlo process to divide ‘observed’ concentration values by correlated recovery values. An alternative class of approaches to analyze source water microorganism occurrence data incorporates analytical recovery into hierarchical models that describe random measurement errors (e.g. Nahrstedt and Gimbel, 1996; Teunis and Havelaar, 1999; the third approach of Petterson et al., 2007; Emelko et al., 2010; Schmidt and Emelko, 2011). In these models, it is acknowledged that each obtained microorganism count depends upon both the corresponding actual concentration of enumerable pathogens in the source from which the sample was collected and the particular value of analytical recovery that was achieved in enumerating the sample. Thus analytical recovery is not regarded as a random correction factor for microorganism counts or ‘observed’ concentration estimates, but as a latent (i.e. unmeasured) variable that affected the number of microorganisms that were observed in each sample. When such hierarchical models are used to assess the concentration (or the parameters of a concentration distribution) from which the available count data could have been obtained, all possible values of analytical recovery (e.g. a beta distribution describing variation in analytical recovery among samples) are implicitly considered. Schmidt and Emelko (2011) developed a hierarchical model to account for random measurement errors and various forms of analytical recovery information in temporally distributed protozoan enumeration data and demonstrated that “microorganism counts and analytical recovery are not independent (as has often been assumed), even if the correlation is obscured by other sources of variability in the data”. The model was used to simulate a set of enumeration data that was subsequently used to compare several data analysis approaches. It was demonstrated that division of high ‘observed’ concentration values by near-zero analytical recovery values in approaches that inappropriately regard analytical recovery as an independent random variable yields unrealistically high concentration values; however, the resulting bias and its effect upon computed risks were not investigated. Accordingly, the objectives of this paper are (1) to evaluate the bias in mean concentration associated with dividing ‘observed’ concentration values (which are derived from enumeration data without consideration of analytical recovery) by independent, random betadistributed recovery values, and (2) to evaluate the potential

2401

for this bias to propagate to mean risk values. The data set simulated in Schmidt and Emelko (2011) is analyzed using a wider variety of data analysis approaches to illustrate various forms of bias, and a relatively simple risk model is used to isolate the effects upon risk of bias in the analysis of pathogen concentrations.

2.

Probabilistic modeling

A multitude of modeling approaches have been proposed to fit distributions to temporally distributed protozoan enumeration data, with or without consideration of analytical recovery and other aspects of random measurement error. Rather than critically reviewing specific modeling approaches that have been reported in the literature, a more general approach is taken herein that 1) proves that assuming statistical independence between protozoan counts and analytical recovery creates bias, 2) compares a diversity of alternative approaches to describe variation in raw water pathogen concentrations, and 3) demonstrates the impacts of various modeling approaches upon computed risks using a simple QMRA model.

2.1. The bias associated with ignoring or mishandling analytical recovery The potential bias conferred upon concentration estimates by assuming that protozoan counts and analytical recovery are statistically independent is evaluated using a simple probabilistic model. Three assumptions, as shown in Equation (1), are made herein: 1) the number of (oo)cysts contained in a particular analyzed volume of water (n) is Poissondistributed with mean equal to the product of the source water (oo)cyst concentration (c) and the volume (V), 2) the number of (oo)cysts in an analyzed sample volume that are observed by the analyst (x) is binomially distributed, conditional on the actual number of (oo)cysts in the analyzed sample volume and the analytical recovery ( p) of the enumeration method, and 3) the analytical recovery varies randomly among samples analyzed using a particular method according to a beta distribution with parameters a, b. These three types of random measurement error have been discussed in detail and respectively described as random sampling error, random analytical error, and non-constant analytical recovery by Emelko et al. (2010). The average number of microorganisms that would be observed in repeated samples given the microorganism concentration in the source, enumerated sample volume, and analytical recovery distribution parameters can then be worked out using mathematical expectation as shown in Equation (2). This derivation assumes that n and p are statistically independent (i.e. that the analytical recovery is independent of the number of (oo)cysts present in a sample), which is much more reasonable than assuming that the (oo)cyst count is independent of the analytical recovery of the method by which the count was obtained. nwPoissonðcVÞ xwbinomialðn; pÞ pwbetaða; bÞ

(1)

2402

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

Ex ½xjn ¼ Ep ½Ex ½xjn; p ¼ Ep ½np ¼ nmp h i Ex ½xjc ¼ En nmp ¼ En ½nmp ¼ cVmp

(2)

From Equation (2), it can be shown that Ex[(x/V)jc] ¼ cmp. Therefore, (oo)cyst counts per unit volume are biased estimates of concentration if the mean analytical recovery of the enumeration method is not 100%. Dividing the count per unit volume by the mean analytical recovery will correct this bias, but does not account for the effects of non-constant analytical recovery, random sampling error, or random analytical error upon the count data (i.e. the estimate is unbiased, but still imprecise). Each of these sources of random measurement error must be addressed in the fitting of a distribution to temporally distributed enumeration data (e.g. Schmidt and Emelko, 2011), or the fitted distribution will be a biased representation of the variation in actual (oo)cyst concentrations. It is relatively common for concentration distributions to be fitted to (oo)cyst enumeration data without consideration of analytical recovery. In such cases, each microorganism count is biased by an unknown analytical recovery value ( p) that is presumed to be a random value from a particular recovery distribution with mean E[p]. These concentration distributions describe the variation in the ‘observed’ concentration of (oo)cysts rather than variation in the actual (oo)cyst concentration, and it is common to correct this issue by dividing random ‘observed’ concentration values by independent, random analytical recovery values ( p*) that are presumed to come from the same recovery distribution. The recovery distribution is often assumed to be beta-distributed (with parameters a, b), and can be estimated from analytical recovery experiments (Schmidt et al., 2010). In this case, the mean analytical recovery is E[p] ¼ a/(a þ b) and the mean of the inverse recoveries is E[1/p*] ¼ (a þ b  1)/(a  1) for a > 1 (and is infinite for a  1). The total multiplicative bias on an individual concentration estimate when beta-distributed analytical recovery is handled in this way can be computed using Equation (3). Fitting an ‘observed’ concentration distribution to enumeration data without consideration of analytical recovery and subsequently dividing random ‘observed’ concentration values by independent, random recovery values would presumably yield a similar bias in the mean concentration. Table 1 displays the magnitude of this bias in the mean concentration for various combinations of the mean and standard deviation

of analytical recovery. For any fixed value of mean analytical recovery, the bias in the mean concentration increases as the standard deviation of analytical recovery increases, and the bias becomes infinite (which propagates to an infinite variance of doses in risk assessment) once the shape parameter a falls below a value of 1. For any fixed value of the standard deviation of analytical recovery, the bias increases as the mean analytical recovery approaches zero. For larger values of standard deviation, the bias also increases when the mean approaches 100% (because the value of parameter a becomes small). 

E½p$E½1=p  ¼

8 <

a aþb1 $ a>1 aþb a1 : N a1

(3)

This bias is largely driven by the reciprocals of near-zero analytical recovery values, which become increasingly prevalent as a / 0. It is particularly problematic in the case of (oo)cyst enumeration methods such as US EPA Method 1623 (US EPA, 2005) in which the analytical recovery is highly variable and often relatively low. The emphasis of this work is upon bias in the mean concentration (and mean risk), but it is important to also consider the effect upon variance. Var[1/p*] ¼ (a þ b  2)/(a  2) for a > 2 (and is infinite for a  2), so the distribution of concentration values obtained by dividing concentration estimates by random beta-distributed analytical recovery values when a  2 will have an infinite variance (which propagates to an infinite variance in mean doses). The effect of this source of bias upon the mean probability of infection (i.e. mean risk) or other risk-based decision metrics is difficult to assess if the doseeresponse model is non-linear.

2.2. Alternative approaches to model variability in (oo) cyst concentrations Seven alternative data analysis approaches (Table 2) are implemented herein to evaluate mean probabilities of infection estimated from a set of simulated Cryptosporidium oocyst enumeration data ({xi}, i ¼ 1,2,.,m). In all cases, the temporal variation in oocyst concentration is modeled using a gamma distribution with shape parameter r and scale parameter l. In the first approach (1A), the gamma distribution is fit directly to the counts per unit volume (b c i ¼ xi =Vi ) using maximum

Table 1 e log10(bias) when concentrations are divided by random beta-distributed recovery values. Mean analytical recovery

Std. deviation

0.05 0.10 0.15 0.20 0.25 0.30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1335 N N N N N

0.0290 0.1461 0.6628 N N N

0.0125 0.0569 0.1663 0.5497 N N

0.0070 0.0307 0.0825 0.2041 0.7333 N

0.0045 0.0193 0.0505 0.1165 0.3010 N

0.0031 0.0134 0.0348 0.0792 0.1954 N

0.0023 0.0099 0.0262 0.0614 0.1644 N

0.0018 0.0078 0.0218 0.0580 0.2568 N

0.0014 0.0069 0.0248 0.2553 N N

This table shows the log10(bias) in the mean (oo)cyst concentration when counts per unit volume are divided by random beta-distributed analytical recovery values. The mean and standard deviation values for analytical recovery are expressed as decimals rather than percentages. The combinations of the mean and standard deviation of analytical recovery that are included in this table do not include examples where the bias exceeds an order of magnitude (except for scenarios where the bias is infinitely large).

2403

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

Table 2 e Summary of alternative data analysis approaches. Analysis approach

Fitting approach

Sampling error

1A

MLE fit to raw concentration estimates MLE fit to observed counts MLE fit to adjusted concentration estimates MLE fit to observed counts

Ignored

Yes

Ignored

No

b c i wgammaðr; lÞ b c i ¼ maxðxi ; 0:5Þ=Vi

Addressed

No

Ignored

No

Ignored

Yes

Independent random variable

No

Addressed

No

Independent random variable

No

ci w gamma(r,l) xi w Poisson(ciVi) b c i w gammaðr; lÞ b c i ¼ maxðxi ; 0:5Þ=Vi Divide conc. by p* w beta(a,b) ci w gamma(r,l) xi w Poisson(ciVi) Divide conc. by p* w beta(a,b)

Addressed

No

No

Addressed

No

Addressed

No

Included as measurement error Included as measurement error Included as measurement error

2A 1B

2B

3A 3B 3C

MLE fit to observed counts Bayesian fit to observed counts Bayesian fit to observed counts

Manipulation of zeros

likelihood estimation and the likelihood function given by Equation (4). Because the gamma distribution cannot include values of zero, non-detects were converted to ‘counts’ of 0.5 (which is consistent with the common practice of manipulating zeros into some function of the method detection limit). The sampling error (randomness in the number of oocysts captured in a sample of particular volume from a homogeneous source) and the analytical recovery of the enumeration method are not considered in this approach. In the second approach (2A), the gamma distribution is fit to the counts using maximum likelihood estimation and the likelihood function given by Equation (5). In this approach, random sampling error is addressed using a Poisson distribution and there is no special treatment of zeros, but analytical recovery is still not addressed. Approaches 1B and 2B are identical to approaches 1A and 2A except that analytical recovery is incorporated as an independent random variable ( p*) in the risk characterization step: each random gamma-distributed ‘observed’ oocyst concentration value is divided by a random beta-distributed recovery value. These four approaches are considered for illustrative purposes only because it is not mathematically sound to ignore imperfect analytical recovery or to regard it as an independent random variable. Lðr; ljfb c i gÞ ¼

m Y i¼1

Lðr; ljfxi gÞ ¼

1 r1 b c ebc i =l lr GðrÞ i

xi  r  m Y Gðxi þ rÞ lVi 1 xi !GðrÞ 1 þ lVi 1 þ lVi i¼1

(4)

(5)

A hierarchical probabilistic model (shown in Table 2) is used for approaches 3A, 3B, and 3C. It gives full consideration to random sampling error, random analytical error, and non-constant analytical recovery as per Schmidt and Emelko (2011). In approach 3A, the gamma distribution is fit to the counts using maximum likelihood estimation and the likelihood function given by Equation (6). Rather than

Analytical recovery

Parameter uncertainty

Yes (combined with variability) Yes (separated from variability)

Model description

ci w gamma(r,l) ni w Poisson(ciVi) xi w binomial(ni,pi) pi w beta(a,b)

using maximum likelihood estimation to obtain point estimates of the gamma distribution’s parameters, Bayes’ theorem and Gibbs sampling are used in approaches 3B and 3C to evaluate a posterior distribution that describes the uncertainty in these estimated parameters. Lðr; ljfxi gÞ ¼

" r  m Y Gðxi þ aÞGða þ bÞ 1 xi !GðrÞGðaÞGðbÞ 1 þ lVi i¼1 

ni #  N X Gðni þ rÞGðni  xi þ bÞ lVi ðni  xi Þ!Gðni þ a þ bÞ 1 þ lVi n ¼x i

(6)

i

Gibbs sampling was carried out as per Schmidt and Emelko (2011) with uniform priors on the interval [106, 106] for the shape parameter of the gamma distribution (r) and the reciprocal of the scale parameter (s ¼ 1/l).1 A total of ten million Gibbs sampling iterations, following a 1000-iteration burn-in, were used in these analyses. Convergence of the Gibbs sampling output upon the posterior distribution was evaluated separately by visually comparing results of three 31,000iteration Gibbs sampling runs that used highly variable initial values of r and l. In approach 3B, each of the ten million pairs of r and l is used to generate a single gamma-distributed concentration value. These ten million concentration values are collectively representative of the posterior predictive concentration distribution, which describes the variability in the oocyst concentration given the uncertainty in the parameters of the concentration distribution (Schmidt and Emelko, 2011). This uncertainty is considered separately from concentration variability in approach 3C by using two hundred different concentration distributions that are assumed to be collectively representative of the uncertainty in the temporal concentration variability distribution. These distributions were obtained

1 The approach described in Schmidt and Emelko (2011) to sample from the conditional posterior of l (with uniform prior on l) is incorrect. That approach, which is used in this work, is correctly described as sampling from the conditional posterior of s ¼ 1/l with a uniform prior on s.

2404

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

using the values of r and l from every 50,000th of the same Gibbs sampling iterations that were used in approach 3B. The enumeration data used in this work were simulated using the same probabilistic model as approaches 3A, 3B, and 3C with m ¼ 24 samples, V ¼ 100 L, r ¼ 0.22, l ¼ 0.36 oocysts/L, a ¼ 2, and b ¼ 3 (Schmidt and Emelko, 2011). Simulated data are used for this work because the results can be compared against the ‘actual model’ from which the data were simulated to better understand various sources of bias. The parameters of the analytical recovery distribution (a, b) are assumed to be precisely known in all of these analyses, but these values would need to be estimated in a practical scenario. Maximum likelihood estimation was carried out in Microsoft Excel using a greedy optimization algorithm, and numerical integration of the Equation (6) likelihood function was completed using the Visual Basic Editor. All concentration cumulative density functions were evaluated empirically using ten million simulated concentration values.

2.3.

Risk characterization

A simple risk model was used to compare the seven data analysis approaches in the context of mean annual risk associated with consumption of drinking water. In this model, it is assumed that all oocysts are infectious. A constant log-reduction level of 4.0, which is based on bin 2 of the Long Term 2 Enhanced Surface Water Treatment Rule (US EPA, 2006), is assumed. Daily consumption of 1.45 L (based on Pintar et al., 2009) is also assumed. Finally, an exponential doseeresponse model with parameter r ¼ 0.005259 (based on Dupont et al., 1995) is used. Uncertainty in this doseeresponse model parameter (and other forms of variability and uncertainty associated with treatment and consumption) were not addressed to maintain simplicity of the model and to ensure that the uncertainty illustrated in approach 3C is uniquely attributable to uncertainty in the parameters of the concentration distribution. In all data analysis approaches, mean daily risk Pd is calculated using the ten million concentration values used to evaluate the respective concentration cumulative density functions. Relative to Monte Carlo simulation, numerical integration (as described in Schmidt et al., in press) would enable much more efficient computation of mean risks and more rigorous assessment of the uncertainty in mean risk; however, numerical integration is not used due to the complexity of approaches 1B and 2B. Mean annual risk (Pa) is calculated from the mean daily risk (Pd) using the function Pa ¼ 1  (1  Pd)365, which assumes that all daily risks are independent with constant mean.

3.

Results and discussion

In the preceding section, it was proven that state-of-the-art Monte Carlo QMRA procedures that fit distributions to protozoan counts or ‘observed’ concentration estimates without consideration of analytical recovery and subsequently divide by independent beta-distributed recovery values are biased. This is especially true when the mean analytical recovery is low and the standard deviation of analytical recovery is high, as is typical in methods such as US EPA Method 1623 (US EPA, 2005) with which imperfect recovery is an important concern.

The bias in the mean pathogen concentration that arises from mishandling analytical recovery can easily approach or exceed a full order of magnitude (Table 1). At a minimum, the multiplicative bias associated with using this approach and specific beta distribution parameter values (which can be evaluated using Equation (3)) should be demonstrated to be close to a value of one before using these conventional Monte Carlo QMRA procedures. Adoption of modeling procedures that appropriately address the random measurement error in all microorganism data (including non-detects) would avert the need to consider the bias resulting from mathematically flawed modeling approaches. The implications of bias due to mishandling of analytical recovery, non-detect samples, and measurement error upon calculated risk values were evaluated using seven alternative data analysis approaches. In each approach, a gamma distribution (or set of gamma distributions) that is assumed to describe the temporal variation in the oocyst concentration was fit to the simulated dataset (as described in Section 2.2 and Table 2), the mean and standard deviation of concentration were calculated (following adjustment using independent, random analytical recovery values in the case of approaches 1B and 2B), and the corresponding mean annual risk (or set of mean annual risk values) was calculated. These results are summarized in Table 3. The results for approaches 1A, 1B, 2A, and 2B are quite different from the actual model because imperfect analytical recovery is ignored (resulting in underestimation of the mean and standard deviation of concentration and mean risk), or because beta-distributed variation in analytical recovery among samples is incorporated into the data analysis inappropriately (resulting in over-estimation of the mean and standard deviation of concentration and mean risk). Approaches 3A and 3B give much more appropriate results (except that approach 3B over-estimates the standard deviation of concentration because parameter uncertainty is treated as a form of excess variability). The results for approach 3C demonstrate the uncertainty associated with parameter estimation and how this propagates to uncertainty in mean risk. The concentration cumulative density functions, from the ‘actual model’ that was used to simulate the data and from the seven data analysis approaches, are shown in Fig. 2a. In the case of approach 3C, two hundred cumulative density functions illustrate the uncertainty in the fitted gamma distribution. It is clear from this plot that approach 1A is biased high at low concentrations due to the manipulation of non-detects and that approaches 1A and 2A are often biased low because analytical recovery is ignored (which has historically been addressed by using approaches similar to 1B and 2B). Approaches 3A and 3B yield curves that lie relatively close to the ‘actual model’ from which the data were simulated because these approaches address non-detects, analytical recovery, and random measurement error appropriately. Fig. 2b is a close-up of the upper portions of these curves. From this figure, it is clear that a surplus of unrealistically high concentration values is obtained when ‘observed’ concentration values are divided by random beta-distributed analytical recovery values in approaches 1B and 2B. This figure also suggests that use of the posterior predictive distribution of concentration (approach 3B) is problematic: a surplus of high and low

2405

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

Table 3 e Results of analysis of simulated data using seven alternative approaches.

Actual model Approach 1A Approach 1B Approach 2A Approach 2B Approach 3A Approach 3B Approach 3C

r

la

Mean conc.a

Conc. std. dev.a

0.2200 0.5334 0.5334 0.1260 0.1260 0.1373 N/A N/A

36.00 6.25 6.25 23.82 23.82 54.90 N/A N/A

7.92 3.33 13.34 3.00 12.00 7.54 7.02 1.22, 33.29

16.89 4.57 N 8.45 N 20.35 26.42 3.44, 103.24

Mean annual risk 2.21  9.28  3.71  8.35  3.33  2.10  1.95  3.39 

105 106 105 106 105 105 105 106, 9.26  105

This table shows the shape and scale parameters (r,l) for the gamma distribution that was used to simulate the data and for the gamma distribution fit using each data analysis approach (except where numerous parameter values were generated by Gibbs sampling in approaches 3B and 3C). The mean and standard deviation of each concentration distribution (after dividing ‘observed’ concentrations by independent betadistributed analytical recovery values in approaches 1B and 2B) and the corresponding computed mean annual risk are also shown. Infinite standard deviations of concentration occur in approaches 1B and 2B because Var[1/p*] ¼ N when p*wbeta(2,3). The minimum and maximum value from 200 s second-order iterations is shown for approach 3C. a Units of oocysts/100 L.

Fig. 2 e This figure illustrates the concentration cumulative density functions of the ‘actual model’ from which the dataset was simulated and the seven alternative data analysis approaches summarized in Table 2. The cumulative density functions are illustrated in logarithmic scale (a) and the highest concentrations are illustrated in absolute scale (b).

2406

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

3A

2A 1A

1.00E-06

3B

2B 1B

1.00E-05

1.00E-04

1.00E-03

Mean Annual Risk Actual Risk

1A

1B

2A

2B

3A

3B

3C

Fig. 3 e This figure displays the mean annual risk associated with daily consumption of 1.45 L of drinking water following a 4.0-log treatment of the raw water. The ‘actual risk’ calculated using the ‘actual model’ from which the dataset was simulated and the risks calculated using each of the seven alternative data analysis approaches are illustrated. Rather than yielding only a point estimate of mean risk or a distribution describing variability (and possibly also uncertainty) in mean risk, approach 3C quantifies the uncertainty in mean risk.

concentration values is obtained because uncertainty in the concentration distribution parameters inappropriately contributes to excess variability in the concentrations. For this reason, parameter uncertainty should be addressed in risk assessment using second-order procedures such as approach 3C, in which variability and parameter uncertainty are addressed separately (Nauta, 2000). Fig. 3 illustrates the mean annual risk values obtained using the temporal concentration variability distribution from which the data were simulated (‘actual risk’) and the sets of simulated concentration values from the seven data analysis approaches. Approaches 1A and 2A are biased low because the imperfect analytical recovery of the enumeration method (with a mean of 40%, or approximately 0.4 log10-units) is ignored. Approaches 1B and 2B, on the other hand, are biased high due to the mishandling of analytical recovery when ‘observed’ concentration values are divided by random beta-distributed recovery values. The log10(bias) in the mean concentration associated with regarding analytical recovery as an independent random variable is expected to have a value of 0.2041, and the bias in mean annual risk is found to be approximately 0.2 orders of magnitude. In this example, the bias is propagated to mean annual risk because single-hit doseeresponse models such as the exponential doseeresponse model are approximately linear at low mean doses (Regli et al., 1991; Haas, 2002) and low mean doses are common in appropriately treated drinking water. Approach 3A yields a mean risk value that is close to the ‘actual risk’ because the model addresses non-detects, analytical recovery and random measurement error appropriately. Approach 3B yields a slightly lower mean risk than approach 3A because it uses the same probabilistic model but inappropriately regards parameter uncertainty as a form of variability in the concentrations (and excess variability will lead to lower mean risk in single-hit models). Approach 3C yields 200 random values from a posterior distribution that represents the uncertainty in mean annual risk because uncertainty in the temporal concentration variability distribution parameters is separated from concentration variability using a second-order risk assessment technique. The overall effect of concentration variability upon risk is integrated out

within each of 200 iterations that only differ due to uncertainty in the temporal concentration variability distribution parameters. In contrast, most conventional QMRA approaches yield a single value of mean risk and a distribution that represents variability alone or a combination of variability and uncertainty. A posterior distribution for mean annual risk can be used to determine the posterior probability that the mean risk is above a regulatory threshold or to determine the posterior probability that the risk from one scenario is greater than that of another (e.g. when comparing datasets or water supplies). For example, all 200 approach 3C mean annual risk values illustrated in Fig. 3 fall below a hypothetical threshold of 104 infections per person per year, and this would constitute strong evidence that the risk is acceptably low. Such posterior probabilities are the Bayesian analog of pvalues in hypothesis testing (Schmidt et al., in press). This is a particularly valuable feature of second-order risk analysis approaches because statistical significance (or the Bayesian analog thereof) has not historically been addressed when drinking water risk values are compared to thresholds or to each other. The simple mathematical proof in Section 2.1 and the case study results presented above demonstrate that many conventional procedures to address analytical recovery in Monte Carlo risk assessments over-correct the bias associated with ignoring recovery and are biased high. Therefore, many published QMRA approaches that have sought to correct the bias associated with ignoring imperfect analytical recovery have created bias by handling recovery inappropriately. It is plausible that the risks calculated for some drinking water systems may be erroneously classified as unacceptably high due to this mathematical flaw in conventional risk calculation procedures. While over-estimation of risk may be considered to be conservative, the magnitude of the bias will vary among scenarios and will not provide a fair safety factor. Accordingly, appropriate approaches to address analytical recovery, and all random measurement errors in microbial enumeration data, should be adopted when evaluating the risks associated with protozoan pathogens in drinking water or other media.

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

4.

Conclusions

 When microbial concentration estimates or values are biased low due to unaddressed imperfect analytical recovery, dividing them by independent, random beta-distributed recovery values will lead to bias because microbial data and the analytical recovery of the method by which the data are obtained are not statistically independent. This type of bias in many contemporary Monte Carlo QMRA approaches can exceed an order of magnitude and may lead to drinking water systems being erroneously classified as surpassing a particular risk threshold.  Hierarchical probabilistic models that appropriately address non-constant analytical recovery and other random measurement errors associated with pathogen enumeration (and thereby are not subject to the bias addressed in this paper) are a convenient means to address the dependence of observed microorganism counts upon actual concentrations and analytical recovery, and should therefore be adopted to better characterize the variability in pathogen concentrations (and associated parameter uncertainty) and to provide more accurate risk values to decision makers.

Acknowledgments We thank the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canadian Water Network for funding of this research.

references

Allen, M.J., Clancy, J.L., Rice, E.W., 2000. The plain, hard truth about pathogen monitoring. J. AWWA 92 (9), 64e76. Anonymous, 2001. Besluit van 9 januari 2001 tot wijziging van het waterleidingbesluit in verband met de richtlijn betreffende de kwaliteit van voor menselijke consumptie bestemd water. Staatsblad van het Koninkrijk der Nederlanden 31, 1e53. Dupont, H.L., Chappell, C.L., Sterling, C.R., Okhuysen, P.C., Rose, J.B., Jakubowski, W., 1995. The infectivity of Cryptosporidium parvum in healthy volunteers. New Engl. J. Med. 332 (13), 855e859. Emelko, M.B., Schmidt, P.J., Reilly, P.M., 2010. Particle and microorganism enumeration data: enabling quantitative rigor and judicious interpretation. Environ. Sci. Technol. 44 (5), 1720e1727. Gale, P., 1998. Simulating Cryptosporidium exposures in drinking water during an outbreak. Water Sci. Technol. 38 (12), 7e13. Haas, C.N., Crockett, C.S., Rose, J.B., Gerba, C.P., Fazil, A.M., 1996. Assessing the risk posed by oocysts in drinking water. J. AWWA 88 (9), 131e136. Haas, C.N., Rose, J.B., Gerba, C.P., 1999. Quantitative Microbial Risk Assessment. John Wiley and Sons, New York. Haas, C.N., 2002. Conditional dose-response relationships for microorganisms: development and application. Risk Analysis 22 (3), 455e463. Jaidi, K., Barbeau, B., Carrie`re, A., Desjardins, R., Pre´vost, M., 2009. Including operational data in QMRA model: development and impact of model inputs. J. Water Health 7 (1), 77e95. Medema, G.J., Teunis, P.F.M., Gornik, V., Havelaar, A.H., Exner, M., 1995. Estimation of the Cryptosporidium infection risk via

2407

drinking water. In: Betts, W.B., Casemore, D., Fricker, C., Smith, H., Watkins, J. (Eds.), Protozoa Parasites and Water. The Royal Society of Chemistry, pp. 53e56. Medema, G.J., Hoogenboezem, W., van der Veer, A.J., Ketelaars, H.A.M., Hijnen, W.A.M., Nobel, P.J., 2003. Quantitative risk assessment of Cryptosporidium in surface water treatment. Water Sci. Technol. 47 (3), 241e247. Nahrstedt, A., Gimbel, R., 1996. A statistical method for determining the reliability of the analytical results in the detection of Cryptosporidium and Giardia in water. J. Water Supply Res. Technol. e Aqua. 45 (3), 101e111. Nauta, M.J., 2000. Separation of uncertainty and variability in quantitative microbial risk assessment. Int. J. Food Microbiol. 57 (1e2), 9e18. Petterson, S.R., Signor, R.S., Ashbolt, N.J., 2007. Incorporating method recovery uncertainties in stochastic estimates of raw water protozoan concentrations for QMRA. J. Water Health 5 (S1), 51e65. Pintar, K.D.M., Waltner-Toews, D., Charron, D., Pollari, F., Fazil, A., McEwen, S.A., Nesbitt, A., Majowicz, S., 2009. Water consumption habits of a south-western Ontario community. J. Water Health 7 (2), 276e292. Pouillot, R., Beaudeau, P., Denis, J.B., Derouin, F., 2004. A quantitative risk assessment of waterborne cryptosporidiosis in France using second-order Monte Carlo simulation. Risk Anal. 24 (1), 1e17. Regli, S., Rose, J.B., Haas, C.N., Gerba, C.P., 1991. Modeling the risk from Giardia and viruses in drinking water. J. AWWA 83 (11), 76e84. Rose, J.B., Gerba, C.P., 1991. Use of risk assessment for development of microbial standards. Water Sci. Technol. 24 (2), 29e34. Rose, J.B., Haas, C.N., Regli, S., 1991. Risk assessment and control of waterborne giardiasis. Am. J. Public Health 81 (6), 709e713. Rose, J.B., Lisle, J.T., Haas, C.N., 1995. Risk assessment methods for Cryptosporidium and Giardia in contaminated water. In: Betts, W.B., Casemore, D., Fricker, C., Smith, H., Watkins, J. (Eds.), Protozoa Parasites and Water. The Royal Society of Chemistry, pp. 238e242. Ruecker, N.J., Braithwaite, S.L., Topp, E., Edge, T., Lapen, D.R., Wilkes, G., Robertson, W., Medeiros, D., Sensen, C.W., Neumann, N.F., 2007. Tracking host sources of Cryptosporidium spp. in raw water for improved health risk assessment. Appl. Environ. Microbiol. 73 (12), 3945e3957. Schijven, J.F., Teunis, P.F.M., Rutjes, S.A., Bouwknegt, M., de Roda Husman, A.M., 2011. QMRAspot: a tool for Quantitative Microbial Risk Assessment from surface water to potable water. Water Res. 45 (17), 5564e5576. Schmidt, P.J., Emelko, M.B., Reilly, P.M., 2010. Quantification of analytical recovery in particle and microorganism enumeration methods. Environ. Sci. Technol. 44 (5), 1705e1712. Schmidt, P.J., Emelko, M.B., 2011. QMRA and decision-making: are we handling measurement errors associated with pathogen concentration data correctly? Water Res. 45 (2), 427e438. Schmidt, P.J., Pintar, K.D.M., Fazil, A.M., Flemming, C.A., Lanthier, M., Laprade, N., Sunohara, M., Simhon, A., Thomas, J.L., Topp, E., Wilkes, G., Lapen, D.R., Using Campylobacter spp. data and Bayesian microbial risk assessment to examine public health risks in agricultural watersheds under tile drainage management. Water Res., in press. Signor, R.S., Ashbolt, N.J., 2006. Pathogen monitoring offers questionable protection against drinking-water risks: a QMRA (Quantitative Microbial Risk Analysis) approach to assess management strategies. Water Sci. Technol. 54 (3), 261e268. Smeets, P.W.M.H., 2011. Stochastic Modelling of Drinking Water Treatment in Quantitative Microbial Risk Assessment. IWA Publishing, London.

2408

w a t e r r e s e a r c h 4 7 ( 2 0 1 3 ) 2 3 9 9 e2 4 0 8

Teunis, P.F.M., Medema, G.J., Kruidenier, L., Havelaar, A.H., 1997. Assessment of the risk of infection by Cryptosporidium or Giardia in drinking water from a surface water source. Water Res. 31 (6), 1333e1346. Teunis, P.F.M., Evers, E.G., Slob, W., 1999. Analysis of variable fractions resulting from microbial counts. Quant. Microbiol. 1 (1), 63e88. Teunis, P.F.M., Havelaar, A.H., 1999. Cryptosporidium in Drinking Water: evaluation of the ILSI/RSI Quantitative Risk

Assessment Framework. RIVM, Bilthoven, The Netherlands. RIVM report no. 284 550 006. US EPA, 2005. Method 1623: Cryptosporidium and Giardia in Water by Filtration/IMS/FA; EPA 815-R-05-002. U.S. Environmental Protection Agency, Office of Water, Washington, DC. US EPA, 2006. Long term 2 enhanced surface water treatment rule; final rule. Fed. Regist. 71 (3), 654e786.