Percentile estimation using variable censored data

Chemosphere 68 (2007) 169–180 www.elsevier.com/locate/chemosphere Percentile estimation using variable censored data Samuel P. Caudill a,*, Lee-Yang ...

Download PDF

164KB Sizes 0 Downloads 118 Views

Report

PDF Reader
Full Text

Chemosphere 68 (2007) 169–180 www.elsevier.com/locate/chemosphere

Percentile estimation using variable censored data Samuel P. Caudill a,*, Lee-Yang Wong b, Wayman E. Turner a, Robin Lee b, Alden Henderson b, Donald G. Patterson Jr. a a

b

Division of Laboratory Sciences, National Center for Environmental Health, Centers for Disease Control and Prevention, 4770 Buford Highway NE, Atlanta, GA 30341, United States Division of Health Studies, Agency for Toxic Substances and Disease Registry, 1600 Clifton Road, Atlanta, GA 30333, United States Received 23 March 2006; received in revised form 30 October 2006; accepted 6 December 2006 Available online 30 January 2007

Abstract Much progress has been made in recent years to address the estimation of summary statistics, using data that are subject to censoring of results that fall below the limit of detection (LOD) for the measuring instrument. Truncated data methods (e.g., Tobit regression) and multiple-imputation are two approaches for analyzing data results that are below the LOD. To apply these methods requires an assumption about the underlying distribution of the data. Because the log-normal distribution has been shown to ﬁt many data sets obtained from environmental measurements, the common practice is to assume that measurements of environmental factors can be described by log-normal distributions. This article describes methods for obtaining estimates of percentiles and their associated conﬁdence intervals when the results are log-normal and a fraction of the results are below the LOD. We present limited simulations to demonstrate the bias of the proposed estimates and the coverage probability of their associated conﬁdence intervals. Estimation methods are used to generate summary statistics for 2,3,7,8-tetrachloro dibenzo-p-dioxin (2,3,7,8-TCDD) using data from a 2001 background exposure study in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population. Because the congener measurements used in this study were subject to variable LODs, we also present simulation results to demonstrate the eﬀect of variable LODs on the multipleimputation process. Published by Elsevier Ltd. Keywords: cPCB; Limit of detection; Multiple-imputation; PCDD; PCDF; TCDD

1. Introduction One problem that arises in trying to characterize environmental exposures is that levels of contaminants in some or many individuals are not detectable by the instrumentation. The inability to detect can result from insuﬃcient matrix or from extremely low exposure levels. Such results are said to be below the limit of detection (LOD) as determined by the sampling and analytic method. In spite of continued improvements in the sensitivity of assays to detect lower and lower concentrations, exposure levels are also decreasing so that the percentage of results below

*

Corresponding author. Tel.: +1 770 488 4622; fax: +1 770 488 4192. E-mail address: [email protected] (S.P. Caudill).

0045-6535/$ - see front matter Published by Elsevier Ltd. doi:10.1016/j.chemosphere.2006.12.013

the LOD is not declining and may actually be increasing. Several papers (Persson and Rootzen, 1977; Gleit, 1985; Haas and Scheﬀ, 1990; Helsel, 1990; Hornung and Reed, 1990; Travis and Land, 1990; Huybrechts et al., 2002) have addressed the problem of estimating the mean or geometric mean of a population subject to results below the LOD. Lubin et al. (2004) took this a step further by focusing on regression models in which the dependent variable has results below the LOD. Regardless of the problem with results below the LOD, many investigators prefer to use percentiles rather than just summary measures of central tendency and dispersion to describe environmental exposure data because such data tend to be skewed and may not be exactly log-normal. In such cases, percentiles often provide a more thorough and accurate characterization than is achieved using

170

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

geometric means and standard deviations. An evaluation of various MLE methods and MLE-based imputation methods for median and interquartile range estimation has recently been published by Huybrechts et al. (2002), but these authors only considered ﬁxed LODs. Our focus in this paper is on the estimation of percentiles and their conﬁdence intervals and how these estimates are aﬀected by: (1) ﬁxed and variable LODs, (2) percentage of results below the LOD, and (3) sample size. We compare singleand multiple-imputation methods for replacing results below the LOD by performing a limited number of simulations to determine how these three factors aﬀect the bias of various percentile estimates and the coverage probability associated with their conﬁdence intervals. We also present percentile estimates and associated conﬁdence intervals for 2,3,7,8-TCDD using data from a 2001 background exposure study, in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population (Agency for Toxic Substance and Disease Registry, 2005). 2. Statistical methods and analysis For multiple-imputation calculations, individual measurements are assumed to be log-normally distributed and the second multiple-imputation procedure described by Lynn (2001) is used to impute values below the LOD. Using this method, missing values of the base(10) logarithm of congener results are sampled from f (Xmisj Xobs, Xmis < c (l*, r*)), by drawing a random variance r ðn 1Þ^ r2 =v2n1 followed by a random mean l N ð^ l; r =nÞ, where n (= nmis + nobs) is the total number of ^ is the maximum likelihood estimate subjects and l ^2 is the MLE of r2. Xmis is then sampled (MLE) of l and r from the lower tail of N(l*, r*) where values are less than c[= log10(LOD)]. This process is repeated m times as described by Rubin (1987) to create m sets of results. Each of these m data sets is then analyzed independently to estimate various percentiles along with their conﬁdence limits. The m within-imputation estimates of a given percentile are averaged to arrive at a ﬁnal percentile estimate. Whereas, according to Rubin (1987), the variance of a mean or geometric mean estimate would be obtained by combining the average of the m within-imputation variance estimates of the mean with the among-imputation variance of these mean estimates, the within-imputation variance estimates of percentiles are not readily available. Thus, a diﬀerent approach is used to obtain conﬁdence intervals that take into account the additional variability resulting from multiple-imputation. This approach uses the cumulative binomial distribution to estimate conﬁdence limits after adjusting the sample size, to account for the increase in variance resulting from multiple-imputation. The method used is adapted from methods described by Woodruﬀ (1952) and by Korn and Graubard (1998). The method is described as follows:

Step 1: Separately for each of the m data sets, the empirical distribution of results is used to determine the value (Xi; i = 1, 2, . . . , m) associated with a selected percentile. Then the number of results (ki) below this Xi value is computed. To simplify notation below, the subscript i on the k has been dropped. Step 2: Next the Clopper and Pearson (1934) 0.95 conﬁdence interval limits (PL(k, n) and PU(k, n)) for the ith data set are computed as follows: v1 F v1;v2 ð0:025Þ and P L ðk; nÞ ¼ v2 þ v1 F v1;v2 ð0:025Þ ð1Þ v3 F v3;v4 ð0:975Þ P U ðk; nÞ ¼ v4 þ v3 F v3;v4 ð0:975Þ where n = the total number of results as deﬁned earlier (i.e., n = nmis + nobs), k = number of results below Xi, v1 = 2k, v2 = 2(n k + 1), v3 = 2(k + 1), v4 = 2(n k), and Fd1,d2(b) is the b quantile of an F distribution with d1 and d2 degrees of freedom. Step 3: The empirical distribution of the sample results is then used again to determine the values XL95 corresponding to PL(k, n) and XU95 corresponding to PU(k,n). Step 4: To estimate the relative increase in the variability of the percentile estimate resulting from multiple-imputation, one-quarter of the width of the 95% conﬁdence interval around the base(10) logarithm of the percentile (i.e., log10[XU95] log10[XL95]) is ﬁrst used to approximate the within-imputation variance. After this variance has been estimated for each of the m imputations, the average is computed to represent the withinimputation variance. The among-imputation variance is calculated by computing the variance of the m log10[Xi] values and represents the increase in variance resulting from multiple-imputation. The total variance is estimated by adding the average within-imputation variance estimate to (1 + 1/ m) times the among-imputation variance estimate. The multiple-imputation design eﬀect (D) is then computed as the ratio of this total variance estimate to the within-imputation variance estimate. The factor D is similar to the design eﬀect (i.e., complex sample design variance divided by the simple random sample variance) in sample survey theory and in this case, quantiﬁes the increase in variance resulting from multiple-imputation. Step 5: For the ﬁnal percentile estimate, the log10[Xi] values are averaged across the m imputation data sets, and the empirical distribution of these averages is used to determine the average log10[Xi] value (L10XAVE) associated with a selected percentile. Steps 2 and 3 are then repeated with the sample size n replaced by a reduced sample size nr = n/D. This reduced sample size is used to incorporate

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

the additional variability in the percentile estimate resulting from multiple-imputation. Changing the sample size also requires a change in the value of k in Eq. (1) which should be replaced by kr = k/D, where k is now the number of results below the L10XAVE value. The resulting equation corresponding to the sample size reduction is given by v1 F v1;v2 ð0:025Þ and v2 þ v1 F v1;v2 ð0:025Þ v3 F v3;v4 ð0:975Þ P U ðk r ; nr Þ ¼ v4 þ v3 F v3;v4 ð0:975Þ

P L ðk r ; nr Þ ¼

ð2Þ

where v1 = 2kr, v2 = 2(nr kr + 1), v3 = 2(kr + 1), v4 = 2(nr kr), and Fd1,d2(b) is the b quantile of an F distribution with d1 and d2 degrees of freedom. Step 6: The empirical distribution of the L10XAVE results is then used again to determine the values L10XAVE_L95 corresponding to PL(kr, nr) and L10XAVE_U95 corresponding to PU(kr, nr). Finally, the L10XAVE, L10XAVE_L95, and L10XAVE_U95 are back transformed to produce the ﬁnal percentile estimate with corresponding conﬁdence limits.

2.1. Simulation study To evaluate the eﬀects of ﬁxed and variable LODs on estimates of various percentiles, we generated 1000 log-normal data sets of sample sizes 100 and 500. The mean chosen for the simulated log-normal data was 0.10 and the variance was 0.397. This mean and variance corresponds to MLEs of the mean and variance of log base(10)-transformed values of 2,3,7,8-TCDD in a sample of 415 subjects from a 2001 background exposure study conducted in Louisiana. Simulation for this speciﬁc data set was performed so that we could explore the eﬀects of variable censoring limits which may be unique to this particular data set and to the Centers for Disease Control and Prevention (CDC) laboratory performing the analyses. The variance of the log base(10)-transformed values of the censoring limits was 0.044, so we also simulated variable censoring limits with this variance. We censored results from each of these 2000 data sets using various ﬁxed and variable censoring limits to achieve censoring fractions from approximately 5–70%, resulting in 2000 uncensored data sets, 2000 data sets with ﬁxed censoring limits, and 2000 data sets with variable censoring limits. The ﬁxed censoring limits were chosen so that approximately 5%, 25%, 50%, or 70% of the log transformed data would be censored. Variable censoring limits were achieved by adding random normal deviates to the ﬁxed censoring limits. For each of these 8000 data sets we computed the 25th, 50th, 75th, 90th, and 95th percentiles, their 95% conﬁdence limits, and their biases relative to the true percentiles. The censoring limit variance of 0.044 mentioned earlier corresponds to the 255 subjects with results below the LOD

171

in the Louisiana background exposure study. This censoring limit variance is approximately 10% of the magnitude of the MLE estimate of variance for the base(10) logarithm of the 415 TCDD measurements. Because background TCDD exposure levels are age-dependent, stratiﬁcation of the full data set into age-groups is also of interest. It turns out that the percentage of censoring limit variance to full sample variance of TCDD measurements ranges from 10% to 20% for the four age-groups considered in the Louisiana study. Thus to determine the eﬀect of censoring-limit variance on the variable censoring simulations and to allow validity checks of the ﬁxed censoring and uncensored simulation results, we also re-ran all the simulations using a censoring limit variance equal to 20% of the variance of the simulated data. 2.2. Comparison of single-imputation and multiple-imputation estimates of percentiles We compared single-imputation and multipleimputation estimates of the 25th, 50th, 75th, 90th, and 95th percentiles using variable censored 2,3,7,8-TCDD results from a 2001 background exposure study in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population. For the single-imputation estimates, we use the variable censoring limit divided by either two or the square root of two as the imputed value. For the multiple-imputation estimates we use the method described above.

3. Results 3.1. Simulation studies Simulation results are presented in Tables 1–5, which display the average bias and coverage of the percentile estimates as a function of the sample size, the variable censoring limit variance, the fraction of results below the LOD, and the type of imputation. The bias and coverage are also given for the uncensored data (i.e., the complete data before censoring). Thus the column labeled ‘‘Fraction < LOD’’ does not apply to these estimates except in the sense that these estimates were obtained from the same data set that was later censored to the extent indicated. The standard error for coverage estimates of 95% conﬁdence limits is 0.0069. Diﬀerences in bias estimates under the uncensored column with comparable sample size, result from random errors in the simulation process because these data were generated under identical conditions. Similarly, diﬀerences in bias estimates under the ﬁxed censoring-limit columns with comparable sample size and fraction of results below the LOD, result from random errors in the simulation process because these data were also generated under identical conditions. Table 1 shows slight bias in estimation of the 25th percentile for sample sizes of 100 and 500 even when no

172

Table 1 Bias of 25th percentile estimates and coverage of their 95% conﬁdence limits (entries are means of 1000 repetitions) Sample size

Variable censoring limit variance (%)a

Fraction < LODb F/Vc

Uncensored

Bias of 25th percentile estimates (coverage of 95% limits) Fixed censoring limit Single-imputation

Variable censoring limit Multiple- imputation

Single-imputation

Multiple- imputation p LOD/ 2

LOD/2

100

10 20 10 20 10 20 10 20

0.04/0.01 0.06/0.10 0.35/0.24 0.28/0.32 0.55/0.57 0.41/0.37 0.69/0.69 0.61/0.80

0.026 0.021 0.017 0.020 0.028 0.023 0.020 0.018

(0.963) (0.968) (0.973) (0.959) (0.962) (0.971) (0.966) (0.950)

0.026 0.021 0.069 0.056 0.959 0.960 3.717 3.302

(0.963) (0.968) (0.999) (0.997) (0.000) (0.000) (0.000) (0.000)

0.026 (0.963) 0.021 (0.968) 0.070 (0.999) 0.077 (0.997) 1.33 (0.000) 1.33 (0.000) 4.610 (0.000) 4.117 (0.000)

0.026 0.021 0.228 0.215 0.146 0.153 0.084 0.137

(1.00) (1.00) (1.00) (1.00) (0.999) (1.00) (0.967) (0.981)

0.024 0.014 0.013 0.078 0.395 0.388 1.818 1.460

(0.963) (0.967) (0.964) (0.948) (0.237) (0.224) (0.000) (0.000)

0.026 0.033 0.192 0.251 0.918 0.861 2.934 2.383

(0.963) (0.965) (0.663) (0.588) (0.001) (0.147) (0.000) (0.000)

0.024 0.009 0.148 0.096 0.246 0.297 0.223 0.346

(1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (0.982) (0.997)

500

10 20 10 20 10 20 10 20

0.09/0.09 0.08/0.09 0.32/0.32 0.27/0.15 0.54/0.50 0.52/0.52 0.76/0.82 0.74/0.70

0.007 0.007 0.004 0.009 0.006 0.007 0.006 0.003

(0.952) (0.961) (0.955) (0.948) (0.953) (0.942) (0.948) (0.950)

0.007 0.007 0.149 0.148 0.932 0.932 3.638 3.637

(0.952) (0.961) (1.00) (1.00) (0.000) (0.000) (0.000) (0.000)

0.007 (0.952) 0.007 (0.961) 0.008 (1.00) 0.008 (1.00) 1.30 (0.000) 1.30 (0.000) 4.515 (0.000) 4.515 (0.000)

0.007 0.007 0.297 0.295 0.152 0.148 0.139 0.147

(0.952) (0.961) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00)

0.006 0.002 0.015 0.061 0.355 0.331 1.754 1.585

(0.952) (0.960) (0.947) (0.830) (0.257) (0.033) (0.000) (0.000)

0.007 0.013 0.179 0.216 0.866 0.796 2.857 2.575

(0.950) (0.958) (0.266) (0.365) (0.000) (0.000) (0.000) (0.000)

0.006 0.000 0.144 0.097 0.284 0.346 0.332 0.425

(1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 255 2,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four subsamples of the 415 subjects based on age-groups. b LOD is the limit of detection. c The fractions < LOD listed are averages for the ﬁxed (F) and variable (V) censoring limits, respectively.

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

p LOD/ 2

LOD/2

Table 2 Bias of 50th percentile estimates and coverage of their 95% conﬁdence limits (entries are means of 1000 repetitions) Sample size

Variable censoring limit variance (%)a

Fraction < LODb F/Vc

Uncensored

Bias of 50th percentile estimates (coverage of 95% limits) Fixed censoring limit

Variable censoring limit

Single-imputation

Multiple- imputation

LOD/2

LOD/ 2

Single-imputation

Multiple- imputation p LOD/ 2

LOD/2

100

10 20 10 20 10 20 10 20

0.04/0.01 0.06/0.10 0.35/0.24 0.28/0.32 0.55/0.57 0.41/0.37 0.69/0.69 0.61/0.80

0.012 0.024 0.007 0.017 0.020 0.013 0.030 0.003

(0.964) (0.968) (0.962) (0.966) (0.965) (0.971) (0.974) (0.961)

0.012 0.024 0.007 0.017 0.067 0.082 0.773 0.617

(0.964) (0.968) (0.962) (0.966) (0.986) (0.991) (0.000) (0.000)

0.012 0.024 0.007 0.017 0.012 0.003 1.109 0.923

(0.964) (0.968) (0.962) (0.966) (0.986) (0.991) (0.000) (0.000)

0.012 0.024 0.007 0.017 0.187 0.212 0.481 0.508

(0.964) (0.968) (0.974) (0.975) (0.988) (0.991) (0.202) (0.184)

0.012 0.024 0.001 0.013 0.022 0.021 0.542 0.506

(0.969) (0.969) (0.962) (0.971) (0.955) (0.900) (0.025) (0.239)

0.012 0.024 0.014 0.053 0.179 0.233 1.109 1.108

(0.964) (0.969) (0.963) (0.963) (0.584) (0.401) (0.000) (0.000)

0.012 0.024 0.002 0.013 0.289 0.323 0.536 0.590

(1.00) (0.969) (0.975) (0.972) (0.968) (0.950) (0.274) (0.316)

500

10 20 10 20 10 20 10 20

0.09/0.09 0.08/0.09 0.32/0.32 0.27/0.15 0.54/0.50 0.52/0.52 0.76/0.82 0.74/0.70

0.007 0.006 0.006 0.004 0.004 0.004 0.005 0.002

(0.949) (0.956) (0.960) (0.949) (0.947) (0.944) (0.964) (0.959)

0.007 0.006 0.006 0.004 0.140 0.142 0.743 0.743

(0.949) (0.956) (0.960) (0.949) (0.984) (0.982) (0.000) (0.000)

0.007 0.006 0.006 0.004 0.053 0.054 1.073 1.073

(0.949) (0.956) (0.960) (0.949) (0.984) (0.982) (0.000) (0.000)

0.007 0.006 0.006 0.004 0.223 0.230 0.505 0.508

(0.949) (0.956) (0.972) (0.960) (0.981) (0.981) (0.000) (0.000)

0.007 0.006 0.001 0.002 0.075 0.024 0.501 0.579

(0.949) (0.956) (0.955) (0.944) (0.936) (0.636) (0.000) (0.000)

0.007 0.006 0.011 0.037 0.132 0.188 1.062 1.135

(0.949) (0.956) (0.959) (0.914) (0.246) (0.320) (0.000) (0.000)

0.007 0.006 0.000 0.016 0.342 0.371 0.596 0.640

(0.949) (0.956) (0.969) (0.962) (0.782) (0.629) (0.000) (0.003)

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

p

a

The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 255 2,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four subsamples of the 415 subjects based on age-groups. b LOD is the limit of detection. c The fractions < LOD listed are averages for the ﬁxed (F) and variable (V) censoring limits, respectively.

173

174

Table 3 Bias of 75th percentile estimates and coverage of their 95% conﬁdence limits (entries are means of 1000 repetitions) Sample size

Variable censoring limit variance (%)a

Fraction < LODb F/Vc

Uncensored

Bias of 75th percentile estimates (coverage of 95% limits) Fixed censoring limit

Variable censoring limit

Single-imputation

Multipleimputation

Single-imputation p LOD/ 2

LOD/2

Multipleimputation

100

10 20 10 20 10 20 10 20

0.04/0.01 0.06/0.10 0.35/0.24 0.28/0.32 0.55/0.57 0.41/0.37 0.69/0.69 0.61/0.80

0.012 0.024 0.005 0.016 0.020 0.013 0.025 0.001

(0.961) (0.968) (0.968) (0.963) (0.953) (0.953) (0.965) (0.950)

0.012 0.024 0.005 0.016 0.020 0.013 0.035 0.040

(0.961) (0.968) (0.968) (0.963) (0.953) (0.95) (0.981) (0.983)

0.012 0.024 0.005 0.016 0.020 0.013 0.012 0.009

(0.961) (0.968) (0.968) (0.963) (0.953) (0.953) (0.981) (0.983)

0.012 0.024 0.005 0.016 0.020 0.013 0.140 0.110

(0.961) (0.968) (0.968) (0.963) (0.953) (0.960) (0.991) (0.985)

0.012 0.024 0.005 0.016 0.018 0.017 0.017 0.021

(0.961) (0.968) (0.968) (0.963) (0.953) (0.952) (0.893) (0.685)

0.012 0.024 0.005 0.017 0.028 0.057 0.216 0.276

(0.961) (0.968) (0.968) (0.963) (0.953) (0.949) (0.355) (0.358)

0.012 0.024 0.005 0.015 0.016 0.003 0.354 0.456

(0.961) (0.968) (0.968) (0.963) (0.960) (0.962) (0.947) (0.827)

500

10 20 10 20 10 20 10 20

0.09/0.09 0.08/0.09 0.32/0.32 0.27/0.15 0.54/0.50 0.52/0.52 0.76/0.82 0.74/0.70

0.006 0.002 0.005 0.004 0.003 0.001 0.002 0.002

(0.961) (0.962) (0.953) (0.943) (0.952) (0.940) (0.950) (0.961)

0.006 0.002 0.005 0.004 0.003 0.001 0.048 0.048

(0.961) (0.962) (0.953) (0.943) (0.952) (0.940) (0.974) (0.983)

0.006 0.002 0.005 0.004 0.003 0.001 0.021 0.023

(0.961) (0.962) (0.953) (0.943) (0.952) (0.940) (0.974) (0.983)

0.006 0.002 0.005 0.004 0.003 0.001 0.095 0.092

(0.961) (0.962) (0.953) (0.943) (0.959) (0.946) (0.984) (0.990)

0.006 0.002 0.005 0.004 0.001 0.003 0.065 0.033

(0.961) (0.962) (0.953) (0.943) (0.952) (0.946) (0.651) (0.053)

0.006 0.002 0.005 0.005 0.008 0.039 0.178 0.323

(0.961) (0.962) (0.953) (0.945) (0.949) (0.910) (0.010) (0.350)

0.006 0.002 0.005 0.004 0.001 0.013 0.423 0.538

(0.961) (0.962) (0.953) (0.948) (0.963) (0.958) (0.675) (0.437)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 255 2,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four subsamples of the 415 subjects based on age-groups. b LOD is the limit of detection. c The fractions < LOD listed are averages for the ﬁxed (F) and variable (V) censoring limits, respectively.

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

p LOD/ 2

LOD/2

Table 4 Bias of 90th percentile estimates and coverage of their 95% conﬁdence limits (entries are means of 1000 repetitions) Sample size

Variable censoring limit variance (%)a

Fraction < LODb F/Vc

Uncensored

Bias of 90th percentile estimates (coverage of 95% limits) Fixed censoring limit

Variable censoring limit

Single-imputation LOD/2 100

10 20 10 20 10 20 10 20

0.04/0.01 0.06/0.10 0.35/0.24 0.278/0.32 0.55/0.57 0.41/0.37 0.69/0.69 0.61/0.80

500

10 20 10 20 10 20 10 20

0.09/0.09 0.08/0.09 0.32/0.32 0.27/0.15 0.54/0.50 0.52/0.52 0.76/0.82 0.74/0.70

0.020 0.033 0.010 0.030 0.017 0.021 0.031 0.023

(0.965) (0.959) (0.961) (0.959) (0.965) (0.964) (0.967) (0.969)

0.020 0.033 0.010 0.030 0.017 0.021 0.031 0.023

(0.965) (0.959) (0.961) (0.959) (0.965) (0.964) (0.967) (0.969)

0.020 0.033 0.010 0.030 0.017 0.021 0.031 0.023

(0.965) (0.959) (0.961) (0.959) (0.965) (0.964) (0.967) (0.969)

0.020 0.033 0.010 0.030 0.017 0.021 0.031 0.023

(0.965) (0.959) (0.961) (0.959) (0.965) (0.964) (0.967) (0.969)

0.006 0.002 0.006 0.006 0.008 0.000 0.006 0.001

(0.959) (0.954) (0.952) (0.960) (0.959) (0.957) (0.949) (0.964)

0.006 0.002 0.006 0.006 0.008 0.000 0.006 0.001

(0.959) (0.954) (0.952) (0.960) (0.959) (0.957) (0.949) (0.964)

0.006 0.002 0.006 0.006 0.008 0.000 0.006 0.001

(0.959) (0.954) (0.952) (0.960) (0.959) (0.957) (0.949) (0.964)

0.006 (0.959) 0.002 (0.954) 0.006 (0.952) 0.006 (0.960) 0.008 (0.959) 0.000 (0.957) 0.006 (0.953) 0.001 (0.964)

Single-imputation p LOD/ 2

LOD/2

Multipleimputation

0.020 0.033 0.010 0.030 0.017 0.021 0.029 0.032

(0.965) (0.959) (0.961) (0.959) (0.965) (0.964) (0.966) (0.968)

0.020 0.033 0.010 0.030 0.017 0.024 0.045 0.084

(0.965) (0.959) (0.961) (0.959) (0.965) (0.964) (0.966) (0.954)

0.020 0.033 0.010 0.030 0.017 0.021 0.026 0.002

(0.965) (0.959) (0.961) (0.959) (0.965) (0.964) (0.971) (0.973)

0.006 0.002 0.006 0.006 0.008 0.000 0.004 0.015

(0.959) (0.954) (0.952) (0.960) (0.959) (0.956) (0.949) (0.962)

0.006 0.002 0.006 0.006 0.008 0.002 0.016 0.075

(0.959) (0.954) (0.952) (0.960) (0.959) (0.955) (0.948) (0.861)

0.006 0.002 0.006 0.006 0.008 0.001 0.003 0.016

(0.959) (0.954) (0.952) (0.960) (0.959) (0.958) (0.967) (0.981)

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

p LOD/ 2

Multipleimputation

a

The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 255 2,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four subsamples of the 415 subjects based on age-groups. b LOD is the limit of detection. c The fractions < LOD listed are averages for the ﬁxed (F) and variable (V) censoring limits, respectively.

175

176

Table 5 Bias of 95th percentile estimates and coverage of their 95% conﬁdence limits (entries are means of 1000 repetitions) Sample size

Variable censoring limit variance (%)a

Fraction < LODb F/Vc

Uncensored

Bias of 95th percentile estimates (coverage of 95% limits) Fixed censoring limit

Variable censoring limit

Single-imputation

Multiple- imputation p LOD/ 2

Single-imputation LOD/2

p LOD/ 2

Multipleimputation

100

10 20 10 20 10 20 10 20

0.04/0.01 0.06/0.10 0.35/0.24 0.28/0.32 0.55/0.57 0.41/0.37 0.69/0.69 0.61/0.80

0.021 0.041 0.040 0.041 0.024 0.036 0.052 0.032

(0.955) (0.954) (0.954) (0.954) (0.952) (0.957) (0.964) (0.946)

0.021 0.041 0.040 0.041 0.024 0.036 0.052 0.032

(0.955) (0.954) (0.954) (0.954) (0.952) (0.957) (0.964) (0.946)

0.021 0.041 0.040 0.041 0.024 0.036 0.052 0.032

(0.955) (0.954) (0.954) (0.954) (0.952) (0.957) (0.964) (0.946)

0.021 0.041 0.040 0.041 0.024 0.036 0.052 0.032

(0.955) (0.954) (0.954) (0.954) (0.952) (0.957) (0.964) (0.946)

0.021 0.041 0.040 0.041 0.024 0.036 0.052 0.034

(0.955) (0.954) (0.954) (0.954) (0.952) (0.957) (0.964) (0.948)

0.021 0.041 0.040 0.041 0.024 0.036 0.052 0.046

(0.955) (0.954) (0.954) (0.954) (0.952) (0.957) (0.964) (0.950)

0.021 0.041 0.040 0.041 0.024 0.036 0.052 0.030

(0.955) (0.954) (0.954) (0.954) (0.952) (0.957) (0.964) (0.946)

500

10 20 10 20 10 20 10 20

0.09/0.09 0.08/0.09 0.32/0.32 0.27/0.15 0.54/0.50 0.52/0.52 0.76/0.82 0.74/0.70

0.007 0.007 0.008 0.003 0.014 0.002 0.003 0.007

(0.944) (0.949) (0.943) (0.945) (0.961) (0.966) (0.949) (0.957)

0.007 0.007 0.008 0.003 0.014 0.002 0.003 0.007

(0.944) (0.949) (0.943) (0.945) (0.961) (0.966) (0.949) (0.957)

0.007 0.007 0.008 0.003 0.014 0.002 0.003 0.007

(0.944) (0.949) (0.943) (0.945) (0.961) (0.966) (0.949) (0.957)

0.007 0.007 0.008 0.003 0.014 0.002 0.003 0.007

(0.944) (0.949) (0.943) (0.945) (0.961) (0.966) (0.949) (0.957)

0.007 0.007 0.008 0.003 0.014 0.002 0.003 0.009

(0.944) (0.949) (0.943) (0.945) (0.961) (0.966) (0.949) (0.959)

0.007 0.007 0.008 0.003 0.014 0.002 0.004 0.025

(0.944) (0.949) (0.943) (0.945) (0.961) (0.966) (0.949) (0.963)

0.007 0.007 0.008 0.003 0.014 0.002 0.003 0.006

(0.944) (0.949) (0.943) (0.945) (0.961) (0.966) (0.949) (0.964)

a The variance of the censoring limit was set for simulation at either 10% or 20% of the variance of the simulated data. Ten percent corresponds to the censoring limit variance associated with 255 2,3,7,8-TCDD (2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin) results below the LOD from a background exposure study of 415 subjects in which PCDDs/PCDFs/cPCBs in human blood serum were measured in a Louisiana population in 2001 and analyzed by the Centers for Disease Control laboratories. Twenty percent corresponds to the largest censoring limit variance observed in four subsamples of the 415 subjects based on age-groups. b LOD is the limit of detection. c The fractions < LOD listed are averages for the ﬁxed (F) and variable (V) censoring limits, respectively.

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

LOD/2

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

censoring occurred (all rows of column 4 labeled ‘‘Uncensored’’). For a sample size of 100 the average bias is 2.2% and for 500 the average bias is 0.6%. As long as the fraction of results below the LOD does not exceed 0.10, the 25th percentile p estimates using single-imputation (LOD/2 or LOD/ 2) or multiple-imputation under ﬁxed censoring (columns 5 through 7) retain the same low bias as the uncensored results because all of the censored values are below the 25th percentile. Similar results are obtained for single- and multiple-imputation methods in estimating the 25th percentile when a variable censoring limit exists and the fraction of censored values does not exceed 0.10. Thus using multiple-imputation to estimate the 25th percentile does not appear to have an advantage over single-imputation when ﬁxed or variable censoring limits exist and the fraction of results below the LOD is less than 10%. As the censoring fraction nears or exceeds 0.25 under ﬁxed or variable censoring, the magnitude of bias relative to the average uncensored bias (which is 0.022 and 0.006 for samples sizes of 100 and 500, respectively) increases for all three imputation methods. When the fraction of results below the LOD is near 0.25 (rows 3 and 4 for each sample size) the variable censoring limit variance (column 2) does not seem to adversely aﬀect the bias of the 25th percentile estimates for any of the three imputation methods except single-imputation using LOD/2 (third from last column rows 3 and 4 for each sample size). The reason for this anomaly is not clear, but because none of the three methods have consistently low bias for this degree of censoring, it appears that a 25th percentile should not be estimated when close to 25% or more of the results are below the LOD. Table 2 shows a slight bias in estimation of the 50th percentile for a sample size of 100 and 500 even when no censoring occurred. For sample sizes of 100 the average bias is 1.5% and for 500 the average bias is 0.5% (all rows of column 4). As long as the fraction of censored values is no more than one-third, the 50th percentile estimates using p single-imputation (LOD/2 or LOD/ 2) or multiple-imputation under ﬁxed censoring (columns 5 through 7) retain the same low bias as the uncensored results because all of the censored values are below the 50th percentile. Under variable censoring, when the fraction of censored results is at or near one-third and the variable censoring p limit variance is 20%, single-imputation using LOD/ 2 appears to be associated with a slight positive bias relative to uncensored results (next to last column row 4 for both sample sizes). Bias associated with single-imputation using LOD/ 2 and with multiple-imputation under variable censoring may also be slightly diﬀerent from those associated with uncensored results when one-third of results are censored, but the diﬀerences are minimal (compare column 4 with columns 8 and 10 rows 3 and 4 for both sample sizes). Thus there may be a slight advantage to multiple-imputation or single-imputation usingpLOD/2 as compared to singleimputation using LOD/ 2 when estimating a 50th percentile and up to one-third of results are censored and there is

177

a variable censoring limit with censoring limit variance as high as 20% of the variance of the measured samples. As the censoring fraction nears or exceeds one-half under ﬁxed or variable censoring, the magnitude of bias relative to the average uncensored bias (which is 0.015 and 0.005 for samples sizes of 100 and 500, respectively) increases for all three imputation methods. As the censoring fraction exceeds 0.5 (see rows 7 and 8 for all sample sizes), singleimputation tends to be positively biased and multipleimputation tends to be negatively biased. These results suggest that 50th percentile estimates should not be computed when the fraction of results below the LOD is near to or exceeds 0.5. Table 3 shows a slight bias in estimation of the 75th percentile for sample sizes of 100 and 500 even when no censoring occurred. For a sample size of 100, the average bias is 1.3% and for 500, it is 0.2%. As long as the fraction of censored values is no more than one-half, the 75th percentile estimates using single-imputation (LOD/2 or LOD/ p 2) or multiple-imputation under ﬁxed censoring (columns 5 through 7) retain the same low bias as the uncensored results because all of the censored values are below the 75th percentile. Under variable censoring, when the fraction of censored results is at or near one-half and the variable censoring p limit variance is 20%, single-imputation using LOD/ 2 appears to be associated with a slight positive bias relative to uncensored results (next to last column row 6 for both sample sizes). Bias associated with singleimputation using LOD/2 and with multiple-imputation under variable censoring may also be slightly diﬀerent from those associated with uncensored results when one-half of results are censored, but the diﬀerences are minimal (compare column 4 with columns 8 and 10 rows 5 and 6 for both sample sizes). Thus there may be a slight advantage to multiple-imputation or single-imputation using p LOD/2 as compared to single-imputation using LOD/ 2 when estimating a 75th percentile and up to one-half of results are censored and there is a variable censoring limit with censoring limit variance as high as 20% of the variance of the measured samples. As the censoring fraction nears or exceeds seven-tenths under ﬁxed or variable censoring, the magnitude of bias relative to the average uncensored bias (which is 0.013 and 0.002 for samples sizes of 100 and 500, respectively) increases for all three imputation methods. These results suggest that 75th percentile estimates should not be computed when the fraction of results below the LOD is near to or exceeds 0.7. Table 4 shows a slight bias in estimation of the 90th percentile for a sample size of 100 or 500 even when no censoring occurred. For sample sizes of 100, the average bias is 2.3% and for 500, it is 0.4%. As long as the censoring fraction is no more than 0.7, the 90th percentile estimates using single- or multiple-imputation under ﬁxed censoring are comparable to those that would have been obtained from uncensored samples, with a couple of exceptions under variable censoring (see last three columns of the 7th and 8th rows for both sample sizes). When the fraction less than

178

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

the LOD is near to or above 0.70 and there is variable censoring, single-imputation appears to have an increased positive bias and multiple-imputation appears to have a small negative bias when the censoring limit variance is 20% of the variance associated with the base(10) logarithm of the measured results. So it appears that the size of the censoring limit variance can aﬀect the estimation of a 90th percentile when the fraction of results below the LOD exceeds 70%. Table 5 shows a slight bias in estimation of the 95th percentile for a sample size of 100 or 500 even when no censoring occurred. For sample sizes of 100, the average bias is 3.6% and for 500, it is 0.6%. As long as the censoring fraction is no more than 0.75, the 95th percentile estimates using single- or multiple-imputation under ﬁxed or variable censoring are equal to those that would have been obtained from uncensored samples, withp the possible exception of single-imputation using LOD/ 2 and there is a variable censoring limit with censoring limit variance as high as 20% of the variance of the measured samples (column 9 row 8 for both sample sizes). This result is not surprising because imputed values are not likely to take values at or

above the 95th percentile. This is also the reason, results are identical across many rows in the table. Using multiple-imputation to estimate the 95th percentile does not appear to have an advantage over single-imputation using LOD/2 whether a ﬁxed censoring limit or variable censoring limits exist, as long as the censoring fraction is less than 0.75. 3.2. Example The simulation results were used to determine whether unbiased percentile estimates could be obtained for a 2001 background exposure study in which PCDDs/ PCDFs/cPCBs in human blood serum were measured in a Louisiana population. Subjects in this study had no known exposure to dioxin-like compounds other than exposure to background levels. Of the 415 measurements of 2,3,7,8-TCDD in pg/g lipid, 255 (61.5%) were below their corresponding limit of detection (LOD). The maximum likelihood estimate (MLE) of the mean of the log base(10) of these results was 0.086 and of the standard deviation of the log base(10) of these results was 0.635.

Table 6 Percentile estimates and 95% conﬁdence intervals for 2,3,7,8-TCDDa in pg/g lipid for 415 subjects from a 2001 study of a Louisiana population Age-group

Sample size

Fraction < LODb

Method

50th

75th

90th

95th

All

415

0.615

Single-imputation LOD/2 Single-imputation p LOD/ 2 Multiple-imputation

0.9 (0.8, 1.2) 1.2 (1.1, 1.5) 0.6 (0.5, 0.7)

3.0 (2.6, 3.4) 3.1 (2.7, 3.6) 2.8 (2.5, 3.6)

5.1 (4.6, 6.0) 5.3 (4.7, 6.0) 5.1 (4.5, 6.0)

7.2 (5.9, 8.3) 7.2 (6.0, 8.3) 7.2 (5.9, 8.3)

[0, 30)

102

0.941

Single-imputation LOD/2 Single-imputation p LOD/ 2 Multiple-imputation

0.7 (0.6, 0.8) 1.0 (0.8, 1.1) 0.5 (0.4, 0.6)

0.9 (0.8, 1.1) 1.3 (1.1, 1.6) 0.6 (0.6, 0.8)

1.9 (1.0, 2.3) 2.1 (1.4, 3.3) 0.8 (0.6, 1.9)

2.3 (1.7, 5.0) 3.3 (2.2, 7.0) 1.9 (0.8, 2.4)

[30, 45)

101

0.693

Single-imputation LOD/2 Single-imputation p LOD/ 2 Multiple-imputation

0.8 (0.6, 0.9) 1.1 (0.8, 1.2) 0.6 (0.5, 0.7)

2.2 (1.2, 2.6) 2.3 (1.5, 2.6) 1.7 (0.7, 2.6)

3.3 (2.5, 4.3) 3.3 (2.6, 4.4) 3.2 (2.5, 4.3)

4.3 (3.1, 5.1) 4.4 (3.1, 5.1) 4.0 (3.1, 5.1)

[45, 60)

110

0.518

Single-imputation LOD/2 Single-imputation p LOD/ 2 Multiple-imputation

1.5 (0.8, 2.2) 1.7 (1.1, 2.4) 0.9 (0.5, 2.2)

3.2 (2.5, 3.7) 3.4 (2.6, 3.8) 3.2 (2.5, 3.8)

4.4 (3.7, 5.6) 4.4 (3.7, 5.6) 4.3 (3.7, 5.6)

5.6 (4.2, 8.4) 5.6 (4.3, 8.4) 5.9 (4.3, 8.4)

[60+)

102

0.314

Single-imputation LOD/2 Single-imputation p LOD/ 2 Multiple-imputation

3.5 (2.6, 4.6) 3.6 (2.6, 4.6) 3.5 (2.6, 4.6)

5.9 (4.8, 7.1) 5.9 (4.8, 7.1) 5.9 (4.8, 7.1)

8.3 (7.0, 11.7) 8.3 (7.0, 11.7) 8.3 (7.0, 11.7)

11.7 (7.5, 18.5) 11.7 (7.5, 18.5) 11.7 (7.5, 18.5)

Results presented by age-group, sample size, and method of estimation. a 2,3,7,8-TCDD is 2,3,7,8-tetrachloro dibenzo-p-dioxin. b LOD is the limit of detection.

Percentile (95% conﬁdence interval)

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

The MLE estimate of the mean of the log10(LOD) values was 0.093 and the standard deviation of the log10(LOD) values was 0.215. Thus the variance of the log base(10) of the censoring limit values was approximately 10% of the variance of the log base(10) of the 2,3,7,8-TCDD measurements. Given the simulation results for samples of size 500 and variable LODs with variance equal to 10% of the variance of the 2,3,7,8-TCDD measurements, we should apparently be able to estimate the 75th, 90th, and 95th percentiles with less than 1% bias (Tables 3–5 row 13 columns 8–10). The 50th percentile, on the other hand, could be positively biased by as much as 13% (Table 2 row 13 column 9) using p single-imputation with imputed values equal to LOD/ 2 and negatively biased by as much as 34% (Table 2 row 13 column 10) using multiple-imputation. The actual percentile estimates in pg/g lipid for all 415 subjects are presented in the ﬁrst two rows of Table 6. The single-imputation and multiple-imputation methods diﬀer substantially as would be expected based on the simulation results in Table 1, when variable LODs exist and 50% or more of results are below the LOD. The true 50th percentile is most likely between 0.6 and 1.2 because multiple-imputation tends to be negatively biasedpand single-imputation with imputed value equal to LOD/ 2 tends to be positively biased. As expected from the simulations presented in Tables 2–4, the 75th, 90th, and 95th percentiles diﬀer very little for the two methods. Because interest exists in whether 2,3,7,8-TCDD levels are related to age, we also stratiﬁed the sample by age to see whether we could obtain unbiased percentile estimates for the resulting age-groups. The age-groups and their corresponding sample sizes are also presented in Table 6. Because the sample sizes are nearer to 100 than to 500, we used the simulation results for 100 and a variable LOD with variance equal to about 10% of the variance of the 2,3,7,8-TCDD measurements to determine the likely reliability of various estimates. More than 90% of results are below the LOD in the less-than-30 years age-group, so even the 95th percentile estimates will likely be biased. For the 30-to-45-years age-group, which has almost 70% of results below the LOD, the results in Tables 4 and 5 suggest that we should be able to estimate the 90th percentile with approximately 3% bias, and the 95th percentiles with approximately 5% bias. The 50th percentile on the other hand, could be positively biased by as much as 111% and the 75th percentile by as much as 22% (see seventh row next to the last column of Tables 2 and 3 for sample size of 100), using p single-imputation with imputed values equal to LOD/ 2. They could be negatively biased by as much as 54% for the 50th percentile and 35% for the 75th percentile (see seventh row last column Tables 2 and 3 for sample size of 100), using multiple-imputation. For the 45-to-60-years age-group, which has approximately one-half of results below the LOD, the results in Tables 3–5 suggest that we should be able to estimate the 75th with approximately 3% bias, and the 90th and 95th

179

percentiles with approximately 2% bias. The 50th percentile, on the other hand, could be positively biased by as much as 18% (see ﬁfth row next to the last column of Table 2 for sample size of 100) using p single-imputation with imputed values equal to LOD/ 2 and negatively biased by as much as 29% (see ﬁfth row last column of Table 2 for sample size of 100) using multiple-imputation. For the 60-plus-years age-group, which has close to onethird of results below the LOD, the results in Tables 2–5 suggest that we should be able to estimate the 50th, the 75th, and the 90th percentiles with approximately 1% bias, and the 95th with approximately 4% bias. Actual percentile estimates are presented in Table 6. The single-imputation and multiple-imputation percentile estimates for the less-than-30-years age-group diﬀer substantially from one another as would be expected, based on the simulation results in Table 2 when variable LODs exist, and more than 75% of results are below the LOD. Except for the 50th and 75 percentile estimates for the 30-to-45years age-group, the percentile estimates by either method are comparable to one another within each age-group for age-groups 30–45-years, 45–60 years, and 60-plus-years. Thus we can state with conﬁdence, for instance, that the 95th percentile of 2,3,7,8-TCDD increases from around 4 ppt for 30-to-45-year-olds to about 12 ppt for persons who are 60 years old or older. 4. Discussion Chemical exposure data tend to be highly skewed and to include a large fraction of measurements that are subject to left censoring. For such data sets, it is often more appropriate to describe the distribution of results by presenting quantiles or percentiles. To estimate the upper percentiles of a distribution, the method of imputation has little eﬀect on the bias of the estimates, as long as the LOD is ﬁxed and the percentage of censored results is below the percentile being estimated. Thus, multiple-imputation appears to have no advantage over single-imputation when there is a ﬁxed censoring limit and the percentage of censored results is less than the percentile being estimated. With variable LODs, the percentage of results below the LOD can, however, aﬀect percentile estimates even when the percentage of censored values is less than the percentile being estimated. The extent to which this will occur depends on the variability in the LODs. The variance of the base(10) logarithm of the LODs associated with the 415 2,3,7,8-TCDD measurements in this report was about 10% of the variance in the base(10) logarithm of the congener measurements. Simulation results with an LOD variance of that relative magnitude suggest that both single-imputation and multiple-imputation lead to biased estimates of the particular percentiles when a variable censoring limit exists and the percentage of censored values is near the percentile being estimated. Although single-imputation with a ﬁxed LOD has been shown to lead to biased estimates of means or geometric

180

S.P. Caudill et al. / Chemosphere 68 (2007) 169–180

means and their variances (Lubin et al., 2004), that does not seem to be the case for percentile estimation as long as the percentage of censored results is less than the percentile being estimated. When the censoring limits are variable with a variance near 10% of the measurement variance, multiple-imputation does not appear to be advantageous over single-imputation when estimating a 50th percentile. But multiple-imputation does appear to have an advantage over single-imputation when estimating a 75th or a 90th percentile if the censoring fraction is within 10-to-20 percentage points of the percentile being estimated. We assume this is probably true for a 95th percentile as well although we did not include simulations with more than 70% of results below the LOD. Thus, using multiple-imputation to estimate percentiles appears to have an advantage over single-imputation when variable censoring limits exist and the censoring fraction is within 10-to-20 percentage points of the percentile being estimated. Appendix l = population mean of a distribution. r = population standard deviation of a distribution. ^ ¼ estimate of l. l ^ ¼ estimate of r. r f (Y|X) = density function of Y given X. X N(l,r) indicates variable X is normally distributed with mean l and standard deviation r. v2n1 is the symbol for a v-square distribution with n 1 degrees of freedom. Fv1,v2(a) is the symbol for the a quantile of an F distribution with v1 and v2 degrees of freedom.

References Agency for Toxic Substance and Disease Registry (ATSDR), 2005. Serum dioxin levels in residents of Calcasieu Parish, Louisiana. Atlanta: US Department of Health and Human Services. ATSDR. Clopper, C.J., Pearson, E.S., 1934. The use of conﬁdence or ﬁducial limits illustrated in the case of the binomial. Biometrika 26, 404–413. Gleit, A., 1985. Estimation of small normal data sets with detection limits. Environ. Sci. Technol. 19, 1201–1206. Haas, C.N., Scheﬀ, P.A., 1990. Estimation of averages in truncated samples. Environ. Sci. Technol. 24, 912–919. Helsel, D.R., 1990. Less than obvious – statistical treatment of data below the detection limit. Environ. Sci. Technol. 24, 1766–1774. Hornung, R.W., Reed, L.D., 1990. Estimation of average concentration in the presence of nondetectable values. Appl. Occup. Environ. Hyg. 5, 46–51. Huybrechts, T., Thas, O., Dewulf, J., Van Langenhove, H., 2002. How to estimate moments and quantiles of environmental data sets with nondetected observations? A case study on volatile organic compounds in marine water samples. J. Chromatogr. A 975, 123–133. Korn, E.L., Graubard, B.I., 1998. Conﬁdence intervals for proportions with small expected number of positive counts estimated from survey data. Survey Methodol. 24, 193–201. Lubin, J.H., Colt, J.S., Camann, D., Davis, S., Cerhan, J.R., Severson, R.K., et al., 2004. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ. Health Perspect. 112, 1691– 1696. Lynn, H.S., 2001. Maximum likelihood inference for left-censored HIV RNA data. Statist. Med. 20, 33–45. Persson, T., Rootzen, H., 1977. Simple and highly eﬃcient estimators for a type I censored normal sample. Biometrika 64, 123–128. Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. Wiley, New York. Travis, C.C., Land, M.L., 1990. Estimating the mean of data sets with nondetectable values. Environ. Sci. Technol. 24, 961–962. Woodruﬀ, R.S., 1952. Conﬁdence intervals for medians and other position measures. J. Am. Stat. Assoc. 47, 635–647.

Percentile estimation using variable censored data

Percentile estimation using variable censored data

Recommend Documents