e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
Available online at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/etap
A comparison of three methods for integrating historical information for Bayesian model averaged benchmark dose estimation Kan Shao ∗ ORISE Postdoctoral Fellow, National Center for Environmental Assessment, U.S. Environmental Protection Agency, USA
a r t i c l e
i n f o
a b s t r a c t
Article history:
The benchmark dose (BMD) approach has been accepted as a valuable tool for risk assess-
Received 6 January 2012
ment but still faces significant challenges associated with combining environmental hazard
Received in revised form 4 May 2012
information from multiple sources and selecting an appropriate BMD/BMDL estimate from
Accepted 6 May 2012
the results of a set of acceptable dose–response models. The main objective of this study
Available online 14 May 2012
is to compare and examine how historical information, especially incompatible data, can impact the Bayesian model averaged BMD estimate through different integration methods.
Keywords:
Based on the Bayesian model averaging (BMA) for the benchmark dose estimation, three
Bayesian model averaging
methods of integration are investigated: (1) pooled data analysis, which combines all dose
Benchmark dose
groups into one dataset; (2) the Bayesian hierarchical model, which takes both between-
Historical information integration
study and within-study uncertainty into account by building multiple levels of distributions
Pooled data analysis
to quantitatively describe parameters in dose–response models; and (3) the power prior
Bayesian hierarchical model
method, which allows researchers to weigh the prior information incorporated through a
Power prior
power parameter. Combined historical information can have different levels of impact on the current model weight and BMD estimates depending on the method of integration. The pooled data analysis, which has the largest impact on the current BMA BMD estimate, has limited applicability and might be statistically and biologically flawed. The Bayesian hierarchical model, with a reasonable structure to combine information, can slightly change the current estimates of the model weights and BMD. The power prior method has little influence on current estimates when data are highly incompatible even if the prior information is fully considered. © 2012 Elsevier B.V. All rights reserved.
1.
Introduction
The benchmark dose (BMD) approach with its statistical lower bound (BMDL) proposed by Crump (1984) has been accepted as a valuable tool for risk assessment by regulatory agencies (e.g. U.S. EPA). However, risk assessors still face significant challenges associated with (1) choosing or combining
environmental hazard information from a variety of sources for risk analysis and (2) selecting an appropriate BMD/BMDL estimate from the results of a set of acceptable dose–response models. Since toxicological experiments, especially long-term carcinogenicity studies are expensive and time consuming, animal bioassay data are usually considered carefully when employed in risk assessment. On the other hand, with
∗ Correspondence address: NCEA B243-01, U.S. EPA, 109T.W. Alexander Drive, Research Triangle Park, NC 27711, USA. Tel.: +1 919 541 0440; fax: +1 919 541 0245. E-mail address:
[email protected] 1382-6689/$ – see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.etap.2012.05.002
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
the augmented availability of toxicological experimental data arising from multiple sources, it is also beneficial for regulatory agencies to synthesize all existing information into an analysis, which is further complicated when significant study-to-study and outcome-to-outcome uncertainty and variability are evident. For some environmental chemicals which have been studied for decades, like 2,3,7,8tetrachlorodibenzo-p-dioxin (TCDD), the experimental data are relatively abundant. However, various indicators (i.e. BMD estimates and posterior probability weights of alternative dose–response models) based on the same or similar endpoints are highly study-dependent. Therefore, integrating historical information (especially that which is heterogeneous) with current data to increase the reliability of a BMD/BMDL estimate is an important and valuable research topic for risk assessment. The typical BMD analysis employed by the U.S. EPA is to choose the BMDL from the best fitted model (according to the AIC value) or the lowest BMDL value in a set of acceptable models (Davis et al., 2011). This practice explicitly ignores the possibility that, to some degree, other models might partially reflect the true dose–response relationship. An alternative approach, which has been proposed by multiple researchers, is to estimate a Bayesian model averaged BMD by taking model uncertainty into account (Wheeler and Bailer, 2007; Shao and Small, 2011, 2012). As one of the future directions under consideration for U.S. EPA’s BMD methodology (Davis et al., 2011), the Bayesian model averaging (BMA) method quantifies a posterior probability weight for each considered dose–response model and computes a weighted average of the BMD estimate from all the individual models. The main purpose of the present study is to compare three prevailing methods of integrating historical information in risk assessment for BMA BMD estimation. Particularly, this study focuses on examining how posterior model weight and BMD estimates calculated using current data can be influenced by combined historical information via the integration methods of (1) pooled data analysis, (2) the Bayesian hierarchical model, and (3) the power prior method. The methodology is illustrated through two dose–response datasets published and employed in a recent report by U.S. EPA (2010). The two datasets were originally from a study by Kociba et al. (1978) and a National Toxicology Program (NTP) study (NTP, 2006), both describing liver tumor incidence in female Sprague-Dawley rats associated with exposure to TCDD. Therefore, this study aims at evaluating how the risk assessment indicators in the BMA BMD estimation based on the current data (i.e. the NTP data) can be informed by integrating the historical information (i.e. the Kociba data).
2.
289
datasets and dose–response models employed in this study is provided in Section 2.1. Section 2.2 contains a brief description of the fundamental approach of BMD estimation and the BMA BMD method for dichotomous data. Then, in Section 2.3, the three methods of historical information integration (i.e. pooled data analysis, the Bayesian hierarchical model and the power prior method) are explained in detail.
2.1.
Datasets and dose–response models
Although a number of long-term studies for the carcinogenicity of TCDD have been conducted in the past (Kociba et al., 1978; NTP, 1982, 2006; Rao et al., 1988), few studies accepted by the U.S. EPA for dose–response modeling (U.S. EPA, 2010) examined the same endpoints using the same species, sex and exposure method. The two sets of endpoints chosen in this study and listed in Table 1 are both hepatocellular adenoma incidence in female Sprague-Dawley rats exposed to TCDD at various dose levels. The original exposure dose levels of TCDD in both datasets had been changed to blood concentration according to the Emond animal kinetic model in the EPA’s report (U.S. EPA, 2010) so that the heterogeneity caused by exposure pathway was adequately limited. Therefore, these two datasets serve as an excellent basis for the methodology demonstration. Three commonly used dose–response models for dichotomous data in the U.S. EPA’s BMD software (U.S. EPA, 2011) are adopted for model averaging, including Logistic model: R(d) =
exp(ˇ1 + ˇ2 × d) , 1 + exp(ˇ1 + ˇ2 × d)
(1)
Probit model: R(d) = ˚(ˇ1 + ˇ2 × d),
(2)
and quantal-linear model: R(d) = ˇ1 + (1 − ˇ1 ) × [1 − exp(−ˇ2 × d)],
(3)
where R(d) is the proportion of adverse response at dose level d, ˇ1 and ˇ2 are two parameters in the individual models, and ˚(·) is the cumulative standard normal distribution function. Additionally, ˇ2 in all three models is constrained to be nonnegative, and ˇ1 in the quantal-linear model is constrained to the range [0, 1] since in this model ˇ1 is equivalent to the background response rate at d = 0.
2.2.
BMA BMD estimation
2.2.1.
BMD calculation
Materials and methods
In contrast to a meta-analysis, which incorporates multiple data sources and may not be very reliable when applied to a small number of studies, the present study focuses on changes to BMA BMD estimates, especially the dispersion measures, after integrating the information from previous studies. The outline of this section is as follow: an introduction to the
The BMD is defined as a dose of an environmental toxicant that corresponds to a prescribed change in the response (also known as a benchmark response, or BMR), compared to the background response level. For dichotomous data, the BMR can be expressed as additional risk or extra risk. Extra risk, which is recommended by the U.S. EPA (U.S. EPA, 2000, 2011;
290
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
Table 1 – Hepatocellular adenoma occurrence in female Sprague-Dawley rats and blood concentration for dose–response modeling. Historical data: Kociba et al.’s study (1978)a Blood concentration (ng/kg) 0 (control) Tumor incidence 2/86
1.55 1/50
7.15 9/50
38.56 14/45
Current data: NTP’s study (2006) Blood concentration (ng/kg) Tumor incidence
2.56 0/48
5.69 0/46
9.79 0/50
a
0 (control) 0/49
16.57 1/49
29.70 13/53
Source: Kociba et al. (1978). Incidence for hepatocellular adenomas is from Goodman and Sauer (1992).
Davis et al., 2011) and adopted in this study, can be expressed as: BMR =
R(d = BMD) − R(d = 0) . 1 − R(d = 0)
(4)
Substituting Eq. (1), (2), or (3) for the corresponding part in Eq. (4), the BMD of that particular model can be explicitly expressed in terms of BMR and model parameters. For example, the BMD of the logistic model can be written as: BMDLogistic =
log[(BMR + exp(ˇ1 ))/(1 − BMR)] − ˇ1 . ˇ2
(5)
The BMD calculation equations for the Probit and quantallinear model can be derived similarly using the procedure described above. The BMR will be set to 10% for illustrative purpose in the present study. There are a variety of methods available for estimating parameters in dose–response models and further calculating the BMDs. For instance, the maximum likelihood estimate (MLE) of a BMD can be calculated based on the MLE of model parameters. However, in the present study, a full Bayesian analysis introduced in later sections is applied for estimating model parameters and BMD distributions and is used as the basis for the integration methods.
2.2.2.
Markov Chain Monte Carlo
To fit the dose–response models in Eqs. (1)–(3) and get parameter estimates, the Bayesian estimation procedure requires the likelihood function for the observed experimental outcomes and the prior distribution for the model parameters. For ni animals tested at dose level di and a model predicted response rate of ri , the likelihood function for the number of animals, yi , exhibiting the adverse response is given by the binomial distribution: yi ∼Bin(ni , ri ).
(6)
Substituting Eq. (1), (2), or (3) for the logistic, Probit or quantal-linear model respectively, the likelihood for one dose group at a certain dose level can be expressed in terms of the model parameters ˇ1 and ˇ2 . For example, the likelihood function for the logistic model is given by Eq. (7) as:
p(yi |ˇ1 , ˇ2 , ni , di ) ∝
× 1−
exp(ˇ1 + ˇ2 × di ) 1 + exp(ˇ1 + ˇ2 × di )
exp(ˇ1 + ˇ2 × di ) 1 + exp(ˇ1 + ˇ2 × di )
The joint posterior distribution of the parameters is proportional to the product of the prior distribution and likelihood of the parameters:
yi
p(yi |ˇ1 , ˇ2 , ni , di ),
(8)
i=1
where k is the total number of dose groups in one dataset. The prior distributions of ˇ1 and ˇ2 in Eq. (8) are assumed to be independent and non-informative (flat), therefore, the parametric specification of the form for the prior distribution is not required in the Markov Chain Monte Carlo (MCMC) algorithm. Eq. (8) serves as the basis for performing the MetropolisHastings within Gibbs sampling method, one flavor of the MCMC algorithms employed in the study. Starting with an initial value for each parameter, a new parameter value is randomly proposed one at a time from an appropriate normal distribution. Given current values of the other parameter (all the models considered in the present study have only two parameters), if the ratio of Eq. (8) for the proposed new sample point compared to the current value is higher than a number randomly generated from a uniform distribution U(0, 1), then the new sample is kept in the chain, otherwise it is discarded (by duplicating the current value in the chain). Repeating the steps above for each parameter would complete one updating cycle for the parameters of a dose–response model. This algorithm is described in more detail by Gelman et al. (2003). After tens of thousands of updating cycles, the sample of parameters thus generated are then substituted for corresponding parts in Eq. (5) (or a proper BMD calculation equation for the Probit or quantal-linear model) to compute the sample of BMD values, which are further used to determine measures of central tendency (mean and median), dispersion (50th and 90th percentile intervals) and the 5th percentile value, corresponding to the 95% BMDL estimate.
2.2.3.
Bayesian model averaging
The basic idea of Bayesian model averaging is that the distribution of some interested quantity of a model (i.e. BMD in this study) is derived over some space of possible models (Hoeting et al., 1999). If is the quantity of interest, then its posterior distribution given observed data D can be expressed as:
Pr(|D) =
K
Pr(|Mk , D)Pr(Mk |D),
(9)
k=1
ni −yi
.
k
p(ˇ1 , ˇ2 |y, n, d) ∝ p(ˇ1 , ˇ2 )
(7)
where Mk is the kth model in a selected model space. If M1 , M2 , . . . , Mk , are the models in the model space under
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
consideration, then the posterior distribution of the model Mk given the data Pr(Mk |D) can be expressed as: Pr(Mk |D) =
Pr(D|Mk )Pr(Mk )
K
t=1
,
(10)
Pr(D|Mt )Pr(Mt )
where Pr(Mk ) is the prior probability of the kth model in the model space reflecting risk assessor’s belief in the relative correctness of this model. In this study, a common choice of prior model weight is adopted, which is Pr(Mk ) = 1/K, k = 1, . . . , K (the total number of models in the model space under consideration), meaning that each model considered is equally likely before the data are observed. Since the MCMC algorithm and the following historical information integration methods are based on the Bayesian framework, the density estimation approach proposed by Wasserman (2000) is employed in this study to obtain posterior model weight and BMA BMD estimates. The density estimation approach is briefly explained as follows. For the kth model considered for averaging, a value of mk is calculated as: mk =
lk (k )Pr(k ) , Pr(k |D)
(11)
where lk ( k ) is the likelihood function, Pr( k ) and Pr( k |D) are the prior and posterior density of the parameters in kth model respectively. To be consistent with the assumption of non-informative prior in the MCMC algorithm, the prior distributions of model parameters in Eq. (11) are assumed to be independent from each other and follow a normal distribution of N(0, 1002 ). This assumption is adequate since all the models considered in the present study have the same number of parameters. Based on the posterior sample of model parameters obtained from the MCMC algorithm, the posterior density in Eq. (11) can also be simply calculated using any density estimation approach. Eq. (11) holds for all possible values of model parameters k . Therefore, an estimate ˜ k , can be obtained by picking any value ˜ k of k as of mk , m ˜ k = (Lk (˜ k )Pr(˜ k ))/(Pr(˜ k |D)). And finally, the posterior probabilm ity model weight expressed in Eq. (10) can be estimated using Eq. (12):
Pr(Mk |D) =
˜k m
K
t=1
˜t m
.
(12)
The posterior model weights for these three models considered in this study are directly applied to the BMD estimates obtained from each model, yielding a combined sample of weighted BMD values for subsequent measures of BMA BMD.
2.3.
Methods for integrating historical information
Based on the Bayesian framework for BMA BMD estimation explained in the previous section, the three methods for integrating historical information are introduced in the section, including (1) pooled data analysis; (2) the Bayesian hierarchical model, and (3) the power prior method.
2.3.1.
291
Pooled data analysis
The pooled data method is a relatively simple concept. Data are pooled by combining all dose groups into one dataset with a new aggregated control group. Therefore, the new dose–response data for modeling have tumor incidence 2/135, 1/50, 0/48, 0/46, 9/50, 0/50, 1/49, 13/53, 14/45 at dose levels 0, 1.55, 2.56, 5.69, 7.15, 9.79, 16.57, 29.70, 38.56 ng/kg correspondingly. Then the MCMC and BMA methods are directly applied to the combined data to get the estimates of interest.
2.3.2.
Bayesian hierarchical model
Bayesian hierarchical modeling is a common method to combine multiple studies for environmental risk assessment by allowing between-study variability and outcome-to-outcome uncertainty. This methodology has been employed in multiple studies for quantifying adverse health effects associated with chemical exposure (Coull et al., 2003; Wheeler and Bailer, 2009). Using the Bayesian framework, the joint posterior distribution of a set of model parameters can be expressed as: p(, , |y, n, d),
(13)
where  is a parameter vector in a dose–response model, and are a vector and a matrix respectively which determine the distribution of . y, n, and d are observed response, number of animals in each dose group, and dose levels respectively. According to the Bayes’ theorem, the posterior distribution can be further writen as: p(, , |y, n, d) ∝ p(y|, n, d)p(|, )p(, ),
(14)
where the first term on the right-hand side of the equation is a likelihood function, the second term is the joint distribution of vector  given the parameters and , and the third term is a joint prior distribution of and . According to Eq. (14), the MCMC algorithm can be properly carried out, but some assumptions made in this study need to be specified. For all three of these dose–response models, the joint distribution of the parameter vector  is assumed to be a multivariate normal distribution as  ∼ MVN(, ). Since all the dose–response models considered in this study have two parameters, it is assumed that, before taking any observed data into account, parameters ˇ1 and ˇ2 in the dose–response models are independent. Thus, the prior covariance matrix is
˙=
2 11 0
0 2 22
,
2 and 2 are the variance of parameter ˇ and ˇ where 11 1 2 22 respectively. More simply, both parameters in dose–response 2 ) and models follow a normal distribution, as ˇ1j ∼N(1 , 11 2 ˇ2j ∼N(2 , 22 ), where j represents the jth study. In the present study, the third term on the right side of Eq. (14), the joint hyper-prior distribution of parameters for vector , is assumed to be noninformative. Hence, the parameter dis2 and tributions are specified as 1 and 2 ∼ N(0, 106 ), 11 222 ∼InvGamma(0.001, 0.001) (Coull et al., 2003; Wheeler and Bailer, 2009).
292
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
Table 2 – Parameter estimates for historical and current data. Model
Parameter
Historical data (Kociba’s data) Mean
Standard deviation
Current data (NTP data)
Correlation coefficient
Mean
Standard deviation
Correlation coefficient
Logistic
ˇ1 ˇ2
−2.95 5.80E−2
0.34 1.25E−2
−0.75
−9.49 0.28
2.22 7.64E−2
−0.99
Probit
ˇ1 ˇ2
−1.66 3.22E−2
0.16 6.85E−3
−0.69
−4.50 0.13
0.97 3.45E−2
−0.98
Quantallinear
ˇ1 ˇ2
2.41E−2 1.19E−2
1.48E−2 2.86E−3
−0.33
1.19E−4 4.98E−3
4.56E−4 1.29E−3
0.021
2.3.3.
Power prior
The posterior distribution is proportional to the product of likelihood and prior distribution, which are two important components in a Bayesian framework. One difficulty but also an advantage in the Bayesian method is to choose an appropriate prior, which provides researchers a flexible and natural way to combine previous information and knowledge. A noninformative prior such as a uniform prior, Jeffreys’ prior, or reference prior is usually adopted when no previous information is available (Kass and Wasserman, 1996). However, with the increased availability of the information from previous similar studies, it is important for researchers to take the historical data into account in an applied research setting. The third method employed to integrate historical information is called the power prior method (Chen et al., 2000). The historical data (i.e. the Kociba data) and current data (i.e. the NTP data) are written as Data0 = (n0 , y0 , d0 ) and Data = (n, y, d) respectively. Therefore, the power prior distribution of  (the vector of parameters in a dose–response model) given by the historical data can be defined as (Chen et al., 2000): p(|Data0 , w0 ) ∝ p(Data0 |)
w0
p0 (),
(15)
where w0 is a scalar prior parameter which weights the historical data relative to the likelihood of the current study. The p0 () in Eq. (15) is a hyper-prior of the parameters in the dose–response models, which refers to p0 () ∝ 1 as a noninformative hyper-prior in this study. Eq. (15) can be further expressed as:
p(|Data0 , w0 ) ∝ p(Data0 |)
w0
p0 () =
k 0
w0
p(yi0 |, ni0 , di0 )
p0 (),
i=1
(16)
where k0 is the total number of dose groups in the historical dataset and the first term on the right side of the equality sign in Eq. (16) is a weighted likelihood function of the historical data. Thus, the posterior distribution of parameters for the dose–response models, based on the observed current data and previous information, can be experssed as: p(|Data, Data0 , w0 ) ∝ p(Data|)p(|Data0 , w0 ) =
k
k 0
j=1
i=1
p(yj |, nj , dj )
w0
p(y0i |, n0i , d0i )
p0 (),
(17)
where k is the total number of dose groups in the current dataset. The parameter w0 can be interpreted as an uncertainty parameter for the historical data, and it is reasonable to restrict this parameter to be between 0 and 1. When w0 = 0, which means that risk assessors are not sure about the historical data at all, the prior distribution of  completely depends on p0 () (specified as a non-informative prior in the present study). When w0 = 1, which means the quality of historical data is very reliable, the prior distribution  totally depends on the likelihood function of the historical data. Therefore, the choice of w0 , to some extent, reflects the subjective judgment about the reliability of the historical information.
3.
Results
The Bayesian hierarchical model was implemented in OpenBugs (version 3.2.1), and all the other computations were implemented in R (version 2.13.1). For all the MCMC computations carried out in the study (including both in R and OpenBugs), chain lengths of 55,000 were simulated, discarding the first 5000 samples as burn-in. The parameter estimates for the three models and the two original datasets are summarized in Table 2, including the mean, standard deviation and correlation coefficient of the two parameters. As shown in the table, the mean values of the same parameter (the corresponding parameter in the same model) estimated from these two datasets are distinct with differences ranging from two folds to almost two orders of magnitude. The significant heterogeneity between these two datasets is further demonstrated by Figs. 1 and 2, which show the posterior probability model weight, the BMD estimates from individual models, as well as the BMA BMD estimates. The horizontal lines across each of the boxes in these figures are the mean BMD estimates, and the bottom line of each box is the 5th percentile estimate of the BMD, corresponding to a 95% BMDL value. Fig. 1 indicates that the quantal-linear model with a posterior weight of 87.1% fits the Kociba data much better than the logistic and Probit model. In contrast, the NTP data are fit best by the logistic model with a posterior weight of 82.5%. Additionally, the BMD estimates from the two datasets are also different. The NTP data yield higher mean BMD estimates for all three dose–response models than the Kociba data, but for the measures of the dispersion of the BMD estimates, the three models also have different performance. The logistic and Probit model have smaller 90th percentile interval
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
Fig. 1 – Posterior model weight estimates, individual model and BMA predictions of BMD based on the single Kociba dataset: solid horizontal line indicates the individual or averaged mean BMD estimate; light box indicates 90th percentile interval from 5th to 95th percentile estimate; dark box indicates 50th percentile interval from 25th to 75th percentile estimate; the bottom line of the light box corresponds to the 5th percentile BMDL.
Fig. 2 – Posterior model weight estimates, individual model and BMA predictions of BMD based on the single NTP dataset: solid horizontal line indicates the individual or averaged mean BMD estimate; light box indicates 90th percentile interval from 5th to 95th percentile estimate; dark box indicates 50th percentile interval from 25th to 75th percentile estimate; the bottom line of the light box corresponds to the 5th percentile BMDL.
293
Fig. 3 – Posterior model weight estimates, individual model and BMA predictions of BMD based on the pooled data: solid horizontal line indicates the individual or averaged mean BMD estimate; light box indicates 90th percentile interval from 5th to 95th percentile estimate; dark box indicates 50th percentile interval from 25th to 75th percentile estimate; the bottom line of the light box corresponds to the 5th percentile BMDL.
Fig. 4 – Posterior model weight estimates, individual model and BMA predictions of BMD based on the NTP data in the Bayesian hierarchical model: solid horizontal line indicates the individual or averaged mean BMD estimate; light box indicates 90th percentile interval from 5th to 95th percentile estimate; dark box indicates 50th percentile interval from 25th to 75th percentile estimate; the bottom line of the light box corresponds to the 5th percentile BMDL.
294
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
estimates from all three individual models as well as the model averaged one slightly increase. Third, Fig. 2 is almost identical to Fig. 5, which shows the individual and BMA BMD measures when power prior w0 is equal to 1. This outcome demonstrates that the estimates of NTP data can hardly be influenced by historical data through the power prior method even if the Kociba data has been fully considered.
4.
Fig. 5 – Posterior model weight estimates, individual model and BMA predictions of BMD based on the NTP data when power prior w0 = 1: solid horizontal line indicates the individual or averaged mean BMD estimate; light box indicates 90th percentile interval from 5th to 95th percentile estimate; dark box indicates 50th percentile interval from 25th to 75th percentile estimate; the bottom line of the light box corresponds to the 5th percentile BMDL.
estimates, I90 , of BMD but the quantal-linear model has larger I90 than their counterparts in the Kociba data. The results shown above demonstrate that the historical data are very distinct from the current data, and the outcomes displayed below focus on illustrating the impact of the combined historical data on the BMA BMD estimates via three integration methods. Figs. 3–5 show the same measures as Figs. 1 and 2 but based on the pooled data analysis, the Bayesian hierarchical model, and the power prior method respectively. Comparing Figs. 3–5 with Fig. 2, the changes are summarized as follows. First, the difference in results between the Kociba data and the NTP data is, to some extent, mitigated by the pooled data analysis as shown in Fig. 3. In other words, the individual BMD estimates, the BMA BMD estimates, and the posterior model weights from the NTP data are changed slightly towards the results solely from the Kociba data. Compared to the results shown in Fig. 2, the weight of the logistic model reduces, the weight of the quantal-linear model increases, and a variety of BMD measures (e.g. the mean BMD estimates, I90 , and BMDL) are closer to the corresponding results from the Kociba data. Second, the difference between the results using the NTP data (Fig. 2) and the results using the Bayesian hierarchical model (Fig. 4) is very limited. The Bayesian hierarchical model builds another layer of distribution to describe the model parameters and to connect the historical and current data. While the probability weight of the quantal-linear model almost stays unchanged relative to Fig. 2, the weight of the logistic model slightly increases, resulting in a small decrease in the weight of the Probit model. More importantly, the I90
Conclusions and discussion
Based on the BMA BMD estimation framework, three methods of integrating historical information were compared and examined in previous sections, including pooled data analysis, the Bayesian hierarchical model and the power prior method. In contrast to classical statistics, the Bayesian analysis implemented in this study provides a set of tools for risk assessors to more flexibly combine information from other sources to estimate the model averaged BMD. Discussion in this section is mainly focused on factors that influence BMD estimates, especially the 90th percentile interval estimate, I90 . Among the three methods considered, the impact of historical information on the current data is maximized by the pooled data analysis, including both the model weight and BMD estimates. Comparing to the results estimated solely from the NTP data, the results shown in Fig. 3 demonstrate that the highly incompatible dose–response data might not be beneficial for reducing the uncertainty in BMD estimates through pooled data analysis even if the number of animals increases via the combination. More importantly, one strong and serious assumption on which this methodology is based is that both datasets come from the same population. In other words, the pooling method inherently ignores and eliminates the heterogeneity between different studies, which is not a plausible treatment statistically or biologically. Although some heterogeneity has been mitigated by adopting the data from the same species/sex/endpoint and using the blood concentration dose level instead of the dosage in the experiments, the availability of such datasets are extremely limited in the practice of risk assessment. Therefore, EPA suggested that combining multiple studies to calculate BMD estimates using BMD software is reasonable only if the multiple datasets represent a homogeneous picture of the dose–response (U.S. EPA, 2000). In contrast to the pooled data method, the Bayesian hierarchical model takes both between-study uncertainty and outcome-to-outcome randomness into account, building another level of distribution upon the model parameters to accommodate the variance and between-study inconsistency among the parameters. A number of studies have been aimed at using the hierarchical model to combine multiple data sources and further to make more reliable estimates (Coull et al., 2003; Wheeler and Bailer, 2009). This previous work relied on all available information rather than pursuing accuracy by ignoring or discarding some inconsistency among the data. On the other hand, combining historical information using a hierarchical model may also be helpful for reducing the uncertainty in parameter estimates in regression problems (Tiwari and Zalkikar, 1999; Dunson and Dinse, 2001;
295
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
Chen and Ibrahim, 2006; Chen, 2010). The results in this study demonstrate that the uncertainty in BMD estimates may not be reduced by combining incompatible information through the hierarchical structure. Although increase in the variance of BMD estimates (resulting in a lower BMDL) is not a desired feature in risk assessment, the BMD derived from the Bayesian hierarchical model is a more reliable estimate since the hierarchical model is more statistically and biologically reasonable than the pooled data method. The power prior method provides a tool to combine historical information with the current dataset through a Bayesian framework. With the help from a power parameter w0 , researchers are able to control how much information to be taken from previous studies based on how much they wish to weigh the historical data. Hence, to some extent, the uncertainty between studies is appropriately taken into account. However, the impact of incompatible historical data can be easily limited by the power prior method. As shown in Table A.I in Appendix A: estimates remain unchanged regardless of the w0 value. A possible reason is that the incompatibility between the Kociba and NTP data may disturb the process of integrating useful information through the power prior method (Ibrahim and Chen, 2000). Historical information combined with current data can have different levels of impact on the current estimates depending on the method of integration. However, these commonly used methods discussed in the present study rely on different fundamental assumptions and hence possess a variety of applicability. The Bayesian hierarchical model might not be successful for reducing the uncertainty in BMA BMD estimates when the data are highly inconsistent, but it is the more
theoretically sound approach, both statistically and biologically.
Conflict of interest statement The author declares that there are no conflicts of interest.
Acknowledgments The major part of this study was finished when the author was a Ph.D. candidate at Carnegie Mellon University. The author would like to thank Dr. Mitchell J. Small for his comments and suggestions on an early version of this article. This research was partially supported by the Heinz Endowments, and also supported in part by an appointment to the Research Participation Program for the U.S. Environmental Protection Agency, Office of Research and Development, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA.
Appendix A. When w0 is equal to 0, 0.1, 0.5 and 1, the parameter estimates (including mean, standard deviation and correlation coefficient), the BMD estimates (including the I90 and mean BMD estimates), and the model weight estimates calculated using the power prior method are listed in Table A.I.
Table A.I – Parameter, BMD and model weight estimates from power prior method at various w0 values. w0
0
Model
Parameter
Mean
Logistic
ˇ1 ˇ2 ˇ1 ˇ2 ˇ1 ˇ2
−9.734 0.290 −4.497 0.129 5.06E−5 4.97E−3
2.550 0.0874 0.960 0.0341 4.86E−4 1.28E−3
ˇ1 ˇ2 ˇ1 ˇ2 ˇ1 ˇ2
−9.805 0.292 −4.325 0.123 3.91E−4 4.98E−3
2.832 0.0971 0.878 0.0313 1.66E−3 1.30E−3
ˇ1 ˇ2 ˇ1 ˇ2 ˇ1 ˇ2
−9.812 0.293 −4.343 0.123 2.36E−5 4.98E−3
2.571 0.0878 0.849 0.0303 2.17E−4 1.27E−3
ˇ1 ˇ2 ˇ1 ˇ2 ˇ1 ˇ2
−8.918 0.262 −4.514 0.129 3.54E−6 4.98E−3
2.270 0.0785 0.939 0.0332 4.09E−5 1.29E−3
Probit Quantallinear Logistic
0.1
Probit Quantallinear Logistic
0.5
Probit Quantallinear Logistic
1
Probit Quantallinear
Standard deviation
Correlation coefficient
I90 /BMD (ng/kg)
Model weight
−0.992
4.76/25.78
0.7790
−0.984
5.16/24.84
0.2207
−0.0146
19.64/22.71
0.0003
−0.994
4.98/25.79
0.8370
−0.980
5.19/24.69
0.1629
−0.0336
19.91/22.68
0.0001
−0.992
4.62/25.79
0.8536
−0.979
5.22/24.73
0.1461
−0.0109
19.66/22.66
0.0003
−0.991
4.94/25.48
0.8051
−0.983
5.05/24.84
0.1940
−0.0169
19.82/22.69
0.0009
296
e n v i r o n m e n t a l t o x i c o l o g y a n d p h a r m a c o l o g y 3 4 ( 2 0 1 2 ) 288–296
references Chen, D.G., 2010. Incorporating historical control information into quantal bioassay with Bayesian approach. Comput. Stat. Data Anal. 54, 1646–1656. Chen, M.-H., Ibrahim, J.G., 2006. The relationship between the power prior and hierarchical models. Bayesian Anal. 1 (3), 551–574. Chen, M.-H., Ibrahim, J.G., Shao, Q.-M., 2000. Power prior distribution for generalized linear models. J. Stat. Plan. Infer. 84, 121–137. Coull, B.A., Mezzetti, M., Ryan, L.M., 2003. A Bayesian hierarchical model for risk assessment of methylmercury. J. Agric. Biol. Environ. Stat. 8 (3), 253–270. Crump, K.S., 1984. A new method for determining allowable daily intakes. Fund. Appl. Toxicol.: Off. J. Soc. Toxicol. 4 (5), 854–871. Davis, J.A., Gift, J.S., Zhao, Q.J., 2011. Introduction to benchmark dose methods and U.S. EPA’s benchmark dose software (BMDS) version 2.1.1. Toxicol. Appl. Pharmacol. 254 (2), 181–191. Dunson, D.B., Dinse, G.E., 2001. Bayesian incidence analysis of animal tumorigenicity data. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 50, 125–141. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2003. Bayesian Data Analysis, 2nd ed. Chapman and Hall/CRC Press, Boca Raton, FL. Goodman, D.G., Sauer, R.M., 1992. Hepatotoxicity and carcinogenicity in female Sprague-Dawley rats treated with 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD): a pathology working group reevaluation. Regul. Toxicol. Pharmcolo. 15 (3), 245–252. Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T., 1999. Bayesian model averaging: a tutorial. Stat. Sci. 14 (4), 382–417. Ibrahim, J.G., Chen, M.-H., 2000. Power prior distributions for regression models. Stat. Sci. 15 (1), 46–60. Kass, R.E., Wasserman, L., 1996. The selection of prior distribution by formal rules. J. Am. Stat. Assoc. 91, 1343–1370. Kociba, R.J., Keyes, D.G., Beyer, J.E., Carreon, R.M., Wade, C.E., Dittenber, D.A., et al., 1978. Results of a two-year chronic toxicity and oncogenicity study of
2,3,7,8-tetrachlorodibenzo-p-dioxin in rats. Toxicol. Appl. Pharmacol. 46 (2), 279–303. NTP, 1982. Carcinogenesis bioassay of 2,3,7,8-tetrachlorodibenzo-p-dioxin in Osborne-Mendel rats and B6C3F1 mice (gavage study). NTP TR 209. Research Triangle Park, NC. NTP, 2006. NTP technical report on the toxicology and carcinogenesis studies of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) (CAS no. 1746-01-6) in female harlan Sprague-Dawley rats (gavage studies). NTP TR 521. Research Triangle Park, NC. NIH Publication No. 06-4468. Rao, M.S., Subbarao, V., Prasad, J.D., Scarpelli, D.G., 1988. Carcinogenicity of 2,3,7,8-tetrachlorodibenzo-p-dioxin in the Syrian golden hamster. Carcinogenesis 6 (6), 1677–1679. Shao, K., Small, M.J., 2011. Potential uncertainty reduction in model-averaged benchmark dose estimates informed by an additional dose study. Risk Anal. 31 (10), 1561–1575. Shao, K., Small, M.J. Statistical evaluation of toxicological experimental design for Bayesian model averaged benchmark dose estimation with dichotomous data. Hum. Ecol. Risk Assess., 2012, in press. Tiwari, R.C., Zalkikar, J.N., 1999. Tests for a trend in proportion based on mixtures of beta distributions incorporating historical controls. Environmetrics 10, 1–22. U.S. EPA, 2000. Benchmark dose technical guidance document [external review draft]. EPA/630/R-00/001. U.S. Environmental Protection Agency, Risk Assessment Forum, Washington, DC. U.S. EPA, 2010. EPA’s reanalysis of key issues related to dioxin toxicity and response to NAS comments. EPA/600/R-10/038A, Washington, DC. U.S. EPA, 2011. Benchmark Dose Software (BMDS). National Center for Environmental Assessment, Research Triangle Park, NC. Wasserman, L., 2000. Bayesian model selection and model averaging. J. Math. Psychol. 44 (1), 92–107. Wheeler, M.W., Bailer, A.J., 2007. Properties of model-averaged BMDLs: a study of model averaging in dichotomous response risk estimation. Risk Anal. 27 (3), 659–670. Wheeler, M.W., Bailer, A.J., 2009. Benchmark dose estimation incorporating multiple data sources. Risk Anal.: Off. Publ. Soc. Risk Anal. 29 (2), 249–256.