13 Statistical issues in inhalation toxicology

13 Statistical issues in inhalation toxicology

P. K. Sen and C. R. Rao, eds., Handbook of Statistics, Vol. 18 © 2000 Elsevier Science B.V. All rights reserved. | 11. . J Statistical Issues in In...

1MB Sizes 1 Downloads 111 Views

P. K. Sen and C. R. Rao, eds., Handbook of Statistics, Vol. 18 © 2000 Elsevier Science B.V. All rights reserved.

|

11. . J

Statistical Issues in Inhalation Toxicology

E. Weller, L. Ryan and D. Dockery

1. Introduction

Public concern in the United States regarding the health effects of air pollution can be traced back to the Donora smog episode of 1948 (Schrenk et al., 1949). Following this incident a series of Acts were passed by Congress to regulate air pollution (Spengler and Samet, 1991). However, it was the 1970 Clean Air Act which established the public health basis for the nation's current efforts to control air pollution. Section 108 of the Clean Air Act required the Environmental Protection Agency to identify all air pollutants that "may reasonably be anticipated to endanger public health". EPA was further required to prepare air quality criteria documents that reflect "the latest scientific knowledge useful in indicating the kind and extend of all identifiable effects on public health and welfare which may be expected from the presence of such pollutants in the ambient air". The six criteria air pollutants identified by this process are ozone, sulfur dioxide, suspended particulate matter, nitrogen dioxide, carbon monoxide, and lead. Section 109 of the Clean Air Act requires the promulgation of national Ambient Air Quality Standards "which in the judgement of the EPA Administrator, based on such criteria and allowing an adequate margin of safety, are requisite to protect the public health". The legislative history of the Clean Air Act indicates that Congress intended the primary national ambient air quality standards be set low enough to protect the health of all sensitive groups within the population, with the exception of those requiring life-support systems (patients in intensive care units and newborn infants in nurseries). Asthma and emphysema were specifically identified in the Clean Air Act as diseases associated with increased susceptibility. The legislative history further indicates that Congress intended that protection of public health and welfare was to be the sole determinant of an acceptable level of air pollution. In the 1990 amendments to the Clean Air Act, Congress gave the EPA authority to impose technology based standards to control specific hazardous substances, labeled as "air toxics". Although this might be interpreted as indicating a shift to technology based (rather than health-based) air quality standards, the list of 189 toxic substances was ultimately based on known or 423

424

E. Weller, L. Ryan and D. Doclcery

anticipated health risks. These air toxics encompass a diverse array of pollutants with a wide array of ambient exposure levels, a wide range of toxicities, and a wide range of exposure-response relationships. The limited health data base for these air toxics derives largely from acute lethality studies. The available health effects information address occupational exposure levels which are generally much higher than ambient environmental exposure levels. As a consequence, there is now an increased need for sophisticated quantitative methods to extrapolate results from these settings to the exposure settings more commonly encountered by the typical individual. The purpose of this chapter is to describe the principles of quantitative risk assessment for inhaled toxicants. Quantitative risk assessment has been well studied in a variety of applied settings, including carcinogenicity, developmental toxicity, neurotoxicity, and many others (see Morgan, 1992). While many of the same principles apply, several unique features of air toxics complicates the task in that context. Unlike exposures that occur through water, food, prescription medicines or everyday contact with household objects (e.g. lead paint, PCBs), air toxics often exhibit highly variable exposure patterns. For example, it is well known that ozone levels are higher during peak hour traffic and during the summer months. This is because ozone is created when compounds such as NO2 and other automobile emissions react with light (Lippman, 1989). Particulate concentrations are often tied to weather patterns. Some locations, e.g. the Utah Valley are infamous for their high levels of particulates and other air pollution during periods of temperature inversions when air gets trapped on the valley floor. In occupational settings, workers are often exposed to potentially dangerous substances during spills or during certain phases of production. As a result, exposures tend to occur as occasional high peaks, rather than at a constant low level. Ethylene oxide (C2H40) is a good example. Brief concentrated bursts of exposure to this known human carcinogen often occurs for workers at sterilization facilities (such as hospitals) when doors to sterilization chambers are opened. It has also been reported that EtO concentrations near sterilization equipment can be quite variable, ranging from levels in the hundreds to thousands of parts per million (ppm). EtO will be discussed in much more detail presently. The general principles underlying risk assessment for air toxics share many features with the principles used for other routes of exposure. Ideally, for example, regulatory decisions should be based on reliable epidemiological information. In its absence, however, regulators must rely on data from controlled studies in laboratory animals, as well as on biological considerations based on likely mechanisms of action. Some unique features of air toxics mean that the statistical design and analysis of studies to assess their health effects differ from the approaches used in standard dose-response settings. The main distinction is the need to account for "dose rate effects" wherein short high exposures may elicit a different response than long, chronic exposures, even though total cumulative exposure may be the same under the two settings. Scientists sometimes refer to "dose rate effects" as "C x T", where C is the concentration and T refers to the time of duration of exposure.

Statistical issues in inhalation toxicology

425

When exposure data is limited, Haber's law (Haber, 1924) has been applied to obtain estimates of short-term exposure limits (Kimmel, 1995). Under Haber's Law, the toxic response to an exposure is assumed to depend only on the cumulative exposure; i.e., the product of concentration times duration of exposure (C x T). In practice, however, Haber's law will often be violated and it is critical that quantitative risk assessment take account of the duration, as well as the concentration of exposure. In the next two sections of this chapter, we discuss design and analysis principles for the study of air toxics, including dose rate effects. After this, we will present a case study in Ethylene Oxide (EtO), while the final section will present some concluding remarks and outline several areas where further statistical research would be helpful.

2. Designing an inhalation toxicology study Regardless of the nature of the compound being studied, a good quality toxicological study should adhere to several established principles of design (Gart et al., 1986). First, the experiment should be conducted in a familiar animal strain and under stable experimental conditions (light, noise, heat and humidity, diet) to avoid extra sources of variability that might confound study results. A second more controversial principle is that the highest experimental dose should correspond to the m a x i m u m tolerated dose (MTD). Loosely speaking, this is the highest dose that can be administered without causing the experimental animals excessive systemic toxicity that could alter the study results. Although precise definition depends on the specific testing situation, the M T D often corresponds to the highest dose that does not cause any mortality or weight loss among test animals. The primary reason for using the M T D is to maximize the statistical power to detect effects. If low doses were used instead, then experiments would need to be much larger to achieve adequate statistical power. A third general principle is that the experimental route and duration of exposure should be chosen to mimic as closely as possible the most likely patterns of human exposure. Thus, animal studies of air toxics should generally involve exposure via inhalation. Unlike more common experimental exposures; via food, water or gavage, exposing animals via inhalation poses some logistical challenges. Usually, animals are placed in small cages that rest inside a tray that is then inserted into a sealed exposure chamber. A typical chamber has a volume of about 100 litres and is constructed from stainless steel and plexiglass. Air flow rates should be carefully controlled, and release of the test chemical into the chamber should be continuously monitored to keep exposure levels stable at the required level. For particularly reactive chemicals, such as ozone and ethylene oxide, rapid air flow rates may be necessary in order to maintain desired concentrations. Generally, there will be a separate exposure chamber for each different dose level. Control animals should also be placed in a chamber and exposed to the same experimental con-

426

E. Weller, L. Ryan and D. Dockery

ditions (such as air flow rates, caging, temperature and diet) as dosed animals. This is to ensure that no bias is introduced by effects caused by the experimental conditions themselves. For instance, animals may be disturbed by high air flow rates. In addition, animals usually do not eat or drink while inhalation is occurring, which may affect outcomes such as body weight. Experimental protocols, of course, differ according to the endpoint of interest. In a carcinogen bioassay, for example, animals are usually exposed for 2 years, and are examined at death or sacrifice for the presence of a variety of different tumors. In a typical developmental toxicity study, pregnant animals are exposed during the critical period of major organogenesis (days 6-15, 17 or 19 for mice, rats and rabbits, respectively) and sacrificed just prior to normal delivery so that the uterine contents can be examined. Regardless of study type, the main experiment is generally preceded by shorter-term pilot studies to establish the doses to be used. It is at the dose setting stage that design strategies for inhalation toxicology may depart markedly from more standard designs. In a typical inhalation toxicology experiment, it is common to expose the animals for 6 h per day, five days per week. This exposure pattern is practical, yet reasonably closely approximates a constant chronic exposure. However, this design may be inappropriate for assessing the effects of more sporadic exposure patterns, particularly those involving bursts of short high exposures. In many cases, it will make more sense to vary patterns of exposure to mimic not only the long term average exposure levels, but also short term levels as well. Because the study design issues are closely related to the planned analysis methods, we will return to the issue of how to design studies to allow for testing of dose rate effects after discussing modeling strategies.

2.1. Statistical analysis Just as for any toxicological experiment, the primary objective for the statistical analysis of data from an inhalation study is estimating and testing for dose response. Specifics depend, of course, on the type of outcome being analyzed. For example, logistic models are typically used for binary outcomes such as presence or absence of tumor, malformation or other adverse events. Linear models are appropriate for continuous endpoints such as body weight and length, organ weights, and many neurological outcomes. Sometime, it may be necessary to account for correlations between animals. The statistical analysis of developmental toxicity data, for example, must allow for the litter effect, or the tendency for littermates to respond more similarly than offspring from different litters. Another example occurs in neurotoxicology where repeated measures may be taken over time. All of these analyses can be succinctly described under the broad framework of either generalized linear models (GLMs - see McCullagh and Nelder, 1989) (for univariate outcomes) or generalized estimating equations (GEEs) for clustered or repeated measures outcomes (see Liang and Zeger, 1986). Suppose Y,-denotes the outcome for individual i, i = 1 , . . . , I. In a carcinogenicity experiment, for example, Y/ would be an indicator of whether or not the ith

Statistical issues in inhalation toxicology

427

animal had a tumor. In a developmental toxicology experiment, Y/could be an ni x 1 vector of outcomes for the ith litter. In a neurotoxicology experiment, Yi could be a t x 1 vector of outcomes measured at t different occasions for the ith animal. Let X/ be a corresponding set of covariates (including dose) associated with Y/. In the most familiar examples of GEEs, the mean of Yi(#i = (#il,-.-, #in,) r) is related to a linear function of the covariates X~ through a link function g(

i) = x fl ,

where fl is a p × 1 vector of unknown regression coefficients (see McCullagh and Nelder, 1989, p. 27). Usually, the variance of 11//is chosen to be a suitable function of #i, and covariance matrix of Y~is then written as

Vi = A]/2 Ri A¢/2

(1)

where Ai = diag(var(Yij)) and R is a correlation matrix. The estimate of fl is obtained by solving:

-' TAI

_ .i) = 0 .

(2)

i=1

This general framework accommodates almost all the familiar types of analysis that arise in toxicology. For example, suppose }7//isa scalar (hi = 1). Then, (2) will correspond to a quasi-likelihood score function (see Wedderburn, 1974). If 11//is a scalar binary variable, for example, and if we put g(#) = log

# 1-#

,

then it is easy to show that (2) corresponds to the score equations for logistic regression. The focus of this paper is not the details of model fitting, but rather the issue of how dose rate effects can be accommodated into the mean model. That is, the question of interest is how to model the transformed mean (g(#) = r/) as a function of exposure, and how to characterize exposure in a way that appropriately takes account of variations in duration and concentration of exposure. In many ways, the task of modeling dose rates effects in inhalation toxicology is analogous to the problem of studying response as a function of different mixtures of chemicals. This topic has received considerable attention over the past several decades. As discussed, for example, by Box, Hunter and Hunter (1978, Chapter 15), responses patterns after exposure to mixtures of chemicals may exhibit markedly different patterns, from the dose response patterns in the presence of a single chemical. The theory of response surface modeling is often used in this context (see Gennings et al., 1989, 1994; Schwartz et al., 1995 and others). Response surface modeling has been developed primarily in the engineering setting as a tool for allowing researchers to determine the settings of input

428

E. Weller, L. Ryan and D. Dockery

variables (dependent variables) so as to optimize an outcome. The basic idea is to set up a regression model that includes the various input variables (and possibly their interactions) as predictors. While developed primarily in the context of continuous outcomes, it is natural to apply analogous ideas to dichotomous outcomes via the use of logistic regression. A good discussion of response surface modeling can be found in Draper (1988). The theory of response surface modeling is also useful for studying dose rate effects, although there are some important differences from the chemical mixture setting. In the latter setting, the simplest model would be one that predicts outcome as a linear combination of the concentrations of each of the chemicals being tested. Of interest then is whether the data suggest synergy, or departures from additivity (see Gennings and Carter, 1995). In inhalation toxicology, the simplest model is the one that assumes Haber's Law where the only important dose metric corresponds to cumulative exposure, i.e., concentration (C) times duration (T). A simple dose-response model satisfying Haber's Law can be written as /7 = / 3 0 + Pl * c × T ,

(3)

where/7 is the linear predictor used to model the outcome of interest, and C × T refers to the product of concentration and duration of exposure (total cumulative exposure). To explore whether Haber's Law holds, one needs to consider a range of different models which allow for different kinds of dose-rate effects. Scharfstein and Williams (1995) describe two generalizations of model (3) to accommodate more complex dose-rate patterns of effects, and in particular, to allow for effects of duration of exposure in addition to cumulative exposure. One such model has the form /7 = ~ @ f l l * C x T-~- fl2 * T .

(4)

In many practical situations, it is reasonable to assume that control animals (C = 0) display a similar level of effects regardless of exposure duration. The model /7 = ~ + p l C

x

T+p2,a,T

,

(5)

allows for this to occur by including an indicator function of whether or not an animal belongs to the control group, c~ = 1 if C > 0 and 0 otherwise. To assess whether Haber's model is appropriate, one can fit models (4) and (5), and then conduct a test of significance for the coefficient associated with duration (/32). Of course, there are many other issues that will need to be considered in any practical setting. For instance, it will be important to apply model diagnostics and assess goodness of fit. Excellent discussion on this topic, at least for logistic regression, can be found in Hosmer and Lemeshow (1989). Less work has been done on assessing goodness of fit for G E E and quasi-likelihood models (Lambert and Roeder, 1995).

Statistical issues in inhalation toxicology

429

3. Optimal design Using the modeling framework presented in the previous section, Scharstein and Williams (1995) discuss optimal experimental design in settings where interest centers around the dose rate effect, or the effect of varying concentration and duration of exposure, for a fixed value of total cumulative exposure, C x T. This work was motivated by the EtO experiment to be discussed presently, where the focus was developmental toxicity. They considered the very specific question of how to choose the middle value of C × T in a situation where animals were to be assigned to either a control or one of two different C x T multiples, and where duration of exposure could be chosen as one of three different levels. They used a simulation approach, as well as asymptotic considerations, treating as fixed the numbers of dams, and the m a x i m u m level of C x T. The middle C x T multiple was allowed to vary relative to the m a x i m u m multiple and the number of animals that are allocated to each multiple relative to the control. More precisely, designs were considered with the middle multiple equal to k(100)% of the m a x i m u m (k = 0.1,0.3,0.5, 0.7, 0.9) and with the number of animals in the exposure groups equal to c(100)% of the number of animals in the control groups (c = 0.5, 1.0, 2.0) for a total of 15 different designs. For each design, 1000 data sets were generated with 45 dams per data set. The number of implants per dam was generated from a binomial distribution and the probability of being affected followed a beta distribution. Response probabilities came from one of 12 different probit models with various parameter values corresponding to models that support and contradict Haber's law. Scharfstein and Williams defined a " g o o d " design as one with the ability to accurately estimate the response surface as well as to detect the presence or absence of dose response relationships, such as Haber's Law. They use as criteria for an optimal design the mean absolute error (MAE), the mean true excess risk deviation ( M T E R D ) , the power to detect a trend, the power to detect a deviation from Haber's law and the false positive rate for testing Haber's model (based on a Wald two-sided test of significance of the extra dose effect). In order to test for trend, they used a Jonckheere-Terpstra test (Jonckheere, 1954; Terpstra, 1952) which essentially generalizes the Wilcoxon test to multiple ordered groups. Like the Wilcoxon test, the Jonckheere-Terpstra test tends to be more powerful than normal based tests in settings where there is a mean shift, but where the underlying distributions are non-normal (see Lehmann, 1975). The M A E and the M T E R D were used to assess the accuracy of the effective concentration (C)-duration (T) contour, which in these models is defined as the set of all points (C, T) that allow the excess risk function to equal a specified value c~. The M A E was defined as the average from the 1000 simulated data sets of the sum of the differences between the true contour and that predicted at 50 equally spaced intervals. The M T E R D was defined as the average from the 1000 simulated data sets of the average difference in the true risk and predicted risk at 50 equally spaced intervals along the contour. Based on these criteria, they found that the optimal combination of the middle C × T multiplier and animal allocation were k = 0.9 and c = 0.5 which implies an

430

E. Weller, L. Ryan and D. Dockery

equal allocation of animals to control group and a single multiple of C x T for the treatment animals. This is consistent with the large sample theory (Chernoff, 1953) that indicates the optimal number of exposure groups should equal the number of parameters to be estimated. These results, though, assume that the underlying dose-response relationship is known. As this is usually unknown, we prefer to include an additional C x T group to allow for a better description of the response surface. Therefore, their recommendations were restricted to those designs with k = 0.3, 0.5, 0.7. They found that the two most efficient designs that were also fairly robust to underlying model specification were the designs with k = 0.3, c = 2.0 and k = 0.7, c = 1.0. These designs suggest that with a smaller middle C x T multiplier twice as many animals should be allocated to the exposed groups than to the control groups; whereas, with a larger middle C x T multiplier the exposed and control groups should be allocated an equal number of animals. In the EtO experiment discussed below, the middle C × T multiple was selected to be 78% of the maximum level, and equal numbers of pregnant animals were exposed to air and EtO. 3.1. A case study in ethylene oxide

One of the chemicals listed under the 1990 amendments to the Clean Air Act, Ethylene Oxide (C2H40) is a colorless, highly reactive and flammable gas, produced primarily as an intermediate in chemical manufacturing, but also for sterilization and fumigation. Even though this latter usage constitutes less than 2% of the EtO produced, these industries are responsible for high occupational exposures to many workers. Studies by N I O S H 1 and OSHA 2 suggest that several hundred thousand workers in health care and related industries may be exposed to EtO (USHHS, 1994 pages 205~10; NIOSH 35, 1981; IARC V. 36, 1985). These exposures are typically brief, concentrated bursts that occur when the door of a sterilizing machine is opened (Sun, 1986). Consequently, is important to learn more about the effects of EtO, and in particular to understand any difference based on acute versus prolonged exposure. Despite inconclusive epidemiological evidence regarding the carcinogenicity of EtO, the International Agency for Research for Cancer (IARC) has classified the compound in Category 1 (a known human carcinogen), based on additional mechanistic considerations (IARC, 1994). Furthermore, there is considerable evidence that EtO is carcinogenic in animals (via both inhalation and injection routes of exposure), and is most likely carcinogenic in humans (WHO, 1985, NTP, 1988). The non-cancer toxic effects of ethylene oxide have been documented extensively (USHHS, 1994) and include irritation to the eyes, skin and respiratory tract, as well as peripheral and central nervous system dysfunction (Brashear et al., 1996).

i National Institute for Occupational Safety and Health 2 The Occupational Safety and Health Administration

Statistical issues in inhalation toxicology

431

The effects of ethylene oxide on animal reproduction and development have been studied in several settings (see Polifka et al., 1996 and references therein). Adverse effects include lowered fetal weight, fetal death, pre-implantation loss and malformations (primarily skeletal). There is also evidence of reproductive effects in humans (Rowland et al., 1996). There has been only very limited study of dose rate effects for EtO. In a study of mice, Generoso et al. (1986) varied both exposure duration and concentration, finding that short, high exposures (1200 ppm × 1.5 h) showed increased dominant-lethal responses over the long, low exposures (300 ppm x 6 h) indicating that Haber's law may not apply. However, there has been no studies of Haber's law for exposures occurring during gestation. This lack of information was the motivation for a developmental toxicity study conducted recently at the Harvard School of Public Health. The goal of the study was to explore effects of various exposure levels and durations to EtO at gestational day 7. Two cumulative exposures levels were studied: 2100 ppm-h (350 x 6, 700 × 3, 1400 x 1.5), and 2700 (450 x 6, 900 × 3, 1800 x 1.5) in addition to the controls (0 x 6, 0 x 3, 0 x 1.5) (see Table 1). These C x T multiples were selected using the design guidelines described in the previous section. The study animals were C57 BL/6J black mice from Jackson Laboratory, Bar Harbor, Maine. Females at least 4 months of age were mated with a single male in the afternoon and checked for the presence of a vaginal plug the following morning. Presence of a plug identified that day as potentially gestational day 0. Plugged animals were randomized to control or one of several exposure groups. Maternal toxicity was evaluated immediately after exposure (30 min) to detect short term effects and again 24 h later for more persistent effects. Indicators of short term toxicity included behavioral and weight changes. Pregnant females were sacrificed on day 18 of gestation, and standard methods were used to evaluate the uterine contents. This evaluation involved recording evidence of resorptions, weighing each live pup, measuring its crown-to-rump length and assessing the presence or absence of a variety of malformations.

Table 1 Targeted experimental design Duration (h)

EtO concentration (ppm)

Cx T (ppm-h)

1.5 3.0 6.0

0 0 0

0 0 0

1.5 3.0 6.0

t400 700 350

2100 2100 2100

1.5 3.0 6.0

1800 900 450

2700 2700 2700

E. Weller, L. Ryan and D. Dockery

432

A m o n g the malformations that appeared were the occurrence of micro- and anopthalmia (small eye and absence of eye, respectively). Conducting pilot studies to determine the appropriate dose levels was one of the most challenging aspects of the EtO study. For a dose rate study, the M T D can be thought of as the maximum C x T multiple associated with minimal maternal toxicity. Because of the reactive nature of EtO, we reasoned that the highest toxicity should correspond to short, high exposures. Hence, the goal of our pilot studies was to identify the M T D based on 1 h exposures. Initially, we determined that exposures to 1200 ppm of EtO for 1 h should be our maximum dose, because some deaths were seen among the mice exposed to 1500 ppm for 1 h. However, we found subsequently that a multiple of 1200 ppm-h was too low for developmental effects. Hence, we were forced to accept a certain amount of maternal toxicity in order to move into a range of the C x T region where developmental effects would also be seen. It was determined that the 2700 multiple resulted in some maternal death and fetal effects. Due to the high mortality observed at the 2700 x 1 ppm-h combination (90%), the lowest exposure duration was increased to 1.5 h where considerably less mortality (23%) was observed (1800 x 1.5 ppm-h). The final study design is presented in Table 1. One last design consideration was to decide on the timing of exposure. Because occurrence of developmental effects is sensitive to the time in gestational when exposure occurs, it was important to decide on a suitable gestational day for exposure. Further Table 2 Maternal and fetal results ppm x h

Exposed a Death N N(%)

Fetal weight (g) Pregnantb Implant Death c Malformed d mean (SE) N(%) N(%) N(%) N(%)

Crown to rump length (mm) mean(SE)

50 8 28 38 30

0 0 1 0 1

28 6 14 19 19

(0.012) (0.011) (0.014) (0.011) (0.010)

19.22 (0.125) 20,03 (0.115) 20.70 (0.136) 19.71 (0.122) 19.52 (0.124)

C x T = 2100 1400 x 1.5 700 x 3 350 x 6

39 41 33

3 (8) 0 (0) 0 (0)

C x T = 2700 I800 x 1.5 1543 x 1.75 1350 x 2 900 × 3 450 x 6

73 23 76 50 41

41 (56) 15 (65) 27 (36) 1 (2) 0 (0)

Air 0 0 0 0 0

x × x x x

1.5 1.75 2 3 6

Maternal

Fetal

(0) (0) (4) (0) (3)

(56) (75) (52) (50) (66)

203 50 95 141 150

28 3 12 16 14

8 (22) 22 (54) 19 (58)

62 169 152

41 (66) 30 (18) 14 (9)

7 (33) 56 (40) 20 (15)

0.73 (0.047) 0.88 (0.012) 0.97 (0.010)

16.90 (0.671) 19.24 (0.148) 19.90 (0,123)

3 1 7 11 20

22 7 20 86 148

14 (64) 1 (14) 10 (50) 27 (31) 28 (9)

7 6 3 34 13

0.70 0.76 0,86 0.83 0.97

16.66 17.83 18.74 18.42 19.32

(9) (13) (14) (22) (48)

(14) (6) (13) (11) (9)

13 5 4 5 14

a Number exposed = number with vaginal plugs bpercent pregnant computed out of those alive CNumber died = number of resorptions + number of stillborn pups dpercent malformed computed out of those alive

(7) (11) (5) (4) (10)

(88) (100) (30) (58) (11)

0.92 0.97 0.99 0.94 0.99

(0.045) (0.030) (0.103) (0.016) (0.010)

(0.739) (0.356) (1.082) (0.203) (0.121)

Statistical issues in inhalation toxicology

433

pilot studies were used to determine that gestational day 7 was the one likely to provide greatest sensitivity to exposure. A total of 530 female mice with vaginal plugs were exposed to either air or ethylene oxide on GD7. Table 2 shows the distribution of animals and the number of maternal animals exposed at each C x T combination. Eighty nine of the 530 mice with plugs died (84 of these mice had been exposed to the highest multiple). One hundred and seventy seven (40%) of the 441 mice that survived to sacrifice at G D 18 were found to be pregnant. This table also shows the number of implants and the number of pups that died (fetal deaths and resorptions) as well as the number of live pups that were malformed at each C x T combination. Note that due to a combination of maternal toxicity and fetal deaths, there are relatively few live pups at the high, short exposure combinations, relative to longer, lower exposures and air exposures. The C x T effect on maternal toxicity is evident from Table 2 and Figure 1, which shows the observed maternal death rates at various concentration and durations. The majority of the maternal deaths occurred at the short durations of exposure within the 2700 multiple. It is clear that the observed maternal death rate varies depending upon the C x T combination. The predicted response surface plots from fitting Haber's model and model given by (4) are given in Figures 2 and 3. Results from fitting the model given by (5) were very similar to that of (4) for all endpoints. The most notable feature is that long, low exposures to EtO (e.g., 600 × 6) did not lead to deaths unlike what would be predicted from the Haber's law model. This can also be seen from Table 2, for example, within the 2700 multiple where the observed death rate ranges from 0% at the 450 x 6 ppm-h combination to 65% at the 1543 x 1.75 ppm-h combination. This departure from Haber's model is quantitatively reflected by the significance of the duration parameter estimate for maternal mortality (p = 0.0001).

% of Deaths 100 80 60 40 20

.. ........ /

. "7":- .......

6.0

0 1800 1200' ~ 3 . EtO (ppm) 600

0

Time (hrs)

o-1.5

Fig. 1. Observed maternal death from EtO exposure.

434

E. Weller, L. Ryan and D. Dockery

Predicted % [

18~i 2 : 4 0(~;O)6OO 16 O2N o~

6.0 4.5 3.0 Time (hrs) [.5

Fig. 2. Predicted probability of maternal death under Haber's model.

~

Predicted % I / / / ~ A ~

80 60 40 20 0 1800 1 ,

2 ~ , ~nn'~

EtO (ppm)

u~,v

4 'T 0 1.5

.

6.0 5 3.0 Time(hrs)

Fig. 3. Predicted probability of maternal death deviation from Haber's model (4). To interpret these predicted response surfaces, consider Figure 2, constructed under the assumption of Haber's law. Slicing the dose-response surface at a certain level of response (say 0.10) gives a set of points with each point on the contour corresponding to the same cumulative exposure (C x T). That is, a fixed C x T predicts a constant response. As discussed by Scharfstein and Williams this is referred to as the effective C × T contour. Model (4) leads to a predicted response surface with a different shape (Figure 3). Slicing the dose-response at a certain level of response to generate the effective C × T contour results in a set of concentration and durations with different cumulative exposure (C × T) values. The contours for both models are given in Figure 4. This figure shows that under Haber's Model a C x T combination of 1774 x 1 ppm-h gives the

Statistical issues in inhalation toxicology

435

Time (hrs) 6"

5 4

3 2

0

300

600

900

1200

1500

1800

EtO (ppm) Model

--

Habers

...... Model (4)

Fig. 4. Effective d o s e - d u r a t i o n c o n t o u r for m a t e r n a l d e a t h excess risk = 0.05.

same predicted probability of maternal death as the 295.7 x 6 ppm-h combination. Under model (4), however, the predicted probability of maternal death is 0.27 for the 1774 x 1 ppm-h combination and 0.0001 for the 295.7 x 6 ppm-h combination. This is based on the following fitted regression models for Haber's Model logit(/?) = -6.853 + 0.002C x T ,

(6)

and Model 2 logit(/5) = -3.049 - 1.615 • T + 0.002 • C x T .

(7)

Therefore, applying Haber's model at short durations would severely underestimate the risk, whereas at longer durations it would severely overestimate the risk. Table 2 also shows the number of live pups at each C x T combination, along with the average weights and crown-to-rump lengths at each dose-duration combination. Dose rate effects are clearly evident for fetal death, malformation, fetal weight and crown-to-rump length. Indeed, statistical analyses comparing Haber's model with model (4) revealed highly significant effects of exposure duration, with short high exposures resulting in increased adverse effects. Note that to allow for intra-litter correlations, these analyses were conducted using G E E with the identity link, and an exchangeable working correlation matrix. The presence of dose-rate effects can be clearly seen in Figures 5 through 8 which show the predicted response surfaces under model (4) for fetal death (Figure 5) and malformation (Figure 7). Figures 6 and 8 show the corresponding effective doseduration contours for death and malformation. They show that, just as with maternal death, risk assessment based on Haber's model could lead to serious biases. At shorter durations the ED05 dose under Haber's model is higher than that under model (4), while at longer durations, it is lower under Haber's model.

436

E. Weller, L. Ryan and D. Dockery

[

Predicted %

~

40 2 1 2 : ; ~

.

4.2 "16"0

2~44 3"3;~me (hrs) 0 1.5

Fig. 5. Predicted probability of fetal death model (4).

T i m e (hrs)

6

5 4

3 21" 0" 0

i

i

300

i

600

900

i

,

1200

1500

1800

EtO (ppm) Model

--

Habers

...... M o d e l

(5)

Fig. 6. Fetal death effective dose-duration contour excess risk = 0.05.

That is, assuming Haber's law would underestimate the risk of exposure at short durations. To see this more clearly, Table 3 shows the estimated ED05s under Haber's law and also under model (4) for fetal malformation and death. For malformation we see that for the 1 h duration the predicted response is higher (25%) than what would have been predicted under Haber's model (12.9%) and for the 6 h duration the predicted response is lower (5.5%) than would have been predicted under Haber's model (12.9%). This implies that using Haber's model we would underestimate the risk at shorter durations and overestimate the risk at longer durations. The same is true for fetal death.

437

Statistical issues in inhalation toxicology

Predicted % [~ / ¢ ~

40 2 6.0 1800 1 2 ~

~2.4

EtO (ppm) 600

4.2 " 3"3 ~ m e (hrs)

0 1.5

Fig. 7. Predicted probability of malformation model (4). Time (hrs) 654-

3 2 1 0

0

J

i

i

i

i

i

300

600

900

1200

1500

1800

EtO (ppm) Model - -

Habers ...... Model (4)

Fig. 8. Malformation effective dose-duration contour excess risk - 0.05.

4. Discussion

We have reviewed some of the design and analysis challenges that arise in the context of inhalation toxicology. Aside from the usual kinds of questions and problems that arise in risk assessment, analysis of air toxics raises interesting questions regarding how to account for dose rate effects. We have described some basic approaches to incorporating dose rates effects into the dose response setting, and illustrated the results with data from a reproductive toxicity study in ethylene oxide. Several challenging statistical questions remain. Many of these relate to experimental design. Because there was a fairly small region of the C x T surface

E. Weller, L. Ryan and D. Doclcery

438

Table 3 Estimated ED05s Model

Background risk

Prob at ED

Duration

ED

Actual risk

Malformation Haber's Law Haber's Law Haber's Law Model (4) Model (4) Model (4)

0.078 0.078 0.078 0.173 0.173 0.173

0.129 0.129 0.129 0.223 0.223 0.223

1 3 6 1 3 6

814 271 136 730 499 441

0.251 0.143 0.055 0.223 0.223 0.223

Death Haber's Law Haber's Law Haber's Law Model (4) Model (4) Model (4)

0.112 0.112 0.i12 0.231 0.231 0.231

0.16241 0.16241 0.16241 0.28097 0.28097 0.28097

1 3 6 1 3 6

1049 348 174 1014 713 638

0.301 0.183 0.077 0.281 0.281 0.281

where we could find developmental effects without maternal effects, for the EtO experiment, running pilot studies to identify the optimal grid for the main study proved difficult and time-consuming. For compounds like EtO where short high exposures are expected to be the most toxic, we recommend identifying the highest tolerated concentration corresponding to the shortest exposure of interest and using this to define the highest multiple to be used in the experiment. Furthermore, there is substantial room for improved design strategies using modern techniques such as the Continual Reassessment method (O'Quigley et al., 1990). There is also a considerable literature on optimal designs for mixtures, for example the simplex method discussed by Nigam (1970) or Scheffe's polynomial methods (see, for example, Cox, 1971) As was noted with the EtO data, assuming cumulative exposure is the only important component of the exposure (i.e., assuming Haber's premise holds) could potentially lead to an underestimation of risk at shorter durations of exposure and an overestimation of risk at longer durations of exposure. Further work, in addition to the method proposed by Scharfstein and Williams (1995), is needed to determine how C x T experiments can be used to provide guidelines for risk assessment. Scharfstein and Williams (1995) recommend calculating an "effective concentration-duration contour" which is the curve that characterizes combinations of concentration and duration which yield a specified response rate above background (no exposure). Finally, there are many topics related to air pollution where further research is needed, and where statistics can play an important role. Better methods are needed, for example, for designing efficient models to characterize automobile emissions, and to monitor urban pollution levels. Further work is needed to study not only dose rate effects, but also the effects of mixtures of chemical. Some good discussion can be found in Moller et al. (1994) and many other papers devoted to air pollution in that same volume of Environmental Health Perspectives.

Statistical issues in inhalation toxicology

439

Acknowledgements T h i s r e s e a r c h is s u p p o r t e d b y a c o l l a b o r a t i v e a g r e e m e n t w i t h t h e E n v i r o n m e n t a l Protection Agency (#CR820525-01). The authors thank the editors for helpful comments which improved the paper.

References Box, G. E. P., W. G. Hunter and S. Hunter (1978). Statistics for experiments: An introduction to design, data analysis, and model building. John Wiley and Sons. Brain, J. D., B. D. Beck, A. J. Warren and R. A. Shaikh (1988). Variations in Susceptability to Inhaled Pollutants: Identification, Mechanisms and Policy Implications. Johns Hopkins Press. Brashear, A., F. W. Unverzagt, M. O. Farber, J. M. Bonnin, J. G. Garcia and E. Grober (1996). Ethylene oxide neurotoxicity: A cluster of 12 nurses with peripheral and central nervous system toxicity. Neurology 46, 992-998. Chernoff, H. (1958). Locally optimal designs for estimating parameters. Ann. Math. Statist. 24, 586-602. Cox, D. R. (1971). A note on polynomial response functions for mixtures. Biometrika 58, 155-159. Draper, N. R. (1988). Response surface designs. Encyclopedia of Statistical Sciences, vol. 9. Elswick, R. K., C. Gennings Jr., V. M. Chinchilli and K. S. Dawson (1991). A simple approach for finding estimable functions in linear models (C/R: 92V46 p76-77). Amer. Statist. 45, 51-53. Gart, J. J., D. Krewski, P. N. Lee, R. E. Tarone and J. Wahrendorf (Eds.) (1986). Statistical Methods in Cancer Research, vol. 3: The design and analysis of long-term animal experiments. Oxford University Press. Generoso, W. M., K. T. Cain, L. A. Hughes, G. A. Sega, P. W. Braden, D. G. Gosslee and M. D. Shelby (1986). Ethylene oxide dose and dose-rate effects in the mouse dominant-lethal test. Environ. Mutagen. 1, 375 382. Generoso, W. M., J. C. Rutledge, K. T. Cain, L. A. Hughes and P. W. Braden (1987). Exposure of female mice to ethylene oxide within hours after mating leads to fetal malformation and death. Mutat. Res. 176, 267-274. Gennings, C., V. M. Chinchilli and W. H. Carter Jr. (1989). Response surface analysis with correlated data: A nonlinear model approach. J. Amer. Statist. Assoc. 84, 805-809. Gennings, C., K. S. Dawson, W. H. Carter and R. H. Myers Jr. (1990). Interpreting plots of a multidimensional dose-response surface in a parallel coordinate system. Biometrics 46, 719-735. Gennings, C., W. H. Carter and B. R. Martin Jr. (1994). Drug interactions between morphine and marijuana. Case Studies in Biometry 429-451. Gennings C. and W. H. Carter Jr. (1995). Utilizing concentration-response data from individual components to detect statistically significant departures from additivity in chemical mixtures. Biometrics 51, 1264-1277. Haber, F. (1924). Zur Geschichte des Gaskrieges (On the history of gas warfare), In Funf Vortrage aus den Jahren 1920-1923 (Five Lectures from the years 1920-1923) Springer, Berlin, 1924), pp. 76-92. Hosmer, D. W. Jr. and S. Lemeshow (1989). Applied logistic regression. Wiley, New York. International Agency for Research on Cancer (IARC) (1985). IARC monographs on the evaluation of the carcinogenic risk of chemicals to humans. Allyl Compounds, Aldehydes, Epoxides, and Peroxides. vol. 36, pp. 369 Lyon, France. International Agency for Research on Cancer (IARC) (1994). Ethylene oxide. In Some Industrial Chemicals. IARC Monographs, vol. 60. pp. 73 159. International Agency for Research on Cancer, Lyon, France. Jonckheere, A. R. (t954). A distribution-free k-sample test against ordered alternatives. Biometrika 41, 133-145.

440

E. Weller, L. Ryan and D. Dockery

Kimmel, G. L. (1995). Exposure-duration relationships: The risk assessment process for health effects other than cancer. InhaL ToxicoL 7, 873 880. Lambert, D. and K. Roeder (1995). Overdispersion diagnostics for generalized linear models. J. Amer. Statist. Assoc. 90, 1225-1236. Lehmann, E. L. (1975). Nonparametrics: Statistical methods based on ranks. Holden-Day: Oakland, CA. Liang, K. Y. and S. L. Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. Lippman, M. (1989). Health effects of Ozone: A critical review. JAPCA 39, 672-695. McCullagh, P. and J. A. Nelder (1989). Generalized linear models. Chapman and Hall, London. Morgan, B. J. T. (1992). Analysis of quantal response data. Chapman and Hall. Moller, L., D. Schuetzle and H. Autrup (1994). Future research needs associated with the assessment of potential human health risks from exposure to ambiant air pollutants. Environ. Health Perpsect. 102, Supplement 4. National Toxicology Program (1988). Toxicology and carcinogenesis studies of ethylene oxide in B6C3F1 mice. NTP TR326, U.S. Department of Health and Human Services. Public Health Service. NIH, RTP, NC. National Occupational Exposure Survey, 1980-83 (1984). National Institute for Occupational Safety and Health. Cincinnati, OH: Department of Health and Human Services. National Toxicology Program (1988). Toxicology and carcinogenesis studies of ethylene oxide in B6C3F1 mice. NTP TR326, U.S. Department of Health and Human Services. Public Health Service. NIH, RTP, NC. Nigam, A. K. (1970). Block designs for mixture experiments. Ann. Math. Statist. 41, 1861 1869. O'Quigley, J., M. Pepe and L. Fisher (1990). Continual reassessment method: A practical design for Phase 1 clinical trials in cancer. Biometrics 46, 33M8. Polifka, J. E., J. C. Rutledge, G. L. Kimmel, V. Dellarco and W. M. Generoso (1996). Exposure to ethylene oxide during the early zygotic period induces skeletal anomalies in mouse fetuses. Teratology 53, 1-9. Rowland, A. S., D. D. Baird, D. L. Shore, B. Darden and A. J. Wilcox (1996). Ethylene oxide exposure may increase the rate of spontaneous of abortion, preterm birth and postterm birth. Epidemiology 7, 363-368. Scharfstein, D. O. and P. L. Williams (1995). Design of developmental toxicity studies for assessing joint effects of dose and duration. Risk Analysis 14(6), 1057-1071. Schrenk, H. H., H. Heimann, G. D. Clayton, W. M. Gafafer and H. Wexler (1949). Air pollution in Donora, PA: Epidemiology of an Unusual Smog Episode of October 1948. Federal Security Agency, Washington DC. Public Health Bulletin no. 306. Schwartz, P. F., C. Gennings and V. M. Chinchilli (1995). Threshold models for combination data from reproductive and developmental experiments. J. Amer. Statist. Assoc. 90, 862-870. Spengter J. D. and J. M. Samet (1991). A perspective on indoor and outdoor air pollution. In Indoor Air Pollution. A Health Perspective (Eds., J. M. Samet and J. D. Spengler), pp. 1-29, Johns Hopkins University Press, Baltimore. pp. 139. Sun, M. (1986). Study estimates higher risk from ethylene oxide exposure [news]. Science 231(4737) 448, Jan 31. Terpstra, T. J. (1952). The asymptotic normality and consistency of Kendall's test against trend, when ties are present in one ranking. Indag. Math. 14, 327-333. U.S. Department of Health and Human Services, Public Health Service (1994). Seventh Annual Report on Carcinogens: Summary, pp. 205-210. Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the GaussNewton method. Biometrika 61, 439 447. World Health Organization (1985). Environmental Health, Criteria 55 Ethylene Oxide.