Atmospheric Environment 43 (2009) 2296–2302
Contents lists available at ScienceDirect
Atmospheric Environment journal homepage: www.elsevier.com/locate/atmosenv
Characterizing relationships between personal exposures to VOCs and socioeconomic, demographic, behavioral variables Sheng-Wei Wang a, *,1, Mohammed A. Majeed a, b, Pei-Ling Chu c, Hui-Chih Lin d a
University of Medicine and Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, NJ, USA Department of Natural Resources & Environmental Control (DNREC), State of Delaware, USA c Novo Nordisk Inc., Princeton, NJ, USA d Department of Marketing & Distribution Management, The Overseas Chinese Institute of Technology, Taiwan b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 8 September 2008 Received in revised form 21 January 2009 Accepted 25 January 2009
Socioeconomic and demographic factors have been found to significantly affect time-activity patterns in population cohorts that can subsequently influence personal exposures to air pollutants. This study investigates relationships between personal exposures to eight VOCs (benzene, toluene, ethylbenzene, o-xylene, m-,p-xylene, chloroform, 1,4-dichlorobenzene, and tetrachloroethene) and socioeconomic, demographic, time-activity pattern factors using data collected from the 1999–2000 National Health and Nutrition Examination Survey (NHANES) VOC study. Socio-demographic factors (such as race/ethnicity and family income) were generally found to significantly influence personal exposures to the three chlorinated compounds. This was mainly due to the associations paired by race/ethnicity and urban residence, race/ethnicity and use of air freshener in car, family income and use of dry-cleaner, which can in turn affect exposures to chloroform, 1,4-dichlorobenzene, and tetrachloroethene, respectively. For BTEX, the traffic-related compounds, housing characteristics (leaving home windows open and having an attached garage) and personal activities related to the uses of fuels or solvent-related products played more significant roles in influencing exposures. Significant differences in BTEX exposures were also commonly found in relation to gender, due to associated significant differences in time spent at work/ school and outdoors. The coupling of Classification and Regression Tree (CART) and Bootstrap Aggregating (Bagging) techniques were used as effective tools for characterizing robust sets of significant VOC exposure factors presented above, which conventional statistical approaches could not accomplish. Identification of these significant VOC exposure factors can be used to generate hypotheses for future investigations about possible significant VOC exposure sources and pathways in the general U.S. population. Ó 2009 Elsevier Ltd. All rights reserved.
Keywords: Volatile organic compounds Personal exposures Time-activity patterns Socio-demographic factors NHANES
1. Introduction Volatile Organic Compounds (VOCs) are common air pollutants that can be found in both indoor and outdoor environments. There are numerous sources of VOCs including gasoline, solvents, paints, and consumer products such as air fresheners, cleaning supplies, dry-cleaned clothing, building or furnishing materials, and so on (USEPA, 2007). In the literature, several VOC exposure monitoring studies have reported that personal and indoor air concentrations of VOCs are higher than outdoor ones, and that the factors of indoor
* Corresponding author. IEH, 7F, No. 17, Xuzhou Rd., Taipei 100, Taiwan. Tel.: 886-2-33668107; fax: 886-2-33668114. E-mail addresses:
[email protected],
[email protected] (S.-W. Wang). 1 Institute of Environmental Health, National Taiwan University, Taipei, Taiwan. 1352-2310/$ – see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.atmosenv.2009.01.032
sources and personal activity can contribute significantly to personal exposures (Adgate et al., 2004; Serrano-Trespalacios et al., 2004; Sexton et al., 2004). Further, socioeconomic and demographic factors have been found to significantly affect time-activity patterns in population cohorts (McCurdy and Graham, 2003). It is important to know if significant different time-activity patterns defined by socioeconomic and demographic attributes also correlate with significant different VOC exposures. Edwards et al. (2006) reported the relationships between VOC exposures and socio-demographic factors, time-activity patterns in the European exposure study, EXPOLIS (Jurvelin et al., 2001). Sexton et al. (2007) and Liu et al. (2007) reported the relationships between VOC exposures and timeactivity patterns for selected adult populations in different urban areas of the U.S. However, besides time-activity patterns, the impacts of socioeconomic and demographic factors on personal
S.-W. Wang et al. / Atmospheric Environment 43 (2009) 2296–2302
exposures to VOCs have not been adequately evaluated for the general U.S. population. The 1999–2000 National Health and Nutrition Examination Survey (NHANES) VOC project dataset (CDC, 2006a) provides an excellent and unique data source to correlate personal exposures to VOCs with socioeconomic, demographic, housing, and time-activity factors for the general U.S. population. The objectives of the current study were to (1) examine the relationships between VOC exposures and socio-demographic, lifestyle (i.e. housing and time-activity) variables, and (2) to characterize significant VOC exposure factors among these variables for a large population-based sample of the general U.S. population by analyzing the 1999–2000 NHANES VOC data. 2. Materials and methods 2.1. Data source The aims of the 1999–2000 NHANES VOC study were to characterize exposures to VOCs in the general U.S. population and determine predictors of exposure. This was the first time that NHANES included personal exposure measurements for VOCs. Participants were a representative sub-sample of NHANES subjects between the ages of 20 and 59 years. Personal air measurements were available for ten VOCs: benzene, chloroform, 1,4-dichlorobenzene (PDB), ethylbenzene, methyl tertiary-butyl ether (MTBE), tetrachloroethene (PERC), toluene, trichloroethylene (TCE), oxylene, and m-,p-xylene. Information about individual demographic, socioeconomic status, residences, as well as time and activity data for the exposure period, were also available for this population subset. The time and activity data collected via the special designed questionnaire can help identify possible sources of exposures and characterize personal activities that might contribute to exposure. The 1999–2000 NHANES study uses a stratified, multistage probability sample of the non-institutionalized US civilian population. Detailed information about the study design and operation of NHANES can be found in the analytical and reporting guidelines of the NHANES (CDC, 2006b). Participants of the 1999–2000 NHANES VOC project were asked to wear passive personal monitors (3 M Organic Vapor Monitors) for a period of 48–72 h for measuring personal exposures to ten VOCs (CDC, 2006a). On their return, a short exposure questionnaire was administered to participants to assess personal activities and exposures related to VOC measurements. The collected personal air samples were analyzed via GC-MS. Table 1 summarizes the numbers of available Table 1 Overview of personal air measurements in the NHANES 1999–2000 VOC dataset for benzene, chloroform, ethylbenzene, tetrachloroethene, toluene, trichloroethene, oxylene, m,p-xylene, 1,4-dichlorobenzene, and methyl tert-butyl ether (MTBE). VOC
Na
Percentage of measurements at or above limit of detectionb
Geometric mean (mg m3)
Geometric standard deviation (ug m3)
benzene chloroform ethylbenzene tetrachloroethene toluene trichloroethylene o-xylene m,p-xylene 1,4-dichlorobenzene MTBE
647 651 642 642 638 644 646 646 644 644
77.43 86.02 97.51 71.18 94.98 30.75 94.89 97.06 77.64 36.18
1.26 0.76 2.50 0.32 13.96 0.03 2.12 6.15 1.61 0.11
10.61 9.31 4.27 19.11 5.04 16.47 5.44 4.91 24.33 23.91
a
N: number of total available measurements. If the measurement was below the limit of detection, the concentration was reported as the limit of detection divided by the square root of 2. b
2297
measurements of the ten VOCs, percentages of measurements at or above limits of detection (LODs), as well as their geometric means and geometric standard deviations. TCE and MTBE were excluded from the data analysis, since less than 40% of the available measurements were above the respective LODs for both chemicals. For the remaining eight VOCs, if the measurement was below the LOD, the concentration reported as the LOD divided by the square root of 2 was used for data analysis. Socioeconomic and demographic attributes of the participants were extracted from the full survey data of 1999–2000 NHANES including: age, gender, education, race/ethnicity, and poverty income ratio (i.e. ratio of family income to poverty threshold). A smoking status variable was also assigned to each of the participants based on their measured serum cotinine levels. Smoker or Environmental Tobacco Smoke (ETS) were assigned to those participants whose serum cotinine levels were greater than 14 ng ml1, and the others as non-smokers (Lin et al., 2008). The collected exposure questionnaire data provided participants’ responses to 30 questions (i.e. 30 variables) related to housing characteristics as well as time and activity patterns of participants during the exposure monitoring period. Two variables (wearing the exposure badge at all times and hours badge not worn) were excluded from the data analysis, since they were used to perform the data cleaning procedure for the corresponding personal air measurements. Therefore, there are total of 34 variables in the socioeconomic, demographic, housing, and time-activity factors for examining their relationships with VOC exposures. 2.2. Statistical analyses The NHANES data were collected based on a complex sampling design with sampling weights for generating national estimates. In the current study, we conducted unweighted statistical analyses, since the analysis results were not used for the estimation of population parameters that can be generalized as national estimates. Instead, they were used from the exploratory perspective for identifying significant VOC exposure factors, which can be used to generate hypotheses about possible significant exposure sources and pathways in future studies. In order to reveal the impacts of socioeconomic and demographic factors on VOC exposures, univariate analyses were conducted first for examining group differences of personal exposures stratified by socioeconomic, demographic, and smoking attributes. Student’s t-test was used to examine differences between two groups. The Bonferroni adjustment was used to examine differences for multiple comparisons. However, when all of the predictor variables (including socioeconomic, demographic, housing, and time-activity factors) were involved in data analysis, several characteristics of the NHANES VOC dataset need to be recognized. First, the dataset include a large number of variables, which are of disparate type (i.e. continuous and categorical). Second, high correlations may exist among exposure factors (collinearity), as well as non-linear and interaction effects between exposure factors for influencing VOC exposures. These characteristics make it difficult to perform data analysis using conventional statistical techniques. The approaches of Classification and Regression Tree (CART) and Bootstrap Aggregating (Bagging) were used in the current study for resolving above challenges in analyzing the NHANES VOC dataset. CART was used to explore potential non-linear and interaction effects among exposure factors on personal exposures to VOCs. The CART models are comprised of a collection of rules that partition the space of dependent variable as a function of predictor variables (Breiman et al., 1984). The rules are constructed by a recursive partitioning procedure using a ‘‘training dataset’’ containing values of dependent and predictor variables. The over-fitting of CART
2298
S.-W. Wang et al. / Atmospheric Environment 43 (2009) 2296–2302
model can be prevented through K-fold cross-validation (CV) as follows: (1) randomly split the training dataset into K subsets (typically K ¼ 10 as used in the current study) of approximately equal size; (2) leave out each subset in turn, construct a CART model using the remaining subsets, and repeat K times; (3) identify the optimal CART model by selecting the one with the best predictive performance on the observations that were left out in the construction of the model. The CART method has been used to characterize associations of biomarkers of exposure with environmental, dietary, demographic, and activity variables for benzene and lead (Roy et al., 2003). The Bagging algorithm (Breiman, 1996) was used to resolve the issue of collinearity and obtain a data-driven importance measure of predictor variable. It used bootstrapping to generate multiple training sets. The base algorithm (such as CART) was then used to create a different base model instance for each bootstrapped training set. Combining multiple instances of the same model type can reduce the variance and drastically improve predictive performance. The best enhancement by Bagging is when the model instances are very different from each other. There are two parameters required to be determined for conducting the Bagging analysis: the probability (P) used for generating the bootstrapped samples and the number of times (N) for performing bootstrapping. If we use small numbers of P (such as 0.1, or 0.2), the size of bootstrapped samples would be too small and the constructed tree models would not be robust. On the contrary, large numbers of P (such as 0.8, or 0.9) would result in similar bootstrapped samples, which could not provide the instability needed by the Bagging approach. Through the iterative searching process, we found the optimal parameter value of P as 0.3, since the constructed ‘‘Bagging Trees’’ had the best performance in predicting the personal air concentrations of the eight VOCs through the cross-validation procedure. For developing the ‘‘importance measure’’ of predictors, we counted the number of times out of the N constructed optimal CART models that identified this variable as the primary predictor. The ‘‘importance measure’’ provides a quantitative scale about the significance of a predictor contributing to the predictive performance on the response variable. The higher the counts, the more significant the predictor is for determining personal exposure. The parameter of N should be large enough for producing stable Bagging analysis results. We set N as 1000 in the current study, and identified the predictors with more than 50 counts of importance measure as significant exposure factors, which is equivalent to the statistical significance level of 0.05. Before conducting univariate, CART, and Bagging analyses, the data cleaning procedure was performed for excluding the personal air measurements collected with significantly less sampling time based on the questionnaire responses to ‘‘wearing the exposure badge at all times’’ and ‘‘hours badge not worn’’. Only a small percentage of participants (less than 5%) were excluded. Natural logarithmic transformation was applied to the measured personal air concentrations, since the distributions were skew to the right. Outliers were also identified through normal probability plots of the log-transformed data and then excluded for data analysis. The software packages including MATLAB, R, and SAS were used to perform data analyses in this study. 3. Results and discussion 3.1. Univariate analyses of VOC exposures vs. socio-demographic factors 3.1.1. Age and gender The effect of age was only revealed in chloroform exposures with significantly negative correlation, indicating that young
participants had higher chloroform exposures. Significant differences in personal exposures to benzene, ethylbenzene, o-xylene, and m-,p-xylene were observed between males and females, with males having higher exposures than females (see Table 2). Edwards et al. (2006) reported similar findings of gender differences in exposures to traffic-related aromatics (i.e. ethylbenzene, o-xylene, and m,p-xylene) in the EXPOLIS study. However, Edwards et al. (2006) did not find significant gender differences in benzene exposures. Schweizer et al. (2007) reported that men spent less time in home than women, and men tended to work away from home in the EXPOLIS study. Further, Graham and McCurdy (2004) suggested using age and gender as ‘‘first-order’’ attributes to identify statistically significant different cohorts with respect to the time spent indoors, outdoors, and in-vehicles by analyzing the USEPA Consolidated Human Activity Database (CHAD). In this study, we found that males spent significantly more time at work/ school than females (male mean: 10.37 h, female mean: 7.57 h, pvalue: 0.0001); males also spent significantly more time outdoors than females (male mean: 10.40 h, female mean: 6.94 h, p-value < 0.0001) during the exposure monitoring period. This finding might suggest that males could spend more time in commuting to work, resulting in elevated exposures to traffic-related aromatics. Significant gender differences were not observed in exposures to toluene and the three chlorinated chemicals (chloroform, PERC, and PDB). 3.1.2. Race/ethnicity Significant differences were observed in exposures to benzene and the three chlorinated chemicals (chloroform, PERC, and PDB) among different race/ethnicity groups (see Table 3). For benzene, Mexican Americans had higher exposures than both non-Hispanic whites and blacks. Churchill et al. (2001) reported that Mexican Americans were less likely to have elevated blood benzene levels than non-Hispanic whites from their analysis on the NHANES-III blood VOC data. However, as pointed out by Lin et al. (2008), the blood–air relationships of BTEX were influenced by factors such as age, gender, BMI, and smoking. Thus, higher benzene exposures would not necessarily correspond to higher benzene blood levels. For chloroform, Non-Hispanic blacks had higher exposures than non-Hispanic whites and Mexican Americans. Churchill et al. (2001) reported a similar finding that non-Hispanic blacks were more likely to have elevated chloroform blood levels than nonHispanic whites in the NHANES-III blood VOC study. Churchill et al. (2001) also indicated the protective effect of rural residence due to the increased well-water use for less exposure to chlorine-treated water. By examining the questionnaire responses to ‘‘description of street where you live’’, we found that non-Hispanic whites had significantly higher proportion of rural residence than nonHispanic blacks (white proportion: 0.20, black proportion: 0.08, p-value: 0.001), thus resulting in lower chloroform exposures.
Table 2 Gender differences in personal VOC exposures (ug m3). Chemical
benzenea toluene ethylbenzenea o-xylenea m,p-xylenea chloroform tetrachloroethene 1,4-dichlorobenzene
Male
Female
p-value
Mean
N
Mean
N
6.33 28.19 6.54 6.46 19.43 2.65 7.71 30.11
295 282 289 288 290 296 292 291
4.67 23.42 3.99 3.84 11.44 2.90 3.23 43.22
352 347 347 352 352 355 350 350
<0.001 0.196 <0.001 <0.001 <0.001 0.051 0.181 0.517
a Statistically significant differences were observed at the significance level of 0.05.
S.-W. Wang et al. / Atmospheric Environment 43 (2009) 2296–2302
2299
Table 3 Race/ethnicity differences in personal exposures (ug m3) to benzene, chloroform, tetrachloroethene (PERC), and 1,4-dichlorobenzene (PDB). Chemical
Race/ethnicity groups Group 1
Group 2
benzene
Mexican American (N ¼ 185) Mexican American (N ¼ 185) Mexican American (N ¼ 186) Non-Hispanic white (N ¼ 273) Mexican American (N ¼ 182) Mexican American (N ¼ 182) Non-Hispanic black (N ¼ 125)
Non-Hispanic Non-Hispanic Non-Hispanic Non-Hispanic Non-Hispanic Non-Hispanic Non-Hispanic
chloroform PERC PDB a
white (N ¼ 271) black (N ¼ 126) black (N ¼ 128) black (N ¼ 128) black (N ¼ 126) white (N ¼ 269) white (N ¼ 269)
Group mean differencea (grp1 grp2)
95% C.I. of group mean differencea
0.42 0.40 0.60 0.50 0.53 1.12 1.00
(0.17, 0.67) (0.10, 0.70) (0.95, 0.26) (0.83, 0.17) (0.98, 0.07) (0.62, 1.61) (0.44, 1.56)
Natural logarithmic-transformed concentration.
For PERC, non-Hispanic blacks had significant higher exposures than Mexican Americans. Churchill et al. (2001) reported higher PERC blood levels in non-Hispanic blacks than non-Hispanic whites. By examining the questionnaire responses to ‘‘visiting a drycleaning shop or wearing dry-cleaning clothes’’, non-Hispanic blacks had higher percentage (19.1%) of using dry-cleaning services than Mexican Americans (14.4%) and non-Hispanic whites (13.7%), but they were not significantly different. For PDB, Mexican Americans and non-Hispanic blacks had higher exposures than nonHispanic whites. Churchill et al. (2001) also reported the same race/ ethnicity differences in PDB blood levels. However, we did not find race/ethnicity differences in the questionnaire responses to the uses of air fresheners or mothballs (i.e. sources of PDB). Elliott and Loomis (2008) found that non-whites were more likely to use air fresheners in vehicles than whites. However, the NHANES questionnaire did not specify the location of using these products (Elliott and Loomis, 2008). Therefore, the participants might interpret the question for asking their behavior in using these products at home only, resulting the non-significant race differences. 3.1.3. Education Significant differences were observed in exposures to benzene, PERC, and PDB with respect to different education levels of the participants (see Table 4). The less educated participants (i.e. less than high school) had higher benzene exposures than the more educated participants (high school diploma and more than high school). Similar finding was also found in PDB exposures. Education differences in benzene and PDB exposures might be due to the association with race/ethnicity. Mexican Americans had significantly higher percentage of ‘‘less than high school’’ education (59.7%) than non-Hispanic whites (10.0%) and non-Hispanic blacks (30.4%). For PERC, however, higher exposures were observed in the more educated participants. From the questionnaire responses, we found that the participants with ‘‘more than high school’’ education had higher percentage of using dry-cleaning services (17.0%) than the participants with ‘‘less than high school’’ education (11.8%), but they were not significantly different. 3.1.4. Income Significant correlations were found between poverty income ratios and exposures to benzene, chloroform, PERC, and PDB. For
benzene, chloroform, and PDB, poverty income ratios had negative correlations with personal exposures, indicating the participants with lower family incomes had higher exposures. On the contrary, poverty income ratios had a positive correlation with PERC exposures, indicating the participants with higher family incomes had higher exposures. Family incomes were significantly associated with race/ethnicity. Mexican Americans and non-Hispanic blacks had significantly lower family incomes than non-Hispanic whites. Thus, the effects of family incomes on exposures could be attributed to the associated race/ethnicity differences in exposures to benzene, chloroform, and PDB as revealed in Section 3.1.2. For PERC, it is reasonable to expect that the participants with higher incomes were more likely to use dry-cleaning services, resulting in higher exposures. 3.1.5. Smoking Significant differences were observed in exposures to benzene, ethylbenzene, m-,p-xylene, and PERC between smoker (including ETS) and non-smoker (see Table 5). Smoker (including ETS) had higher exposure to benzene, ethylbenzene, and m,p-xylene, since these chemicals are common components of cigarette smoke (Ashley et al., 1995; Wallace et al., 1996). However, smoker (including ETS) had lower PERC exposure than non-smoker. 3.2. CART analysis Fig. 1 presents the constructed optimal CART model for PDB exposures, showing a hierarchical structure of a decision tree (i.e. a dendogram). In each node of the dendogram, the number of participants (n) classified by the condition of the exposure factor above the node and the geometric mean of the personal air concentrations of the group are shown. For instance, the primary exposure factor identified was poverty income ratio (PIR), where the higher PDB exposures were revealed for 457 participants with PIR less than 3.785 (Node 3) and the geometric mean of the PDB personal air concentrations for this group of participants was 5.37 ug m3. The subsequent exposure division factors included: education, race/ethnicity, hours spent indoors at home, and smoking (or ETS). The exposure patterns characterized by family income, education, and race/ethnicity are similar to the results shown in Sections 3.1.4, 3.1.3, and 3.1.2, respectively. The
Table 4 Education differences in personal exposures (ug m3) to benzene, tetrachloroethene (PERC), and 1,4-dichlorobenzene (PDB). Chemical
Education groups Group 1
benzene PERC PDB a
less less less less
than than than than
Group mean differencea (grp1 grp2)
95% C.I. of group mean differencea
0.27 0.28 0.31 0.94
(0.03, 0.51) (0.07, 0.48) (0.62, 0.006)b (0.53, 1.36)
Group 2 high high high high
school school school school
(N ¼ 201) (N ¼ 201) (N ¼ 198) (N ¼ 197)
high school diploma (N ¼ 158) more than high school (N ¼ 288) more than high school (N ¼ 285) more than high school (N ¼ 285)
Natural logarithmic-transformed concentration. Statistically significant difference was not observed in the ANOVA analysis, since the 95% C.I. contained zero. However, based on the non-parametric Kruskal–Wallis test, significant difference was found between the two education groups (p-value: 0.026). b
2300
S.-W. Wang et al. / Atmospheric Environment 43 (2009) 2296–2302
Table 5 Smoking differences in personal VOC exposures (ug m3). Chemical
benzenea toluene ethylbenzenea o-xylene m,p-xylenea chloroform tetrachloroethenea 1,4-dichlorobenzene
Smoker or ETS
Non-smoker
Mean
N
Mean
N
p-value
6.60 28.82 6.86 6.14 20.11 2.81 3.91 26.22
160 153 157 158 159 160 159 160
5.05 24.39 4.51 4.68 13.52 2.78 5.74 40.76
460 450 454 455 456 464 456 454
<0.001 0.345 0.002 0.074 0.016 0.351 0.014 0.078
a Statistically significant differences were observed at the significance level of 0.05.
participants who met the following three conditions were in the highest PDB exposure group (Node 9): (1) PIR less than 3.785, (2) education less than high school, and (3) spent less than 27 h at home. Interestingly, smoking (or ETS) decreased PDB exposures. As described in the above socio-demographic analysis, PDB exposures might be related to the use of air freshener in car. Smoker might be
Node 1 n = 644
>=3.785
Poverty income ratio (PIR) <3.785
Node 2 Mean = 2.23 n = 187
Education Less than high school
Node 5 Mean = 8.25 n = 181
Node 4 Mean = 4.06 n = 276 Race/Ethnicity Others
Hours spent indoors at home?
Non-Hispanic black
Node 6 Mean = 3.32 n = 203
3.3. Bagging analysis
Node 3 Mean = 5.37 n = 457
High school diploma or more than high school
>=26.5
Node 8 Mean = 6.11 n = 128
Node 7 Mean = 7.10 n = 73
<26.5
Node 9 Mean = 17.12 n = 53
Smoker (or ETS) Yes
Node 10 Mean = 2.32 n = 23
more likely to leave car windows open when driving than nonsmokers, thus decreasing PDB exposures. For benzene exposures, the primary exposure factor identified in the CART model was home ventilation condition, where leaving windows open decreased benzene exposures. For the other chemicals of BTEX (toluene, ethylbenzene, o-xylene, and m,pxylene), the primary exposure factor was consistently identified as the activity of using solvent-related products (i.e. paint thinner, brush cleaner, or furniture stripper). Since these four chemicals are common components of organic solvents, it is expected to observe higher exposures from using solvent-related products. The other exposure factors commonly identified in the CART models of the BTEX chemicals included: smoking, having an attached garage, hours spent at work/school, and pumping gas. Having an attached garage in home increased BTEX exposures, since attached garages can contribute to the elevation of indoor air concentrations for gasoline-related VOCs (Adgate et al., 2004; Batterman et al., 2007). Spending more hours at work/school increased TEX exposures (i.e. solvent-related chemicals), suggesting that the exposure source might be related to the participants’ occupational activities. Pumping gas also increased BTEX exposures. For chloroform and PERC, the primary exposure factors were ‘‘description of street where you live’’ and ‘‘visiting a dry-cleaning shop or wearing drycleaned clothes’’, respectively.
No
Node 11 Mean = 11.82 n = 50
Fig. 1. Constructed optimal CART model (i.e. a dendogram) for 1,4-dichlorobenzene (PDB) exposures. The exposure factors were poverty income ratio (PIR), education, race/ethnicity, hours spent indoors at home, and smoking. Each split of the dendogram is labeled with an exposure factor and the condition of determining the split. Each node of the dendogram is labeled with the number of participants (n) classified by the condition above the node and the geometric mean of the personal air concentrations in the group. The group with the lowest PDB exposures was the participants with PIR greater or equal to 3.785 (Node 2: n ¼ 187, geometric mean ¼ 2.23 ug m3). The participants who met the following three conditions were in the highest PDB exposure group (Node 9: n ¼ 53, geometric mean ¼ 17.12 ug m3): (1) PIR less than 3.785, (2) education less than high school, and (3) spent less than 27 h at home.
For benzene, the most significant exposure factor was leaving windows open at home with the highest counts of importance measure, followed by race/ethnicity with a little less counts (see Table 6). These two factors have much higher counts than the other identified exposure factors, indicating that their impacts on benzene exposures were far more significant than the other factors. The dominant factor of leaving windows open at home suggested that significant indoor sources of benzene might be present at home and could contribute to personal exposures. Compared to the single CART analysis, the Bagging analysis identified two additional time-activity factors (i.e. hours spent outdoors, and breathing fumes from gasoline) and three additional socio-demographic factors (poverty income ratio, education, and gender). In Section 3.1.1, we showed that significant gender differences were observed in hours spent indoors at work/school and hours spent outdoors. Since gender was identified as a significant factor for benzene exposures, the associated time-activity patterns impacted by gender were likely revealed as significant exposure factors. Pursuing gasoline-related activities (i.e. pumping gas and breathing fumes from gasoline) increased exposures to benzene. The factors of poverty income ratio and education were identified due to their correlation with race/ethnicity. For the other chemicals of BTEX (toluene, ethylbenzene, o-xylene, and m-,p-xylene), the activity of using solvent-related products (i.e. paint thinner, brush cleaner, or furniture stripper) was consistently identified as the most significant exposure factor (see Table 6), which is the same as the single CART analysis results. The subsequent significant exposure factors identified were different between toluene and the other three chemicals (ethylbenzene, o-xylene, and m-,p-xylene). For toluene, home ventilation (i.e. windows open condition) was identified with the second highest counts, followed by breathing fumes from gasoline and pumping gas. For ethylbenzene, o-xylene, and m,p-xylene, pumping gas and breathing fumes from diesel fuel or kerosene were consistently identified with the second and third highest counts, respectively. The subsequent exposure factors included: having an attached garage in home, hours spent at work/school, hours spent at home, and gender.
S.-W. Wang et al. / Atmospheric Environment 43 (2009) 2296–2302
2301
Table 6 Significant exposure factors for BTEX identified by the Bagging analysis.a Exposure factor
Counts
benzene leaving windows open at home race/ethnicity having an attached garage smoking status hours spent outdoors pumping gas breathing fumes from gasoline education gender poverty income ratio hours spent at work/school toluene paint thinner, brush cleaner, stripper leaving windows open at home breathing fumes from gasoline pumping gas ethylbenzene paint thinner, brush cleaner, stripper pumping gas a
Exposure factor diesel fuel or kerosene having an attached garage hours spent at work/school o-xylene paint thinner, brush cleaner, stripper pumping gas diesel fuel or kerosene gender having an attached garage hours spent at home hours spent at work/school m-,p-xylene paint thinner, brush cleaner, stripper pumping gas diesel fuel or kerosene hours spent at work/school having an attached garage hours spent at home gender
251 245 73 70 66 66 66 63 63 62 57 372 119 107 99 357 154
Counts 84 63 51 321 142 89 78 72 68 63 268 146 113 82 74 70 65
Bagging was conducted with N ¼ 1000, and exposure factors with >50 counts were identified.
For chloroform, race/ethnicity was identified as the most significant exposure factor (see Table 7). In the single CART analysis, the primary exposure division factor was ‘‘description of street where you live’’, which was identified here with the third highest counts of importance measure. However, both factors were correlated with each other as shown in Section 3.1.2. The other identified exposure factors included: age, poverty income ratio, hours spent at home, home ventilation (i.e. windows open condition), and storing paints or fuels inside home. In Section 3.1.1, a significantly negative correlation was revealed between age and chloroform exposures. Poverty income ratio was identified due to its correlation with race/ethnicity. The factor of hours spent at home was found to have a significantly positive correlation with chloroform exposures (Spearman correlation coefficient: 0.091, p-value: 0.021), suggesting that spending more time at home would result in higher chloroform exposure. One possible explanation is that spending more time at home would increase tap water uses, thus resulting in higher chloroform exposures. Leaving windows open at home significantly decreased chloroform exposures (p-value: 0.0018). For PERC, the most significant exposure factor was ‘‘visiting a dry-cleaning shop or wearing dry-cleaned clothes’’ (see Table 7), which is consistent with the single CART model. The identified additional exposure factors were hours spent at work/school and poverty income ratio. The factor of hours spent at work/school was found to have a significantly positive correlation with PERC exposures (Spearman correlation coefficient: 0.15, p-value < 0.001), suggesting that the participants who spent more hours at work/ school had higher PERC exposures. For understanding the underlying cause, the association between using dry-cleaning services and hours spent at work/school was examined. The participants
who used dry-cleaning services spent significantly more hours at work/school than the participants who did not use the dry-cleaning services (p-value: 0.011). Poverty income ratio was shown to have a significantly positive correlation with PERC exposures in Section 3.1.4. For PDB, the identified significant exposure factors were poverty income ratio, education, and race/ethnicity, similar to the single CART analysis results. 3.4. Strength and limitation of CART and Bagging In general, the Bagging analysis revealed more significant exposure factors than the single CART analysis due to its robustness for treating co-linearity. Thus, some hidden significant exposure factors that were not characterized in the single CART analysis can be identified in the Bagging analysis. Further, the Bagging analysis can prevent some spurious exposure factors in the single CART analysis to be identified again as significant exposure factors. For instance, kitchen stove type and house age were identified as significant exposure factors for chloroform exposures in the single CART analysis (results not shown), where reasonable interpretation could not be provided. However, both factors were not identified as significant exposure factors in the Bagging analysis for chloroform exposures. Similar situation was also revealed for the case of PERC exposures, where kitchen stove type appeared in the single CART model (results not shown), but was not identified as a significant exposure factor in the Bagging analysis. The limitation of the Bagging analysis is that the developed counts of importance measure can only identify the significance of exposure factors quantitatively, but cannot reveal how the identified factors affect exposures. Therefore, the Bagging analysis has to rely on CART for
Table 7 Significant exposure factors for chloroform, PERC, and PDB identified by the Bagging analysis.a Exposure factor chloroform race/ethnicity age description of street where you live poverty income ratio hours spent at home leaving windows open at home storing paints or fuels at home a
Counts
Exposure factor
183 162 129 111 75 64 33
PERC visiting a dry-cleaner or wearing dry-cleaned clothes hours spent at work/school poverty income ratio PDB poverty income ratio education race/ethnicity
Bagging was conducted with N ¼ 1000, and exposure factors with >50 counts were identified.
Counts 559 118 95 445 283 142
2302
S.-W. Wang et al. / Atmospheric Environment 43 (2009) 2296–2302
this information. Further, CART can only perform binary split for classifying effects of exposure factors, and cannot reveal complete effects for factors with multiple categories such as race/ethnicity. Multiple comparison test was used to resolve this issue in this study. 4. Conclusions We have demonstrated a systematic data analysis framework for identifying robust and significant VOC exposure factors from sociodemographic, housing characteristics, and time-activity variables in the 1999–2000 NHANES VOC data. Significant differences in exposures to the three chlorinated chemicals (chloroform, PERC, and PDB) were generally found in relation to the socio-demographic factors of race/ethnicity and family income. This was mainly due to the associations paired by race/ethnicity and urban residence, race/ethnicity and use of air freshener in car, family income and use of dry-cleaner, which can in turn affect exposures to chloroform, 1,4-dichlorobenzene (PDB), and tetrachloroethene (PERC), respectively. For BTEX, housing characteristics (leaving home windows open and having an attached garage) and personal activities related to the uses of fuels or solvent-related products played more significant roles in influencing exposures. Significant differences in BTEX exposures were also commonly found in relation to gender, due to associated significant differences in time spent at work/school and outdoors. The coupling of CART and Bagging techniques were used as effective tools for characterizing robust set of significant VOC exposure factors, which conventional statistical techniques could not accomplish. Identification of these significant VOC exposure factors can be used to generate hypotheses for future investigations about possible significant VOC exposure sources and pathways in the general U.S. population. Acknowledgement The support for this study has been provided by the Mickey Leland National Urban Air Toxics Research Center (NUATRC) and the U.S. Environmental Protection Agency (USEPA – Cooperative Agreement CR-83162501). The viewpoints expressed in this work are solely the responsibility of the authors and do not necessarily reflect the views of NUATRC and USEPA or their contractors. References Adgate, J.L., Eberly, L.E., Stroebel, C., Pellizzari, E.D., Sexton, K., 2004. Personal, indoor, and outdoor VOC exposures in a probability sample of children. Journal of Exposure Analysis and Environmental Epidemiology 14 (Suppl. 1), S4–S13. Ashley, D.L., Bonin, M.A., Hamar, B., McGeehin, M.A., 1995. Removing the smoking confounder from blood volatile organic compounds measurements. Environmental Research 71 (1), 39–45.
Batterman, S., Jia, C., Hatzivasilis, G., 2007. Migration of volatile organic compounds from attached garages to residences: a major exposure source. Environmental Research 104 (2), 224–240. Breiman, L., 1996. Bagging predictors. Machine Learning 24 (2), 123–140. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Regression Tree. Wadsworth, Belmont. CDC, 2006a. NHANES 1999–2000 Data Documentation, Lab 21-Volatile Organic Compounds. Centers for Disease Control. http://www.cdc.gov/nchs/data/ nhanes/frequency/lab21_doc.pdf. CDC, 2006b. Analytic and reporting guidelines: The National Health and Nutrition Examination Survey (NHANES). Centers for Disease Control. http://www.cdc.gov/ nchs/data/nhanes/nhanes_03_04/nhanes_analytic_guidelines_dec_2005.pdf. Churchill, J.E., Ashley, D.L., Kaye, W.E., 2001. Recent chemical exposures and blood volatile organic compound levels in a large population-based sample. Archives of Environmental Health 56 (2), 157–166. Edwards, R.D., Schweizer, C., Llacqu, V., Lai, H.K., Jantunen, M., Bayer-Oglesby, L., Kunzli, N., 2006. Time–activity relationships to VOC personal exposure factors. Atmospheric Environment 40 (29), 5685–5700. Elliott, L., Loomis, D., 2008. Car air fresheners as a source of ethnic differences in exposure to 1,4-dichlorobenzene. Epidemiology 19 (1), 166–167. Graham, S.E., McCurdy, T., 2004. Developing meaningful cohorts for human exposure models. Journal of Exposure Analysis and Environmental Epidemiology 14 (1), 23–43. Jurvelin, J., Edwards, R., Saarela, K., Laine-Ylijoki, J., De Bortoli, M., Oglesby, L., Schlapfer, K., Georgoulis, L., Tischerova, E., Hanninen, O., Jantunen, M., 2001. Evaluation of VOC measurements in the EXPOLIS study. Journal of Environmental Monitoring 3 (1), 159–165. Lin, Y.S., Egeghy, P.P., Rappaport, S.M., 2008. Relationships between levels of volatile organic compounds in air and blood from the general population. Journal of Exposure Analysis and Environmental Epidemiology 18 (4), 421–429. Liu, W., Zhang, J.J., Korn, L.R., Zhang, L., Weisel, C.P., Turpin, B., Morandi, M., Stock, T., Colome, S., 2007. Predicting personal exposure to airborne carbonyls using residential measurements and time/activity data. Atmospheric Environment 41 (25), 5280–5288. McCurdy, T., Graham, S.E., 2003. Using human activity data in exposure models: analysis of discriminating factors. Journal of Exposure Analysis and Environmental Epidemiology 13 (4), 294–317. Roy, A., Ouyang, M., Freeman, N., Georgopoulos, P.G., Lioy, P.J., 2003. Environmental, dietary, demographic, and activity variables associated with biomarkers of exposure for benzene and lead. Journal of Exposure Analysis and Environmental Epidemiology 13 (6), 417–426. Schweizer, C., Edwards, R.D., Bayer-Oglesby, L., Gauderman, W.J., Ilacqua, V., Jantunen, M.J., Lai, H.K., Nieuwenhuijsen, M., Kunzli, N., 2007. Indoor timemicroenvironment-activity patterns in seven regions of Europe. Journal of Exposure Science and Environmental Epidemiology 17 (2), 170–181. Serrano-Trespalacios, P.I., Ryan, L., Spengler, J.D., 2004. Ambient, indoor and personal exposure relationships of volatile organic compounds in Mexico City Metropolitan Area. Journal of Exposure Analysis and Environmental Epidemiology 14 (Suppl. 1), S118–S132. Sexton, K., Adgate, J.L., Mongin, S.J., Pratt, G.C., Ramachandran, G., Stock, T.H., Morandi, M.T., 2004. Evaluating differences between measured personal exposures to volatile organic compounds and concentrations in outdoor and indoor air. Environmental Science and Technology 38 (9), 2593–2602. Sexton, K., Mongin, S.J., Adgate, J.L., Pratt, G.C., Ramachandran, G., Stock, T.H., Morandi, M.T., 2007. Estimating volatile organic compound concentrations in selected microenvironments using time-activity and personal exposure data. Journal of Toxicology and Environmental Health, Part A 70 (5), 465–476. USEPA, 2007. Indoor Air Quality (IAQ): Organic Gases (Volatile Organic Compounds – VOCs). Available from: http://www.epa.gov/iaq/voc.html. Wallace, L., Buckley, T., Pellizzari, E., Gordon, S., 1996. Breath measurements as volatile organic compound biomarkers. Environmental Health Perspectives 104 (Suppl. 5), 861–869.