Journal of Hazardous Materials 383 (2020) 121154
Contents lists available at ScienceDirect
Journal of Hazardous Materials journal homepage: www.elsevier.com/locate/jhazmat
Application of quantitative structure-activity relationship (QSAR) model in comprehensive human health risk assessment of PAHs, and alkyl-, nitro-, carbonyl-, and hydroxyl-PAHs laden in urban road dust
T
⁎
Gustav Gbeddya, , Prasanna Egodawattaa, Ashantha Goonetillekea, Godwin Ayokoa, Lan Chenb a b
Science and Engineering Faculty, Queensland University of Technology (QUT), GPO Box 2434, Brisbane, 4001, Queensland, Australia Institute for Future Environments, Queensland University of Technology (QUT), GPO Box 2434, Brisbane, 4001, Queensland, Australia
G R A P H I C A L A B S T R A C T
A R T I C LE I N FO
A B S T R A C T
Editor: Daniel CW Tsang
The carcinogenic human health risks (CHHR) posed by the exposure to PAHs and transformed PAH products (TPPs) are currently inconclusive due to the lack of toxicity equivalency factors (TEFs) for most TPPs although some of these pollutants are more potent carcinogens. The applicability of quantitative structure-activity relationship (QSAR) model in predicting TEF of PAHs and TPPs to holistically evaluate the CHHR posed by the exposure to these pollutants in road dust from Gold Coast, Australia was examined. Statistical evaluation via ten metrics shows that partial least-squares regression (PLSR1) model has more statistical power in predicting TEF than multiple linear regression (MLR) within relevant applicability domain. For instance, the predicted residual sum of squares (PRESS) and standard deviation of error of prediction (SDEP) for PLSR is closer to zero than that of MLR. The total cancer risk estimated using the QSAR model derived TEFs and original TEFs for outliers gives a more holistic incremental lifetime cancer risk in relation to children and adults. Potential cancer risk exists for adults with this approach whereas reliance on only the originally available TEFs lead to a negligible risk diagnosis. The application of QSAR model in assessing CHHR due to PAHs and TPPs exposures is very viable.
Keywords: Carcinogenic human health risks Transformed PAH products Toxicity equivalency factors Applicability domain
⁎
Corresponding author. E-mail addresses:
[email protected],
[email protected] (G. Gbeddy).
https://doi.org/10.1016/j.jhazmat.2019.121154 Received 27 May 2019; Received in revised form 24 August 2019; Accepted 3 September 2019 Available online 04 September 2019 0304-3894/ © 2019 Elsevier B.V. All rights reserved.
Journal of Hazardous Materials 383 (2020) 121154
G. Gbeddy, et al.
1. Introduction
understanding of the cancer risk posed by PAHs and TPPs. Therefore, the research outcome will immensely benefit scientists, policy makers and the general society.
Polycyclic aromatic hydrocarbons (PAHs) have attracted significant research interest globally due to their detrimental ecological and health effects, and widespread environmental distribution. Research relating to PAHs monitoring and evaluation are fast-tracked by readily available commercial standards and efficient extraction and analytical methods (Andersson and Achten, 2015). In addition, the human health risks associated with the exposure to most parent PAHs have equally been investigated due to the availability of the corresponding toxicity equivalency factors (TEF). TEF was established to evaluate structurally interrelated compounds having a similar mechanism of toxicological action. TEF is often stated as the ratio of the median effective concentration (EC50) of benzo(a)pyrene (BaP), the indicator toxic PAH, to the EC50 of other individual PAHs (Delistraty, 1997). Jung et al. (2010) noted that the use of TEF in human health risk assessment potentially constitutes an accurate evaluation of environmental exposure to PAHs. However, the lack of TEF values for most transformed PAH products (TPPs) such as nitro-PAHs (N-PAHs), carbonyl-PAHs (C-PAHs) and hydroxyl-PAHs (HO-PAHs) has significantly hampered the associated health risk assessment. Some of these polar TPPs exhibit markedly higher carcinogenic and mutagenic properties than their corresponding parent PAHs (Cochran et al., 2012). For example, 6-nitrochrysene and 1, 6-dinitropyrene have TEF values that are ten times that of BaP. As a result, these two pollutants are considered more carcinogenic and mutagenic in comparison to BaP (Andersson and Achten, 2015; Jung et al., 2010). In this regard, human health risk assessment of PAHs devoid of TPPs can best be described as inconclusive. Accordingly, Andersson and Achten (2015) noted that the over-reliance on toxicity assessments using the sixteen USEPA listed parent PAHs has often led to an underestimation of the potential toxicity of PAHs laden in various environmental matrices. The vast number of individual TPPs makes it practically difficult to perform individual toxicological tests. Quantifying the TEFs and other toxicity parameters for all hazardous TPPs are often arduous, timeconsuming and costly (Kunal et al., 2015; Liu et al., 2006). The application of quantitative structure-activity relationship (QSAR) model is a viable statistical approach in addressing this challenge. QSAR can be defined as a mathematical relationship between chemical characteristics and quantitatively expressed pharmacological activities, toxicities and/or properties for a series of compounds. According to the Organization for Economic Co-operation and Development (OECD), every QSAR model must entail five (5) key elements. There must be a welldefined endpoint; unambiguous algorithm for the model development; clearly stipulated Applicability Domain (AD); relevant metrics for ascertaining the robustness, goodness of fit and predictability; and finally, a mechanistic explanation where possible (Gadaleta et al., 2016; Kunal et al., 2015; OECD, 2007). The accurate prediction power of the validated QSAR can be the basis for prioritizing toxicological test of specific TPPs. The justification for the application of QSAR in regulatory decision making concerning the toxicities of TPPs is therefore highly dependent on the model’s capability for quantifying or predicting the unknown TEF values with some degree of confidence (Liu et al., 2006). Thorough scrutiny of existing research clearly indicates the absence of QSAR application in predicting the TEF of PAHs and TPPs, thereby resulting in potential underestimation of human health risk to these hazardous pollutants. In this regard, this study aimed at developing a validated QSAR model capable of predicting the TEF of TPPs within an AD and applying the output in examining the human carcinogenic health risk posed by the exposure to PAHs and TPPs laden in urban road dust. In order to accomplish this task, a comprehensive physicochemical data set on PAHs and TPPs was established and subsequently subjected to appropriate multivariate regression methods in line with QSAR model. The most efficient and optimized regression method was then selected for the QSAR model. It is highly anticipated that the robust QSAR model developed in this study will enhance the current
2. Methods 2.1. Data preparation and processing A generic QSAR data consisting of one (1) end-point (logTEF or TEF response variable), sixteen (16) predictor variables and thirty (30) PAHs and TPPs shown in Table S1 of the Data in Brief supplementary information. The predictors consisted of eight (8) experimental data (octanol-water partitioning coefficient (logKow), vapour pressure (Vp), boiling point (BP), melting point (MP), water solubility (Sw), molecular weight (Mw), number of rings (NOR) and number of aromatic rings (NOAR)), and eight (8) predicted data (density (ρ), enthalpy of vapourization (Hv), molar refractivity (Rf), polarizability (Pl), surface tension (St), molar volume (mvol), soil adsorption coefficient (logKoc) and bioaccumulation estimate from logKow (logBCF)). These parameters were widely available for all the observations and were considered as fundamental and significant chemical information. The predictor values were retrieved from ChemSpider.com (2019). The TEF response variable for the 30 observations as shown in Table S1 was obtained from Bortey-Sam et al. (2015) and Wei et al. (2015). In order to reduce the effect of redundant and noisy variables among the 16 predictors, data reduction was undertaken using factor analysis (FA) as elaborated in the Data in Brief supplementary information. Data reduction is a critical step since it enhances the performance of the developed model and also facilitates the mechanistic interpretation of the model (Mehmood et al., 2012). In this regard, Microsoft Office Professional Plus Excel 2010 StatistiXL plug-in version 1.8 software was used for the data reduction. From the results of the FA, four descriptors name Mw, Hv, BP, and logVp were found to be highly relevant to the response variable and were therefore selected. The application of multiple linear regression (MLR) during the QSAR model development facilitated the identification of outlier observations (observations having residuals greater than three times the standard deviation from the mean residual). It is reported that for a reliable QSAR model development, the ratio of the number of observations (n) to descriptors (p) must be 5:1 (Kunal et al., 2015). In this regard, using the results of both FA and MLR, 20 observations and 4 descriptors were significant in this research as shown in Table 1. 2.2. Study area and sampling In order to fully assess the health risk posed by these deleterious micro-organic pollutants to children and adults, the predicted TEF values were applied to the PAH and TPP concentrations measured in 0.45–75 μm road dust sample collected from De Haviland residential area, Benowa, Gold Coast in the Queensland region of Australia. PAH concentrations vary inversely with particle size, and consequently, major association has been found between exposures to particulate matter (PM) below 2.5 μm, and cardiovascular, respiratory and inflammatory diseases, thus the choice of 0.45–75 μm particle size range in this study (Amato et al., 2014; Zhang et al., 2019). The study area is marked by rapid population growth and significant tourist destination, thus serving as an important research location. Road dust was collected using a very efficient Delonghi Aqualand vacuum cleaner. Further information on the study area, sampling protocol and sample processing can be found in Gbeddy et al. (2018). 2.3. Sample extraction and analytes analysis The pressurized fluid extraction (PFE) method employed in this study was adapted and modified from a previous study conducted by Lundstedt et al. (2006) to simultaneously extract and separate parent 2
Journal of Hazardous Materials 383 (2020) 121154
G. Gbeddy, et al.
Table 1 QSAR Table. PAHs
Acenaphthene Acenaphthylene Anthracene Benz[a]anthracene Benzo[ghi]perylene Chrysene Fluorene Naphthalene Phenanthrene Pyrene 2-Methylnaphthalene 1-Methylnaphthalene Perylene Coronene 2-Nitrofluorene 5-Nitroacenaphthene 9-Nitroanthracene 3-Nitrofluoranthene 4H-Cyclopenta(def)phenanthren-4-one 7H-Benz[de]anthracen-7-one (Benzanthrone) Range
ID
ACE ACT ANT BaAN BghiPE CHR FLR NAP PHE PYR 2MNAP 1MNAP PRL CRN 2NFLR 5NAC 9NAN 3NFL CPPHN BANN
Selected Predictor Variables
Response Variable
Mw
Hv
BP
logVp
logTEF
TEF
154.2 152.2 178.2 228.3 276.3 228.3 166.2 128.2 178.2 202.3 142.2 142.2 252.3 300.4 211.2 199.2 223.2 247.3 204.2 230.3 128.2 ̶ 300.4
49.7 51.7 55.8 66.7 74.1 67.9 51.2 43.9 55.8 63 45.7 46 70.2 77 59.8 60.1 62.9 67.6 66.4 69.3 43.9 ̶ 77
279 280 339.9 437.6 486.3 448 295 217.9 340 404 241.1 240 442.8 493.5 495.9 478.3 543.2 586.7 375.8 403.2 217.9 ̶ 586.7
−2.66756154 −2.17522354 −5.18508682 −6.67778071 −9.05799195 −8.20551195 −3.22184875 −1.07058107 −3.91721463 −5.34678749 −1.25963731 −1.1739252 −8.2798407 −9.35654732 −11.5767541 −10.9430951 −13.2932822 −14.8996295 −5.57348874 −6.65560773 −14.89963 ̶ −1.07058107
−3.000 −3.000 −3.301 −2.301 −2.000 −2.000 −3.301 −3.000 −3.301 −3.000 −3.000 −2.602 −3.000 −2.000 −2.000 −2.000 −2.495 −2.585 −2.301 −2.409 −3.301 ̶ −2.000
0.001 0.001 0.0005 0.005 0.01 0.01 0.0005 0.001 0.0005 0.001 0.001 0.0025 0.001 0.01 0.01 0.01 0.0032 0.0026 0.005 0.0039 0.0005 ̶ 0.01
PAHs and carbonyl-PAHs from contaminated soil. A glass filter paper was placed at the outlet of the 5 mL extraction cell and a mixture of 2 g deactivated (2%) and 1 g activated alumina was packed into the cell for extract clean-up followed by another filter paper. A homogenous mixture of 0.15 g sample and 0.075 g activated diatomaceous earth was transferred into the cell. The packed cell was also spiked with 50 μL of 2 ng/μL recovery standards (RS) solution consisting of eight deuterated PAHs and TPPs and then allowed to equilibrate for 24 h in a desiccator prior to extraction. The loaded accelerated solvent extractor (Dionex 350 ASE) was pressurized to 17 MPa and heated to 120 °C within 6 min. The pressure and heat were held for 5 min during static extraction, with a flush volume of 100% followed by rinsing with more solvent (60% of cell volume) and purged with N2 for 90 s. Each cell was sequentially extracted with 100% dichloromethane (DCM) followed by DCM/methanol/acetone (1:1:1 v/v). The collected extracts were then evaporated to dryness using a gentle stream of nitrogen gas and the solvent phase changed by adding 0.9 mL of acetone, filtered and then transferred to a 2 mL glass vial. 100 μL of Fluoranthene-D10 and ChryseneD12 internal standard solution (1 ng/μL) was added to obtain a final volume of 1 mL. The calibration standards, extract, laboratory and solvent blanks were analysed using Thermo Scientific Triple Quadrupole (TSQ) 8000 Evo GC/MS System containing TG5Si/MS column (30mx0.25 mm ID x0.25 μm) with constant column flow of 1.2 mL/min. Splitless injection, full scan (50–650 amu), selective reaction monitoring (SRM) and electron ionization (EI) modes were used. The GC oven temperature program used included an initial temperature of 60 °C held for 1 min, increased to 200 °C at a rate of 5 °C per min, held for 1 min, and finally increased to 320 °C at a rate of 8 °C per min and held for 10 min. The data acquisition, reprocessing and report generation were done using Thermo Scientific TraceFinder 4.1 General Quan data system. The recovery standards (anthracene d10, pyrene d10, phenanthrene d10, 9-nitroanthracene d9, and benzophenone d10) were in the range of 50–129% in the road dust sample excluding naphthalene-d8, 1-nitropyrene d9 and 1-hydroxypyrene d9. The variability in the polarities of the analytes is a major influential factor in the varied recoveries. The analyte concentrations in this study were not corrected for any recovery lost as the corresponding deuterated recovery standards for some analytes were not available.
2.4. QSAR modeling and multivariate regression The QSAR model relates mathematically the TEF response variable (end-point) with the physicochemical properties of PAHs and TPPs as the predictor variables. According to Liu et al. (2006), a large and varied training set entailing the structural characteristics of the TPPs is essential for formulating a good QSAR model with enhanced predictive capacity. Furthermore, external and internal validations of the unambiguous QSAR algorithm are also required to guarantee reliability and predictive power of the model (Ayoko et al., 2007; Kunal et al., 2015). The modeling involves the application of multivariate regression methods such as multiple linear regression (MLR), principal component regression (PCR) and partial least-squares regression (PLSR). Multivariate regression consists of determining the relationship between two categories of variables, thereby enabling the prediction of unknown variables for new analytes. Three different regression methods are under discussion. First of all, multiple linear regression (MLR) entails finding a regression equation consisting of only linear terms from multivariate data where the number of predictors is smaller than the specimen or samples. Principal component regression (PCR) seeks to reduce the number of predictor variables by focusing on the first few principal components also known as latent variables instead of using the original variables in the algorithm. PCR is highly suitable when there is a significant correlation between predictor variables. Finally, partial least-squares regression (PLSR) is a linear combination of the latent predictor variables that are highly correlated with the response variables and also accounts for the variation in the predictor variables (Miller and Miller, 2010). PLSR condenses the predictors to a set of uncorrelated components (latent variables) based on the covariance between response and predictor variables. Least squares regression is then performed on the latent variables. Wold et al. (2001) noted that PLSR has the merit of analyzing data with noisy, collinear and incomplete values in both, predictor and response variables. It can also handle large predictor and response data matrices. PLSR utilizes several predictor variables to concurrently predict one or more response variables. Two PLSR algorithms, namely, PLSR1 and PLSR2, entailing the prediction of only one and several response variables at the same time, respectively, are commonly used. Ferrer et al. (1998) however, noted that PLSR1 is more effective in prediction than PLSR2. PLSR is highly applicable in QSAR modeling and in this regard, PLSR1 was deployed in 3
Journal of Hazardous Materials 383 (2020) 121154
G. Gbeddy, et al.
this study. Further information on the underlying model algorithms and relevant assumptions can be found in Ayoko et al. (2007); Helland (2001) and Wold et al. (2001). In this study, MLR and PLSR were employed in the QSAR modeling using Minitab® statistical software version 17.2.1. The 16 predictor variables under consideration are considered inadequate to compute PCR, thus its elimination in the model. The initial regression analysis was done repeatedly using MLR until all analytes with residuals greater than three times the standard deviation from the mean residual were identified and completely eliminated from the model development since these outliers could have significant negative impacts on the final QSAR model. The quality and goodness-of-fit of the generated QSAR model was assessed using relevant statistical metrics such as mean average error (MAE), determination coefficient (R2), adjusted determination coefficient (Ra2), variance ratio (F), standard error of estimate (s), and root mean square error of calibration (RMSEC). The reliability and predictability of the developed model was validated using cross-validation (leave-one-out (LOO)) method. For a small dataset such as the 20 observations and four predictors employed in this study, Hawkins et al. (2003) noted that withholding a fraction of the data from model development and validation is unnecessary since this may lead to loss of information. Additionally, it was demonstrated that cross-validation using all the observations was a better approach for determining QSAR model predictability. In this context, external validation of the developed model was considered inapplicable in this study. Therefore, predicted residual sum of squares (PRESS), cross-validated determination coefficient (Q2), and standard deviation of error of prediction (SDEP) based on LOO method (Kunal et al., 2015) were employed. These statistical metrics are further expatiated in the Data-in-Brief supplementary information. The proximity of PRESS statistic to zero is considered a good indicator of the predictive power of the QSAR models (Miller and Miller, 2010).
Table 2 Applicable parameters for the incremental lifetime cancer risk evaluation of PAHs.
(
3
BW 70
∑ BaPeq xCSFinhalation x ILCRinhalation =
(1)
3
BW 70
) xIR
inhalation xEFxED
(2)
BWxATxPEF
∑ BaPeq xCSFingestion x ILCRingestion =
(
(
3
BW 70
) xIR
BWxATx106
Body weight (BW) [kg] Exposure frequency (EF) [day year−1] Exposure duration (ED) [year] Inhalation rate (IRinhalation) [m3 day−1] Dust ingestion rate (IRingestion) [mg day−1] Dermal exposure area (SA) [cm2] Dermal adherence factor (AF) [mg cm−2] Dermal adsorption fraction (ABS) [unitless] Particle emission factor (PEF) [m3 kg−1] Carcinogenic slope factor ingestion (CSFingestion) of BaP [mg kg−1 day-1] Carcinogenic slope factor dermal uptake (CSFdermal) of BaP [mg kg−1 day−1] Carcinogenic slope factor inhalation (CSFinhalation) of BaP [mg kg−1 day-1] Averaging life span (AT) [years]
70 180 24 20 100 5700 0.07 0.13 1.36E+09 7.3
15 180 6 10 200 2800 0.2 0.13 1.36E+10 7.3
25
25
3.85
3.85
64.2
64.2
The applicability of QSAR model using two multivariate regression methods, namely, MLR and PLSR in accurately predicting TEFs for PAHs and TPPs, thereby facilitating the holistic assessment ofcancer health risks associated with the exposure to these pollutants was established. Thirty (30) species of PAHs and TPPs with corresponding logTEF data as indicated in Table S1 and their respective 16 physicochemical properties were pruned using FA as described in the Data-inBrief supplementary information and MLR in order to eliminate potential outliers and redundant variables. The data was reduced to 20 observations and 4 predictors as shown in Table 1 in order to obtain a 5:1 ratio. Ten (10) PAHs and TPPs species (benzo(e)pyrene, indeno [1,2,3-cd]pyrene, fluoranthene, dibenz[a,h]anthracene, benzo[b]fluoranthene, benzo[k]fluoranthene, benzo[a]pyrene, 6H-benzo[c,d]pyren6-one, 1-nitropyrene and 6-nitrochrysene) were found to be outliers by virtue of their significantly large residual values and were therefore excluded fromthe QSAR modeling. Kunal et al. (2015) noted that it is advisable to have a dataset with at least 3–4 log units of the response variable. As a result, logTEF was used during the initial QSAR model development followed by TEF response parameter since this will facilitate a comparison of the statistical significance of the model output in terms of quality, goodness-of-fit, robustness and predictability. The results of the comparison process helped to reach an informed decision on the choice of the most suitable data matrix for the final model development. The QSAR data in Table 1 was subjected to MLR and PLSR1 analysis using the Minitab software for logTEF and TEF responses, sequentially. The model outputs were used to estimate the nine (9) statistical metrics using Eqs. (1)–(9) as stipulated in the supplementary information. The results are shown in Table 3. A comparative evaluation of the QSAR models as indicated in Table 3 shows that in terms of model quality the MAE, s, and RMSEC for TEF response are lower compared to logTEF for both MLR and PLSR1. Secondly, R2, R, Ra2 and F at 95% confidence level wereall higher for TEF compared to logTEF, thus indicating that using TEF as the response variable leads to better quality and goodness-of-fit. Moreover, the PRESS and SDEP estimates for TEF response are low compared logTEF. The Q2 values for TEF response parameter are greater than 0.5 whilst those of logTEF are less than 0.5. This implies that the reliability and predictability of the QSAR model using TEF response parameter is relatively better for both, MLR and PLSR1 models. In this regard, QSAR modeling using TEF response offers more competitive statistical
) xSAxAFxABSxEFxED
BWxATx106
Child
3.1. QSAR model development and validation: application of multivariate regression
The incremental lifetime cancer risk (ILCR, unitless) due to parent-, methyl-, nitro-, carbonyl- and hydroxyl-PAHs exposure in road dust via dermal, inhalation and ingestion uptakes are evaluated using Eqs. (1)–(3) as stipulated by Bandowe and Nkansah (2016); USEPA (2002) and Wei et al. (2015).
∑ BaPeq xCSFdermal x
Adult
3. Results and discussion
2.5. Human health risk assessment using QSAR output
ILCRdermal =
Parameters, units
ingestion xEFxED
(3)
The benzo[a]pyrene equivalent concentration (BaPeq) of individual PAH analytes was estimated using Ci x TEFi where, Ci (g/g) is the concentration of each analyte in the road dust whilst TEFi is the toxicity equivalency factors of each analyte derived from the QSAR model and originally existing values to facilitate the comparison of results. The sum of BaPeq for all PAH analytes in each sample is represented as (∑ BaPeq ). All other terms in Eqs. (1)–(3) and their respective values for children and adult are specified in Table 2. The total carcinogenic risk in this study was estimated as the sum of ILCRdermal, ILCRinhalation and ILCRingestion (USEPA, 2002; Wei et al., 2015).
4
Journal of Hazardous Materials 383 (2020) 121154
G. Gbeddy, et al.
Table 3 Statistical evaluation of developed QSAR model. a. logTEF Response Model
Quality Statistics MAE
MLR 0.30 PLSR1 0.31 b. TEF Response MLR 0.0023 PLSR1 0.0024
Acceptability & Reliability Statistics
RMSEC
s
Ra 2
F
R
R2
PRESS
Q2
SDEP
0.352 0.355
0.41 0.41
0.58 0.28
2.97 2.88
0.67 0.66
0.44 0.43
4.221 3.092
0.05 0.30
0.46 0.39
0.00288 0.00290
0.0033 0.0034
0.65 0.65
9.91 9.68
0.85 0.85
0.73 0.72
0.000294 0.000212
0.51 0.65
0.0038 0.0033
MAE is mean average error; R2 is determination coefficient; R is regression correlation coefficient; Ra2 is adjusted R2; F is variance ratio; s is standard error of estimate; RMSEC is root mean square error of calibration; PRESS is predicted residual sum of squares; Q2 is cross-validated determination coefficient, and SDEP is standard deviation of error of prediction.
advantage over logTEF and therefore, constituted the focus of this study. For the analysis with TEF as response parameter, MAE, RMSEC, s, R2, R, Ra2 and F for both, MLR and PLSR1 are approximately the same as shown in Table 3. The overall significance of the regression coefficient (R) is only 0.23 higher for MLR than PLSR1. These observations indicate similar quality properties of QSAR model when both techniques are employed. However, in terms of the models acceptability and reliability, PLSR1 has lower PRESS and SDEP values than MLR. According to Miller and Miller (2010), the closer the PRESS value to zero, the higher its predictive power. The Q2 estimate for PLSR1 is also higher than that of MLR even though both values are greater than a predetermined value of 0.5. In this context, the application of PLSR1 in the QSAR modeling of TEF from the four predictor variables has statistical advantage and potential predictability capacity than MLR. In this regard, PLSR1 approach was selected as the preferred QSAR modeling technique in this study. This observation agrees with the conclusion by Wold et al. (2001) that PLSR1 is a viable QSAR technique especially for a dataset containing noisy, incomplete and collinear multiple predictor and response variables.
Fig. 1. PLSR1 plot for fitted (predicted) response and leave-one-out cross-validated response.
TEFpredicted = (−5.99E − 03) + (1.40E − 05 * Mw ) + (6.42E − 05 * Hv ) + (5.92E − 06 * BP ) − (1.48E − 04 * logVp)
(4)
3.2. Suitable QSAR model: PLSR1 3.2.1. Applicability domain (AD) of the developed PLSR1 model Applicability domain (AD) refers to the chemical space from which the PLSR1 QSAR model was derived whereby prediction is considered reliable due to the fact that QSAR is essentially a reductionist model and therefore, has limitations (Gadaleta et al., 2016; Kunal et al., 2015; OECD, 2007). In this study, predictor bounding-box range based AD method (Kunal et al., 2015) was used. This implies that observations with descriptor values outside the applicable ranges for the four (4) selected parameters Mw, Hv, BP and logVp were considered as out of scope. In this context, the ADs for Mw, Hv, BP and logVp in this study were 128.2–300.4 g/mol, 43.9–77 kJ/mol, 217.9–586.7 °C, and -14.89963-(-1.07058107) mmHg, respectively (see Table 1). The application of these ADs in the developed QSAR model potentially results in TEF values within 0.0005-0.01 range. The TEFs for twenty eight (28) untested PAHs and TPPs that are compatible with these ADs were predicted and the results are shown in Table 5.
The PLSR1 QSAR model shows that one latent component is significant accounting for 89% of the variance in the predictor variables (Table 4). The one component has the lowest PRESS value of 0.0002122 and was deployed in the PLSR1 modeling. The relatively close distance between the fitted (predicted) TEF values and the LOO cross-validated values as shown in Fig. 1 indicates that the PLSR1 model is highly appropriate and significant in predicting TEF. The residual plot for fitted TEF values shows that the points are normally and randomly distributed on both sides of zero indicating a good output from the model (see Fig. 2). The PLSR1 QSAR model developed in this study is shown in Eq. (4). The equation shows that Mw, Hv and BP have direct correlation with TEF whilst logVp has an inverse relationship with TEF. The applicability domain (AD) and mechanistic explanation of the developed model are provided in Sections 3.2.1 and 3.2.2 respectively.
3.2.2. Mechanistic interpretation of the developed PLSR1 model Mechanistic interpretation of the QSAR model assesses the possibility of a mechanistic relationship between the four (4) predictors and the TEF end-point (the mechanism by which the predictors affect the response parameter). Molecular weight (Mw) of PAHs and TPPs is a vital chemical feature due to its direct influence on many other physicochemical and toxicity properties as clearly evident in the FA data pretreatment presente in the Data-in-brief supplementary information. TEF generally increases with increasing Mw due to the fact that compounds
Table 4 PLSR1 model selection and validation for TEF. Components
Predictor variance
PRESS
1 2 3 4
0.888267
0.0002122 0.0002643 0.0002845 0.0002940
5
Journal of Hazardous Materials 383 (2020) 121154
G. Gbeddy, et al.
Fig. 2. PLS residual plots for the TEF response variable.
Hv and BP can be postulated to have analogous direct variation with TEF. Finally, vapour pressure (logVp) has an inverse relationship with Mw and therefore, correlates negatively with TEF. Analytes with lower logVp tend to be more semi-volatile and therefore, unlikely to accumulate in tissues (Bates et al., 2008).
with higher Mw have higher octanol-water partitioning coefficient (logKow > 3.5), thus are more lipophilic and tend to accumulate more in fatty tissues (Forsgren, 2015). Similarly, there is a direct positive correlation between enthalpy of vapourization (Hv), boiling point (BP) and Mw (Hanshaw et al., 2008; Roux et al., 2008). In this regard, both Table 5 Predicted TEF values for untested PAHs and TPPs. PAHs
Picene Benzo[c]phenanthrene 7,12-Dimethylbenz[a]anthracene 3-Methylcholanthrene Retene Biphenyl 3,6-Dimethylphenanthrene 1,3-Dimethylnaphthalene Benzo[a]fluoren-11-one 9,10-Anthraquinone 2-Hydroxybiphenyl 1-Hydroxypyrene 1,8-Dihydroxyanthraquinone (Danthron) 1-Nitronaphthalene 1-Indanone 1-Naphthaldehyde 2-Biphenylcarboxaldehyde 1,2-Acenaphthenequinone 2-Methyl-9,10-anthraquinone Benzo[a]anthracene-7,12-dione 5,12-Naphthacenequinone 9-Nitrophenanthrene 2,7-Dinitrofluorene 4H-Cyclopenta(def)phenanthrene 1-Methylphenanthrene 1,4-Naphthaquinone 9-Fluorenone 4-Hydroxybiphenyl
ID
PIC BcPH DMBA MCHO RT BPN DMPH DMNA BaFN AQN 2HBP HPYR DHAQ NNAP IDN 1NTD BPCX ACNQ 2MAQN BAND NTCQ 9NPHE DNFL CPPHE 1MPHE NQN 9FLN 4HBP
Predictor Variables
PLSR1_PredTEF
Mw
Hv
BP
logVp
278.4 228.3 256.3 268.4 234.3 154.2 206.3 156.2 230.3 208.2 170.2 218.3 240.2 173.2 132.2 156.2 182.2 182.2 222.2 258.3 258.3 223.2 256.2 190.2 192.3 158.2 180.2 170.2
76.2 66.7 69.7 74.7 61.7 47.6 58.5 48.1 68.7 62.5 54.2 72.1 73.9 52.3 48.1 53.1 57.5 60.9 66 73.5 73.5 64 68.3 57.4 57.5 53.8 58.5 56.9
519 448 422.4 438.3 390 257 351.49 263 403.2 377 286 406.7 433.3 472.3 243 292 324.5 350.3 375.2 434.5 434.5 543.2 551.6 353 339.8 301.2 331.7 305
−11.9914 −8.20551 −6.59688 −7.36653 −5.5784 −2.04915 −5.11976 −1.63451 −6.41229 −6.93554 −2.69897 −8.31605 −10.1192 −10.7282 −1.57349 −2.60206 −3.73283 −6.46852 −6.05899 −7.41117 −7.45593 −13.2933 −13.6021 −4.82974 −4.82391
6
0.008 0.005 0.006 0.006 0.004 0.001 0.003 0.001 0.005 0.004 0.002 0.005 0.006 0.004 0.001 0.002 0.003 0.003 0.004 0.006 0.006 0.006 0.007 0.003 0.003 0.001 0.002 0.002
Journal of Hazardous Materials 383 (2020) 121154
G. Gbeddy, et al.
Table 6 Human health risk assessment using QSAR model and original TEF estimates. a) Contribution of PAHs and different TPPs to the ∑BaPeq using QSAR modeled and original TEFs for outliers
∑BaPeq
Parent-PAHs
Alkyl-PAHs
N-PAHs
C-PAHs
HO-PAHs
4.92E-07
7.09E-10
1.06E-07
3.91E-09
3.96E-09
b) Risk assessment using QSAR model and original TEFs for outlier PAH data source
Risk assessment using QSAR model and some original TEF values ILCRinhalation
ILCRdermal
This study
ILCRingestion
Total cancer risk
Child
Adult
Child
Adult
Child
Adult
Child
Adult
7.41E-10
1.51E-05
1.15E-15
3.30E-14
5.94E-10
4.26E-10
1.34E-09
1.51E-05
c) Risk assessment using only original TEF values PAH data source
Risk assessment using only original TEF values ILCRinhalation
ILCRdermal
This study
ILCRingestion
Total cancer risk
Child
Adult
Child
Adult
Child
Adult
Child
Adult
7.16E-10
7.30E-10
1.11E-15
3.19E-14
5.74E-10
4.11E-10
1.29E-09
1.14E-09
negligible risk conclusion. This observation shows that the reliance on only the originally available TEF values is likely to lead to inappropriate cancer risk inferences and conclusion due to the environmental exposure of these hazardous pollutants. For example, Bandowe and Nkansah (2016) reached a conclusion of potential and high cancer risk to children and adults in Kumasi, a major city in Ghana where the sources of PAHs and TPPs are pervasive, by employing only the originally existing TEF values in the risk assessment. Similarly, Wei et al. (2015) observed a potential cancer risk to the inhabitants of Xi’an, China, via exposure to PAHs and TPPs in urban road dust using only the originally available TEF values. The holistic evaluation of the cancer risk using the QSAR model results from this study is likely to depict a more accurate and critical nature of the exposure potential to PAH and TPPs laden road dust in order to engender the most appropriate mitigation measures. Therefore, under estimation of cancer risk can result in the inadequate deployment of mitigation measures. Consequently, the holistic evaluation of the risk posed by micro-organic pollutants such as PAHs and TPPs cannot be discounted.
3.3. Human carcinogenic risk assessment using selected QSAR model and original TEF estimates The outcomes of the cancer risk assessment due to PAHs and TPPs exposure indicate that the exclusion of TPPs during the assessment process can potentially lead to risk underestimation and attainment of wrong conclusions. The TPPs evaluated contributed to the ∑BaPeq in decreasing order of N-PAHs > HO-PAHs > C-PAHs as indicated in Table 6a in conformity with their mean concentration trend of N-PAHs (6.30E-07 g/g) > HO-PAHs (5.86E-07 g/g) > C-PAHs (3.88E-07 g/g). The results for the human carcinogenic risk assessment using the combined QSAR model TEF estimates and the original TEF values for the 10 outliers shows that the ILCR due to dermal, inhalation and ingestion exposures to PAHs and TPPs is higher in all instances relative to using only the originally available TEF values (see Table 6a and b). Most of these outliers belong to heavy molecular weight (HMW) class of PAHs and TPPs with corresponding TEF values greater than 1 (Table 1). It is therefore important to use their originally available TEF values in the risk assessment instead of the predicted values in order to ensure accurate outcome, thereby preventing underestimation of the cancer risk posed by these deleterious micro-organic pollutants. Secondly, PAHs and TPPs (6-nitrobenz(a)pyrene, dibenzo[a,h]pyrene, dibenzo [a,i]pyrene, dibenzo[a,l]pyrene, 2-methylphenanthrene and 1,2,3,4tetrahydronaphthalene) that are incompatible with the AD for predicting TEF could potentially have TEF values greater than one (TEF > 1) similar to the outliers identified in this study. In this regard, these analytes could be subjected to future toxicology tests in order to determine their TEFs. Cancer risk has been demarcated into three categories by various regulatory bodies as high risk, potential risk and negligible risk for ILCR value ranges of > 10−4, 10−4 to 10-6 and ≤ 10-6, respectively (Bandowe and Nkansah, 2016; Wei et al., 2015). The ILCR via dermal, inhalation and ingestion exposures and the total cancer risk for both children and adults (see Table 6a and b) calculated using the concentrations of the analytes from this study and the QSAR predicted TEF and original TEF values for the ten outliers is relatively higher than that obtained using only the originally existing TEF values. With respect to children, the total cancer risk is negligible in both scenarios. However, there exists a potential cancer risk to adults when the predicted and outlier TEF values are used complimentarily whilst the usage of only the original TEF values in the risk assessment would have led to a
3.4. Study limitations It must be noted other potential predictor variables such as ionization potential, electron affinity, total surface area, solvent accessible area, solvent accessible volume, length/breadth ratio, sub-cooled vapor pressure, octanol-air partition coefficient among others were not included in the QSAR modeling due to the lack of comprehensive data for all analytes in this study. Even though the TEF values for hydroxylPAHs (H-PAHs) have been estimated in this study, there was no original TEF value for these pollutants in literature to facilitate their inclusion in the training set. This study may be limited to PAHs and TPPs with TEF values less than one (1). Other forms of TPPs and polycyclic aromatic compounds (PACs) exist, however, this study focused on alkyl-, nitro-, carbonyl- and hydroxyl-PAHs. Therefore, the QSAR concept presented is research can be extended to other PACs thereby facilitating the holistic health risk assessment of PACs in the environment. 4. Conclusions This research outcome shows that QSAR modeling via the application of multivariate regression is highly useful in determining the TEF values for most PAHs and TPPs that are currently unavailable within 7
Journal of Hazardous Materials 383 (2020) 121154
G. Gbeddy, et al.
the AD. We observed that PLSR1 has greater statistical power in predicting TEF than MLR. Furthermore, the application of QSAR predicted TEF estimates in assessing human carcinogenic risk is a potent and viable approach in the holistic assessment of the health risk associated with exposures to PAHs and TPPs. PAHs and TPPs outside the AD of the developed QSAR model in this study may be eligible for future toxicological tests due to their high probability of having TEF > 1. The outcomes of this study should assist policy makers, research scientists and regulators in managing and prioritizing the risk posed to the population by exposure to the deleterious PAHs and TPPs.
Ayoko, G.A., Singh, K., Balerea, S., Kokot, S., 2007. Exploratory multivariate modeling and prediction of the physico-chemical properties of surface water and groundwater. J. Hydrol. 336, 115–124. Bandowe, B.A., Nkansah, M.A., 2016. Occurrence, distribution and health risk from polycyclic aromatic compounds (PAHs, oxygenated-PAHs and azaarenes) in street dust from a major West African Metropolis. Sci. Total Environ. 553, 439–449. Bates, M., Bruno, P., Caputi, M., Caselli, M., de Gennaro, G., Tutino, M., 2008. Analysis of polycyclic aromatic hydrocarbons (PAHs) in airborne particles by direct sample introduction thermal desorption GC/MS. Atmos. Environ. 42, 6144–6151. Bortey-Sam, N., Ikenaka, Y., Akoto, O., Nakayama, S.M., Yohannes, Y.B., Baidoo, E., et al., 2015. Levels, potential sources and human health risk of polycyclic aromatic hydrocarbons (PAHs) in particulate matter (PM(10)) in Kumasi, Ghana. Environ. Sci. Pollut. Res. Int. 22, 9658–9667. ChemSpider.com, 2019. ChemSpider Search and Share Chemistry. Accessed on March 8, 2019. . Cochran, R.E., Dongari, N., Jeong, H., Beranek, J., Haddadi, S., Shipp, J., et al., 2012. Determination of polycyclic aromatic hydrocarbons and their oxy-, nitro-, and hydroxy-oxidation products. Anal. Chim. Acta 740, 93–103. Delistraty, D., 1997. Toxic equivalency factor approach for risk assessment of polycyclic aromatic hydrocarbons. Toxicol. Environ. Chem. 64, 81–108. Ferrer, R., Beltrán, J.L., Guiteras, J., 1998. Multivariate calibration applied to synchronous fluorescence spectrometry. Simultaneous determination of polycyclic aromatic hydrocarbons in water samples. Talanta 45, 1073–1080. Forsgren, A.J., 2015. Wastewater Treatment: Occurrence and Fate of Polycyclic Aromatic Hydrocarbons (PAHs). CRC Press Taylor & Francis Group A Series. Gadaleta, D., Mangiatordi, G.F., Catto, M., Carotti, A., Nicolotti, O., 2016. Applicability domain for QSAR models. Int. J. Quant. Struct. Relatsh. 1, 45–63. Gbeddy, G., Jayarathne, A., Goonetilleke, A., Ayoko, G.A., Egodawatta, P., 2018. Variability and uncertainty of particle build-up on urban road surfaces. Sci. Total Environ. 640-641, 1432–1437. Hanshaw, W., Nutt, M., Chickos, J.S., 2008. Hypothetical thermodynamic properties. Subcooled vaporization enthalpies and vapor pressures of polyaromatic hydrocarbons. J. Chem. Eng. Data 53, 1903–1913. Hawkins, D.M., Basak, S.C., Mills, D., 2003. Assessing model fit by cross-validation. J. Chem. Inf. Comput. Sci. 43, 579–586. Helland, I.S., 2001. Some theoretical aspects of partial least squares regression. Chemom. Intell. Lab. Syst. 58, 97–107. Jung, K.H., Yan, B., Chillrud, S.N., Perera, F.P., Whyatt, R., Camann, D., et al., 2010. Assessment of benzo(a)pyrene-equivalent carcinogenicity and mutagenicity of residential indoor versus outdoor polycyclic aromatic hydrocarbons exposing young children in New York City. Int. J. Environ. Res. Public Health 7, 1889–1900. Kunal, R., Supratik, K., Rudra, N.D., 2015. Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment. Academic Press, an imprint of Elsevier, Amsterdam; Boston. Liu, H., Papa, E., Gramatica, P., 2006. QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. Chem. Res. Toxicol. 19, 1540–1548. Lundstedt, S., Haglund, P., Oberg, L., 2006. Simultaneous extraction and fractionation of polycyclic aromatic hydrocarbons and their oxygenated derivatives in soil using selective pressurized liquid extraction. Anal. Chem. 78, 2993–3000. Mehmood, T., Liland, K.H., Snipen, L., Sæbø, S., 2012. A review of variable selection methods in Partial Least Squares Regression. Chemom. Intell. Lab. Syst. 118, 62–69. Miller, J.N., Miller, J.C., 2010. Statistics and Chemometrics for Analytical Chemistry, sixth edition. Pearson Education Limited, England. OECD, 2007. Guidance Document on the Validation of (quantitative) Structure-activity Relationships: (Q)SAR Models. OECD Environment Health and Safety Publications Series on Testing and Assessment No. 69. ENV/JM/MOMO(2007)2. Roux, M.V., Temprado, M., Chickos, J.S., Nagano, Y., 2008. Critically evaluated thermochemical properties of polycyclic aromatic hydrocarbons. J. Phys. Chem. Ref. Data 37, 1855–1996. USEPA, 2002. Supplemental Guidance for Developing Soil Screening Levels for Superfund Sites. Office of Soild Waste and Emergency Response, pp. 4–24 OSWER 9355. Wei, C., Bandowe, B.A., Han, Y., Cao, J., Zhan, C., Wilcke, W., 2015. Polycyclic aromatic hydrocarbons (PAHs) and their derivatives (alkyl-PAHs, oxygenated-PAHs, nitratedPAHs and azaarenes) in urban road dusts from Xi’an, Central China. Chemosphere 134, 512–520. Wold, S., Sjӧstrӧm, M., Eriksson, L., 2001. PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130. Zhang, J., Li, R., Zhang, X., Bai, Y., Cao, P., Hua, P., 2019. Vehicular contribution of PAHs in size dependent road dust: a source apportionment by PCA-MLR, PMF, and Unmix receptor models. Sci. Total Environ. 649, 1314–1322.
Data statement The sixteen predictor physicochemical properties of PAHs and transformed PAH products (TPPs) were extracted from ChemSpider website and duly referenced as per the website’s recommendation. The toxicity equivalency factors (TEF) response variable were obtained from Bortey-Sam et al. (2015) and Wei et al. (2015). The PAH and TPP (alkyl-, nitro-, carbonyl- and hydroxyl-PAHs) concentrations used in this study were measured in urban residential road dust of 0.45–75 μm particle size collected from the Gold Coast area of Queensland, Australia. The authors can testify that there is no real or perceived conflict of interest such as personal, financial and connection to person(s) or institution(s) that may have impacted negatively on the outcome of this research. Declaration of Competing Interest The authors can testify that there is no real or perceived conflict of interest such as personal, financial and connection to person(s) or institution(s) that may have impacted negatively on the outcome of this research. Acknowledgements The authors will like to acknowledge the Queensland University of Technology (QUT) for providing the postgraduate research award to Gustav Gbeddy to undertake this study. The Central Analytical Research Facility (CARF) under the Institute of Future Environments, QUT where the data employed in this paper were acquired is highly appreciated. Access to CARF was facilitated by generous funding from the Science and Engineering Faculty, QUT. The significant role of the Ghana Atomic Energy Commission (GAEC) is highly appreciated for granting study leave to Gustav Gbeddy in order to embark upon this study. References Amato, F., Alastuey, A., de la Rosa, J., Sánchez de la Campa, A.M., Pandolfi, M., Lozano, A., et al., 2014. Trends of road dust emissions contributions on ambient air particulate levels at rural, urban and industrial sites in southern Spain. Atmos. Chem. Phys. 14, 3533–3544. Andersson, J.T., Achten, C., 2015. Time to say goodbye to the 16 EPA PAHs? Toward an up-to-Date use of PACs for environmental purposes. Polycycl. Aromat. Compd. 35, 330–354.
8