Ecotoxicology and Environmental Safety 190 (2020) 110179
Contents lists available at ScienceDirect
Ecotoxicology and Environmental Safety journal homepage: www.elsevier.com/locate/ecoenv
Development of pp-LFER and QSPR models for predicting the diffusion coefficients of hydrophobic organic compounds in LDPE
T
Tengyi Zhua, Yue Jianga, Haomiao Chenga, Rajendra Prasad Singhb, Bipeng Yana,∗ a
Jiangsu Provincial Laboratory of Water Environmental Protection Engineering, School of Environmental Science and Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China b School of Civil Engineering, Southeast University, Nanjing, 210096, China
A R T I C LE I N FO
A B S T R A C T
Keywords: Hydrophobic organic contaminants (HOCs) Diffusion coefficient (D) Quantitative structure-property relationship (QSPR) Polyparameter linear free energy relationship (pp-LFER)
Diffusion coefficient (D) is important to evaluate the performance of passive samplers and to monitor the concentration of chemicals effectively. Herein, we developed a polyparameter linear free energy relationship (pp-LFER) model and a quantitative structure-property relationship (QSPR) model for the prediction of diffusion coefficients of hydrophobic organic contaminants (HOCs) in low density polyethylene (LDPE). A dataset of 120 various chemicals was used to develop both models. The pp-LFER model was developed with two descriptors (V and E) and the statistical parameters of the model showed satisfactory results. As a further exploration of the diffusion behavior of the compounds, a QSPR model with five descriptors (ETA_Alpha, ASP-6, IC1, TDB6r and ATSC2v) was constructed with adjusted determination coefficient (R2 ) of 0.949 and cross-validation coefficient 2 ) of 0.941. The regression results indicated that both models had satisfactory goodness-of-fit and robustness. (QLoo This study proves that pp-LFER and QSPR approaches are available for the prediction of log D values for the hydrophobic organic compounds within the applicability domain.
1. Introduction As the additives or products of metabolism in human activities and chemical industries, large quantities of hydrophobic organic contaminants (HOCs) are transferred into the environment (Hale et al., 2010; Golfinopoulos et al., 2003). These pollutants produce toxic effects in organisms within water bodies (Konstantinou et al., 2006), sediments (Liu et al., 2013) and other environmental media (Hung et al., 2005; Perihan Binnur et al., 2006). To enhance the ability to assess ecological risks and to make environmental management decisions, measuring the concentration of these pollutants has been an indispensable course (Liu et al., 2017). The passive sampling technology has been utilized worldwide as a powerful approach for detecting HOC concentrations in water (Kot-Wasik et al., 2007; Rusina et al., 2010). Some of the most common polymers such as polyoxymethylene (POM), polydimethylsiloxane (PDMS) and low-density polyethylene (LDPE) (Thompson et al., 2015), have played a substantial role in sampling various HOCs in water bodies. Among them, LDPE is a single-phase device material, which is characterized by its cheapest price, simplest sorption behavior and the most convenient to deploy (Hale et al., 2010; Zhu et al., 2018). For these reasons, LDPE has become a convenient sampling material for measuring the concentration of HOCs in the
∗
aqueous environment. Diffusion coefficients (D) are used in models of chemical uptake and release for organisms and for passive samplers, because diffusion influences the rate at which contaminants can traverse biotic (e.g., lipidwater) or polymer-water phase boundaries (Nabi and Arey, 2017). The D in polymer is in proportion to the mass transfer coefficient and is therefore directly proportional to the sampling rates (Huckins et al., 2006). This means that the higher the D value in a polymer is, the faster the sampling rate will be. Once the HOC has left the gas- or dissolved phase, it diffuses within the PE matrix. Yet, as LDPEs become more widely used, applications might arise where the samplers are deliberately used in a membrane-controlled uptake mode. Thus, it is significant to better comprehend the diffusion in the LDPE membrane (Lohmann, 2012). However, the diffusion behavior of substances is influenced by many factors, including the free volume and the segmental mobility in the phase chain of passive sampling materials. Therefore, achievement of an accurate measurement of D values by experimental methods is an arduous task. Considering that the D values are mainly available from costly experimental determination, establishing a computer prediction model based on the physicochemical properties of chemical substances will provide a more highly efficient method for future studies.
Corresponding author. E-mail address:
[email protected] (B. Yan).
https://doi.org/10.1016/j.ecoenv.2020.110179 Received 8 September 2019; Received in revised form 31 December 2019; Accepted 5 January 2020 0147-6513/ © 2020 Elsevier Inc. All rights reserved.
Ecotoxicology and Environmental Safety 190 (2020) 110179
T. Zhu, et al.
In fact, only a few methods have been explored to extrapolate D values in previous reports (Hale et al., 2010; Lohmann, 2012; Rusina et al., 2010; Thompson et al., 2015). For example, the relationship between log D and total surface area (TSA) of polycyclic aromatic hydrocarbons (PAHs) and polychlorinated biphenyls (PCBs) was established by Rusina et al. (2010) with the determination coefficient R2 of 0.84. To predict log D values in a larger dataset that covered 76 compounds of PAHs and PCBs, a further attempt was made by Lohmann (2012) with R2 of 0.76. However, these models have only one parameter which may come from various possible sources. As a result, discrepancies in this single parameter will further complicate the linear relationships. Moreover, the amount of the experimental data in most previous studies was small. Consequently, as the number of compounds was not sufficient and the statistical parameter such as R2 had not received the best results, another alternative needs to be explored. The polyparameter linear free energy relationship (pp-LFER) (Abraham, 1993; Abraham et al., 2004; Abraham and Mcgowan, 1987) is a mainstream method that is utilized widely to analyze the distribution process (Chen et al., 2014; Sprunger et al., 2007). Abraham descriptors include about 8000 chemicals (Nabi and Arey, 2017), however, more than 14,000 chemicals may be released, transferred or deposited into water bodies and resulted in toxic effects to aquatic organisms (Hayward et al., 2006). Thus, Abraham descriptors are unavailable for a number of compounds, i.e., emerging nonpolar contaminants (Howard and Muir, 2010), polyhalogenated dibenzofurans (PHDFs) (Greim, 1997; Van den Berg et al., 2013) and polybrominated diphenylethers (PBDEs) (Nabi and Arey, 2017). On account of the descriptor limitations of the pp-LFER model, in this work, quantitative structure-property relationship (QSPR) model is developed as another tool for predicting log D with its wide applications (Chao et al., 2018). QSPR is a method to establish relationships between the chemical molecular structure and its physicochemical properties, environmental behavioral parameters and toxicity at the lowest computational cost (Sabljic, 2001; Liu et al., 2016). In recent years, at least several hundreds of QSPR models have been successfully utilized to predict the biological and physicochemical properties of some compounds between molecular descriptors and different functional endpoints. (Lyman et al., 1990; Leo and Hoekman, 1995; Lyman et al., 1990; Pavan et al., 2008; He et al., 2018; Wei et al., 2018; Zhu et al., 2018; Ling et al., 2019). In the current study, to further explore the diffusion behavior of HOCs in LDPE, we construct a pp-LFER model and a QSPR model to predict the D values on the basis of the Organization for Economic Cooperation and Development (OECD) principles (Hermens et al., 1995; OECD, 2007; Hermens et al., 1995). Within the application domain characterization and the mechanism explanation, the feasibility and reliability of the models were demonstrated. Moreover, the obtained models were validated and compared with other predicted models from previous reports.
Fig. 1. Distribution of observed log D values of the data set.
(SD) was 0.3 and the mean was −13.0 (Fig. 1). Following the OECD guidelines, 80% of the total data were randomly distributed to a training set (96 chemicals), which were utilized to develop models; other compounds were used for validation set (24 chemicals), which were utilized to evaluate the predictive capacity of the models. 2.2. Descriptor generation The fundamental hypothesis behind QSAR models holds that the activities of molecules are mainly caused by their structures, which are referred to as molecular descriptors. Hence, the molecular descriptors can be calculated using molecular structures. Prior to calculating the descriptors, the molecular structures of all compounds should be employed (Wang et al., 2018). Firstly, the mol files of the molecules obtained from NIST Chemistry WebBook (https://webbook.nist.gov/ chemistry/) were converted to simplified molecular input line entry system (SMILES) format. The molecular structures of all compounds were pre-optimized by the “Minimize Energy” method in ChemBio 3D Ultra (Version 12.0) (Schnur et al., 1991). Then, further optimization was performed by employing the PM7 method in MOPAC 2016 program and the following parameters were selected: PM7 eps = 78.6, EF, GNORM = 0.01, and POLAR MULLIK SHIFT = 80, which is built into ChemBio3D Ultra (Version 12.0). After optimizing the stabilized structure of these compounds, the molecular descriptors were calculated with PaDEL-Descriptor software (Version 2.21) (Yap, 2011). According to the calculation procedure, 1305 descriptors were retained for the models development. In a pool of this 1305 descriptors, two preprocesses were performed to remove uninformative descriptors before subsequent analysis (Cao et al., 2017): (1) the constant descriptors, almost-constant descriptors and descriptors with missing values were deleted, and (2) based on the selected features, if the correlation coefficient of a pair of descriptors was higher than 0.95, only one was retained. Finally, 823 descriptors were retained and used for the further variable selection by the analysis method of stepwise multiple linear regression (MLR).
2. Material and methods 2.1. Dataset collection and conversion After an in-depth literature survey, experimental data on the log D values for 120 chemicals were obtained, including 25 PAHs, 39 PCBs (Rusina et al., 2007, 2010), 24 organochlorine pesticides (OCPs) (Howell et al., 1985; Lohmann, 2012; Thompson et al., 2015), 26 emerging contaminants (Pintadoherrera et al., 2016) and 6 other compounds including PBDE 47 (Valderrama et al., 2016). These experimental log D values for all the compounds are listed in the Supporting Information. Since, under the same experimental conditions, some collected data were measured by different determination methods and from different laboratories for the same chemical, this paper utilized the average of these log D values. All data followed a significant normal distribution (p ≤ 0.001), meanwhile the standard deviation
2.3. Models development The widely used pp-LFER is established by Abraham et al. (2004). Three forms of free energy relationships can be used to express the basic model. 2
Ecotoxicology and Environmental Safety 190 (2020) 110179
T. Zhu, et al.
Table 1 Obtained QSPR model, pp-LFER models and statistical parameters. Model
Training set ntra
pp-LFER
Validating set R2tra
Q2LOO
log D = - 0.881V - 0.262E - 10.79 96 0.815 0.776
Q2BOOT
RMSEtra
next
R2ext
Q2ext
RMSEext
0.791
0.059
24
0.703
0.620
0.292
0.845
0.768
0.099
log D = - 0.194E - 0.123S + 0.454A + 0.124B - 0.878V - 10.797
QSPR
96 0.824 log D = - 0.303ETA_Alpha +12.057ASP-6 + 0.111 IC1 + 0.093TDB6r + 0.000222ATSC2v - 11.243 96 0.949 0.941 0.784 0.016 24
SP = eE + sS + aA + bB + vV + c
(1)
SP = eE + sS + aA + bB + lL + c
(2)
SP = sS + aA + bB + vV + lL + c
(3)
independent indicator can be used to further assess the predictive performance after model development. 3. Results and discussion
The dependent variable, SP, represents characteristics of the solutes in passive sampling materials. For LDPE applications, the SP is the logarithm of LDPE/water diffusion coefficient (log D). The uppercase letters (E, S, A, B, V, L) characterize the solute properties: E, excess molar refraction; S, dipolarity/polarizability parameter; A and B, solute H-bond acidity and H-bond basicity; V, McGowan volume in units of (cm3 mol−1)/100; and L, the logarithmic hexadecane-air partition coefficient (Endo and Goss, 2014). The lower case letter c is the regression constant and the other lowercase letters are regression coefficients referred to as system parameters. Eq. (1) was utilized for diffusion between two condensed phases; meanwhile, Eq. (2) was utilized for diffusion between the condensed and gas phases. Eq. (3) processed both condensed-condensed and gas-condensed phase transfers. Hence, for the pp-LFER model, the Abraham solute descriptors values were utilized as independent variables, which were referenced from PaDELDescriptor software (Version 2.21) (Yap, 2011) and previous reports (Abraham and Mcgowan, 1987; Platts et al., 1999). For the QSPR model in this work, 823 calculated molecular descriptors were used as independent variables. Based on the MLR built into SPSS software (SPSS 20.0), the variables were selected to establish the optimal predictive models with the log D as the dependent variable. The optimal model would have max2 imum values of Radj , minimum values of root mean squared error (RMSE), the fewest number of independent variables and VIF < 10 for the predictor variables.
3.1. pp-LFER and QSPR models In view of the above methods, the constructed optimum pp-LFER model and QSPR model were established. The optimum model result from pp-LFER with two descriptors (V and E) was established as follows:
log D = − 0.881V − 0.262E − 10.79
(5)
2 2 2 ntra = 96, Radj = 0.815, QLOO = 0.776, QBOOT = 0.791, RMSEtra 2 2 = 0.059, p < 0.001; next = 24, Rext = 0.703, Qext
= 0.620, RMSEext = 0.292 In the training set, the obtained model had statistical parameters 2 2 2 with Radj , QLOO and QBOOT values of 0.815, 0.776 and 0.791. For the 2 2 were 0.703 and Qext were 0.620, which exhibited validation set, the Rext high goodness-of-fit and robustness of the model. For the pp-LFER model, the training set and the validation set have the RMSE values of 0.059 and 0.292, respectively. According to the standard (Q2 > 0.6, R2 > 0.7) proposed by Chirico and Gramatica (2012), the pp-LFER model with two descriptors (V and E) showed good predictive capacity. From Table 1, it should be noted that, upon comparison of the pp-LFER model with five descriptors, this pp-LFER model was developed with similar R2 values and fewer descriptors. To further explore the method of predicting the log D values effectively, a QSPR model with five descriptors (ETA_Alpha, ASP-6, IC1, TDB6r and ATSC2v) was established as follows:
2.4. Model validation To allow model comparison and selection, on the basis of the OECD 2 2 , RMSE, QLoo (leave-one-out guidelines, the statistical parameters Radj 2 cross-validated coefficient), bootstrap method (QBOOT ) (1/5, 5000 2 iterations), and Qext (external explained variance) were calculated to assess the performance of the established models (Liu et al., 2016). In addition, the variable inflation factor (VIF), which describes the degree of multicollinearity, also played a key role in developing and evaluating of the models (Ou et al., 2018). Moreover, the method for calculating the Williams plot and the statistical parameters of the developed models were shown in the Supporting Information. To avoid accidental correlation, the Y-randomization technique was used to further evaluate the robustness of the model (Roy and Kabir, 2012). In this test, the dependent-variable vector, Y-vector, is randomly shuffled, and a new QSPR model is developed using the original independent-variable matrix. The process is repeated several times (Tropsha et al., 2003). If R2YS (Q2YS) is much smaller than the performance parameter R2adj (Q2LOO) of the original model, then, it is considered that there is a real QSPR relationship in the original sample data and the built model is robust (Patil et al., 2018; Sabatino et al., 2018). Note that all the statistical parameters mentioned above could act as
log D = − 0.303ETA _Alpha + 12.057ASP − 6 + 0.111 IC1 + + 0.093TDB6r 0.000222ATSC 2v − 11.243 (6) 2 2 2 ntra = 96, Radj = 0.949, QLOO = 0.941, QBOOT = 0.784, RMSEtra 2 2 = 0.016, p < 0.001; n ext = 24, Rext = 0.845, Qext
= 0.768, RMSEext = 0.099. 2 2 2 In this model, the statistical parameters Radj , QLOO and QBOOT values 2 were 0.949, 0.941 and 0.784 in the training set, and Rext were 0.845 2 and Qext were 0.768 in the validation set, which indicated that the QSPR model had a good external prediction performance. Moreover, for this QSPR model, the training set and the validation set have the RMSE values of 0.016 and 0.099, respectively, which revealed an effective improvement over the pp-LFER model. The mean absolute error (MAE), RMSE, and standard error (SE), which implied the validity and reasonability of the models, was utilized to assess the models and make comparison. Moreover, the RMSE and MAE values in the training set
3
Ecotoxicology and Environmental Safety 190 (2020) 110179
T. Zhu, et al.
The evaluation results of the application domain, characterized by Euclidean distance, were presented in Fig. 3a and Fig. 3b, which shown that all the compounds in the dataset were within the acceptable region. The results implied that the training set for both the pp-LFER model and QSPR model had good representativeness. For the pp-LFER model, the results in the Williams plot shown (Fig. 4a) that there were no compounds with leverage values (h) higher than the warning value h* (the warning leverage h* = 0.09). However, three compounds (Indeno[1,2,3-cd]pyrene, acenaphthene-d10 and fluorene-d10) were identified as the outliers since the standard residuals (δ) exceeded the range of standard residuals (−3, 3). Meanwhile, from the Williams plot of the QSPR model (Fig. 4b), it should be noted that two outliers (fluorene-d10 and Benzophenone-3 (BP-3)) had values of |δ| > 3. The overestimated value for Indeno[1,2,3-cd] pyrene, Benzophenone-3 (BP-3), acenaphthene-d10 and fluorene-d10 may due to the limited data points for these structurally similar compounds (Liu et al., 2017). Additionally, there were no compounds with h > h* (h* = 0.19) in QSPR model, signifying the excellent extrapolating capacity of this model.
Table 2 Analysis of mean absolute error (MAE) in data set. Model
n
Dataset range
MAE
RMSE
SE
pp-LFER QSPR
120 120
3.30 3.30
0.240 0.116
0.059 0.016
0.027 0.027
n: the number of total data set. MAE: mean absolute error. MAE = . 1 × ∑ Yobs − Ypred n
RMSE: root mean square error. RMSE =
1
× ∑ (Yobs − Ypred )2⎤ ⎡ ⎣n ⎦
SE: standard error. SE = SD / n
were considerably low (Table 2). The RMSE describes the average measure of error in predicting the dependent variable (Singh et al., 2014). Values of RMSE < 0.1 × training set range (Alexander et al., 2015) and MAE ≤ 0.1 × training set range (Roy et al., 2016) suggested the usefulness of the QSPR model here. The SE values of the coefficients are also presented in Table 2. The SE value (0.027) for log D in the final model indicates that the 95% confidence interval of the predicted log D is the predicted value ± 0.05 (Altman and Bland, 2005; Huang and Jolliet, 2019), indicating that most of the predicted D values are within a factor of 100.05 = 1.12 from the measured D. Moreover, values of t-statistic (t), statistical significance (p), variance inflation factor (VIF) and mean effect (MF), these descriptors involved in the developed models were also shown in Table 3. In addition, the VIF values were less than 10, indicating that none of the variables had any serious multicollinearity. The relative importance and the contribution of each descriptor were evaluated by the MF values in this study (Pourbasheer et al., 2009; Riahi et al., 2009), which was calculated by equation (7) below.
MFj =
βj ∑i = n dij i=1 n m ∑ j βj ∑i dij
3.3. Mechanism interpretation By interpreting the descriptors of models, it is possible to provide structural insight into the diffusion mechanism in LDPE. Thus, reasonable descriptors ordering is proposed according to the values of mean effect, and an acceptable interpretation of the pp-LFER and QSPR results are provided below. 3.3.1. Mechanism interpretation of pp-LFER The established pp-LFER model had two descriptors, V and E (Table 1). V represents McGowan's molar volume (Kamlet et al., 1988; Abraham et al., 2010), which was utilized to describe the magnitude of energy that form a suitable cavity for the compounds by breaking the molecules structure of the polymer phase (Liu et al., 2017; Zhu et al., 2019). Moreover, the negative coefficient of V implies that the compounds with smaller molecular volume tend to diffuse more quickly than larger ones, which may be attributed to the fact that the compounds molecules need less energy required to form cavity in the polymer (Vitha and Carr, 2006). The values of log D decrease with an increase in values of V, accordingly. The other descriptor in the model, E, implies non-specific molecular interactions such as excess molar refraction, which represents the characterization of diffusion (Endo and Schmidt, 2006). Since E has a negative correlation with diffusion behaviors, the molecule tends to interact with the polymer molecule and the molar flux in LDPE is prone to decrease. Thus, compounds with lower E are liable to diffuse in LDPE.
(7)
where MFj is the mean effect value for the j descriptor; βj represents the coefficient of the jth descriptor; dij is the value of the descriptors for each chemical, and m is the descriptors number. The sign of MF values could exhibit the variation direction in the values of the diffusion coefficient with the increase (or reduction) in this descriptor value (Riahi et al., 2009). In the pp-LFER model, the MF values of V and E were 0.770 and 0.230, which reveal the impact of target descriptors was V > E. Similarly, the relative importance of descriptors for the QSPR model was ETA_Alpha > ASP-6 > IC1 > TDB6r > ATSC2v (Table 3). As shown in Fig. 2, the good consistency of the predicted versus observed log D values suggested the good predictive capacity of the established models. th
3.3.2. Mechanism interpretation of QSPR Obviously, the pp-LFER model only explained 81.5% of the variance for the log D values. In this case, a strong predictive QSPR model, with five selected molecular descriptors (ETA_Alpha, ASP-6, IC1, TDB6r and ATSC2v), was established. In the QSPR model, the descriptor ETA_Alpha represents the sum of the alpha values of all non-hydrogen vertices of a
3.2. Applicability domain On the basis of the OECD guidance principle, an applicability domain (AD) needs to be defined for a developed model to show its scope and limitations (Netzeva et al., 2005). A Euclidean distance-based method and Williams plot were utilized to assess the AD in this work. Table 3 Description of the descriptors involved in the models. Model
Descriptor
Description
t
p
VIF
MF
pp-LFER
V E ETA_Alpha ASP-6 IC1 TDB6r ATSC2v
McGowan Volume Excessive molar refraction Sum of alpha values of all non-hydrogen vertices of a molecule Average simple path, order 6 Information content index (neighborhood symmetry of 1-order) 3D topological distance based autocorrelation - lag 6/weighted by covalent radius Centered Broto-Moreau autocorrelation - lag 2/weighted by van der Waals volumes
−15.941 −7.705 −34.818 9.449 3.536 5.547 7.709
< 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001
1.043 1.040 2.024 1.172 1.805 1.991 1.426
0.770 0.230 1.564 −0.311 −0.140 −0.109 −0.004
QSPR
Note: t: corresponding t-test, p: significance level, VIF: variance inflation factors and MF: mean effect. 4
Ecotoxicology and Environmental Safety 190 (2020) 110179
T. Zhu, et al.
Fig. 2. The plot of the predicted versus observed log D values by pp-LFER (a) model and QSPR (b) model.
Fig. 3. Applicability domains for the developed pp-LFER model (a) and QSPR model (b), characterized by the Euclidean distance-based approach.
Fig. 4. Applicability domain of (a) pp-LFER model; and (b) QSPR model.
diffusing, which result in a lower passive diffusion rate (Poerschmann et al., 2000). Referring to MF values, ETA_Alpha plays the most significant role in diffusion behavior and that the log D value is inversely associated with this descriptor. Its negative sign implies that chemicals
molecule. The alpha value mentioned above is a parameter for characterizing the average molecular polarizability, which is associated with hydrophobicity (Zhu et al., 2018). Consequently, the chemicals with high hydrophobic tend to interact with the LDPE instead of
5
Ecotoxicology and Environmental Safety 190 (2020) 110179
T. Zhu, et al.
Table 4 Comparison of the current models with previous models for log D values. Parameter
ma
nb
R2
Chemicals
RMSEc
Reference
MW MW log MW TSA TSA TSA Vm (SPARC) V, E ETA_Alpha, IC1, ASP-6, ATSC2v, TDB6r
1 1 1 1 1 1 1 2 5
25 39 42 53 39 14 74 120 120
0.95 0.98 NR 0.84 0.85 0.95 0.76 0.82 0.95
PAHs PCBs PAHs and PCBs PAHs and PCBs PCBs PAHs PAHs and PCBs PAHs, PCBs, OCPs, emerging contaminants and PBDE47 PAHs, PCBs, OCPs, emerging contaminants and PBDE47
NR NR NR NR NR NR NR 0.29 0.099
(Schwarzenbach et al., 2002) (Schwarzenbach et al., 2002) Hofmans (1998) Rusina et al. (2010) Rusina et al. (2010) Rusina et al. (2010) (Lohmann, 2012) This study This study
a
m: the number of parameters. n: the number of total data set. c RMSE: the root mean squared error; NR: not reported; MW: molecular weight (g/mol); TSA: the total surface areas; Vm(SPARC): SPARC-generated Vm; V: McGowan Volume; E: excessive molar refraction; ETA_Alpha: Sum of alpha values of all non-hydrogen vertices of a molecule; IC1: Information content index (neighborhood symmetry of 1-order); ASP-6:Average simple path, order 6; ATSC2v: Centered Broto-Moreau autocorrelation - lag 2/weighted by van der Waals volumes; TDB6r: 3D topological distance based autocorrelation - lag 6/weighted by covalent radius. b
with large ETA_Alpha may be hard to diffuse in the polymer. The best example for understanding this effect based on the compounds in Table S4 is to compare atoms with different log Kow values in a group with the same properties (e.g., PCB). The hydrophobicity of PCB 180 (log Kow = 7.36) is higher than that of PCB 118 (log Kow = 6.74) when the descriptor ETA_Alpha is higher for PCB 180 (ETA_Alpha = 11.00) than PCB 118 (ETA_Alpha = 9.57), thus resulting in a lower diffusion coefficient for PCB 180 (log D = −13.47) than PCB 118 (log D = −13.04) (Table S7). Furthermore, reducing time-weighted average (TWA) concentrations of dissolved organic chemicals in passive samplers will cause a decrease in log D. However, unlike the ETA_Alpha descriptor, the ASP-6 has a positive sign, with a greater MF value than those of other positive sign descriptors. This descriptor is defined by distinct edges and vertices equal to or greater than six in a sequence (Oluwaseye et al., 2018). It was also performed as a medium for describing the shape of the molecule. As ASP-6 tends to increase with the topological path length increased (Khan et al., 2019), which would result in a relatively strong diffusion function. The third selected molecular descriptor IC1 is one of the topological indices that represents the information content index (neighborhood symmetry of 1-order) (Todeschini and Consonni, 2009). This descriptor defined as the complexity of a graph based on the r-th (r = 0–6) order neighborhood of vertices in a hydrogen-filled graph (Gute et al., 1999). Hence, the compounds with longer molecular topological distances attempt to interact with the LDPE structure. The forth descriptor, TDB6r, describing the 3D topological distance based autocorrelation - lag 6/weighted by covalent radius and the topological distance, represents the presence of unsaturated bonds. Hence, the shorter the intra-molecular distance, the less the flexibility at the molecular level (Khan et al., 2019). The last descriptor, ATSC2v, is defined as the centered BrotoMoreau autocorrelation - lag 2/weighted by van der Waals volumes. It explains how the values of certain functions are correlated. For this descriptor, the atoms in a molecule represent a set of discrete points in space, and the atomic properties and functions are evaluated at those points. Here, the physico-chemical property of the ATSC2v descriptor is the atomic van der Waals volume, which is related to the volume of the molecule. It can be concluded, following the above findings, that the following factors were favored for the diffusion behavior of the HOCs in LDPE: molecular volume, hydrophobicity interactions, topological path and distance and van der Waals volumes.
been based on experimental measurements. Initially, correlations between the molecular weight (MW) and log D were established. The MW variable explained 95% and 98% of the variance for 25 PAHs and 39 PCBs, respectively (Table 4), but the obtained correlations were not ideal (Lohmann, 2012) which attributed to the molecular volume cannot be reflected by MW directly. The relation with log MW was investigated to extrapolate log D values. However, after the tests, Rusina et al. (2010) demonstrated that correlations using a log D-MW were better than a form of the log D-log MW relation. Instead of MW, a universal coefficient to predict the log D values is the total surface areas (TSA), which is a molecular property utilized to relate to log D. As shown in Table 4, the variance of the log D-TSA relations for PCBs is similar to the log D-MW relations, and the log D-MW relations accounted for a higher variance for PAHs. The above results suggested that the method using TSA as variable to predict the log D values was limited. As we have mentioned before, the log D values are affected by compound structure, and the molecular volume of the analytes (Vm) is one of the descriptors for compound structure aside from the molecular weight. Therefore, Lohmann (2012) utilized Vm (SPARC) as variable to establish a correlation to extrapolate the log D values with a larger dataset (n = 74 versus 53 using TSA). The SPARC is an online calculator that uses various empirical molecular formulas to calculate D values by determining sorbate-sorbent interactions (Hilal et al., 2004). However, the correlation coefficient of log D versus Vm (SPARC) was the lowest (R2 =0.76) among the models in previous studies. After the investigation, most models that concentrated on PCBs and PAHs acquired a good linear relationship. However, because of the similar structures, these models have a single application domain and a remarkable regularity in the diffusion behavior of this two chemical classes, thus, their universal application were restricted. Compared with previous reports, the pp-LFER model and the QSPR model established in this study show some advantages, with higher values of R2 and Q 2 Moreover, the two models covered 120 compounds, which included diverse molecular structures in their application domain. For the pp-LFER model, the variance was close to the correlation with TSA (R2 =0.82 versus 0.84); however, the number of chemicals in the dataset was more than double (n = 120 versus 53). For the QSPR model, which had the highest variance (R2 =0.95), the largest number and the more variety of compounds was observed, compared to previous results. In combination with the statistical parameters, our models exhibited high goodness-of-fit and predictive capacity.
3.4. Model comparison
There are several approaches that extrapolate diffusion coefficients based on experimental measures involved in the diffusion behavior in LDPE, but there are less prediction methods that establish both QSPR
4. Conclusions
Most of the models for extrapolating the log D values have thus far 6
Ecotoxicology and Environmental Safety 190 (2020) 110179
T. Zhu, et al.
and pp-LFER models. The optimal QSPR and pp-LFER models were successfully developed by molecular descriptors in the present study. The application domain of both models covered 120 compounds from five different chemical classes. For the pp-LFER model, V and E were screened as variables, and for the QSPR model, the significant descriptors influencing log D values were ETA_Alpha, ASP-6, IC1, TDB6r and ATSC2v. The statistical parameters of both models implied a satisfactory performance in the goodness-of-fit, robustness and predictive capacity. Thus, the proposed models could be utilized to make predictions of the log D values of other compounds in the applicable domain.
chemicals in the environment: an international European project. SAR QSAR Environ. Res. 3, 223–236. Hilal, S.H., Karickhoff, S.W., Carreira, L.A., 2004. Prediction of the solubility, activity coefficient and liquid/liquid partition coefficient of organic compounds. QSAR Comb. Sci. 23, 709–720. Hofmans, H.E., 1998. Numerical Modeling of the Exchange Kinetics of Semipermeable Membrane Devices. Ph.D. Thesis. University of Utrecht, The Netherlands. Howard, P.H., Muir, D.C., 2010. Identifying new persistent and bioaccumulative organics among chemicals in commerce. Environ. Sci. Technol. 44, 2277–2285. Howell, B.F., Mccrackin, F.L., Wang, F.W., 1985. Fluorescence measurements of the diffusion coefficient for butylated hydroxyanisole in low-density polyethylene. Polymer 26, 433–436. Huang, L., Jolliet, O., 2019. A combined quantitative property-property relationship (QPPR) for estimating packaging-food and solid material-water partition coefficients of organic compounds. Sci. Total Environ. 658, 493–500. Huckins, J.N., Kees, B.P.D., Jimmie, D.P.P.D., 2006. Monitors of Organic Chemicals in the Environment. Springer, New York. Hung, H., Blanchard, P., Halsall, C.J., Bidleman, T.F., Stern, G.A., Fellin, P., Muir, D.C.G., Barrie, L.A., Jantunen, L.M., Helm, P.A., 2005. Temporal and spatial variabilities of atmospheric polychlorinated biphenyls (PCBs), organochlorine (OC) pesticides and polycyclic aromatic hydrocarbons (PAHs) in the Canadian Arctic: results from a decade of monitoring. Sci. Total Environ. 342, 119–144. Kamlet, M.J., Doherty, R.M., Abraham, M.H., Marcus, Y., Taft, R.W., 1988. Linear solvation energy relationship. 46. An improved equation for correlation and prediction of octanol/water partition coefficients of organic nonelectrolytes (including strong hydrogen bond donor solutes). J. Phys. Chem. C 92, 5244–5255. Khan, K., Benfenati, E., Roy, K., 2019. Consensus QSAR modeling of toxicity of pharmaceuticals to different aquatic organisms: ranking and prioritization of the DrugBank database compounds. Ecotoxicol. Environ. Saf. 168, 287–297. Konstantinou, I.K., Hela, D.G., Albanis, T.A., 2006. The status of pesticide pollution in surface waters (rivers and lakes) of Greece. Part I. Review on occurrence and levels. Environ. Pollut. 141, 555–570. Kot-Wasik, A., Zabiegala, B., Urbanowicz, M., Dominiak, E., Wasik, A., Namiesnik, J., 2007. Advances in passive sampling in environmental studies. Anal. Chim. Acta 602, 141–163. Leo, A., Hoekman, D., 1995. Exploring QSAR: Fundamentals and Applications in Chemistry and Biology. American Chemical Society. Ling, Y.H., Klemes, M.J., Steinschneider, S., Dichtel, W.R., Helbling, D.E., 2019. QSARs to predict adsorption affinity of organic micropollutants for activated carbon and betacyclodextrin polymer adsorbents. Water Res. 154, 217–226. Liu, H.H., Bao, L.J., Zhang, K., Xu, S.P., Wu, F.C., Zeng, E.Y., 2013. Novel passive sampling device for measuring sediment-water diffusion fluxes of hydrophobic organic chemicals. Environ. Sci. Technol. 47, 9866–9873. Liu, H.H., Wei, M.B., Yang, X.H., Yin, C., He, X., 2017. Development of TLSER model and QSAR model for predicting partition coefficients of hydrophobic organic chemicals between low density polyethylene film and water. Sci. Total Environ. 574, 1371–1378. Liu, H.H., Yang, X.H., Rui, L., 2016. Development of classification model and QSAR model for predicting binding affinity of endocrine disrupting chemicals to human sex hormone-binding globulin. Chemosphere 156, 1–7. Lohmann, R., 2012. Critical review of low-density polyethylene's partitioning and diffusion coefficients for trace organic contaminants and implications for its use as a passive sampler. Environ. Sci. Technol. 46, 606–618. Lyman, W.J., Reehl, W.F., Rosenblatt, D.H., 1990. Handbook of Chemical Property Estimation Methods: Environmental Behavior of Organic Compounds. American Chemical Society. Nabi, D., Arey, J.S., 2017. Predicting partitioning and diffusion properties of nonpolar chemicals in biotic media and passive sampler phases by GC× GC. Environ. Sci. Technol. 51, 3001–3011. Netzeva, T.I., Worth, A.P., Aldenberg, T., Benigni, R., Cronin, M.T., Gramatica, P., Jaworska, J.S., Kahn, S., Klopman, G., Marchant, C.A., 2005. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships-The report and recommendations of ECVAM Workshop 52. Altern. Lab. Anim. 33, 155–173. OECD, 2007. Guidance Document on the Validation of (Quantitative) Structure-Activity Relationships [(Q) SAR] Models. Organisation for Economic Co-operation and Development, Paris, France. http://www.OECD.Org/env/ehs/risk-assessment/ guenvironment. Oluwaseye, A., Uzairu, A., Shallangwa, G., Abechi, S., 2018. QSAR studies on derivatives of quinazoline-4 (3h)-ones with anticonvulsant activities. J. Engine. Exact. Sci. 4 0255-0264. Ou, W., Liu, H., He, J., Yang, X., 2018. Development of chicken and fish muscle protein – water partition coefficients predictive models for ionogenic and neutral organic chemicals. Ecotoxicol. Environ. Saf. 157, 128. Patil, R.B., Barbosa, E.G., Sangshetti, J.N., Sawant, S.D., Zambre, V.P., 2018. LQTA-R: a new 3D-QSAR methodology applied to a set of DGAT1 inhibitors. Comput. Biol. Chem. 74, 123–131. Pavan, M., Netzeva, T.I., Worth, A.P., 2008. Review of literature‐based quantitative structure-activity relationship models for bioconcentration. QSAR Comb. Sci. 27, 21–31. Perihan Binnur, K.K., Bidleman, T.F., Staebler, R.M., Jones, K.C., 2006. Measurement of DDT fluxes from a historically treated agricultural soil in Canada. Environ. Sci. Technol. 40, 4578–4585. Pintadoherrera, M.G., Laramartín, P.A., Gonzálezmazo, E., Allan, I.J., 2016. Determination of silicone rubber and low density polyethylene diffusion and polymer-water partition coefficients for emerging contaminants. Toxicol. Environ.
Acknowledgements We thank Dr. Xianhai Yang, the Nanjing University of Science and Technology, for statistical advice. The current work was funded by the National Natural Science Foundation of China (Grant Nos. 21607123 and 51809226) and the Jiangsu Provincial Laboratory for Water Environmental Protection Engineering (Grant No. W1901). Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.ecoenv.2020.110179. References Abraham, M.H., 1993. Scales of solute hydrogen-bonding: their construction and application to physicochemical and biochemical processes. Chem. Soc. Rev. 22, 73–83. Abraham, M.H., Ibrahim, A., Zissimos, A.M., 2004. Determination of sets of solute descriptors from chromatographic measurements. J. Chromatogr. 1037, 29–47. Abraham, M.H., Mcgowan, J.C., 1987. The use of characteristic volumes to measure cavity terms in reversed phase liquid chromatography. Chromatographia 23, 243–246. Abraham, M.H., Smith, R.E., Luchtefeld, R., Boorem, A.J., Luo, R., Acree Jr., W.E., 2010. Prediction of solubility of drugs and other compounds in organic solvents. J. Pharm. Sci. 99, 1500–1515. Alexander, D.L., Tropsha, A., Winkler, D.A., 2015. Beware of R 2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322. Altman, D.G., Bland, J.M., 2005. Standard deviations and standard errors. BMJ 331, 903. Cao, D.S., Deng, Z.K., Zhu, M.F., Yao, Z.J., Dong, J., Zhao, R.G., 2017. Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling. J. Chemom. 31, e2922. Chao, K.P., Wang, V.S., Liu, C.W., Lu, Y.T., 2018. QSAR studies on partition coefficients of organic compounds for polydimethylsiloxane of solid-phase microextraction devices. Int. J. Environ. Sci. Technol. 15, 2141–2150. Chen, W., Sabljic, A., Cryer, S.A., Kookana, R.S., 2014. Non-first Order Degradation and Time-dependent Sorption of Organic Chemicals in Soil. American Chemical Society. Chirico, N., Gramatica, P., 2012. Real external predictivity of QSAR models. part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. J. Chem. Inf. Model. 52, 2044–2058. Endo, S., Goss, K.U., 2014. Predicting partition coefficients of polyfluorinated and organosilicon compounds using polyparameter linear free energy relationships (PPLFERs). Environ. Sci. Technol. 48, 2776–2784. Endo, S., Schmidt, T.C., 2006. Prediction of partitioning between complex organic mixtures and water: application of polyparameter linear free energy relationships. Environ. Sci. Technol. 40, 536–545. Golfinopoulos, S.K., Nikolaou, A.D., Kostopoulou, M.N., Xilourgidis, N.K., Vagi, M.C., Lekkas, D.T., 2003. Organochlorine pesticides in the surface waters of Northern Greece. Chemosphere 50, 507–516. Greim, L.W.W.H., 1997. The toxicity of brominated and mixed-halogenated dibenzo-pdioxins and dibenzofurans: an overview. J. Toxicol. Environ. Health A 50, 195–216. Gute, B., Grunwald, G., Basak, S., 1999. Prediction of the deral penetration of polycyclic aromatic hydrocarbons (PAHs): a hierarchical QSAR approach. SAR QSAR Environ. Res. 10, 1–15. Hale, S.E., Martin, T.J., Goss, K.U., Arp, H.P.H., Werner, D., 2010. Partitioning of organochlorine pesticides from water to polyethylene passive samplers. Environ. Pollut. 158, 2511–2517. Hayward, S.J., Lei, Y.D., Wania, F., 2006. Comparative evaluation of three high-performance liquid chromatography-based Kow estimation methods for highly hydrophobic organic compounds: polybrominated diphenyl ethers and hexabromocyclododecane. Environ. Toxicol. Chem. 25, 2018–2027. He, J.Y., Peng, T., Yang, X.H., Liu, H.H., 2018. Development of QSAR models for predicting the binding affinity of endocrine disrupting chemicals to eight fish estrogen receptor. Ecotoxicol. Environ. Saf. 148, 211–219. Hermens, J., Balaz, S., Damborsky, J., Karcher, W., Müller, M., Peijnenburg, W., Sabljic, A., Sjöström, M., 1995. Assessment of QSARs for predicting fate and effects of
7
Ecotoxicology and Environmental Safety 190 (2020) 110179
T. Zhu, et al.
Singh, K.P., Gupta, S., Mohan, D., 2014. Evaluating influences of seasonal variations and anthropogenic activities on alluvial groundwater hydrochemistry using ensemble learning approaches. J. Hydrol 511, 254–266. Sprunger, L., Proctor, A., Acree Jr., W.E., Abraham, M.H., 2007. Characterization of the sorption of gaseous and organic solutes onto polydimethyl siloxane solid-phase microextraction surfaces using the Abraham model. J. Chromatogr. A 1175, 162–173. Thompson, J.M., Hsieh, C.H., Luthy, R.G., 2015. Modeling uptake of hydrophobic organic contaminants into polyethylene passive samplers. Environ. Sci. Technol. 49, 2270–2277. Todeschini, R., Consonni, V., 2009. Molecular Descriptors for Chemoinformatics. WileyVCH Weinheim. Tropsha, A., Gramatica, P., Gombar, V.K., 2003. The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci. 22, 69–77. Valderrama, J.F.N., Baek, K., Molina, F.J., Allan, I.J., 2016. Implications of observed PBDE diffusion coefficients in low density polyethylene and silicone rubber. Environ. Sci. Process. Impacts. 18, 87–94. Van den Berg, M., Denison, M.S., Birnbaum, L.S., DeVito, M.J., Fiedler, H., Falandysz, J., Rose, M., Schrenk, D., Safe, S., Tohyama, C., 2013. Polybrominated dibenzo-p-dioxins, dibenzofurans, and biphenyls: inclusion in the toxicity equivalency factor concept for dioxin-like compounds. Toxicol. Sci. 133, 197–208. Vitha, M., Carr, P.W., 2006. The chemical interpretation and practice of linear solvation energy relationships in chromatography. J. Chromatogr. A 1126, 143–194. Wang, L., Chen, B., Zhang, T., 2018. Predicting hydrolysis kinetics for multiple types of halogenated disinfection byproducts via QSAR models. Chem. Eng. J. 342, 372–385. Wei, M.B., Yang, X.H., Watson, P., Yang, F.F., Liu, H.H., 2018. Development of QSAR model for predicting the inclusion constants of organic chemicals with α-cyclodextrin. Environ. Sci. Pollut. Res. 25, 17565–17574. Yap, C.W., 2011. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474. Zhu, T., Chen, W., Cheng, H., Wang, Y., Singh, R.P., 2019. Prediction of polydimethylsiloxane-water partition coefficients based on the pp-LFER and QSAR models. Ecotoxicol. Environ. Saf. 182, 109374. Zhu, T., Wu, J., He, C., Fu, D., Wu, J., 2018. Development and evaluation of MTLSER and QSAR models for predicting polyethylene-water partition coefficients. J. Environ. Manag. 223, 600–606.
Chem. 35, 2162–2172. Platts, J.A., Butina, D., Abraham, M.H., Hersey, A., 1999. Estimation of molecular linear free energy relation descriptors using a group contribution approach. J. Chem. Inf. Comput. Sci. 39, 835–845. Poerschmann, J., Górecki, T., Kopinke, F.D., 2000. Sorption of very hydrophobic organic compounds onto poly (dimethylsiloxane) and dissolved humic organic matter. 1. Adsorption or partitioning of VHOC on PDMS-coated solid-phase microextraction fibers a never-ending story? Environ. Sci. Technol. 34, 3824–3830. Pourbasheer, E., Riahi, S., Ganjali, M.R., Norouzi, P., 2009. Application of genetic algorithm-support vector machine (GA-SVM) for prediction of BK-channels activity. Eur. J. Med. Chem. 44, 5023–5028. Riahi, S., Pourbasheer, E., Ganjali, M.R., Norouzi, P., 2009. Investigation of different linear and nonlinear chemometric methods for modeling of retention index of essential oil components: concerns to support vector machine. J. Hazard Mater. 166, 853–859. Roy, K., Das, R.N., Ambure, P., Aher, R.B., 2016. Be aware of error measures. Further studies on validation of predictive QSAR models. Chemometr. Intell. Lab. Syst. 152, 18–33. Roy, K., Kabir, H., 2012. QSPR with extended topochemical atom (ETA) indices: modeling of critical micelle concentration of non-ionic surfactants. Chem. Eng. Sci. 73, 86–98. Rusina, T.P., Smedes, F., Klanova, J., 2010. Diffusion coefficients of polychlorinated biphenyls and polycyclic aromatic hydrocarbons in polydimethylsiloxane and lowdensity polyethylene polymers. J. Polym. Sci. 116, 1803–1810. Rusina, T.P., Smedes, F., Klanova, J., Booij, K., Holoubek, I., 2007. Polymer selection for passive sampling: a comparison of critical properties. Chemosphere 68, 1344–1351. Sabatino, M., Rotili, D., Patsilinakos, A., Forgione, M., Tomaselli, D., Alby, F., Arimondo, P.B., Mai, A., Ragno, R., 2018. Disruptor of telomeric silencing 1-like (DOT1L): disclosing a new class of non-nucleoside inhibitors by means of ligand-based and structure-based approaches. J. Comput. Aided Mol. Des. 32, 435–458. Sabljic, A., 2001. QSAR models for estimating properties of persistent organic pollutants required in evaluation of their environmental fate and risk. Chemosphere 43, 363–375. Schnur, D.M., Grieshaber, M.V., Bowen, J.P., 1991. Development of an internal searching algorithm for parameterization of the MM2/MM3 force fields. J. Comput. Chem. 12, 844–849. Schwarzenbach, R.P., Gschwend, P.M., Imboden, D.M., 2002. Environmental Organic Chemistry, second ed. John Wiley & Sons, Inc.
8