Journal of Chromatography A, 1198–1199 (2008) 202–207
Contents lists available at ScienceDirect
Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma
Quantitative structure–chromatographic retention relationship for polycyclic aromatic sulfur heterocycles Hui-Ying Xu a , Jian-Wei Zou b,∗ , Yong-Jun Jiang b , Gui-Xiang Hu b , Qing-Sen Yu b a b
College of Biology & Environment Engineering, Zhejiang Shuren University, Hangzhou 310015, China Key Laboratory for Molecular Design and Nutrition Engineering, Ningbo Institute of Technology, Zhejiang University, Ningbo 315104, China
a r t i c l e
i n f o
Article history: Received 14 December 2007 Received in revised form 9 May 2008 Accepted 19 May 2008 Available online 23 May 2008 Keywords: Polycyclic aromatic sulfur heterocycle (PASH) Molecular electrostatic potential Quantitative structure–retention relationship (QSRR)
a b s t r a c t Polycyclic aromatic sulfur heterocycles (PASHs) are of concern in petroleum geochemistry and environmental chemistry. In the present study, geometrical optimization and electrostatic potential calculations have been performed for 114 PASHs reported previously at the HF/6-31G* level of theory. A group of 25 statistically based parameters have been extracted. Linear relationships between gas-chromatographic retention index (RI) and the structural descriptors have been established by stepwise linear regression analysis. The result shows that two quantities derived from positive electrostatic potential on molecular surface, Vs+ (the average value of the positive electrostatic potentials on molecular surface) and +2 (a measure of dispersion tendency of positive electrostatic potential), together with Vmc (the molecular volume) and EHOMO (the energy of the highest occupied molecular orbital) can be well used to express the quantitative structure–retention relationship (QSRR) of PASHs. Predictive capability of the model has been demonstrated by leave-one-out cross-validation with the cross-validated correlation coefficient (RCV ) of 0.992. Furthermore, when splitting the 114 PASH samples into calibration and test sets in the ratio of 2:1, a similar treatment yields an equation of almost equal statistical quality and very similar regression coefficients, validating the robustness of our model. Predictions for six PASHs from other source have also been made. The QSRR model established may provide a new powerful method for predicting chromatographic properties of aromatic organosulfur compounds. © 2008 Published by Elsevier B.V.
1. Introduction Polycyclic aromatic sulfur heterocycles (PASHs) are known to be a major chemical class of organosulfur compounds in petroleum. They are generally associated with adverse effects on the quality of petroleum products such as catalyst poisoning, corrosion or atmospheric pollution caused by combustion processes [1]. Furthermore, it appears that some PASHs are more mutagenic/carcinogenic than their carbocyclic analogues [2]. In view of the importance of PASHs in organic geochemistry, environmental chemistry and toxicology etc., there is increasing interest in the isolation, identification, and quantification of this class of compounds [3–10]. Gas chromatography (GC), coupled with suitable detection system (such as mass spectrometry) is one of the most powerful tools in analytical chemistry. It produces a single parameter (retention index), which can be used for the identification of a wide range of analytes under well-defined conditions. Over the past decade,
∗ Corresponding author. Tel.: +86 574 88229517; fax: +86 574 88229516. E-mail address:
[email protected] (J.-W. Zou). 0021-9673/$ – see front matter © 2008 Published by Elsevier B.V. doi:10.1016/j.chroma.2008.05.042
the application of GC with mass spectrometry (GC/MS) or atomic emission detector (GC/AED) in determining PASHs in geochemical, petroleum, and environmental samples has been reported widely in the literature [3–10]. Nevertheless, it is hard work to determine experimentally the retention indices for all possible PASH compounds, due to not only the extremely large number of isomers and alkylated isomers, but also the lack of synthesized PASH standards. Therefore, alternative approaches are needed. Many previous studies revealed that it was indeed feasible to predict the gas-chromatographic retention properties with quantitative structure–retention relationship (QSRR) models [11–17]. Moreover, a significant and physically meaningful QSRR may help us gain insight into the separation mechanism for a given chromatographic system at molecular level [11]. Whereas the QSRR studies on similar heteroatom-containing compound have evoked great interest, e.g. saturated O-, N- and S-heterocycles [18], organic sulfur compounds [19], nitrogen-containing polycyclic aromatic compounds [20], polychlorinated dibenzodioxins (PCDDs) and polychlorinated dibenzofurans (PCDFs) [21–23], this kind of studies on PASHs is still limited [9,24,25]. The choice of appropriate structural parameters plays a pivotal role in the QSRR studies. It has been known that most of the
H.-Y. Xu et al. / J. Chromatogr. A 1198–1199 (2008) 202–207
physicochemical properties (also including chromatographic ones) are associated with intermolecular noncovalent interactions, which are largely electrostatic in nature, and the electrostatic potentials, especially those distributed on the molecular surface, can be well used to quantify the molecular interactions. The structural descriptors derived from the molecular surface electrostatic potentials not only have definite physical meaning but also represent good repeatability despite being statistically based (i.e. they are kept constant so long as the three-dimensional molecular structure provided remains the same). In fact, this type of descriptors has been successfully applied in research of the quantitative structure–property relationships (QSPRs), and their application range is being increasingly extended [26–29]. In this paper, we report the quantitative structure–chromatographic retention relationship of PASHs established by using the structural descriptors derived from the molecular surface electrostatic potentials. It serves on one hand to improve our understanding how the chromatographic retention indices of PASHs are influenced by the intermolecular interactions between molecules and chromatographic solid phase, and to provide a new method for predicting chromatographic properties of organic compounds on the other hand. 2. Methods The initial geometries of each compound were optimized with MOPAC 6.0 program implemented in VEGA package using AM1 method (keywords “PRECISE”, “GEO-OK”) [30]. Then the molecular geometries were reoptimized at the HF/6-31G* level of theory with Gaussian98 software package [31]. Based on these optimized geometries, calculations of electronic density and electrostatic potentials with grid method were performed. The grid control option was set to “cube = 100”. As a result, for each molecule, there are about 1003 points at which the values of electronic density and electrostatic potentials were computed. On the basis of the above calculations, structural descriptors for all PASHs considered were extracted and those pertinent to the subsequent discussion are described as follows: (a) Vmc , the molecular volume; (b) EHOMO , the energy of the highest occupied molecular orbital; (c) Vs+ , the average value of the positive electrostatic potentials on molecular surface (an outer contour of the electronic density (r) = 0.001 a.u.), defined in [29] 1 + V (ri ) m m
Vs+ =
(1)
i=1
where V+ (ri ) represents the positive value at a point ri on the molecular surface. 2 , a measure of dispersion tendency of positive electrostatic (d) + potential, defined in [28] 1 + |V (ri ) − Vs+ |2 m m
2 + =
(2)
i=1
Finally, correlation between the structural descriptors of PASHs and the retention index was established by stepwise linear regression analysis with SPSS 10.0 package. In the course of stepwise regression analysis, the probability values of F were set as ≤0.050 and ≥0.100, respectively, for entering and removing a variable. The collinearity diagnostics procedures between variables were performed so that only one descriptor was retained from a pair with similar contribution, if correlation was greater than 0.8 between variables. The predictive power of the QSRR
203
Fig. 1. Structures of some representative PASH compounds.
model was validated by a leave-one-out cross-validated analysis. Predictions for external test set were also made. 3. Results and discussion All experimentally determined retention indices of 114 PASHs (structures of some representative compounds are given in Fig. 1) considered in the present paper were taken from previous publication [32]. Their experimental RI values, together with the chemical names, the computed structural descriptors and the predicted RI values from QSRR model, are collected in Table 1. A number of structural parameters derived from electrostatic potentials of the PASHs were calculated. As can be readily appreciated, it is structurally similar among these congeners, and the main differences exist in the number and/or position of the condensed rings and substituted alkyl groups. Nevertheless, the electrostatic potential-derived quantities are sensitive to minor structural changes, and, thus, can objectively reflect to some degree the electrostatic interaction ability and mode provided by the compound. Eq. (1) is the relationship between the structural parameters and retention index established for all 114 PASHs by stepwise linear regression analysis, in which N represents the number of data points submitted to the regression, R the correlation coefficient, S.D. the standard deviation and F the overall statistical significance of the equation. 2 RI = −22.99 + 2.792Vmc + 278.8Vs+ + 576.7EHOMO + 569.7+ (1) N = 114, R = 0.9921, S.D. = 9.563, F = 1711
Molecular volume (Vmc ) and molecular surface area (As ) are measurement of the molecular size. They are highly correlated with each other. The correlation coefficient amounts to as high as 0.96 for the present compound set of 114 PASHs. Our results show that the Vmc term is introduced in Eq. (1) and less satisfactory correlation is yielded when using As term as an alternative. Generally speaking, for the compounds with similar structure, the larger the molecular volume is, the stronger the intermolecular dispersion force will be. In addition, molecular volume is often viewed as a cavity term in linear solvation energy relationship (LSER), which is a measure of the energy needed to overcome the cohesive forces in order to form a cavity for the solute, and the larger molecules would tend to be excluded from the more polar solvent (e.g. water). As can be seen from Eq. (1), the Vmc term is positively correlated with the RI, which indicates that the larger the molecule is, the stronger the intermolecular dispersion force is. Therefore the PASH molecule with larger Vmc value tends to be adsorbed onto the stationary phase (methyl-phenyl polysiloxane), and the retention index becomes larger. EHOMO reflects the ability of a molecule donating electrons. The inclusion of EHOMO in the model indicates that the chromatographic retention of PASH is closely associated with the charge transfer
204
H.-Y. Xu et al. / J. Chromatogr. A 1198–1199 (2008) 202–207
Table 1 Calculated descriptors together with the experimental and predicted RI values for PASHsa No.b
Compound name
Vmc
Vs+
EHOMO
+2
RI observedc
Predictedd
1 2 3* 4 5 6* 7 8 9* 10 11 12* 13 14 15* 16 17 18* 19 20 21* 22 23 24* 25 26 27* 28 29 30* 31 32 33* 34 35 36* 37 38 39* 40 41 42* 43 44 45* 46 47 48* 49 50 51* 52 53 54* 55 56 57* 58 59 60* 61 62 63* 64 65 66* 67 68 69* 70 71 72* 73 74
Benzo[b]thiophene 7-Methylbenzo[b]thiophene 2-Methylbenzo[b]thiophene 5-Methylbenzo[b]thiophene 6-Methylbenzo[b]thiophene 3-Methylbenzo[b]thiophene 4-Methylbenzo[b]thiophene 5-Ethylbenzo[b]thiophene 3,5-Dimethylbenzo[b]thiophene Naphtho[1,2-b]thiophene Dibenzothiophene Naphtho[2,1-b]thiophene Naphtho[2,3-b]thiophene 5-Methylnaphtho[2,1-b]thiophene 2-Methylnaphtho[2,1-b]thiophene 4-Methyldibenzothiophene 8-Methylnaphtho[1,2-b]thiophene 2-Methyldibenzothiophene 3-Methyldibenzothiophene 4-Methylnaphtho[1,2-b]thiophene 4-Methylnaphtho[2,1-b]thiophene 6-Methylnaphtho[1,2-b]thiophene 1-Methyldibenzothiophene 8-Methylnaphtho[2,1-b]thiophene 7-Methylnaphtho[2,1-b]thiophene 6-Methylnaphtho[2,1-b]thiophene 1-Methylnaphtho[2,1-b]thiophene 9-Methylnaphtho[2,1-b]thiophene 3-Ethyldibenzothiophene 4,6-Dimethyldibenzothiophene 2,6-Dimethyldibenzothiophene 2-Ethyldibenzothiophene 3,6-Dimethyldibenzothiophene 2,8-Dimethyldibenzothiophene 3,7-Dimethyldibenzothiophene 3,8-Dimethyldibenzothiophene 1,7-Dimethyldibenzothiophene Phenanthro[4,5-bcd]thiophene Phenaleno[6,7-bc]thiophene Benzo[b]naphtho[2,1-d]thiophene Benzo[b]naphtho[1,2-d]thiophene Phenanthro[9,10-b]thiophene Phenanthro[4,3-b]thiophene Anthra[1,2-b]thiophene Benzo[b]naphtho[2,3-d]thiophene Phenanthro[1,2-b]thiophene Phenanthro[3,4-b]thiophene Anthra[2,1-b]thiophene Phenanthro[2,1-b]thiophene Phenanthro[3,2-b]thiophene Phenanthro[2,3-b]thiophene 1-Methylbenzo[b]naphtho[1,2-d]thiophene 11-Methylbenzo[b]naphtho[1,2-d]thiophene 10-Methylbenzo[b]naphtho[2,1-d]thiophene 3-Methylbenzo[b]naphtho[2,1-d]thiophene Anthra[2,3-b]thiophene 2-Methylbenzo[b]naphtho[2,1-d]thiophene 8-Methylbenzo[b]naphtho[2,1-d]thiophene 9-Methylbenzo[b]naphtho[2,1-d]thiophene 2-Methylbenzo[b]naphtho[1,2-d]thiophene 8-Methylbenzo[b]naphtho[1,2-d]thiophene 10-Methylbenzo[b]naphtho[1,2-d]thiophene 6-Methylbenzo[b]naphtho[1,2-d]thiophene 5-Methylbenzo[b]naphtho[2,1-d]thiophene 3-Methylbenzo[b]naphtho[1,2-d]thiophene 4-Methylbenzo[b]naphtho[2,3-d]thiophene 4-Methylbenzo[b]naphtho[2,1-d]thiophene 6-Methylbenzo[b]naphtho[2,1-d]thiophene 9-Methylbenzo[b]naphtho[1,2-d]thiophene 7-Methylbenzo[b]naphtho[2,1-d]thiophene 10-Methylbenzo[b]naphtho[2,3-d]thiophene 9-Methylbenzo[b]naphtho[2,3-d]thiophene 1-Methylbenzo[b]naphtho[2,1-d]thiophene 8-Methylbenzo[b]naphtho[2,3-d]thiophene
88.451 101.216 101.255 101.260 101.163 101.120 101.179 115.237 113.906 119.422 119.501 119.417 119.719 132.206 132.078 132.001 131.908 132.154 132.256 132.115 131.991 131.899 132.051 132.118 132.006 132.067 132.244 131.916 146.455 145.094 144.892 146.246 144.954 144.836 144.868 144.708 144.968 131.259 131.147 150.505 150.286 150.329 149.614 150.400 150.355 150.196 153.494 150.240 150.399 150.338 150.443 169.960 170.075 163.098 163.232 150.521 162.974 162.965 163.225 162.984 162.902 163.046 162.845 163.188 163.147 163.284 163.130 162.951 163.271 162.984 163.157 163.069 162.540 163.291
0.45001 0.37137 0.36921 0.38307 0.37756 0.37533 0.39090 0.32564 0.33148 0.43157 0.42047 0.47942 0.43823 0.39730 0.36832 0.36558 0.38824 0.40145 0.38675 0.36789 0.37721 0.38220 0.41116 0.36886 0.37526 0.39479 0.40304 0.40714 0.36338 0.31498 0.34319 0.35236 0.33044 0.36160 0.32953 0.34241 0.35689 0.42869 0.44046 0.41713 0.44744 0.42205 0.40818 0.45417 0.45722 0.45969 0.44151 0.45411 0.47926 0.46304 0.47189 0.36250 0.37826 0.35105 0.39386 0.43640 0.38669 0.36819 0.41987 0.39403 0.39070 0.38182 0.39426 0.37588 0.39859 0.36077 0.39760 0.36971 0.41374 0.36619 0.41772 0.40324 0.37769 0.39546
−0.29757 −0.29075 −0.29142 −0.29401 −0.29319 −0.29297 −0.29031 −0.29503 −0.29098 −0.28393 −0.29223 −0.28134 −0.26824 −0.27475 −0.27494 −0.28785 −0.28189 −0.28720 −0.28924 −0.28041 −0.27676 −0.28087 −0.28779 −0.27852 −0.27800 −0.27607 −0.27885 −0.27673 −0.28931 −0.28494 −0.28324 −0.28739 −0.28484 −0.28421 −0.28324 −0.28567 −0.28632 −0.27345 −0.25948 −0.27553 −0.27611 −0.27713 −0.27554 −0.25868 −0.26998 −0.27800 −0.27800 −0.25814 −0.27664 −0.27062 −0.26982 −0.27030 −0.27131 −0.27425 −0.27177 −0.24680 −0.27409 −0.27451 −0.27209 −0.27464 −0.27360 −0.27504 −0.27343 −0.27037 −0.27235 −0.26866 −0.27371 −0.27389 −0.27239 −0.27252 −0.26802 −0.26804 −0.27044 −0.26668
0.05007 0.05166 0.03293 0.04870 0.04495 0.04103 0.04700 0.04599 0.03074 0.05671 0.05179 0.05849 0.05795 0.04377 0.03426 0.04144 0.04720 0.04162 0.04352 0.04372 0.05435 0.05159 0.03994 0.04931 0.04914 0.04874 0.04248 0.05165 0.04698 0.03555 0.03981 0.04543 0.03604 0.04052 0.04037 0.03953 0.03438 0.04961 0.05365 0.05363 0.05010 0.05989 0.05486 0.06012 0.05848 0.05514 0.05400 0.06606 0.06245 0.06272 0.06274 0.04512 0.04411 0.04806 0.04223 0.06243 0.04243 0.04080 0.04145 0.04795 0.04369 0.04942 0.04370 0.04018 0.04658 0.05459 0.03964 0.03824 0.04474 0.03907 0.05050 0.04791 0.04996 0.05116
201.57 219.16 220.76 222.09 222.11 223.08 223.15 236.14 243.56 295.80 296.01 300.00 304.47 306.53 311.77 312.72 315.61 316.19 316.32 317.19 318.12 319.55 319.69 319.86 320.26 323.57 323.58 325.25 328.34 329.17 332.42 332.65 332.88 335.90 336.02 336.09 339.36 348.75 353.45 389.37 392.92 394.96 395.03 395.39 395.97 396.01 396.43 399.31 400.59 401.89 402.19 402.59 404.15 404.28 407.55 407.57 407.63 407.69 407.93 408.00 409.04 409.04 409.48 410.58 411.48 411.60 411.65 411.71 411.81 412.08 414.26 414.62 414.62 414.68
206.35 223.27 212.76 223.62 220.22 217.54 226.79 243.89 236.18 298.38 288.41 314.82 310.07 322.27 308.34 304.21 316.76 315.70 311.54 310.30 320.10 317.74 317.12 314.46 316.23 323.00 321.26 327.38 346.20 324.46 335.34 342.56 328.95 340.49 331.51 333.17 335.20 332.60 345.20 384.15 390.40 387.49 379.63 407.03 401.03 395.38 398.66 409.90 406.22 404.55 407.89 420.94 424.88 397.72 409.09 409.54 405.12 398.56 416.21 409.88 407.02 406.98 408.01 403.32 412.18 407.42 407.56 397.98 416.06 398.18 422.17 416.29 407.03 417.00
H.-Y. Xu et al. / J. Chromatogr. A 1198–1199 (2008) 202–207
205
Table 1 (Continued)
No.b
Compound name
Vmc
Vs+
EHOMO
+2
RI observedc
Predictedd
75* 76 77 78* 79 80 81* 82 83 84* 85 86 87* 88 89 90* 91 92 93* 94 95 96* 97 98 99* 100 101 102* 103 104 105* 106 107 108* 109 110 111* 112 113 114*
2-Methylbenzo[b]naphtho[2,3-d]thiophene 6-Methylbenzo[b]naphtho[2,3-d]thiophene 3-Methylbenzo[b]naphtho[2,3-d]thiophene 4-Methylbenzo[b]naphtho[1,2-d]thiophene 1-Methylbenzo[b]naphtho[2,3-d]thiophene 7-Methylbenzo[b]naphtho[2,3-d]thiophene 3-Methylphenanthro[9,10-b]thiophene 1-Methylanthra[2,1-b]thiophene 10-Methylphenanthro[2,1-b]thiophene 11-Methylbenzo[b]naphtho[2,3-d]thiophene 3-Methylphenanthro[2,1-b]thiophene 2-(2 -Naphthyl)benzo[b]thiophene Benzo[2,3]phenanthro[4,5-bcd]thiophene Pyreno[4,5-b]thiophene Benzo[1,2]phenaleno[3,4-bc]thiophene Triphenaleno[4,5-bcd]thiophene Pyreno[1,2-b]thiophene Chryseno[4,5-bcd]thiophene Pyreno[2,1-b]thiophene Benzo[4,5]phenaleno[1,9-bc]thiophene Benzo[4,5]phenaleno[9,1-bc]thiophene Benzo[b]phenanthro[4,3-d]thiophene Dinaphtho[2,1-b: 1 ,2 -d]thiophene Dinaphtho[1,2-b: 2 ,1 -d]thiophene Benzo[1,2]phenaleno[4,3-bc]thiophene Dinaphtho[1,2-b: 1 ,2 -d]thiophene Benzo[b]phenanthro[9,10-d]thiophene Benzo[b]phenanthro[3,4-d]thiophene Anthra[1,2-b]benzo[d]thiophene Benzo[b]phenanthro[2,1-d]thiophene Dinaphtho[1,2-b: 2 ,3 -d]thiophene 9,13-H-Triphenyleno[2,3-b]thiophene Benzo[b]phenanthro[3,2-d]thiophene Benzo[b]phenanthro[1,2-d]thiophene Benzo[b]phenanthro[2,3-d]thiophene Triphenaleno[2,1-b]thiophene Triphenaleno[1,2-b]thiophene Dinaphtho[2,3-b: 2 ,3 -d]thiophene Triphenaleno[2,3-b]thiophene 13-Methylbenzo[b]phenanthro[3,2-d]thiophene
163.235 163.019 163.141 163.205 162.925 163.041 162.840 163.253 165.126 163.485 163.114 169.453 162.151 161.837 162.000 162.375 161.805 162.319 161.782 162.058 161.768 192.436 192.010 181.269 162.127 181.903 183.616 180.752 181.297 181.355 181.328 181.239 181.256 182.855 181.310 188.237 184.066 181.221 181.325 197.851
0.40705 0.40282 0.39573 0.40014 0.39074 0.38559 0.37926 0.42339 0.41754 0.40786 0.41367 0.43625 0.44271 0.44911 0.43625 0.45854 0.44969 0.45288 0.47324 0.51374 0.45852 0.39653 0.40870 0.42288 0.46260 0.42090 0.42776 0.41004 0.42891 0.44663 0.45992 0.49666 0.46064 0.46980 0.46255 0.44774 0.43188 0.45341 0.45962 0.39299
−0.26775 −0.26584 −0.26734 −0.27206 −0.26830 −0.26797 −0.27581 −0.25683 −0.27238 −0.26495 −0.27359 −0.27214 −0.25813 −0.25955 −0.25978 −0.27467 −0.25457 −0.26104 −0.25237 −0.25500 −0.24227 −0.27261 −0.26638 −0.27203 −0.26163 −0.26923 −0.27087 −0.27140 −0.25611 −0.27707 −0.26196 −0.25208 −0.27159 −0.27703 −0.27151 −0.27437 −0.27355 −0.26991 −0.27280 −0.26724
0.04892 0.05247 0.04659 0.05005 0.04551 0.05343 0.04542 0.05472 0.05383 0.04465 0.05165 0.05138 0.05428 0.07005 0.05697 0.05377 0.06203 0.05564 0.05832 0.05901 0.06363 0.04612 0.05010 0.05376 0.06565 0.05645 0.05702 0.05654 0.05234 0.05509 0.04984 0.06354 0.06014 0.05672 0.06194 0.05587 0.05810 0.06300 0.06575 0.04887
414.69 415.02 415.11 415.41 415.54 417.07 417.70 418.22 422.14 422.85 423.48 430.65 443.29 446.51 447.66 448.45 449.30 450.62 455.01 455.99 457.30 470.47 472.62 482.60 482.99 486.58 487.32 487.76 488.45 488.89 489.14 489.81 491.02 492.31 493.31 493.90 494.41 495.17 500.00 511.19
418.56 419.53 413.96 414.74 410.82 413.86 403.18 432.18 427.06 418.78 418.51 443.62 433.90 442.27 432.08 430.26 440.85 436.55 447.02 458.77 450.66 493.39 501.12 474.09 443.73 478.14 484.41 470.60 483.60 479.40 488.52 511.39 488.76 491.34 490.44 500.87 486.04 488.87 490.89 511.77
X1 X2 X3 X4 X5 X6
3,4-Dimethyldibenzothiophene Triphenyleno-di[1,12-bcd:8,9-bcd]thiophene Anthra[2,3-b]benzo[d]thiophene Perylo[1,12-bcd]thiophene Benzo[4,5]triphenyleno[1,12-bcd]thiophene Chryseno-di[4,5-bcd:10,11-bcd]thiophene
145.559 173.762 174.058 173.954 173.986 184.200
0.33669 0.52094 0.49132 0.48840 0.44311 0.47990
−0.28729 −0.27285 −0.26053 −0.24947 −0.25974 −0.24937
0.04041 0.06884 0.06355 0.05520 0.05620 0.05375
336.8 497.9 498.1 499.1 499.8 515.5
333.46 489.41 485.22 485.59 467.37 511.13
a b c d
Unit: Vmc : A˚ 3 ; EHOMO : a.u.; Vs+ : eV; +2 : eV2 . The numbers marked by an asterisk are the PASH congeners in the test set. Experimentally determined RI values (averaged) taken from ref. [32] for PASHs 1–114, and from ref. [7] for PASHs X1–X6. The values predicted from QSRR model (2).
between the PASH and the stationary phase, and the PASH probably acts as a -type electron donor during this process. The parameter has a positive sign, which means that the larger EHOMO , the higher the RI value. The other two parameters introduced are related with the molecular surface electrostatic potentials. Vs+ represents a descriptor derived from positive electrostatic potentials on molecular surface and has been found to be statistically significant to the retention index of PASH (the simple correlation coefficient between Vs+ and RI is 0.51). Its entering in the correlation indicates that the positive electrostatic potentials of PASHs play a significant role in the retention index of PASH, and the larger it is, the stronger it is to interact with the stationary phase, implying that stationary phase may supply negative electrostatic potentials in the course of bind2 . From the definitions ing with PASHs. Also introduced in Eq. (1) is + and several groups’ as well as our own experiences with case stud2 measures the dispersion ies [27,28], it is seen that the parameter +
tendency of positive electrostatic potentials. This further supports the importance of the positive electrostatic potentials in determining the retention indices of PASHs. Being different from the average value of the positive electrostatic potentials on molecular surface, 2 emphasizes the contribution of the extremum. Vs+ , however, + Table 2 lists the magnitude of the t statistics for the model, which shows that the Vmc is the most significant factor for RI. Additionally, as there is more than one variable presented in the correlation, it is necessary to examine the stability of our regression. Upon investigating the collinearity of variables in Eq. (1), we obtained the variance inflation factor (VIF) for each descriptor (see Table 2). As one can see, the VIF values are all less than 3.0, indicating the stability of the Eq. (1) constructed (according to statistics principle, a value of 1.0 is indicative of no correlation, while a value of under 10.0 is statistically satisfactory [33]). To test the predictive power of the QSRR model, a leave-one-out cross-validated analysis has been made for Eq. (1). The cross-
206
H.-Y. Xu et al. / J. Chromatogr. A 1198–1199 (2008) 202–207
Table 2 The t-values for the correlation, the variance inflation factor (VIF) of each descriptor and the cross-validated correlation coefficient (RCV ) Model
t-test
VIF
RCV
(1)
Vmc
54.761
1.668
0.9915
(2)
Vs+ EHOMO +2 Vmc
7.417 4.522 3.037 44.846
2.860 2.263 2.897 1.784
0.9904
Vs+ EHOMO +2
6.234 3.599 2.139
2.953 2.314 2.969
validated correlation coefficient (RCV ) is larger than 0.99, which shows high predictive capability of the model. Considering that the data is plentiful, we split these data into calibration and test sets in the ratio of 2:1 (76 vs 38) to evaluate further the predictive power of the model. First we chose one every third sample as the test set (the samples numbered 3N and marked by asterisk in Table 1). By using stepwise linear regression analysis, the relationship (Eq. (2)) has been established for the calibration set (76 PASH compounds). 2 RI = −44.42 + 2.806Vmc + 296.3Vs+ + 526.0EHOMO + 513.9+ (2) N = 76, R = 0.9928, S.D. = 9.284, F = 1216
As one can see, Eq. (2) is not only of excellent statistical quality, but it introduces the same parameters as Eq. (1) with quite similar regression coefficients. Using Eq. (2), we have predicted the RI values of all PASH compounds in both the calibration and test sets, and the results are summarized in Table 1. Fig. 2 shows the relationship between the observed and predicted values from which good agreement is evident. The figure shows a kind of clustering tendency which can be ascribed to the uneven distribution of the data points, e.g., the RI values of only four PASHs fall into the range of 251–300, whereas there are 26 PASHs with their RI values being 301–350. Similarly, we chose respectively the samples numbered 3N − 1 (Scheme 2), 3N − 2 (Scheme 3), 6N − 1 and 6N (Scheme 4), 6N − 3 and 6N − 2 (Scheme 5), 6N − 5 and 6N − 4 (Scheme 6) as the test sets, and obtained the statistical results shown in Table 3 through the same treatment. As can be seen, the statistical quality is good for each model (the correlation coefficients between the predicted and observed RI values with respect to the test set are all around 0.99). Finally, 38 PASH samples were taken randomly as the test set (Scheme 7), and this yielded equally excellent statistical results (see Table 3), which demonstrates the robustness of our QSRR model established. Becker and Colmsjo¨ [7] have ever performed gas chromatography-atomic emission detection for 34 tri-, tetra-, penta- and hexacyclic aromatic sulfur heterocycles among which
Fig. 2. Plot of observed versus predicted RI by Eq. (2) for the calibration and test sets.
six are not included in our data set. Since the data from the two different sources are approximately equal for the same compounds, the RI values of these six PASHs (X1–X6 in Table 1) have been predicted by Eq. (2). As one can see, except for PASH X5, the predicted results are satisfactory (the relative average deviation is only 1.8%). The samples from other sources have not been considered as the external test set due to relatively large difference in chromatographic conditions. Using artificial neural network (ANN) approach, Can et al. [24] have established six prediction models for the chromatographic retention indices of PASHs. The optimal model was obtained by selecting Ss (sum of Kier–Hall electrotopological states) and J3D (3D-Balaban index) as a combination of descriptors. Although predictive capacity of the model was pretty good (RCV = 0.990), only part of samples in the whole data set (80 of 114 PASHs in ref. [32]) were adopted, which lowers to a great extent the credibility of the model. Wise and co-workers [25] have also determined the retention indices of a series of PASHs and investigated the correlation between RI values and molecular descriptors. Due to simplicity of the descriptors used (only length, breadth, thickness and their ratios were considered), the correlations are far from satisfactory (R ranging from 0.68 to 0.98 for different subclasses). In a recent study, Schade and Andersson [9] have investigated the GC retention indices for 30 alkylated dibenzothiophenes (mostly are polymethylated congeners) and constructed the quantitative model for structure–retention relationship using some simple parameters such as number of methyl groups in a certain position. The model is good enough, but of little value due to small
Table 3 Statistical results for different selection schemes of samples in the test set Scheme
Calibration set R
S.D.
RCV
Test set R
S.D.
1 2 3 4
0.9928 0.9921 0.9919 0.9901
9.284 9.712 9.768 10.481
0.9904 0.9901 0.9890 0.9905
0.9914 0.9924 0.9926 0.9897
9.957 9.083 9.534 10.31
5
0.9928
9.308
0.9906
0.9909
10.21
6
0.9916
9.766
0.9893
0.9931
9.137
7
0.9914
0.9889
0.9937
8.193
10.31
Sample number in the test set 3, 6, 9, . . ., 114 (3N, N = 1–38) 2, 5, 8, . . ., 113 (3N − 1, N = 1–38) 1, 4, 7, . . ., 112 (3N − 2, N = 1–38) 5, 6, 11, 12, 17, 18, . . ., 113, 114 (6N − 1 and 6N, N = 1–19) 3, 4, 9, 10, 15, 16, . . ., 111, 112 (6N − 3 and 6N − 2, N = 1–19) 1, 2, 7, 8, 13, 14, . . ., 109, 110 (6N − 5 and 6N − 4, N = 1–19) 1, 5, 11, 13, 16, 18, 21, 24, 29, 30, 33, 34, 37, 40, 42, 45, 49, 53, 55, 57, 61, 64, 65, 69, 72, 75, 77, 81, 84, 88, 92, 95, 98, 100, 103, 106, 109, 112
H.-Y. Xu et al. / J. Chromatogr. A 1198–1199 (2008) 202–207
structural variation of the samples. Compared with the abovementioned QSRR models for PASHs, Eqs. (1) and (2) obviously have more general applicability and better explanatory power. 4. Conclusion The structural descriptors derived from molecular electrostatic potentials together with molecular volume (Vmc ) and the energy of the highest occupied molecular orbital (EHOMO ) can be well used to express the quantitative structure–retention relationship of PASHs. The model constructed should have good predictive ability for the PASH molecules with different number and/or position of condensed ring and substituted alkyl groups. This, together with previous analogous QSPR studies [26–29], demonstrates to a certain extent that the parameter set derived from electrostatic potentials on molecular surface has broad applicability in the prediction of chromatographic retentions as well as other physicochemical properties of organic compounds.
[8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
Acknowledgements [28]
The authors are grateful to the Natural Science Foundation of Zhejiang province (Y406042) and Natural Science Foundation of Ningbo City (2007A610081) for financial supports.
[29] [30] [31]
References [1] C.S. Song, Catal. Today 86 (2003) 211. [2] J. Jacob, Sulfur Analogues of Polycyclic Aromatic Hydrocarbons (Thia-Arenes), Cambridge University Press, Cambridge, 1990. [3] F. Liang, M. Lu, M.E. Birch, T.C. Keener, Z. Liu, J. Chromatogr. A 1114 (2006) 145. [4] X.H. Wang, L.J. Hong, Z.L. Zhang, H.S. Hong, Chin. Mar. Environ. Sci. 18 (1999) 72. ¨ [5] S.G. Mossner, S.A. Wise, Anal. Chem. 71 (1999) 58. ¨ ¨ C. Ostman, [6] G. Becker, A. Colmsjo, Environ. Sci. Technol. 33 (1999) 1321. ¨ Anal. Chim. Acta 376 (1998) 265. [7] G. Becker, A. Colmsjo,
[32] [33]
207
A.H. Hegazi, J.T. Andersson, Energy Fuel 21 (2007) 3375. T. Schade, J.T. Andersson, J. Chromatogr. A 1117 (2006) 206. A.H. Hegazi, J.T. Andersson, M.Sh. El-Gayar, Fuel Process. Technol. 85 (2004) 1. R. Kaliszan, Chem. Rev. 107 (2007) 3212 (and references therein). ´ K. Heberger, J. Chromatogr. A 1158 (2007) 273 (and references therein). F.P. Liu, Y.Z. Liang, C.Z. Cao, N. Zhou, Anal. Chim. Acta 594 (2007) 279. C.H. Lu, A.F. Jalbout, L. Adamowica, Y. Wang, C.S. Yin, Bull. Environ. Contamin. Toxicol. 77 (2006) 793. Y.H. Zhang, S.S. Liu, H.Y. Liu, Chromatographia 65 (2007) 319. P. Tulasamma, K.S. Reddy, J. Mol. Graphics Modell. 25 (2006) 507. ¨ ´ ¨ enyi, ´ ´ T. Kortv elyesi, M. Gorg K. Heberger, Anal. Chim. Acta 428 (2001) 73. ´ O. Farkas, K. Heberger, I.G. Zenkevich, Chemom. Intell. Lab. Syst. 72 (2004) 173. F. Safa, M.R. Hadjmohammadi, QSAR Comb. Sci. 24 (2005) 1026. R.-J. Hu, H.-X. Liu, R.-S. Zhang, C.-X. Xue, X.-J. Yao, M.-C. Liu, Z.-D. Hu, B.-T. Fan, Talanta 68 (2005) 31. Z. Lin, S. Liu, Z. Li, J. Chromatogr. Sci. 40 (2002) 7. Z. Zhai, Z. Wang, L. Wang, J. Mol. Struct: THEOCHEM 724 (2005) 115. G.H. Ding, J.W. Chen, X.L. Qiao, L.P. Huang, J. Lin, X.Y. Chen, SAR QSAR Environ. Res. 16 (2005) 301. H. Can, A. Dimoglo, V. Kovalishyn, J. Mol. Struct. (THEOCHEM) 723 (2005) 183. ¨ S.G. Mossner, M.J. Lopez de Alda, L.C. Sander, M.L. Lee, S.A. Wise, J. Chromatogr. A 841 (1999) 207. M. Betsy, B.M. Rice, J.J. Hare, J. Phys. Chem. A 106 (2002) 1770. H.Y. Xu, J.W. Zou, Q.S. Yu, Y.H. Wang, J.Y. Zhang, H.X. Jin, Chemosphere 66 (2007) 1998. J.S. Murray, T. Brinck, P. Lane, K. Paulsen, P. Politzer, J. Mol. Struct. (THEOCHEM) 113 (1994) 55. J.S. Murray, F. Abu-Awwad, P. Politzer, J. Phys. Chem. A 103 (1999) 1853. A. Pedretti, L. Villa, G. Vistoli, J. Mol. Graphics 21 (2002) 47. M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman, V.G. Zakrzewski, J.A. Montgomery, R.E. Stratmann, J.C. Burant, S. Dapprich, J.M. Millam, A.D. Daniels, K.N. Kudin, M.C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G.A. Petersson, P.Y. Ayala, Q. Cui, K. Morokuma, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J. Cioslowski, J.V. Ortiz, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham, C.Y. Peng, A. Nanayakkara, C. Gonzalez, M. Challacombe, P.M.W. Gill, B.G. Johnson, W. Chen, M.W. Wong, J.L. Andres, M. Head-Gordon, E.S. Replogle, J.A. Pople, Gaussian 98 (Revision A.9), Gaussian, Inc., Pittsburgh, PA, 1998. D.L. Vassilaros, R.C. Kong, D.W. Later, M.L. Lee, J. Chromatogr. 252 (1982) 1. D.A. Belesley, E. Kuh, R.E. Welsh, Regression Diagnostics, Wiley Press, New York, 1990.