Accepted Manuscript Title: Application of a Quantitative Structure Retention Relationship approach for the prediction of the two-dimensional gas chromatography retention times of polycyclic aromatic sulfur heterocycle compounds Author: Rafal Gieleciak Darcy Hager Nicole E. Heshka PII: DOI: Reference:
S0021-9673(16)30072-3 http://dx.doi.org/doi:10.1016/j.chroma.2016.02.006 CHROMA 357278
To appear in:
Journal of Chromatography A
Received date: Revised date: Accepted date:
10-11-2015 28-1-2016 1-2-2016
Please cite this article as: Rafal Gieleciak, Darcy Hager, Nicole E.Heshka, Application of a Quantitative Structure Retention Relationship approach for the prediction of the two-dimensional gas chromatography retention times of polycyclic aromatic sulfur heterocycle compounds, Journal of Chromatography A http://dx.doi.org/10.1016/j.chroma.2016.02.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Title: Application of a Quantitative Structure Retention Relationship approach for the prediction of the two-dimensional gas chromatography retention times of polycyclic aromatic sulfur heterocycle compounds
Authors: Rafal Gieleciak*, Darcy Hager, Nicole E. Heshka
Address: CanmetENERGY, Natural Resources Canada, 1 Oil Patch Drive, Devon, Alberta, T9G 1A8 Canada
*Corresponding Author: Tel.: +1 780 987 8349 E-mail address:
[email protected]
1
Highlights Quantitative structure retention relationships are used to model gas chromatography Model developed and validated for two-dimensional gas chromatography Retention times of sulfur containing compounds are predicted Prediction enables identification beyond available compounds Identification of sulfur compounds enables better desulfurization technology
Abstract Information on the sulfur classes present in petroleum is a key factor in determining the value of refined products and processing behavior in the refinery. A large part of the sulfur present is included in polycyclic aromatic sulfur heterocycles (PASHs), which in turn are difficult to desulfurize. Furthermore, some PASHs are potentially more mutagenic and carcinogenic than polycyclic aromatic hydrocarbons, PAHs. All of this calls for improved methods for the identification and quantification of individual sulfur species. Recent advances in analytical techniques such as comprehensive two-dimensional gas chromatography (GC×GC) have enabled the identification of many individual sulfur species. However, full identification of individual components, particularly in virgin oil fractions, is still out of reach as standards for numerous compounds are unavailable. In this work, a method for accurately predicting retention times in GC×GC using a QSRR (Quantitative Structure Retention Relationship) method was very helpful for the identification of individual sulfur compounds. Retention times for 89 saturated, aromatic, and polyaromatic sulfur-containing heterocyclic compounds were determined using two-dimensional gas chromatography. These retention data were correlated with molecular descriptors generated with CODESSA software. Two independent QSRR relationships were derived for the primary as well as the secondary retention characteristics. The predictive ability of the relationships was tested by using both independent sets of compounds and a cross-validation technique. When the corresponding chemical standards are unavailable, the equations developed for predicting retention times can be used to identify unknown chromatographic peaks by matching their retention times with those of sulfur compounds of known molecular structure. Keywords: Quantitative structure retention relationship (QSRR), two dimensional gas chromatography (GC×GC), retention prediction, variable selection, sulfur standards, PASHs.
2
1. Introduction The majority of organosulfur compounds in crude oils and their industrial conversion products are polycyclic aromatic sulfur heterocycles (PASHs). Information on the sulfur classes present in petroleum is a key factor in determining the value of such streams. The reason for this is clear as sulfur compounds participate in many undesirable processes such as catalyst poisoning [1-3], corrosion [4,5] and environmental pollution [6,7].
Furthermore, it is known that
carcinogenic and mutagenic effects for some PASHs are higher than for analogous PAH compounds [8-10]. For example, benzo[2,3]phenanthro[4,5-bcd]thiophene was found to be more carcinogenic than benzo[a]pyrene [11]. However, the most disturbing fact is that the feedstocks processed in refineries around the world are becoming heavier and the total sulfur concentration as well as the fractional contribution of PASHs is
increasing. Therefore, deep
hydrodesulfurization (HDS) technology must be implemented to reduce levels of sulfur in finished products [6]. While many groups of sulfur compounds are moderately easy to remove, some PASH classes, such as dibenzothiophenes, are particularly resistant towards desulfurization [12,13]. Powerful analytical separation and speciation are necessary to determine which isomers are unaltered and at what concentrations they are present. Gas chromatography (GC) with a sulfur-specific detector has been the preferred characterization technique for the detailed analysis of sulfur compounds in petroleum. However, detailed separation cannot be obtained using conventional GC because of the lack of chromatographic resolution [14-17]. Comprehensive two-dimensional gas chromatography (GC×GC) offers huge potential for the examination of complex mixtures (such as petroleum products) due to its superior chromatographic resolving power. Moreover, this method is characterized by improved detection limits due to the focusing of analytes during the modulation process, and the structured nature of chromatograms, where compounds are arranged according to their chemical group and their numbers of carbon atoms [18-20]. Figure 1 shows examples of two-dimensional chromatograms of sulfur in petroleum samples before and after hydrodesulfurization. It is clearly seen (Figure 1a) that thiophenes (Th), benzothiophenes (BTh), and dibenzothiophenes (DBTh) form three bands that are almost parallel to one another. This band arrangement demonstrates the orthogonality of the separation mechanisms in the two dimensions, as the peaks are distributed across the separation plane and effectively occupy the available separation space. After hydrodesulfurization (Figure 1b), only compounds resistant to this process (dibenzothiophenes) 3
are left in the sample. Detailed information on the distribution of such compounds can serve as an important basis for improving desulfurization technology. In spite of the great advances that accompany the use of GC×GC, the full identification of many individual sulfur species, particularly in virgin oil fractions, remains a problem, as standards for numerous compounds are unavailable. Although the molecular structure can often be deduced from mass spectral fragmentation patterns using GC×GC coupled with time-of-flight-mass-spectrometry (GC×GCTOFMS), the large numbers of compounds having very similar structures present in oil fractions makes the identification and assignment process very tedious. A method for correctly predicting retention times or indices would be useful for the identification of individual sulfur compounds. Earlier studies have shown that it is possible to predict the gas chromatographic retention characteristics of sulfur compounds in crude oil−based samples using a quantitative structure– retention relationship (QSRR) [21,22]; both of these studies used a single dimension of GC and employed atomic emission detection. Other works have applied QSRR to model one-dimensional GC separations of chlorinated pesticides and organohalides [23], saturated esters [24], alkyl benzenes [25], polyhalogenated biphenyls [26], and hydrocarbons found in gasoline [27]. Ren et al. reported the use of a QSRR model for the prediction of GC×GC retention of polychlorinated biphenyl congeners, but only models for the first dimension are shown [28]. The aforementioned GCxGC retention data were later modelled by D’Archivio et al., [29] who predicted retention behavior of solutes for both dimensions using various regression approaches. An alternative approach to predicting retention times is to model the thermodynamic parameters of the separation; this approach has been demonstrated for the prediction of GC×GC retention times for alkyl phosphates on ionic columns [30] and for alkanes, ketones, and alcohols on a more conventional column set [31].
QSRR modeling has also been applied to one-dimensional
thermodynamic GC data of alkanes, alcohols, and alkyl halides, to expand data to new molecules [32]. Despite the potential represented by previous work, a QSRR model for the prediction of retention behavior of sulfur-containing compounds in two-dimensional chromatography has not been reported. The main attraction of the QSRR modeling approach, in comparison to other methods used for the prediction of retention characteristics in GCxGC, is that it can be used without the laborious and expensive process of generating a great deal of experimental data. The QSRR modeling results obtained in this work for the primary column are comparable with the other 4
QSRR models reported in the literature for sulfur-containing hydrocarbons. A short review of QSRR models is summarized in Table 1. Implementing the secondary separation and hence the QSRR modeling of secondary retention times may simplify the process of proper identification of sulfur compounds as well as supporting lab operators during the generation of classification templates used in GCxGC data reporting. An excellent comprehensive review of QSRR applications was written by K. Heberger, and covers the period between 1996 and 2006 [33]. The work presented herein demonstrates that QSRR can be used to successfully predict the retention of sulfur-containing compounds in GC×GC, including the second dimension. GC×GC data for 89 model compounds representing sulfur compounds found in crude oil are shown. The model is validated and results showing predicted vs. experimental retention times are presented.
2. Methodology 2.1.
Instrumentation The GC×GC experiments were performed on an Agilent 6890 gas chromatograph
(Agilent Technologies, Mississauga, ON, Canada) equipped with a liquid nitrogen−cooled quadjet cryogenic modulator (Leco Instruments, Mississauga, ON, Canada). The detector used was a sulfur chemiluminescence detector (SCD; Sievers 355, Agilent Technologies) operating in tandem with a flame ionization detector (FID) (Agilent Technologies). The column characteristics and the operating conditions for GC×GC experiments are listed in Table 2.
2.2.
Data set A suite of 89 commercial standards was obtained from Sigma-Aldrich (Mississauga,
ON), Acros Organics (Fisher Scientific, Ottawa, ON), Chiron (Trondheim, Norway), and from Dr. Jan Andersson (University of Munster, Munster, Germany). The sulfur compounds considered in this study are listed in Table 3. Analytical data were collected using solutions containing mixtures of sulfur compounds in dichloromethane with concentrations ranging from 5 to 50 ppm.
2.3.
Molecular descriptor generation
5
After selecting molecules and chromatographic properties to be modeled, molecular descriptors to consider in the QSRR models must be generated. The structures of each of the 89 model compounds were drawn and pre-optimized using MarvinSketch and Standardizer utilities, respectively,
provided
by
ChemAxon
(Hungary).
The
three-dimensional
molecular
representations obtained were converted into a file format suitable for AMPAC (AMPAC version 8.16, SemiChem, Inc., Shawnee, KS, USA) via the Open Babel shareware file conversion system (Open Babel, http://openbabel.org/wiki/Main_Page). Precise geometry optimization was carried out using quantum semi-empirical PM3 methodology implemented in the AMPAC software. The output files from AMPAC were passed as an input to the CODESSA program in order to generate molecular descriptors (CODESSA version 2.7.8, SemiChem, Inc., Shawnee, KS, USA). Over 450 molecular descriptors were calculated for each compound using CODESSA. These descriptors can be classified as constitutional (derived from the molecular composition), topological (based on atomic connectivity in the molecule), geometrical (calculated from 3-D geometrical structure), electronic (related to the partial charge distribution in the molecule), thermodynamic, or quantum-chemical. A more detailed description of the individual descriptors can be found in the CODESSA Reference Manual [43].
2.4.
Statistical analysis The statistical calculations were performed using programs written in-house in MATLAB
(2012b; MathWorks, Inc., Natick, MA, USA). MATLAB functions and m-scripts are available from the authors upon request.
2.4.1. Multilinear approach Multilinear regression analysis (MLR) is a common method used in QSRR studies. Equations linking the structural features to the experimental properties are developed using Equation 1: tR = b0 + b1x1 + … + bnxn
(1)
where tR is the retention time property; the intercept (b0) and the regression coefficients of the descriptors (bi) are determined by using the least-squares method. The descriptors (xi) included in 6
the equation are used to describe the chemical structure of compounds and n is the number of descriptors.
2.4.2. Variable Selection The best multilinear regression (BMLR) approach described by Katritzky et al. [44] was used to derive the best QSRR models for retention time (tR). A detailed description of the method is available in the CODESSA Reference Manual [43]. In short, the complete list of descriptors initially calculated for the entire set of 89 compounds was reduced by eliminating descriptors considered redundant (i.e. a one-parameter correlation coefficient is less than 0.01). From the remaining descriptors, all orthogonal pairs were found and treated by using two-parameter regression with the chromatographic property. The predefined number (Nc = 400) of pairs with the highest regression correlation coefficients were chosen to provide higher-order regression analysis that involved successively adding (from the pool of still-available descriptors) the third (and then higher) descriptor to the best two descriptor correlation. The strategy of consecutive addition of variables is monitored by the Fisher criterion (F) at a given probability level (=95%). The variable selection procedure was finished when the F criterion, calculated for the existing descriptor set, was smaller than that for the previous correlation, or until all the available descriptors had been tried. The BMLR method was originally implemented in CODESSA. However, in the present work this algorithm was embedded into MATLAB to achieve greater flexibility and control of variable selection routines.
2.4.3. Model Validation The calibration and predictive capability of a QSRR model must be tested through model validation. To achieve validation, the available descriptor and chromatographic retention data were divided into a training set and a validation set using the Kennard-Stone approach [45]. Then QSRR models were established based on the training data set, and internal validation (based on cross-validation) together with external validation (based on the test set) was performed to test the predictive capability of the resulting QSRR models.
7
The internal predictive capability of a model was evaluated by leave-one-out crossvalidation (LOO-CV) (q2model) on the training set, which was calculated using the following formula: model
(y
i
yˆ i ) 2
i 1 q 2model 1 model ( yi yi ) 2
(2)
i 1
where y i , yˆ i , and y i were the experimental, LOO-CV predicted, and average retention time (tR) of the samples for the training set, respectively. The predictive capability of a model on the external prediction set can be expressed by q2test using the following equation: test
q 2test 1
(y i 1 test
(y i 1
i
yˆ i ) 2
i
yi )
(3) 2
where y i , yˆ i , and y i were the experimental, predicted and average retention times (tR) of the samples for the test set, respectively. Using the parameters above, QSRR models for GC×GC derived in this work are described by four parameters (two for each dimension). Since the extra parameters can complicate the interpretation of a model, we introduced new parameters that combine the results from both QSRR models. The new parameters are called Q2DISTmodel and Q2DISTtest whose formulas are given in Equations 4 and 5. The mathematical formulas and statistical interpretations of these parameters are similar to those for the earlier-described q2model and q2test parameters; but instead of calculating simple differences between dependent variables we calculate Euclidean distance in the scaled two-dimensional space of retention times: model
dist ( y , yˆ ) i
i
i 1 Q 2 DISTmodel 1 model dist ( yi , yi )
(4)
i 1
test
Q2DISTtest 1
dist ( y , yˆ ) i 1 test
i
i
dist ( y , y ) i 1
i
(5)
i
8
Here dist( y i , yˆ i ) and dist( y i y i ) are the Euclidean distances between experimental and predicted or average retention time (tR) of the samples for the training and test sets, respectively. All the retention characteristics were scaled to the <0, 1> range prior to distance calculation. The resulting models were tested for their statistical validity and robustness by using a Yscrambling technique. Williams plots were used to verify the presence of outliers and to demonstrate the applicability domain of QSRR models.
3. Results and discussion 3.1. GCxGC retention data To generate a database of retention times of sulfur-containing heterocyclic compounds used in the QSRR studies, the solution mixtures were subjected to GC×GC separation and their experimental retention characteristics were recorded (Table 3). The database of sulfur compounds considered in this study consists of 89 species including three thiols, four sulfides, 12 thiophenes, 29 benzothiophenes, 24 dibenzothiophenes, and 17 other polycyclic aromatic sulfur compounds. To verify chromatographic performance and reproducibility of the GC×GC separation, a standard mixture of thiophene, 1-dodecanethiol, 1-octadecanethiol, benzothiophene, 4,6diethylbenzothiophene, thianthrene, and benzo[b]naphtho[1,2-d]thiophene was separated after each GCxGC analysis of a mixture of model compounds. The compounds in the standard were selected to map the experimentally available sulfur chromatographic space. Figure 2 shows a GC×GC chromatogram of the external standard mixture, the structures of the molecules, and the ranges and standard deviations for their retention times in both dimensions. For the first dimension, retention times reported for peaks typically differed by only one modulation period, with the exception of compound 2, which had primary retention times shifted by up to two modulation periods. In comparison to the shifts in the first dimension the changes of retention time in the second dimension are higher, especially for highly polar compounds such as polyaromatic sulfur compounds. Additionally, the peak width increased for compounds eluting later on the second dimension. The compounds analyzed, along with their tR values, are listed in Table 3.
9
3.2.Development of QSRR models for sulfur compounds The objective of the current work was to find general QSRR relationships for the prediction of GC×GC retention times for sulfur compounds commonly found in middle distillates, with particular focus on PASH compounds. The resulting models were designated QSRrt1R and, QSRrt2R, after correlating the structures with primary and secondary retention times (1tR/2tR), respectively. Prior to QSRR model development, the data set was split into training (n = 77 compounds) and test sets (n = 12 compounds) using the Kennard-Stone approach [45]. The result of this sample subset selection represented in GC×GC space is shown in Figure 3. Advantages of the Kennard-Stone algorithm are that the calibration samples map the measured region of the variable space completely and that the test samples all fall inside the measured region. Table 3 lists the experimental and predicted values for retention times for all of the molecules used in this study. Compounds that are members of the test set are presented in bold italic font. The ‘best multilinear regression variable selection’ method was used to develop all the QSRR models for the prediction of both primary and secondary retention characteristics using calculated structural descriptors. Statistical quality measures for the regression equations developed here are listed in Table 4. The ratios of the number of data points to the number of descriptors for all models were kept greater than 10 to prevent overfitting. Ratios greater than 5 are usually considered sufficient [50]. The statistical significance (p-value) of the regression coefficients was estimated from a ttest. Higher t-values correspond to a greater statistical significance of regression coefficients. For each regression equation, the associated number of molecules (N), coefficient of determination (R2), square of the standard error of regression (s2), F-statistic (F), adjusted coefficient of determination (R2adj), cross-validated coefficient of determination (q2model), and external coefficient of determination (q2test) are reported. For each of the descriptors obtained regression coefficients (b), the student’s t-statistic (t-test), the p-value, the VIF (variance inflation factor) and descriptor names are reported (Table 4). VIF is defined as: VIF=1/(1-R2desc) where R2desc corresponds to a correlation developed by assuming one of the descriptor values to be a property and performing a multivariate linear regression using the other descriptors. The VIF provides a route to determine if there is interdependence or non-orthogonality among the descriptors. A VIF larger than 10 is indicative of multicolinearity problems [51]. Before applying multilinear
10
regression, all independent variables (X) were autoscaled so that regression coefficients would have both quantitative and qualitative meaning. The correlation model for 1tR (QSRrt1R) is shown in Table 4. This model is characterized by a relatively high correlation coefficient R2 = 0.9966. The LOO-CV procedure applied to this model resulted in q2model = 0.9953. Estimations based on the test data set show q2test = 0.9966, which, together with q2model, indicate a good-quality QSRR model. The seven-parameter model for primary retention time contains one constitutional, one geometrical, one topological, one thermodynamic, and three quantum-chemical descriptors. According to the p-value test, which indicates the significance of descriptors, a value of p≤0.001 and absolute t-test value > 2 are acceptable. Figure 4a shows the predicted vs. experimental 1tR values for GC×GC data on the basis of QSRrt1R model. The interpretation of the descriptors in the above-mentioned model provides insight into factors that are likely to govern the chromatographic retention of sulfur compounds on non-polar stationary phases. It is well known that chromatographic retention is based on the interaction between the solute and the stationary phase, which includes directional forces, induction forces, dispersion forces, hydrogen bonding, and so on. Retention is the macroscopic reflection of the molecular structure of the solute and the properties of the stationary phase. Among the seven descriptors included in a QSRrt1R model, a gravitation index is characterized by the greatest contribution to 1tR prediction. Gravitation index over all bonded atoms accounts for the distribution of the atomic masses within the molecular space and quantifies the bulk cohesiveness of a compound due to the dispersion interactions. The distribution of compounds along the first dimension in GC×GC is mainly based on their boiling points. It has been shown previously that boiling temperature increases with the molecular weight and gravitational index, which seems to prove that the inclusion of this descriptor in the QSRR equation is not an artifact [52]. The next most influential parameter is the Wiener index (WI); WI represents an approximate measure of the van der Waals molecular surface area and is inversely proportional to the compactness of the molecule. Consequently, for non-polar molecules, WI is proportional to intermolecular forces. The negative value of the regression coefficient for this descriptor suggests that the compounds with a higher value of WI spent less time on the chromatographic column.
11
A seven-descriptor model was also found to be optimal for predicting secondary retention time 2tR (QSRrt2R). A general statistical description of this model is provided in Table 4. This model has an R2 value of 0.9933. The LOO-CV procedure applied to this model resulted in q2model = 0.9914. The external predictivity for this model was assessed using the external test data set (q2test = 0.9973). A graphical representation of the relationship between experimental and predicted 2tR values for GC×GC data on the basis of the QSRrt2R model is shown in Figure 4b. As indicated, the model contains one constitutional, one thermodynamic, two topological, and three quantum-chemical descriptors. The secondary column (VF-17ms) is medium-polar and consists of 50% phenyl-50% dimethylpolysiloxane. Therefore, the secondary separation is based on π-π interactions and molecular polarizability, so the descriptors selected for a QSRR model should reflect the selectivity introduced by this column. Two of the most important descriptors in this correlation are the relative number of benzene rings and the Randic index (order 3). These indices can be considered to be related to the intermolecular interaction between the molecule studied and the gas chromatographic medium. The relative count of benzene rings represents the size and electronic distribution of a molecule, accounting for the polarizability and hydrophobicity of a molecule. Figure 1a clearly indicates the dependence of second-dimension retention on the selected descriptors. The Randic index (order 3) encodes aspects of molecular connectivity, the order being related to the number of atoms involved in the molecular connectivity subgraph. The inclusion of this descriptor reflects the shape-dependence of secondary retention that is expected for the model analytes on a highly phenylated phase [53]. Work by Dvorak et al. [54] and Liu et al. [55] has shown that a relationship exists between the Randic index (calculated for any connected graph) and the dimension. This, along with the observation that molecular size is one of the factors that may have an effect on the elution of compounds in the secondary dimension, supports the use of the descriptor for shape dependence. Table 4 indicates that the variance inflation factor for those descriptors is higher than 10, suggesting a multicolinearity in the descriptor space. In fact the correlation coefficient between those descriptors is 0.59. Information Content (order 0) is a topological descriptor that encodes a molecule’s complexity and quantifies the heterogeneity and redundancy of topological neighborhoods of atoms in molecules. The trend described by this parameter can be seen in Figure 1a as the decreasing secondary retention times of more highly alkylated dibenzothiophenes. 12
The statistical performance of the QSRR model that was calculated for the second dimension was similar to that of the model for the first dimension, a finding that exceeded our expectations. The reasons for this seem to lie in the unusual experimental conditions (i.e. short column and time of analyte on the column, almost constant temperature program in the second oven) as well as greater experimental error connected with separation on the second column (see Figure 2 for repeatability test). Once again, there is a need to point out that, in GC×GC, each compound is described by two parameters (i.e. retention times for both dimensions). To estimate the quality of the QSRR models for GC×GC we had to calculate and check two parameters, q2model and q2test, for each dimension. To simplify the interpretation process and for visualization purposes we introduced new parameters that combine the results from both QSRR models; moreover they provide more insight. In this way we reduce the number of parameters to two. Formulas for these two parameters (Q2DISTmodel and Q2DISTtest) are given in Equations 4 and 5. The Q2DISTmodel values show the internal predictive ability of a QSRR model and are based on LOO-CV on the training set. The Q2DISTtest values estimate the predictive power of regression models based on the external data set. Calculated values for those parameters are listed in Table 4 for QSRR models for modeling retention time. These parameters seem to have characteristics similar to those of other known and widely applicable parameters such as q2model and q2test introduced in Equations 2 and 3. However, these similarities will have to be verified with future experiments. Figure 5 depicts the comparison of the observed and the predicted GC×GC retention times for test set compounds based on two independent QSRR models (QSRrt1R/ QSRrt2R). It is clear that there is excellent agreement between the predicted retention times and those observed. QSRR models should be evaluated for their applicability with regard to chemical domain to confirm that they produce reliable predicted data for compounds that are not too structurally dissimilar. To visualize this phenomenon we applied Williams plots (see Figure 6a and 6b), which illustrate the behavior of standardized residuals vs. leverage values (H) for QSRrt1R and QSRrt2R models, respectively. The Williams plots are used as a simple graphical means of detecting both of the response outliers (i.e. compounds with standardized residuals greater than three standard deviations) and structurally influential chemicals in a model (i.e. with high leverages, usually above a given threshold) [56]. Thresholds for models derived for this QSRR study are presented as horizontal and vertical dotted lines in Figure 6. A chemical with high 13
leverage in the training set greatly influences the regression but it is not an outlier for the response fitting. On the contrary, the predictions for high-leverage chemicals in the test set could be extrapolated and would thus become unreliable. A high-leverage chemical is structurally anomalous in the chemical domain of the model. The Williams plots did not demonstrate any seriously anomalous behavior. There were no chemical compounds having both high-residual and high-leverage responses. Chemicals 1, 3, 74,
and
89
(thiophene,
tetrahydrothiophene,
3-(1-naphthyl)benzothiophene,
and
2-
eicosylbenzothiophene) are outside the determined chemical domain for both QSRR models. All mentioned compounds are assigned to the model set. There were negligible changes in performance of QSRR models after removing objects with high leverages from the model set and recalculating QSRrt1R and QSRrt2R models. These objects can expand the chemical domain of both QSRR models as the first two are very small cyclic compounds and the fourth one is the longest aromatic sulfur compound in the data set. Compounds 86 (Figure 6a) and 39, (Figure 6b), dinaphtho[2,1-b:2',1'-d]thiophene, and 1,2,3,4-tetrahydrodibenzothiophene, respectively which belong to the model chemical domain, are poorly predicted (≥3σ). Removing these outliers and rebuilding the QSRR models did not introduce significant changes in QSRR model performance. Finally, Y-scrambling was performed to check the robustness of the QSRR models and the possibility of random correlation. About 1000 random permutations of the response variables were computed and used to generate QSRR pseudomodels. The statistical characteristics of Yrandomization routines are presented in Table 5. The QSRR pseudomodels have significantly lower q2model and R2 values than the original models. This demonstrates that the QSRR models that we have developed are not due to random correlation and are therefore statistically reliable.
4. Conclusions The present study has demonstrated the first successful application of the classical quantitative structure-retention relationship approach for the prediction of gas chromatographic retention times of PASHs and other sulfur compounds in GC×GC. The cross-validation technique, use of external set of compounds, Y-scrambling technique, and Williams plots incorporated in the study ensured that the models performed as stably and reliably as possible.
14
New parameters, namely Q2DISTmodel and Q2DISTtest, were introduced to describe statistical behaviors of QSRR models. Future work will focus on the development of QSRR models for retention indices for a two-dimensional system. The benefit of using retention indices versus retention times is that indices do not display the same dependence on parameters such as the condition of the GC column, operating conditions employed (e.g. pressure), and solute concentrations that retention times do. This process is not trivial, and will require the development of models that do not depend solely on secondary retention times in order to be relevant. The coupling of the GC×GCSCD technique and QSRR modeling can serve as the basis for methodologies and technologies that improve desulfurization of crude and processed oils. The equations developed for predicting retention times can be used for identification of unknown chromatographic peaks by matching their retention times with those of sulfur compounds of known molecular structure when the corresponding chemical standards are unavailable.
Acknowledgements The authors would like to acknowledge support from the Government of Canada's interdepartmental Program of Energy Research and Development, PERD 1.1.3. Petroleum Conversion for Cleaner Air.
15
References
[1]
S.L. Lakhapatri, M.A. Abraham, Analysis of catalyst deactivation during steam reforming of jet fuel on Ni-(PdRh)/-Al2O3 catalyst, Appl. Catal. A 405 (2011) 149-159.
[2]
T. Kolli, M. Huuhtanen, A. Hallikainen, K. Kallinen, R.L. Keiski, The Effect of Sulphur on the Activity of Pd/Al2O3, Pd/CeO2 and Pd/ZrO2 Diesel Exhaust Gas Catalysts, Catal. Lett. 127 (2009) 49-54.
[3]
J.A. Rodriguez, J. Hrbek, Interaction of Sulfur with Well-Defined Metal and Oxide Surfaces: Unraveling the Mysteries behind Catalyst Poisoning and Desulfurization, Acc. Chem. Res. 32 (1999) 719-728.
[4]
H. Fang, B. Brown, D. Young, S. Nešić, Investigation of Elemental Sulfur Corrosion Mechanisms, NACE Corrosion Paper No. 11398 (2011) 1-13.
[5]
J. Sojka, M. Jerome, M. Sozanska, P. Vanova, L. Rytirova, P. Jonsta, Role of microstructure and testing conditions in sulphide stress cracking of X52 and X60 API steels, Mater. Sci. Eng. A 480 (2008) 237-243.
[6]
C. Song, An overview of new approaches to deep desulfurization for ultra-clean gasoline, diesel fuel and jet fuel, Catal. Today 86 (2003) 211-263.
[7]
S. Hameed, J. Dignon, Changes in the Geographical Distributions of Global Emissions of NOx and SOx from Fossil-Fuel Combustion Between 1966 and 1980, Atmos. Environ. 22 (1988) 441-449.
[8]
D.A. Eastmond, G.M. Booth, M.L. Lee, Toxicity, accumulation, and elimination of polycyclic aromatic sulfur heterocycles in Daphnia magna, Environ. Contam. Tox. 13 (1984) 105-111.
[9]
A. Eisentraeger, C. Brinkmann, H. Hollert, A. Sagner, A. Tiehm, J. Neuwoehner, Heterocyclic
compounds:
Toxic
effects
using
algae,
daphnids,
and
the
Salmonella/microsome test taking methodical quantitative aspects into account, Environ. Toxicol. Chem. 27 (2009) 1590-1596. [10]
I.-K. Kim, C.-P. Huang, P.C. Chui, Sonochemical decomposition of dibenzothiophene in aqueous solution, Water Res. 35 (2001) 4370-4378.
[11]
W. Kleibohmer, Environmental Analysis, Elsevier Science B.C., Amsterdam, The Netherlands, 2001. 16
[12]
T. Schade, J.T. Andersson, Speciation of Alkylated Dibenzothiophenes in a Deeply Desulfurized Diesel Fuel, Energy Fuels 20 (2006) 1614-1620.
[13]
S.H. Ali, D.M. Hamad, B.H. Albusairi, M.A. Fahim, Removal of Dibenzothiophenes from Fuels by Oxy-desulfurization, Energy Fuels 23 (2009) 5986-5994.
[14]
R. Hua, Y. Li, W. Liu, J. Zheng, H. Wei, J. Wang, X. Lu, H. Kong, G. Xu, Determination of sulfur-containing compounds in diesel oils by comprehensive two-dimensional gas chromatography with a sulfur chemiluminescence detector, J. Chromatogr. A 1019 (2003) 101-109.
[15]
J. Blomberg, T. Reimersma, M. van Zuijlen, H. Chaabani, Comprehensive twodimensional gas chromatography coupled with fast sulphur-chemiluminescence detection: implications of detector electronics, J. Chromatogr. A 1050 (2004) 77-84.
[16]
R. Ruiz-Guerrero, C. Vendeuvre, B. Thiebaut, F. Bertoncini, D. Espinat, Comparison of Comprehensive
Two-Dimensional
Gas
Chromatography Coupled
with
Sulfur-
Chemiluminescence Detector to Standard Methods for Speciation of Sulfur-Containing Compounds in Middle Distillates, J. Chromatogr. Sci. 44 (2006) 566-573. [17]
B.M.F. Ávila, V.B. Pereira, A.O. Gomes, D.A. Azevedo, Speciation of organic sulfur compounds using comprehensive two-dimensional gas chromatography coupled to timeof-flight mass spectrometry: A powerful tool for petroleum refining, Fuel 126 (2014) 188-193.
[18]
M. Adahchour, J. Beens, U.A.T. Brinkman, Recent developments in the application of comprehensive two-dimensional gas chromatography, J. Chromatogr. A 1186 (2008) 67108.
[19]
K.D. Nizio, T.M. McGinitie, J.J. Harynuk, Comprehensive multidimensional separations for the analysis of petroleum, J. Chromatogr. A 1255 (2012) 12-23.
[20]
P.J. Marriott, S.-T. Chin, B. Maikhunthod, H.-G. Schmarr, S. Bieri, Multidimensional gas chromatography, Trends Anal. Chem. 34 (2012) 1-21.
[21]
H. Du, Z. Ring, Y. Briker, P. Arboleda, Prediction of gas chromatographic retention times and indices of sulfur compounds in light cycle oil, Catal. Today 98 (2004) 217-225.
[22]
H.-Y. Xu, J.-W. Zou, Y.-J. Jiang, G.-X. Hu, Q.-S. Yu, Quantitative structure– chromatographic retention relationship for polycyclic aromatic sulfur heterocycles, J. Chromatogr. A 1198-1199 (2008) 202-207. 17
[23]
J. Ghasemi, S. Asadpour, A. Abdolmaleki, Prediction of gas chromatography/electron capture detector retention times of chlorinated pesticides, herbicides, and organohalides by multivariate chemometrics methods, Anal. Chim. Acta. 588 (2007) 200-206.
[24]
F. Liu, Y. Liang, C. Cao, N. Zhou, QSPR study of GC retention indices for saturated esters on seven stationary phases based on novel topological indices, Talanta 72 (2007) 1307-1315.
[25]
L.C. Porto, E.S. Souza, B.d.S. Junkes, R.A. Yunes, V.E.F. Heinzen, Semi-empirical topological index: Development of QSPR/QSRR and optimization for alkylbenzenes, Talanta 76 (2008) 407-412.
[26]
C. Lu, A.F. Jalbout, L. Adamowicz, Y. Wang, C. Yin, QSRR Study for Gas and Liquid Chromatographic Retention Indices of Polyhalogenated Biphenyls Using Two 2D Descriptors, Chromatographia 66 (2007) 717-724.
[27]
X. Zhang, L. Ding, Z. Sun, L. Song, T. Sun, Study on Quantitative Structure–Retention Relationships for Hydrocarbons in FCC Gasoline, Chromatographia 70 (2009) 511-518.
[28]
Y. Ren, H. Liu, X. Yao, M. Liu, An accurate QSRR model for the prediction of the GC×GC–TOFMS retention time of polychlorinated biphenyl (PCB) congeners, Anal. Bioanal. Chem. 388 (2007)
[29]
A.A. D'Archivio, A. Incani, F. Ruggieri, Retention modelling of polychlorinated biphenyls in comprehensive two-dimensional gas chromatgraphy, Anal. Bioanal. Chem. 399 (2011) 903-913.
[30]
B.M. Weber, J.J. Harynuk, Application of thermodynamic-based retention time prediction to ionic liquid stationary phases, J. Sep. Sci. 37 (2014) 1460-1466.
[31]
T.M. McGinitie, J.J. Harynuk, Prediction of retention times in comprehensive twodimensional gas chromatography using thermodynamic models, J. Chromatogr. A 1255 (2012) 184-189.
[32]
H. Ebrahimi-Najafabadi, T.M. McGinitie, J.J. Harynuk, Quantitative structure–retention relationship modeling of gas chromatographic retention times based on thermodynamic data, J. Chromatogr. A 1358 (2014) 225-231.
[33]
K. Heberger, Quantitative structure-(chromatographic) retention relationships, J. Chromatogr. A 1158 (2007) 273-305.
18
[34]
G.A. Depauw, G.F. Froment, Molecular analysis of the sulphur components in a light cycle oil of a catalytic cracking unit by gas chromatography with mass spectrometric and atomic emission detection, J. Chromatogr. A 761 (1997) 231-247.
[35]
S.G. Mössner, M.J. Lopez de Alda, L.C. Sander, M.L. Lee, S.A. Wise, Gas chromatographic retention behavior of polycyclic aromatic sulfur heterocyclic compounds, (dibenzothiophene, naphtho[b]thiophenes, benzo[b]naphthothiophenes and alkyl-substituted derivatives) on stationary phases of different selectivity, J. Chromatogr. A 841 (1999) 207-228.
[36]
Y. Gao, Y. Wang, X. Yao, X. Zhang, M. Liu, Z. Hu, B. Fan, The prediction for gas chromatographic retention index of disulfides on stationary phases of different polarity, Talanta 59 (2003)
[37]
V.G. Garbuzov, T.A. Misharina, A.F. Aerov, R.V. Golovnya, Gas chromatographic retention indices for sulfur(II)-containing organic substances, J. Anal. Chem. USSR 40 (1985) 576-586.
[38]
H. Can, A. Dimoglo, V. Kovalishyn, Application of artificial neural networks for the prediction of sulfur polycyclic aromatic compounds retention indices, J. Mol. Struct. (THEOCHEM) 723 (2005) 183-185.
[39]
D.L. Vassilaros, R.C. Kong, D.W. Later, M.L. Lee, Linear retention index system for polycyclic aromatic compounds: Critical evaluation and additional indices, J. Chromatogr. A 252 (1982) 1-20.
[40]
F. Safa, M.R. Hadjmohammadi, Use of topological indices of organic sulfur compounds in quantitative structure-retention relationship study, QSAR Comb. Sci. 24 (2005) 10261032.
[41]
K.E. Miller, T.J. Bruno, Isothermal Kovats retention indices of sulfur compounds on a poly(5% diphenyl-95% dimethylsiloxane) stationary phase, J. Chromatogr. A 1007 (2003) 117-125.
[42]
T. Schade, J.T. Andersson, Speciation of alkylated dibenzothiophenes through correlation of structure and gas chromatographic retention indexes, J. Chromatogr. A 1117 (2006) 206-213.
[43]
CODESSA Reference Manual, SemiChem, Shawnee Mission, KS, 1997.
19
[44]
A.R. Katritzky, E.S. Ignatchenko, R.A. Barcock, V.S. Lobanov, M. Karelson, Prediction of gas chromatographic retention times and response factors using a general qualitative structure-property relationships treatment, Anal. Chem. 66 (1994) 1799-1807.
[45]
R.W. Kennard, L.A. Stone, Computer aided design of experiments, Technometrics 11 (1969) 137-148.
[46]
E. Kováts, Gas-chromatographische Charakterisierung organischer Verbindungen. Teil 1: Retentionsindices aliphatischer Halogenide, Alkohole, Aldehyde und Ketone, Helv. Chim. Acta 41 (1958) 1915-1932.
[47]
S. Bieri, P.J. Marriott, Dual-Injection System with Multiple Injections for Determining Bidimensional
Retention
Indexes
in
Comprehensive
Two-Dimensional
Gas
Chromatography, Anal. Chem. 80 (2008) 760-768. [48]
J.V. Seeley, S.K. Seeley, Multidimensional Gas Chromatography: Fundamental Advances and New Applications, Anal. Chem. 85 (2013) 557-578.
[49]
M. Jiang, C. Kulsing, Y. Nolvachai, P.J. Marriott, Two-Dimensional Retention Indices Improve
Component
Identification
in
Comprehensive
Two-Dimensional
Gas
Chromatography of Saffron, Anal. Chem. 87 (2015) 5753-5761. [50]
T.F. Woloszyn, P.C. Jurs, Quantitative structure-retention relationship studies of sulfur vesicants, Anal. Chem. 64 (1992) 3059-3063.
[51]
D.T. Stanton, P.C. Jurs, M.G. Hicks, Computer-assisted prediction of normal boiling of furans, tetrahydrofurans, and thiophenes, J. Chem. Inf. Comput. Sci. 31 (1991) 301-310.
[52]
A.R. Katritzky, V.S. Lobanov, M. Karelson, Normal boiling points for organic compounds: correlation and prediction by a quantitative structure-property relationship, J. Chem. Inf. Comput. Sci. 38 (1998) 28-41.
[53]
R. Kaliszan, Structure and Retention in Chromatography, A Chemometric Approach, Hardwood Academic Publishers, Amsterdam, The Netherlands, 1997.
[54]
Z. Dvorak, B. Lidicky, R. Skrekowski, Randic Index and the Diameter of a Graph, European J. Combin. 32 (2011) 434-442.
[55]
J. Liu, M. Liang, B. Cheng, B. Liu, A Proof for a Conjecture on the Randic Index of Graphs with Diameter, Appl. Math. Lett. 24 (2011) 752-756.
[56]
P. Gramatica, Principles of QSAR models validation: internal and external, QSAR Comb. Sci. 26 (2007) 694-701. 20
Figure captions
Figure 1. Examples of chromatograms of an oil sample by GCxGC-SCD before (a) and after (b) desulfurization. The x-axis of the GC×GC chromatogram is the volatility-based retention time (s), the y-axis is the polarity-based retention time (s) and the z-axis is the detector (SCD) response. The signal axes are set at the same threshold for both (a) and (b).
21
Figure 2. The two dimensional chromatogram of the standard mixture of sulfur compounds: thiophene (1), 1-dodecanethiol (2), 1-octadecanethiol (3), benzothiophene (4), 4,6diethylbenzothiophene (5), thianthrene (6) and benzo[b]naphtho[1,2-d]thiophene (7) used for testing reproducibility of the GCxGC experiments. On the right are the ranges and standard deviations (reported in seconds) for both primary and secondary retention times for the compounds in the mixture. The analysis was based on 15 runs.
22
Figure 3. Projection of the GCxGC peaks of all the compounds used in this study. The system works with different separation mechanisms in two dimensions and orthogonal separation is achieved. The red triangles indicate compounds included in the test set using the Kennard-Stone subset selection based on descriptor space.
23
Figure 4. Plot of predicted vs. experimental for (a) 1st dimension retention times (1tR) and (b) 2nd dimension retention times (2tR) using the QSRR models included in Table 4.
24
Figure 5. Projection of test set compounds predicted by the QSRR models (red triangles) and experimental results (grey circles) on a two-dimensional plot. Q2DISTmodel = 0.9286 and Q2DISTtest = 0.9386.
25
Figure 6. Williams plots for QSRrt1R (a) and QSRrt2R (b) models - a plot of standardized residuals vs. leverages (for details see text).
26
Table 1: Review of QSRR applied to PASHs in the literature Author(s) (year) Depauw et al. (1997) [34]
Mössner et al. (1999) [35]
Gao et al. (2003) [36]
Data source / stationary phase Original data Instrument: GC-MS Column: HP-PONA (50 m x 0.2 mm x 0.5 µm)
Descriptors
QSRR model
Notes
A set of simple indicator variables analogues of Free-Wilson descriptors. Descriptors calculated for di, tri- and tetramethyl derivatives of benzothiophenes.
Compound class: BT (39) Modeled property: tR Model: linear (MLR) External validation: yes R2 (model): 0.9968
-
Original data Instrument: GC-MS Column(s): DB5-MS (60 m x 0.25 mm x 0.25 µm) DB17 (60 m x 0.25 mm x 0.25 µm) SB-Smectic (25 m x 0.20 mm x 0.15 µm)
Length (L), breath (B), thickness (T), and length-to-breadth ratio (L/B) calculated for all PASH compounds using commercial software (PC-Model and MMX, Serena Software, Bloomington, IN, US).
Compound class: PASHs (80) Modeled property: Ic and Is Model: linear (MLR) External validation: no R2 (model): 0.46 - 0.96
-
Data from [37] Instrument: GC-FPD Column(s): Apiezon M, OV-17, Triton X305, PEG-1000
A set of simple quantum-chemical descriptors were generated after optimization of molecules using semiempirical AM1 method (Hyperchem 4.0, Hypercube, 1994).
Compound class: disulfides (50) Modeled property: RI Model: both linear (MLR) and non-linear (RBFNNs) External validation: yes for non-linear modeling R2 (model): 0.95 - 0.99 (MLR) and 0.94 - 0.99 (RBFNNs) R2 (test): 0.98 - 0.99
-
-
-
-
-
A correlation was developed for the prediction of the retention times of polymethyl benzothiophenes (BTs). This correlation was applied to the identification of trimethylBTs and tetramethylBTs in light cycle oil. Valuable compilation of retention data for PASH compounds. The retention indices for 80 PASHs were determined on three stationary phases. A set of correlations for modeling of retention parameters were investigated, however the simple QSRR models are presented only for the liquid crystalline stationary phase (SB-Smectic) excluding QSRR models for other two columns. The QSRR models performed for a narrow group of C1-C6 dialkyl-disulfides separated on unique stationary phases. The influence of descriptors on the QSRR model varies and it depends on the GC column used in modeling. Prediction of retention indices for disulfides for specific columns using non-linear QSRR modeling is superior to the
27
Du et al. (2004) [21]
Original data Instrument: GC-EAD Column(s): RTX-50 (60 m x 0.25 mm x 0.25 µm)
Large number of descriptors generated using CODESSA software (SemiChem, Inc, 2002) including constitutional, geometrical, topological and quantumchemical ones.
(RBFNNs) Compound class: sulfur-containing hydrocarbons (90) Modeled property: IS and tR Model: linear (MLR) External validation: no R2 (model): 0.99
-
-
-
-
Can et al. (2005) [38]
Data from [39] Instrument: GC-MS Column(s): SE-52 (20 m x 0.30 mm x 0.25 µm)
Molecular descriptors generated using the Dragon Web version package (http://www.vcclab.org/lab/edragon/) including constitutional, geometrical, topological and quantum-chemical and molecular-walks.
Compound class: PASHs (80) Modeled property: Ic Model: non-linear (Feed –forward neural network) External validation: yes q2 (model): 0.95
-
-
Safa et (2005) [40]
al.
Data from [41] Instrument: GC-FID/SCD Column(s): poly(5% diphenyl–95% dimethylsiloxane) (30 m x 0.25 mm x 0.10 µm)
A set of 45 topological descriptors was generated using the Dragon Web 3.0 version package (http://disat.unimib.it/chm/Dragon.htm).
Compound class: sulfide and thiols (20) Modeled property: RI Model: linear (MLR) External validation: yes R2 (model): 0.98-0.99
-
results reported by MLR. QSRR models generated for whole set (90 molecules, 7parameters) and the thiophenic subset (34 molecules, 5parameters). The former QSRR model for narrower group of compounds shows improved performance for prediction of retention characteristics. Authors noticed that the relative prediction errors are similar to the experimental ones, which may indicate on limited opportunity for future model improvements. The derived QSRR models were not properly validated (no external test data set used). Six nonlinear QSRR models generated for predicting linear temperature programmed gas chromatographic retention characteristics. The quality of QSRR models was evaluated by a crossvalidation parameter (q2 model). The most robust models included up to 27 descriptors. The optimal model was obtained using two descriptors (a sum of Kier-Hall electrotopological states and 3D-Balaban index). Dataset consists of only 20 compounds. QSRR models derived for sulfur compounds at four different temperatures. After adding the temperature as
28
q2 (model): 0.96-0.97
-
Schade et (2006) [42]
al.
Original data Instrument: GC-MS Column(s): DB-5 (30 m x 0.25 mm x 0.25 µm) DB-17ms (30 m x 0.25 mm x 0.25 µm)
A set of simple indicator variables including number of methyl groups in specific positions of the DBT structural motif. Descriptors calculated for methyl derivatives of dibenzothiophene.
Compound class: alkylated DBTs (30) Modeled property: Ic and Is Model: linear (MLR) External validation: yes R2 (model): 0.997
-
-
-
Xu et al. (2008) [22]
Data from [39] Instrument: GC-MS Column(s): SE-52 (20 m x 0.30 mm x 0.25 µm)
A set of 25 descriptors mostly derived from the electrostatic potential calculated on molecular surface. Structures were optimized with MOPAC 6.0 using semiempirical AM1 method and then Gaussian98 software (HF/6-31G* level of theory).
Compound class: PASHs (114) Modeled property: Ic Model: linear (MLR) External validation: yes R2 (model): 0.99 q2 (model): 0.98
-
-
a parameter, a combined QSRR model allows for prediction of retention index values at different temperatures. The quality of QSRR models was evaluated by both crossvalidation and external validation techniques. A few new sulfur compounds were synthesized. The QSRR model involves simple (easy to calculate) parameters for prediction of retention indices of alkylated dibenzothiophenes QSRR validated both internally (cross-validation) and externally (test set of four DBT derivatives). Low structural variability of the molecules limits its applicability. Correlation between the structural descriptors and the retention index was established by stepwise linear regression analysis. Final QSRR model consists of four descriptors QSRR validated both internally (cross-validation) and externally (test set).
MS –Mass Spectrometer; FPD – Flame Photometric Detector; AED – Atomic Emission Detector; MLR – Multilinear regression analysis; BT-benzothiophene; DBT - dibenzothiophene; RI – the Kovats index; Ic – the Lee retention index; Is – the Andersson retention index; RBFNNs – Radial Basis Function Neural Networks.
29
Table 2. Experimental setup for GCxGC analysis Primary column
VF5-HT, 30 m x 0.32mm x 0.1m (Varian)
Secondary column
VF17-MS, 1.5 m x 0.1mm x 0.2 m (Varian)
Main oven program
50 °C (0 min) to 340 C (0 min); at 5 C·min-1
Secondary oven program
+40 C offset from main oven
Inlet temperature
350 C
Injection volume
0.2 L
Split ratio
40:1
Carrier gas
He, constant flow, 1.5 mL·min-1
Modulator temperature
+55 C offset from main oven
Detector
FID, 350 °C / SCD
Acquisition rate
100 Hz
Modulation period
8s
FID: flame ionization detector; SCD: sulfur chemiluminescence detector.
30
Table 3. Composition and experimental and predicted retention times (1tR/2tR) values for sulfurcontaining compounds derived from GCxGC experiment 1t
R
#
compound
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Thiophene 3-Methylthiophene Tetrahydrothiophene 2-Ethylthiophene 2,5-Dimethylthiophene 2-Propylthiophene (a) 3-Butylthiophene n-Butylsulfide 2-Pentylthiophene Benzothiophene Isoamyl sulfide 2-Butyl-5-ethylthiophene 3-Hexylthiophene 7-Methylbenzothiophene (a) 2-Methylbenzothiophene 5-Methylbenzothiophene 3-Methylbenzothiophene 6-Methylbenzothiophene 4-Methylbenzothiophene 2,7-Dimethylbenzothiophene (a)
(s) 296 336 360 400 408 504 672 696 808 864 888 936 992 1016 1032 1048 1048 1056 1064 1184
2-Phenylthiophene 3,7-Dimethylbenzothiophene 4,7-Dimethylbenzothiophene 3-Phenylthiophene 2,4-Dimethylbenzothiophene 2,6-Dimethylbenzothiophene 3,5-Dimethylbenzothiophene 4,6-Dimethylbenzothiophene (a)
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
exp
1t
R
calc (s)
2t
R
exp
2t
R
303.7 344.8 302.5 396.9 462.5 546.1 669.2 724.8 838.7 918.9 900.9 997.4 975.2 1031.1 1058.8 1071.1 1059.5 1056.4 1111.2 1142.2
(s) 2.07 2.43 2.36 2.42 2.39 2.54 2.65 2.47 2.68 3.34 2.43 2.67 2.71 3.30 3.27 3.29 3.36 3.25 3.29 3.16
(s) 2.08 2.36 2.42 2.52 2.37 2.56 2.65 2.36 2.64 3.28 2.47 2.66 2.69 3.30 3.21 3.28 3.33 3.28 3.34 3.18
1200 1200 1200 1208 1208 1208 1216 1216
1241.7 1174.7 1223.0 1186.4 1234.2 1192.0 1210.2 1240.9
3.38 3.26 3.26 3.46 3.25 3.17 3.29 3.29
3.35 3.30 3.31 3.39 3.23 3.15 3.28 3.26
2,3-Dimethylbenzothiophene 2,5-Dimethylbenzothiophene 3,6-Dimethylbenzothiophene 6,7-Dimethylbenzothiophene 3,4-Dimethylbenzothiophene 2,3,7-Trimethylbenzothiophene
1224 1224 1224 1232 1288 1360
1265.9 1215.5 1217.0 1194.1 1262.9 1386.3
3.33 3.11 3.27 3.36 3.44 3.25
3.33 3.16 3.28 3.36 3.40 3.31
1-Dodecanethiol (a) 2,3,5-Trimethylbenzothiophene Dodecyl methyl sulfide 2,3,4,7-Tetramethylbenzothiophene
1392 1392 1520 1576
1307.2 1388.4 1495.8 1580.4
2.50 3.23 2.51 3.35
2.48 3.29 2.48 3.37
calc
31
1,2,3,4-Tetrahydrodibenzothiophene
1656
1609.0
3.68
3.49
Dibenzothiophene (a) Naphtho[1,2-b]thiophene Naphtho[2,1-b]thiophene Naphtho[2,3-b]thiophene (a) 4-Methyldibenzothiophene 3-Methyldibenzothiophene 3-Dodecylthiophene 1-Methyldibenzothiophene 4-Ethyldibenzothiophene 1-Hexadecanethiol 4,6-Dimethyldibenzothiophene
1672 1688 1720 1760 1792 1832 1848 1848 1848 1904 1912
1686.4 1649.1 1653.9 1705.7 1779.9 1797.7 1851.0 1851.9 1853.6 1871.7 1888.0
3.96 3.91 3.98 4.00 3.87 3.78 2.66 3.97 3.70 2.51 3.79
3.91 4.00 4.04 3.99 3.86 3.84 2.71 3.95 3.77 2.52 3.70
2,4-Dimethyldibenzothiophene
1936
1905.6
3.68
3.73
2,6-Dimethyldibenzothiophene (a)
1944
1923.5
3.69
3.75
Cyclohexylmethyl-2-benzothiophene
1952
1845.1
3.45
3.43
54
1,4-Dimethyldibenzothiophene
1968
1923.0
3.84
3.86
55
2,7-Dimethyldibenzothiophene
1968
1985.4
3.69
3.74
2,8-Dimethyldibenzothiophene
1968
2022.2
3.65
3.74
1,3-Dimethyldibenzothiophene
1984
1945.4
3.86
3.81
2,3-Dimethyldibenzothiophene
2008
1990.4
3.81
3.84
4-Ethyl-6-methyldibenzothiophene
2008
1958.1
3.69
3.66
Acenaphtho[1,2-b]thiophene 2,4,6-Trimethyldibenzothiophene (a)
2032 2048
2040.0 2015.8
4.25 3.65
4.33 3.60
2,4,7-Trimethyldibenzothiophene
2064
2047.4
3.66
3.63
2,4,8-Trimethyldibenzothiophene
2064
2111.8
3.62
3.65
Phenanthro[4,5-bcd]thiophene 4,6-Diethyldibenzothiophene 1,4,7-Trimethyldibenzothiophene
2072 2096 2096
2101.9 2086.8 2077.2
4.43 3.68 3.80
4.48 3.59 3.77
1,3,7-Trimethyldibenzothiophene
2112
2124.4
3.79
3.73
1-Octadecanethiol 2,4-Dimethyl-6-ethyldibenzothiophene
2136 2136
2125.1 2105.2
2.49 3.55
2.53 3.57
2,4,6,8-Tetramethyldibenzothiophene
2160
2232.1
3.53
3.51
1,4,6,8-Tetramethyldibenzothiophene (a)
2176
2152.5
3.62
3.66
2-Decylbenzothiophene Benzo[d]naphtho[1,2-b]thiophene
2224 2352
2217.9 2330.9
2.96 4.54
2.95 4.47
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
32
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 (a)
3-(1-Naphthyl)benzothiophene
2368
2444.2
4.46
4.62
Benzo[b]naphtho[1,2-d]thiophene
2384
2395.3
4.61
4.57
Phenanthro[9,10-b]thiophene Phenanthro[4,3-b]thiophene (a)
2400 2408
2358.2 2389.0
4.67 4.67
4.61 4.68
Benzo[b]naphtho[2,3-d]thiophene
2416
2409.7
4.51
4.47
Phenanthro[1,2-b]thiophene Phenanthro[3,4-b]thiophene Phenanthro[2,1-b]thiophene 3-(2-Naphthyl)benzothiophene
2416 2416 2448 2520
2369.3 2396.1 2397.0 2515.2
4.61 4.71 4.71 4.46
4.61 4.67 4.63 4.59
2-(naphthalen-2-yl)benzothiophene
2624
2557.3
4.46
4.46
Acenaphtho[1,2-b]-benzo[d] thiophene
2656
2738.0
4.80
4.81
4-Decyldibenzothiophene Dinaphtho[2,1-b:2',1'-d]thiophene
2728 2880
2793.0 3004.2
3.16 5.12
3.25 5.08
Benzo[b]phenanthro[9,10-d]thiophene
2976
2994.4
5.13
5.04
Benzo[b]phenanthro[2,1-d]thiophene (a)
3000
3021.0
5.07
5.01
2-Eicosylbenzothiophene 3128 3111.5 2.80 2.76 Compounds selected to test set based on the Kennard-Stone training/test sampling
procedure.
33
Table 4. Best multi-linear regression models of retention times (1tR and 2tR) for sulfur-containing compounds measured for the GCxGC experiments Q2DISTmodel = 0.9286 & Q2DISTtest = 0.9386 QSRrt1R model R2: 0.9966, s2: 1692.4, F: 2905, R2adj: 0.9962, q2model: 0.9953, q2test: 0.9966, N=77 p-value i bi ±Δbi t-test VIF Descriptor name Descriptor type (x10e-6) 0 1677.4 4.70 356.68 0.00 Intercept Intercept 1 794.16 9.95 79.83 0.00 4.58 Gravitation index (all bonds) Constitutional 2 48.45 5.45 8.88 0.00 1.36 Molecular volume / XYZ Box Geometrical 0.00 2.88 3 129.11 7.59 -17.02 Wiener index Topological 0.00 2.43 QuantumChem 4 55.62 7.37 7.54 Total dipole of the molecule ical 0.00 1.74 ESP-Max net atomic charge for a QuantumChem 5 -40.69 6.00 -6.78 H atom ical 0.00 3.42 QuantumChem 6 56.68 8.84 6.41 Max n-n repulsion for a C-C bond ical Principal moment of inertia B / # Thermodynami 7 29.1 5.47 5.31 0.12 1.50 of atoms c QSRrt2R model R2: 0.9933, s2: 0.0038, F:1453, R2adj: 0.9922, q2model: 0.9914, q2test: 0.9973, N=77 0 3.53 0.01 499.95 0.00 Intercept Intercept 13.9 1 0.21 0.03 7.77 0.00 1 Relative number of benzene rings Constitutional 2 -0.34 0.02 -15.9 0.00 9.00 Information content (order 0) Topological 17.9 3 0.51 0.03 16.68 0.00 7 Randic index (order 3) Topological Thermodynami 4 -0.15 0.01 -12.02 0.00 3.43 Total entropy (300K) / # of atoms c QuantumChem 5 0.08 0.01 8.91 0.00 1.55 Max n-n repulsion for a C-C bond ical Min resonance energy for a C-H QuantumChem 6 -0.04 0.01 -3.89 22.47 2.43 bond ical Total hybridization comp. of the QuantumChem 7 -0.031 0.01 -3.85 25.78 1.35 molecular dipole ical
34
Table 5. Statistical characteristics of the results of the Y-scrambling approach applied to all QSRR models derived in this work Pseudomodel
R2
q2model min
max
QSRrt1R
-0.7437
QSRrt2R
-0.4603
mean
min
max
mean
0.0676 -0.1696
0.0071
0.2242
0.0812
0.1048 -0.1474
0.0079
0.2639
0.0922
35