Quantum chemical characterization of Abraham solvation parameters for gas–liquid chromatographic stationary phases

Quantum chemical characterization of Abraham solvation parameters for gas–liquid chromatographic stationary phases

Journal of Chromatography A, 1216 (2009) 8535–8544 Contents lists available at ScienceDirect Journal of Chromatography A journal homepage: www.elsev...

644KB Sizes 0 Downloads 53 Views

Journal of Chromatography A, 1216 (2009) 8535–8544

Contents lists available at ScienceDirect

Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma

Quantum chemical characterization of Abraham solvation parameters for gas–liquid chromatographic stationary phases Eufrozina A. Hoffmann a , Robert Rajkó b , Zoltan A. Fekete c,∗ , Tamás Körtvélyesi a,c a b c

Department of Physical Chemistry and Materials Science, University of Szeged, Rerrich B. tér 1., H-6720 Szeged, Hungary Department of Mechanical and Process Engineering, University of Szeged, P. O. Box 433, H-6701 Szeged, Hungary HPC Group, University of Szeged, Szikra u. 2., H-6725 Szeged, Hungary

a r t i c l e

i n f o

Article history: Received 12 August 2009 Received in revised form 24 September 2009 Accepted 28 September 2009 Available online 2 October 2009 Keywords: QSPR Quantitative structure–property relationship Linear solvation energy relationships (LSERs) Stationary phase Abraham solvation model Quantum chemical Semiempirical Chemometrics

a b s t r a c t Quantum chemical based investigation is presented on the Abraham solvation parameters for 23 molecular (non-polymeric) GLC stationary phases. PM6 semiempirical calculations combined with conductor-like screening model (COSMO) have been utilized. Comprehensive search for an optimal model was carried out, based on best subset selection from 86 variables considered. A unified quantitative structure–property relationship model has been developed for all five Abraham parameters reported. The selected set of five structure-driven descriptors was subjected to statistical analyses, and was shown to be useful for stationary phase classification. © 2009 Elsevier B.V. All rights reserved.

1. Introduction In the fast developing science of chromatography there is a great need for the application of chemometric tools [1–3]. These methods assist treating the large body of chromatographic data, organizing and rationalizing measurements, predicting chromatographic behavior of compounds [4–13]. Most of the chemometrics analyses published so far on retention data considered properties of the solute molecules. Conversely, our current aim is to characterize GLC stationary phases via finding correlations between empirical and quantum chemical (QC) descriptors. Previously, a quantitative structure–property relationship (QSPR) solvent model was developed for the McReynolds constants (prototypical solutes) on 36 gas–liquid chromatographic stationary phases [14]. In this work a similar structure-driven, QC based investigation on Abraham parameters is carried out. The Abraham solvation parameter model is a frequently applied, theoretically well founded linear solvation energy relationships

∗ Corresponding author. Tel.: +36 62 546821; fax:/voicemail: +1 781 6235997. E-mail address: [email protected] (Z.A. Fekete). 0021-9673/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.chroma.2009.09.074

(LSERs) model. (For a summary of the most frequently used LSER models in chromatography, see the review by Vitha and Carr [15].) It correlates a solubility property, such as GLC retention, with several additive terms which represent specific solubility interactions. The model is applicable to many partitioning processes besides GC and also offers a straightforward approach to the calculation of phase selectivity via the system constants. Over the decades, several versions have evolved [16], of which an early form is illustrated in Eq. (1): H 16 lg SP = c1 + r1 R2 + s1 2∗ + a1 ˛H 2 + b1 ˇ2 + l1 lg L

(1)

where SP is a property of probe solutes in a fixed system (such as retention time, specific retention volume, thermodynamic gas solvent partition coefficient); R2 is an excess molar refraction; 2∗ is the dipolarity/polarizability (derived from solvatochromic measurements); ˛H and ˇ2H are the hydrogen bond acidity and 2 basicity (obtained from 1:1 equilibrium constants); L16 is the gashexadecane partition coefficient at 25 ◦ C. The subscripts 1 and 2 denote solvent and solute properties, respectively. The coefficients c1 , r1 , s1 , a1 , b1 , l1 are determined from least-squares fitting of the model.

8536

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

Later on some of the solute descriptors were improved, yielding the revised Eq. (2) [17]: H 16 lg SP = c1 + r1 R2 + s1 2H + a1 ˙˛H 2 + b1 ˙ˇ2 + l1 lg L

(2)

The old dipolarity/polarizability parameter was replaced by a new one (2H ) which has been derived from GLC measurements [18]. The hydrogen bond acidity and basicity were replaced by the , ˙ˇ2H ) that refer to the hydrogen-bond overall descriptors (˙˛H 2 propensity of a solute surrounded by solvent molecules [17,18]. In the past few years a further revision the fundamental LSER equation became more widely used, as shown in Eq. (3) [15]: lg SP = c + eE + sS + aA + bB + vV

(3)

The last term in Eq. (3), vV , is related to molecular volume, and no longer depends explicitly on measured data (unlike L16 ). The other terms have only notationally changed from those in Eq. (2). In this work we use the formalism embodied in Eq. (2) exclusively, to keep consistent with both the data in Ref. [19] and the comparative analysis in Ref. [20]. The five fitted solvent-specific constants in Eq. (2) have a welldefined physical meaning. r1 reflects the propensity of solvent (or stationary phase) to interact via ␲- and n-electron pairs. s1 is the measure of the solvent or stationary phase dipolarity/polarizability. The term a1 ˙˛H presents the interactions between hydrogen-bond 2 solute acids and the hydrogen-bond solvent base surrounded by solvent molecules. Because acidic solvents interacts with basic phases, a1 indicates the solvent (or stationary phase) hydrogen bond basicity. According to similar considerations b1 reflects the solvent (or stationary phase) hydrogen bond acidity. l1 is the measure of solvent lipophilicity is to that of hexadecane (l1 (hexadecane) = 1 by definition). It also indicates the ability of a stationary phase to separate members of a homologous series. Contemporary chromatographic methods for the experimental determination of solute descriptors have been reviewed recently [21]. While there is little structure-driven modeling on solvent parameters reported, there have been several molecular based models to predict solute Abraham descriptors; see, e.g. Ref. [15] for a general review on LSER modeling, and Ref. [20] for a specific overview of works on Abraham parameters. Platts et al. [22,23] developed a group contribution approach to calculate Abraham descriptors from 81 atom and functional group fragments. This method was shown to accurately reproduce experimental partition data for a number of solvents. However, for treating various physicochemical and biological systems generally insufficient training data is available for such fragmental methods. In subsequent work by Platts [24,25] and Lamarche et al. [26–28], improved prediction was achieved by correlating QC results (ab initio and DFT) with some Abraham’s parameters, namely hydrogen bond acidity, basicity, and the polarity/polarizability parameter. Further improvement with QC based QSPR was achieved by Lamarche et al. [29] for correlating the polarity/polarizability parameter. Models for Hbonding parameters with a minimal set of QC descriptors have been reported very recently by Devereux et al. [31]. The parameter log (L16 ) was predicted with a group additivity method by Havelec and Sevcik [32,33]. The polarity/polarizability parameter was calculated with a feed-forward neural network by Svozil and Sevcik [34], using semiempirical QC (AM1) descriptors, as well as topological indices as inputs. Hydrogen-bonding parameters were also correlated with semiempirical molecular orbital data by Dearden and Ghafourian [35,36], who obtained good fit over a training set of 55 compounds. A set of five purely computational descriptors (COSMOments, independently derived from Klamt’s COSMO-RS method [37–39]) has been compared with the experimental Abraham set in a seminal paper by Zissimos, Abraham, Klamt, Eckert, and Wood [20] (ZAKEW). Their comprehensive analysis, considering data for 470

solutes, showed that the respective chemical information contents have a large overlap. This study involved a variety of solute–solvent systems, but no similar account has been published specifically relating to GC stationary phases. In 2000, Abraham et al. [19] presented revised LSER coefficients of 77 stationary phases and classified them with cluster analysis. This work, like its predecessors, was based on multiple linear regression (MLR) over known solute data. To the best of our knowledge, there have not been any publications on structure-driven calculations for the Abraham parameters of these solvents. In the current contribution we correlate the LSER coefficients by Abraham et al. [19], for a subset of GLC stationary phases (comprising nonpolymeric solvents). It will be shown that good description can be achieved with descriptors derived solely from QC calculations on the structures of the solvent molecules. Cluster analysis was also carried out with the variables selected, demonstrating that the calculated parameters can be utilized for classification of stationary phases. A goal of this investigation is to show that QC based descriptors can provide a good basis for QSPR models on Abraham parameters of GLC solvents. Our semiempirical approach kept the calculations as simple as possible. No higher level (ab initio or DFT) QC theory was used, avoiding unnecessary increase in computational cost. For the same reason, the complicated issue of conformational selection was side-stepped by picking a typical (extended) geometry for each molecule. Getting statistically robust correlations this way verifies that the simplified treatment is sufficient for developing valid models on these data. 2. Computation methodology 2.1. Solvation parameters Published Abraham solvent coefficients for 25 molecular (nonpolymeric and non-ionic) GLC stationary phases were collected from the paper of Abraham et al. [19]. The b1 parameter has been excluded from consideration in this study, since it is missing (due to reported statistical insignificance [19]) for most of these solvents. Table 1 contains the names (both conventional and IUPAC nomenclature), ID codes and Abraham parameters of the phases investigated. Structural drawings are provided as Supplementary Scheme S1. The estimated errors associated with these coefficients have not been reported in the literature. Ref. [19] only gave generic ranges of typical standard error from their MLR fitting: 0.02–0.03 for r1 , s1 , and a1 , 0.002–0.005 for l1 . We note that these parameters have been derived from a variety of model fittings whose overall standard errors are not known. This lack of sufficient information on the uncertainties in the input data precludes any rigorous error analysis of our results. It is emphasized, however, that our primary aim is not directly to provide a predictive model for the retention time—rather to establish that QC based QSPR of these parameters is feasible. 2.2. Structure generation and descriptor calculation All descriptors considered in this study were generated from our QC calculations. Starting geometries of stationary phase molecules were generated in the molecular graphics program ACD/ChemSketch [40], and pre-optimized with molecular mechanics (the force-field based [41,42] three-dimensional optimizer of ChemSketch). Where available, experimental or pre-calculated structures available via the PubChem [43] database were utilized as well. The initial geometries were adjusted manually to represent the most extended conformation, and then optimized with the OpenBabel suite [44,45] using the MMFF94 force-field

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

8537

Table 1 Names, ID codes and Abraham parameters of GLC stationary phases considered. Code

Abraham solvent parametersa

Name Conventional

IUPAC

c1

r1

s1

a1

l1

6M72E8A C22OL CASTORW CI-A4 D2E6A D2E6P D2E6S D4P4CL D8S DEEP DGLYC DIDP FL-8N8

Isooctyldecyl adipate Docosanol Castorwax Citroflex A4 Di-2-ethylhexyl adipate Dioctyl phthalate Di-2-ethylhexyl sebacate Dibutyl tetrachlorophthalate Dioctyl sebacate Bis(2-ethoxyethyl) phthalate Diglycerol Diisodecyl phthalate Flexol 8N8

−0.38 −0.40 −0.48 −0.47 −0.34 −0.31 −0.35 −0.59 −0.34 −0.66 −1.98 −0.55 −0.50

0.04 0.11 0.06 0.07 0.09 0.04 0.13 0.04 0.11 0.14 0.47 0.01 0.07

0.37 0.53 0.74 0.66 0.49 0.63 0.58 1.17 0.49 1.01 1.25 1.04 0.64

0.88 0.82 1.27 1.08 0.80 1.06 1.49 1.26 0.79 1.19 2.09 1.25 0.79

0.605 0.580 0.573 0.565 0.589 0.643 0.592 0.525 0.589 0.550 0.336 0.510 0.588

HA-M18 HA-M18OL HYPROSE

Hallcomid M18 Hallcomid M18OL Hyprose SP80

−0.37 −0.42 −1.37

0.08 0.00 0.48

0.55 0.87 1.65

0.84 1.08 2.68

0.586 0.553 0.236

PPE5 PPE6 QUADROL

Polyphenyl ether, five rings Polyphenyl ether, six rings Quadrol

−0.71 −0.72 −0.72

0.17 0.01 0.06

0.91 1.39 1.46

0.62 2.34 1.58

0.554 0.471 0.429

SAIB

Sucrose acetate isobutanoate

−0.57

0.17

0.67

0.66

0.593

SORBITOL SUC8A

Sorbitol Sucrose octaacetate

0.00 −0.72

0.00 0.17

0.00 1.34

0.00 2.31

0.000 0.499

TCP THFP TMPTP

Tricresylphosphate Kroniflex THFP Tripelargonate

6-O-(2-ethyloctyl) 1-O-(6-methylheptyl) hexanedioate docosan-1-ol 2,3-Bis(12-hydroxyoctadecanoyloxy)propyl 12-hydroxyoctadecanoate Tributyl 2-acetyloxypropane-1,2,3-tricarboxylate Bis(2-ethylhexyl) hexanedioate Bis(2-ethylhexyl) benzene-1,2-dicarboxylate Bis(2-ethylhexyl) decanedioate Dibutyl 3,4,5,6-tetrachlorobenzene-1,2-dicarboxylate Dioctyl decanedioate Bis(2-ethoxyethyl) benzene-1,2-dicarboxylate 3-(2,3-Dihydroxypropoxy)propane-1,2-diol Bis(8-methylnonyl) benzene-1,2-dicarboxylate 2-[2-Ethylhexanoyl-[2-(2-ethylhexanoyloxy)ethyl]amino]ethyl 2-ethylhexanoate N,N-dimethyloctadecanamide (Z)-N,N-dimethyloctadec-9-enamide 1-[4-(2-Hydroxypropoxy)-2,5-bis(2-hydroxypropoxymethyl)-2-[3,4,5tris(2-hydroxypropoxy)-6-(2-hydroxypropoxymethyl)oxan-2yl]oxyoxolan-3-yl]oxypropan-2-ol 1-Phenoxy-4-[4-(4-phenoxyphenoxy)phenoxy]benzene 1-(4-Phenoxyphenoxy)-4-[4-(4-phenoxyphenoxy)phenoxy]benzene 1-[2-[Bis(2-hydroxypropyl)amino]ethyl-(2-hydroxypropyl)amino]propan2-ol [2-(Acetyloxymethyl)-2-[6-(acetyloxymethyl)-3,4,5-tris(2methylpropanoyloxy)oxan-2-yl]oxy-4-(2-methylpropanoyloxy)-5-(2methylpropanoyloxymethyl)oxolan-3-yl] 2-methylpropanoate Hexane-1,2,3,4,5,6-hexol [4-Acetyloxy-2,5-bis(acetyloxymethyl)-2-[3,4,5-triacetyloxy-6(acetyloxymethyl)oxan-2-yl]oxyoxolan-3-yl] acetate Tris(4-methylphenyl) phosphate Tris(oxolan-2-ylmethyl) phosphate 2,2-Bis(nonanoyloxymethyl)butyl nonanoate

−0.70 −0.85 −0.41

0.15 0.09 0.14

0.91 1.46 0.65

0.62 2.35 1.50

0.561 0.419 0.584

a

From Ref. [19].

[46,47]. In this way a consistent set of geometries is assembled, even though the representative conformers selected might not be the major ones under realistic conditions in the solvent phase. Picking the all-trans stretched conformations for alkyl chains is a standard modeling practice (see, e.g. a solubility study with COSMO-RS [48]). Among the considered set of molecules the median numbers of rotatable bonds and rotamers are 19 and 1010 , respectively—therefore any meaningful systematic conformational analysis would be a formidable task, handling the excess complexity of which is expected to yield diminishing returns for these data. From the structure files, systematic names were generated with the mol2nam application (part of the Lexichem suite [49]), and graphical depictions were made with the mol2ps utility (from the Ogham suite [50]). Full geometry optimization was carried out for isolated (gasphase) molecules by the PM6 semiempirical QC method [51], as implemented in MOPAC2009 [52]. PM6, the novel NDDO Hamiltonian, represents a major improvement over conventional semiempirical methods [51–53] that has remedied several shortcomings of traditional Hamiltonians like AM1 and PM3. From the converged MOPAC output the following quantities were extracted: coordinates and Mulliken charges of each atoms; distances and bond orders of each atom pairs; dipole moment; isotropic average polarizability; energies of the highest occupied and lowest unoccupied molecular orbitals (HOMO and LUMO); largest positive charge on any H atom, and largest negative charge on any atom. The set of descriptors considered is similar to our recent work on McReynolds constants [14]; the weighted walk-count based ones used previously were omitted from the current investigation, since our initial screening of variables indicated that they are not as useful with

this dataset. A condensed tabulation of the descriptors considered is given in Table 2, with the full list provided as Supporting Material (Table S1). Solvation related descriptors were generated from separate single-point runs, at the previously optimized geometries, using a dielectric continuum model (conductor-like screening model COSMO) [54]. Following the spirit of the COSMO-RS method recently proposed by Klamt [37,39], the totally screened state embedded in a virtual conductor (ε = ∞) was considered, rather than a traditional COSMO calculation in a dielectric with some specific permittivity; for this purpose the built-in COSMO routine of MOPAC2009 was invoked by specifying the keyword EPS = 1000, i.e. ε = 1000 that corresponds to practically total screening. The keyword COSWRT was also given, in order to save in a file the calculated solvent-accessible surface (SAS) segments along with their polarization (i.e. screening) charge densities. So-called sigma-profiles, as suggested by Klamt et al. [55,56], were generated from this output by simple binning. These profiles were then further processed to yield various descriptors related to charge distribution over the solvent-accessible surface. Three types of descriptors were generated this way. The first group consists of the sigma-moments, already defined by Klamt [39,57] and used for solvation related QSPR studies, including GLC modeling [20,58]. They are enumerated in rows 13–22 of Table 2. The second group (rows 23–26 of Table 2) contains simple fractions of the solvent-accessible surface, according to the sign of charge at its segments: positive, negative, neutral and charged portion were separately summed. The third group is analogous to the “general interaction properties function” (GIPF) family of descriptors proposed by Murray and Politzer [59–61], that reflects the detailed pattern and physically meaningful features of the electrostatic potential on the surface; rather

8538

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

Table 2 Abbreviated listing of 86 descriptors considered. No.

Code

Explanation

#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 32–59 60–86

Vol CSA Inv Vol Inv CSA DIELECTRIC Alpha MaxNeg MaxPosH DIPOLE HOMO LUMO HOMO-LUMO MHBdon0 MHBacc0 MHBdon1 MHBacc1 MSig2 MHBdon2 MHBacc2 MSig3 MHBdon3 MHBacc3 SNegAREA SNeuAREA SPosAREA SChgAREA Sigma2Neg Sigma2Pos SigmaTot Bal Mwt – –

Molecular Volume (by COSMO) Solvent-accessible surface (SAS) area (by COSMO) Reciprocal volume Reciprocal surface COSMO dielectric energy Static polarizability Most negative partial charge Most positive partial charge on a hydrogen atom Dipole moment HOMO energy LUMO energy HOMO–LUMO energy difference HB-donor moment of order 0a HB-acceptor moment of order 0 HB-donor moment of order 1 HB-acceptor moment of order 1 Sigma-moment of order 2 HB-donor moment of order 2 HB-acceptor moment of order 2 Sigma-moment of order 3 HB-donor moment of order 3 HB-acceptor moment of order 3 Negative portion of SAS Neutral portion of SAS Positive portion of SAS Charged portion of SAS  −S 2 (GIPF-like)b  +S 2 (GIPF-like)  tS (GIPF-like) BalS (GIPF-like) Molar weight Descriptors 2 and 5–31, volume-specific Descriptors 5–31, surface-specific

a COSMOment-type descriptors were calculated from PM6/COSMO implemented in MOPAC2009 (see main text for details). b “GIPF-like” descriptors were calculated analogously to those defined by Murray and Politzer [59–61], but taken at the SAS rather than at the electronically determined molecular surface (see main text for details).

than evaluating the molecular electrostatic potential as does the GIPF method, our approach takes the distribution of the screening potential at the SAS. To show this distinction in their designation, and to avoid confusion with the meaning by Murray and Politzer [59–61], we add the subscript S for these descriptors:  −S 2 ,  +S 2 and  tS 2 . One should be careful to avoid confusion due to different uses of the letter ; since they occur in distinct contexts, regarding either the COSMOments-type or the GIPF-like descriptors, they are discernible and so we kept the original notation of the authors in this discussion. We prepended the symbols for the COSMOmentstype descriptors with the letter M, however—signifying that they are moments of the sigma-profiles. Together with the molar weight, a total of 29 primary descriptors were generated. As in our prior work [14], values divided by either the volume or surface were included as well, making 86 descriptors altogether. The inverse volume and surface were also considered separately, to allow for a possibly better fit with them than with their non-reciprocal values. After the first pass of analysis it was observed that many topranking models contained either MHBdon0 or MHBacc0, or both. Note that both are differently charged fractions of the total SAS, and as such have the same unit and their difference is also a meaningful physical quantity. These two variables have an intercorrelation coefficient of 0.56, which can be reduced to near zero (0.08) simply by substituting the difference MHBdon0–MHBacc0 for the descriptor MHBdon0. This diminishing interdependence of the explanatory variables had a beneficial effect on the model selection; therefore the substituted descriptor set was used for the final MLR analysis. Whereas in the first pass several of the top models, ranked by their R2 , were contaminated by very high inter-

correlation of descriptors, this was not the case for the best models from the re-submitted run. During the calculations we checked if including squares and cross-products of the descriptors could improve the fit for r1 . The slightly increased R2 did not justify using these second-order variables, so the descriptor set was not expanded. 2.3. MLR calculations Calculations were performed with in-house developed tools1 : descriptors were generated with scripts written in the Awk language, and MLR analysis was carried out with C-language programs using the GNU Scientific Library (GSL) [62,63]. The GSL routines have been designed to cope with (near-)singular matrices that occur when processing multicollinear data, using singular value decomposition [64,65]. Post-processing of results was done with spreadsheet programs, Gnumeric [66] and Microsoft Excel [67]. The statistical evaluation (MLR including Mallows’ Cp [68] and Akaike’s AIC [69], CA and PCA [70]) of the data was performed by the PROSTAT package [43]. Severe multicollinearity between numerous descriptors hinders variable selection with traditional methods [4], when a large number of near-equivalent possibilities are to be considered as in this work. Herein, we report results from an exhaustive search for discovering alternative models. Inspection of the dependent variables revealed that, in the set of r1 data, two items are distant (by more than seven times the standard deviation) from the rest of the points, as illustrated in Fig. 1. Therefore, these two atypical phases (namely diglycerol and sorbitol) would have a controlling influence unduly distorting the least-squares fits. Thus they were excluded, and the following calculations were done with the reduced set of 23 points only. It is noteworthy that diglycerol had been flagged early on as an outlier from the McReynolds set [71], due to very high surface adsorption controlling its retention behavior. Previous analysis of McReynolds constants also found this phase statistically remote from the rest of the data [14]. Others reported analogous findings too [72,73]. It is reasonable to surmise that sorbitol, having similarly high density of hydroxyl groups on its molecular surface as diglycerol, would behave anomalously for similar reasons. 2.4. Model building and variable selection Large-scale BSS search was carried out, with a brute-force method implemented in our own software, for locating optimal models. Preliminary tests showed that when all five dependent variables were considered simultaneously, no satisfactory MLR models restricted to four or less descriptors could be built. Therefore, all possible combinations from the list of 86 prospective descriptors were submitted to MLR fitting with five independent variables (3.5 × 107 possibilities per dependent variable). Parsing the output from these runs showed that three of the Abraham parameters, namely s1 , a1 and l1 , can be well fitted with several different combinations; no such well-correlating model could be found for either r1 or c1 , however. At this stage most raw models had some fitted coefficients with large estimated standard errors, compared to their absolute values. Therefore they were pruned, to leave only terms that are highly significant. In a stepwise manner, the parameter with the highest relative error was deleted, if that exceeded the limit set at 50%. (But at most 2 such removals were done, retaining at least three

1 Selected utilities are made available rial, and up-to-date versions are accessible szeged.hu/∼fekete/GSL-progs/>.

as supplementary mateat the author’s webpages and
E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

8539

sionality. PCA was done on the dependent variables, as well. Furthermore, we made and analyzed a set of descriptors analogous to that used in ZAKEW [20] (used by them for analysis of solute parameters), for comparison (see Section 3.4). The selected optimal set of descriptors was also subjected to cluster analysis; it was tested whether groupings achievable could be used for classification of stationary phases.

3. Results and discussion 3.1. The model

Fig. 1. Histogram of Studentized data of Abraham’s r1 coefficients for the 25 stationary phases initially considered; the remote rightmost bin marks diglycerol and sorbitol as extreme outliers.

descriptors in each individual equation). These reduced candidate models were then subjects of the final selection procedure. Our aim has been to choose a model, from among the several alternatives with similar goodness-of-fit measures, whose descriptors do not correlate too much among themselves. Guided by quantitative analyses of the modeling, this selection was a multi-stage process (that had some necessarily subjective decisions, considering the variety of statistics available). To simplify a physical picture emerging from the interpretation, models that had ratio descriptors containing both the surface and volume as denominators were discarded (except for Mwt/Vol and Alpha/Vol, which ratios are physically justifiable quantities on their own). The final decision for selecting a best model is based on combining a number of criteria. To begin with, models were only allowed if their R2 for all five individual fits were within chosen limits from the best of each group: 10% for s1 , a1 , l1 and r1 , and 20% for c1 . Further, combined limits were set for the root-mean-square (RMS) average R2 of the three equations for s1 , a1 and l1 , and four equations for r1 , s1 , a1 and l1 (both within 2%). The overall R2 of all five equations was required to be within 5% from the best such value, as well. From among the models remaining on this filtered list of alternatives, the one with the lowest RMS of the inter-descriptor multiple correlation coefficients was finally selected. Notably, this same choice also had the minimal value for the maximums of the multiple correlation coefficients, as well as the lowest RMS average relative error of MLR coefficients. 2.5. Statistical methods After the optimal set of descriptors was selected, Principal Component Analysis (PCA) was carried out to determine its dimen-

The proposed model, based on the selection procedure outlined above, uses five descriptors: Alpha/Vol, MSig2, MHBacc3/CSA, HOMO/CSA and MaxPosH/CSA, with only moderate intercorrelations (0.63 RMS and 0.81 maximum of the multiple correlation coefficients); this has an overall R2 = 0.920 of all five fits including c1 (average R2 = 0.961 of the best three fits (s1 , a1 , and l1 ), and average R2 = 0.953 of the four fits including r1 ). See also Section 3.4 for a further discussion of these variables, in comparison with other sets. Finally, reduced models were built for each of the five Abraham’s parameters after considering several statistical measures of significance for inclusion of variables into the ultimate equations. These tests were first carried out with mean-centered variables, so that no effect of intercepts on the active variables confounded the statistics. To achieve a robust selection, at this stage the decision was not based on the R2 , but rather on five different selection methods applied simultaneously: best subset selection (BSS) employing three possible criteria (RMS error, Mallows’ Cp coefficient [68] or Akaike’s Information Criterion [69]), backward elimination and stepwise selection (with probability threshold set 0.01 for entry, 0.05 for removal in each of the latter two methods). In all cases at least three of these procedures agreed on an optimal model, making an unambiguous decision. The calculated leave-one-out cross-validation statistics (predicted residuals error sum of squares) also confirmed these selections, even when they differed from those based on R2 or RMSE. It is also interesting to note that both the BSS (with Akaike’s criterion, which was found the most successful of the three investigated here) and the backward elimination methods were part of the winning majority for all the five dependent variables considered. On the other hand, stepwise selection was the poorest performer, which voted with the majority in only two cases (of which one, that of s1 , was decided unanimously). Having had the sets of explanatory variables for each model finalized, the remaining question was whether intercepts should be included or not. This was tested with the variables selected in the previous stage, but in their un-normalized form. In two cases, those of r1 and a1 , Student’s t-test indicated non-significance of the intercepts (with probabilities 0.73 and 0.41, respectively), therefore we recommend omitting them for these parameters. All the final models developed are briefly discussed below individually, showing their standardized form so that relative importance of their coefficients is easy to judge. Full statistics are shown for these forms in Table 3; for reference, the non-reduced equations are listed in Supplementary Table S2, while the reduced equations with non-standardized coefficients are given in Supplementary Table S3. A cumulative histogram for residuals from all five fits, shown in Fig. 2, indicates no deviation from the normal distribution of errors. The regression for the intercept term in Abraham’s LSER for GLC (Eq. (2) in Ref. [19]) contains only three descriptors: c1 = −0.75 Alpha/Vol − 0.52 MSig2 − 0.59 MHBacc3/CSA

(4)

8540

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

Table 3 Reduced correlation equations and statistics, on standardized variables. Descriptor

Alpha/ Vol

c1 Std. err. t-Statistic Pr > |t|

−0.749 0.114 −6.573 <0.0001

−0.519 0.112 −4.654 0.000

−0.591 0.113 −5.225 <0.0001

– – – –

r1 Std. err. t-Statistic Pr > |t|

0.569 0.135 4.213 0.000

−0.461 0.142 −3.250 0.004

0.739 0.175 4.229 0.000

– – – –

s1 Std. err. t-Statistic Pr > |t|

0.544 0.041 13.352 <0.0001

a1 Std. err. t-Statistic Pr > |t| l1 Std. err. t-Statistic Pr > |t|

MSig2

– – – – −0.379 0.073 −5.205 <0.0001

MHBacc3/ CSA

HOMO/ CSA

MaxPosH/ CSA – – – – −0.784 0.163 −4.810 0.000

0.868 0.048 18.177 <0.0001

0.457 0.044 10.432 <0.0001

−0.516 0.045 −11.389 <0.0001

– – – –

0.313 0.066 4.735 0.000

0.814 0.061 13.439 <0.0001

−0.139 0.064 −2.153 0.044

– – – –

−0.400 0.078 −5.111 <0.0001

0.327 0.081 4.036 0.001

– – – –

−0.913 0.085 −10.712 <0.0001

a1 = 0.31 MSig2 + 0.81 MHBacc3/CSA − 0.14 HOMO/CSA

The r1 parameter is regressed with four descriptors: r1 = 0.57 Alpha/Vol − 0.46 MSig2 + 0.74 MHBacc3/CSA − 0.78 MaxPosH/CSA

(5)

The absolute magnitudes of the four contributions are broadly similar, indicating that a complicated interaction is described; according to Abraham et al., this quantity encodes the propensity of solutes to interact via ␲- and n-electron pairs. The equation for the s1 parameter (a measure of dipolarity/polarizability) contains four descriptors: s1 = 0.54 Alpha/Vol + 0.87 MSig2 + 0.46 MHBacc3/CSA − 0.52 HOMO/CSA

descriptor Alpha/Vol is a better regressor than polarizability itself for QSPR on GLC data. A notable feature of Eq. (6) is the role by the other three descriptors, relating to various aspects of charge distribution over CSA. This is in accord with the observation already made by Arey et al. [30], who noted the electrostatic origin of 2H (the S solute parameter in the author’s revised notation) and pointed out the possibility of some mixing in the Abraham scheme with the hydrogen-bonding terms (just like our MHBacc3/CSA variable entering here). It may also be instructive to compare this model with that of Lamarche et al. [26], who also developed a QC based equation for 2H . Their four descriptors were (in the order of importance according to the scaled regression coefficients): the dipole moment, the polarizability, the HOMO–LUMO gap and the average of absolute atomic charges. Lamarche et al. reported R2 = 0.840 [26], while our model with Eq. (6) has R2 = 0.987. The regression for a1 (Abraham’s hydrogen-bond basicity) uses only three descriptors:

(6)

Here again all four contributions are considerable. We found in this study, just as in our prior work [14], that the volume-specific

(7)

The dominant term here is a COSMOments-type hydrogen bond acceptor moment [20], divided by CSA. For comparison, Abraham’s basicity term for solutes was very recently described with a QC based regression by Devereux et al. [31], with four descriptors. Their three most important variables are the minimum, median and mean values of the electrostatic potential at the molecular surface, while the least influential is the local ionization energy evaluated on the surface (whereas our third descriptor, shown above, uses the global ionization energy divided by the surface area). The model by Devereux et al. has R2 = 0.848 [31]; our model with Eq. (7) has R2 = 0.974. The equation for the l1 parameter contains four descriptors: l1 = −0.38 Alpha/Vol − 0.91 MSig2 − 0.40 MHBacc3/CSA + 0.33 HOMO/CSA

(8)

The l1 coefficient represents the ability of a GLC solvent to separate members of a homologous series. It is also a measure of lipophilicity; in other words, how near the solvent lipophilicity is to that of hexadecane (for which l1 = 1 by definition) [19]. In our model, the largest absolute magnitude contribution is due to a COSMOments-type sigma-moment, which has been characterized as a measure of the overall electrostatic polarity in solute/solvent interactions [20]. 3.2. Physicochemical background of the selected independent variables

Fig. 2. Cumulative histogram of Studentized residuals from all five models (Eqs. (4)–(8)).

All of the descriptors entering the models are readily calculated from semiempirical QC calculations, and carry straightforward physicochemical relevance. Alpha/Vol is the volume-specific polarizability; the inclusion of the denominator can be rationalized, considering that for similar molecules with different sizes the polarizabilities would scale similarly to the volumes. MSig2, the second moment of the COSMOments-type sigma-potential, has already been identified as a measure of the overall electrostatic polarity of molecules [20]. The COSMOments-type hydrogen bond acceptor moment, MHBacc3, is a quantitative measure of the acceptor capacity [20]. The role of HOMO as surrogate for covalent Lewis basicity (“molecular orbital basicity”) has been reported early in [74–77]. It represents the global electron donor capacity, and as such is often used in QSPR descriptor, e.g. to model molecular binding to surfaces [78]. The largest positive charge on any H atom, MaxPosH, is an often used theoretical LSER descriptor for “electrostatic acidity” [74–77]. Our comprehensive statistical search showed that surface-specific forms of these latter three variables

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

8541

Table 4 Summary of Principal Component Analysis: Abraham descriptors. Statistics used for autoscaling Mean Std. dev. PCA statistics Std. dev. Variance% Loadings (eigenvectors) c1 r1 s1 a1 l1

r1

a1 −1

−5.070 × 10 1.807 × 10−1 PC1 1.80 65 PC1 −8.514 × 10−1 −8.048 × 10−2 9.680 × 10−1 8.230 × 10−1 −9.551 × 10−1

c1 −2

−1

8.826 × 10 5.280 × 10−2 PC2 1.05 22 PC2 −3.515 × 10−1 9.745 × 10−1 −1.297 × 10−2 −1.671 × 10−1 7.403 × 10−2

serve as more suitable regressors than either their raw versions or volume-specific forms. 3.3. PCA calculations: analysis of dimensionality Principal Component Analysis (PCA) was carried out separately for the set of independent, as well as dependent variables. PCA provides a measure of dimensionality of their respective spaces spanned. Moreover, as pointed out by Zissimos et al. [20], the percentage of variance accounted for gives an indication of the stability of a component, while examination of the loadings can indicate its interpretation. The summaries of PCA for the two sets are given in Tables 4 and 5. Descriptors were standardized prior to this analysis by subtracting off their means and then dividing by their standard deviations, so each descriptor then had zero mean and unit variance. This procedure, often called autoscaling, is equivalent to carrying out the PCA on the correlation matrix rather than on the covariance matrix. For each variable set, the means and standard deviations used for autoscaling are listed in the tables, followed by the standard deviations of the principal components themselves (i.e. the square roots of their eigenvalues) with the variance accounted for by each component, as a percentage of the total. The last parts of Tables 4 and 5 are the loadings, which apply to the autoscaled descriptors and are equivalently the eigenvectors of the correlation matrix in each case. The so-called biplot representation [79] of the observations and variables is shown in Fig. 3. 3.4. Comparison of the selected model with a ZAKEW-analogue alternative set In a comparative analysis of solute parameters, ZAKEW [20] used a set of five COSMOment descriptors: MSig2, MSig3, MHBdon3, MHBacc3 and CSA (which is identical to MSig0). In order to make a comparison with their study, we also examined an analogous set for the solvents studied here. It is emphasized, however, that the descriptors calculated in our work are not identical with the original COSMOments. The latter are provided exclusively by the proprietary software COSMO-RS [37–39]; moreover, they typically

8.365 × 10 3.331 × 10−1 PC3 0.65 8 PC3 2.971 × 10−1 1.899 × 10−1 −8.247 × 10−2 5.301 × 10−1 9.240 × 10−2

l1

s1

1.191 × 10+0 5.314 × 10−1 PC4 0.38 3 PC4 2.486 × 10−1 8.784 × 10−2 1.087 × 10−1 −1.158 × 10−1 −2.185 × 10−1

5.541 × 10−1 5.552 × 10−2 PC5 0.27 1 PC5 4.030 × 10−2 1.903 × 10−3 2.104 × 10−1 −1.862 × 10−2 1.611 × 10−1

utilize higher level ab initio QC calculations that require higher computational cost. Our implementation is based on the earlier, simpler version of COSMO rather than COSMO-RS. It is using the readily available MOPAC2009 program, which is free for academic application. Being a lower level theory, this method may have inherently lower accuracy for important quantities like dipole moments. On the other hand, the success of QSPR modeling indicates that these less expensively calculated descriptors are sufficient for their task. The principal difference between ZAKEW [20] and this work is that the former dealt with solute properties, while the latter focuses on solvents. Nevertheless it is instructive to make a comparison, as solvent and solute properties are complementary in solvation modeling, and the Abraham’s LSER formalism embodied in Eqs. (1)–(3) makes their roles explicitly symmetrical. Fig. 4 summarizes graphically the eigenvalues obtained for the ZAKEW-analogue alternative set, along with the two other sets whose PCA analysis was discussed in Section 3.4. The Abraham set practically collapsed into three dimensions—its last two principal components account for a mere 4% of the total variance. The descriptors selected in our work have the slowest decay of eigenvalue magnitudes, compared to either of the other sets: the fourth and fifth principal components account for 7% and 4% of the total variance, respectively. As seen in Fig. 4, the ZAKEW-analogue set occupies an intermediate position with regard to reduction of dimensionality: its last two principal components account for 6% of the total variance. All pair-wise and multiple inter-correlations within our set of descriptors, as well as their cross-correlations with those of the Abraham’s coefficient set, are listed in Table 6; Fig. 5 depicts the correlations graphically. Also included in this table is the ZAKEWanalogue set of five descriptors. For the selected set, most inter-correlations are low: out of the 10 values, only 3 differ from zero statistically (at ˛ = 0.05 significance level). In contrast, 6 inter-correlations are high among Abraham’s solvent parameters (all pairs except those involving r1 ). For the ZAKEW-analogue set, all but two inter-correlations differ from zero significantly. In the bottom row of Table 6 the squared multiple correlation coefficients, calculated for each variable from

Table 5 Summary of Principal Component Analysis: QC-based descriptors from the current investigation. Statistics used for autoscaling

Alpha/Vol

MSig2

MHBacc3/CSA

HOMO/CSA

MaxPosH/CSA

Mean Std. dev. PCA statistics Std. dev. Variance% Loadings (eigenvectors) Alpha/Vol MSig2 MHBacc3/CSA HOMO/CSA MaxPosH/CSA

9.725 × 10−2 1.289 × 10−2 PC1 1.45 42 PC1 −9.586 × 10−2 −3.385 × 10−1 9.200 × 10−1 −7.568 × 10−1 7.383 × 10−1

1.971 × 10−2 8.701 × 10−3 PC2 1.30 34 PC2 8.747 × 10−1 −6.476 × 10−1 −1.042 × 10−1 5.164 × 10−1 4.758 × 10−1

7.127 × 10−9 8.919 × 10−9 PC3 0.82 14 PC3 2.854 × 10−1 6.811 × 10−1 1.684 × 10−1 1.421 × 10−1 2.851 × 10−1

−2.135 × 10−2 5.132 × 10−3 PC4 0.59 7 PC4 3.765 × 10−1 2.017 × 10−2 9.595 × 10−2 −2.759 × 10−1 −3.443 × 10−1

5.710 × 10−4 1.701 × 10−4 PC5 0.45 4 PC5 −4.953 × 10−2 −4.138 × 10−2 3.242 × 10−1 2.536 × 10−1 −1.695 × 10−1

8542

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

Fig. 3. Biplot representation of the observations and variables, from PCA on the selected five descriptors.

within their respective sets, are listed. These values indicate the fractional variance of the variable already explained by the rest of the variables in the set; the lower this value, the higher is the partial information content achieved by including the variable into the set. Among the five descriptors selected in our study, even the most correlated variable carries 34%(=1 − 0.66) of its variance as extra information. In the Abraham set, the lowest extra information is 11%. Only 5% variance of the least informative variable (MSig3) in the ZAKEW-analogue set is unexplained by the remaining descriptors.

[80,81]. This role can now be fulfilled by the QC based descriptor set, which was shown to model the Abraham parameters. Therefore, a structure-driven (that is, molecularly based) classification is achieved using these descriptors. It becomes possible, then, to deduce molecular properties for new stationary phases promising to extend, rather than duplicate, the selectivity space already available. A dissimilarity metric was established taking the Euclidean distances in the five-dimensional vector space spanned by our descriptors (scaled by their standard deviations taken from the sample investigated); larger distances correspond to lower similarities. Grouping was achieved with agglomerative hierarchical clustering algorithm [82,83]. Ward’s linkage [84] was chosen for agglomeration method, whose criterion for joining groups is to increase within-group inertia as little as possible, in order to keep the clusters homogeneous.

3.5. Structure-driven classification of stationary phases The Abraham solvation parameter system has often been used as a means for establishing a chemometric classification of stationary phases (as opposed to merely predicting retention data)

Table 6 Correlations for a combined list of variables from three sets: descriptors selected in this work (first group), Abraham’s solvent parameters (middle group) and Klamt-type descriptors (last group); significant values (at ˛ = 0.05 level) are indicated in bold. Variables

Alpha/ Vol

MSig2

MHBacc3/ CSA

HOMO/ CSA

MaxPosH/ CSA

r1

a1

c1

l1

s1

CSA

MHBdon3

MHBacc3

MSig3

Alpha/Vol MSig2 −0.33 MHBacc3/CSA −0.36 HOMO/CSA 0.00 MaxPosH/CSA −0.14

0.31 0.45 −0.11

−0.22 0.59

−0.67

r1 a1 c1 l1 s1

0.56 −0.42 −0.36 0.07 0.09

−0.34 0.50 −0.45 −0.76 0.60

−0.07 0.94 −0.48 −0.61 0.64

0.09 −0.18 −0.03 0.00 −0.23

−0.38 0.50 −0.26 −0.34 0.45

−0.14 −0.20 0.15 −0.10

−0.51 −0.73 0.74

0.77 −0.81

−0.92

CSA MHBdon3 MHBacc3 MSig3 MSig2

−0.19 −0.30 −0.39 −0.56 −0.33

0.56 0.50 0.51 0.89 1.00

−0.23 0.65 0.91 0.58 0.31

0.91 −0.03 0.06 0.30 0.45

−0.64 0.69 0.35 −0.02 −0.11

−0.08 −0.45 −0.06 −0.30 −0.34

−0.15 0.67 0.90 0.71 0.50

0.05 −0.47 −0.49 −0.36 −0.45

−0.01 −0.65 −0.66 −0.71 −0.76

−0.22 0.58 0.60 0.56 0.60

0.03 0.04 0.42 0.56

0.61 0.48 0.50

0.77 0.51

0.89

0.19

0.41

0.52

0.60

0.66

0.26

0.58

0.75

0.87

0.89

0.42

0.61

0.86

0.95

Within-set R

2

MSig2

0.92

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

8543

Fig. 6. Dendogram from agglomerative hierarchical clustering, with Ward’s linkage; inset shows the group linkages above cut-off.

Fig. 4. Eigenvalues obtained for the three sets of variables compared. Left-side bars and top curve: descriptors selected in this work; middle bars and middle curve: ZAKEW-analogue descriptors; right-side bars and bottom curve: Abraham’s solvent parameters.

Fig. 5. Visualization of correlations for a combined list of variables from three sets: descriptors selected in this work (first group), Abraham’s solvent parameters (middle group) and ZAKEW-analogue descriptors (last group); significant values (at ˛ = 0.05 level) are indicated with black, non-significant ones with white squares.

Results from the procedure are conveniently visualized in a tree diagram (called dendogram), shown in Fig. 6. Stationary phases most like each other are connected at the left hand side, and differentiated from their neighbors with increasingly different properties by longer connections along the x-axis. Those stationary phases connected at the far right hand side of the dendogram are the least like those phases connected at the left hand side. Long linkages identify singular stationary phases whose properties, in large part, cannot be duplicated by other phases included in the analysis. The resulting five-group classification is depicted in Fig. 7, drawn onto the plane of MSig2 and Alpha/Vol. Although it is difficult to see

Fig. 7. Plot of classification from agglomerative hierarchical clustering, with Ward’s linkage. Group 1: slanted crosses; Group 2: straight crosses (symbols unlabeled, except for the point closest to Group 1); Group 3: diamonds; Group 4 (single point): circle; Group 5: squares.

8544

E.A. Hoffmann et al. / J. Chromatogr. A 1216 (2009) 8535–8544

in the planar projections, the clusters are well separated in higher dimensions (as can be visualized, for example, in the subspace spanned by the first three PCs). For example, the closest approach between groups 1 and 2, which appears partially overlapped in this figure, is as high as 1.0 distance units (in the Euclidean space of standardized variables)—that is comparable to the size of the clusters. This analysis demonstrated that our set of descriptors offers a viable basis for classification. 4. Conclusions and outlook This work demonstrated that, based on QC calculations, useful structure-driven MLR models for Abraham solvent descriptors of GLC stationary phases can be built. Applying these models, Abraham parameters for stationary phases of arbitrary structure can be calculated. We selected an optimal set of five descriptors, subsets of which constitute three- or four-variable equations for the individual fits. Cluster analysis, in the five-dimensional vector space spanned, showed that these descriptors are also useful for classification of stationary phases. The scope of the current investigation has been limited to molecular (non-polymeric and non-ionic) materials. This was a practical rather than principal restriction. The methodology explored here can be extended to other type of stationary phases as well, and we plan developing such models in the future. Acknowledgements Z.A.F. is grateful to Prof. M.H. Abraham, Prof. A. Dallos, Prof. P. Politzer and Prof. C.F. Poole for providing reprints of their papers. We thank the anonymous reviewers whose suggestions helped improving the manuscript. This work was partially supported by the Hungarian Research Fund (OTKA K61577 and T067679). Computational services were provided by the HPC Group at the University of Szeged. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.chroma.2009.09.074. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20]

R. Kaliszan, Chem. Rev. 107 (2007) 3212. K. Heberger, J. Chromatogr. A 1158 (2007) 273. A.C. Duarte, S. Capelo, J. Liq. Chromatogr. Rel. Technol. 29 (2006) 1143. A. Dallos, H.S. Ngo, R. Kresz, K. Heberger, J. Chromatogr. A 1177 (2008) 175. E.S. Souza, C.A. Kuhnen, B.D. Junkes, R.A. Yunesa, V.E.F. Heinzen, J. Chemometr. 22 (2008) 378. Y. Ren, H. Liu, X. Yao, M. Liu, J. Chromatogr. A 1155 (2007) 105. Y.Z. Song, J.F. Zhou, Y. Song, J.M. Xie, Y. Ye, Comput. Biol. Med. 37 (2007) 315. T. Djakovic-Sekulic, N. Perisic-Janjic, C. Sarbu, Z. Lozanov-Crvenkovic, J. Planar Chromatogr. Mod. TLC 20 (2007) 251. R. Rajko, T. Kortvelyesi, K. Sebok-Nagy, M. Gorgenyi, Anal. Chim. Acta 554 (2005) 163. T. Hanai, Anal. Bioanal. Chem. 382 (2005) 708. T. Kortvelyesi, M. Gorgenyi, K. Heberger, Anal. Chim. Acta 428 (2001) 73. J.M. Santiuste, J.A. Garcia-Dominguez, Anal. Chim. Acta 405 (2000) 335. Z. Kiraly, T. Kortvelyesi, L. Seres, M. Gorgenyi, Chromatographia 42 (1996) 653. E.A. Hoffmann, Z.A. Fekete, R. Rajkó, I. Pálinkó, T. Körtvélyesi, J. Chromatogr. A 1216 (2009) 2540. M. Vitha, P.W. Carr, J. Chromatogr. A 1126 (2006) 143. M.H. Abraham, G.S. Whiting, R.M. Doherty, W.J. Shuely, J. Chem. Soc., Perkin Trans. 2 (1990) 1451. M.H. Abraham, G.S. Whiting, R.M. Doherty, W.J. Shuely, J. Chromatogr. 587 (1991) 229. M.H. Abraham, G.S. Whiting, R.M. Doherty, W.J. Shuely, J. Chromatogr. 587 (1991) 213. M.H. Abraham, D.S. Ballantine, B.K. Callihan, J. Chromatogr. A 878 (2000) 115. A.M. Zissimos, M.H. Abraham, A. Klamt, F. Eckert, J. Wood, J. Chem. Inf. Comput. Sci. 42 (2002) 1320.

[21] C.F. Poole, S.N. Atapattu, S.K. Poole, A.K. Bell, Anal. Chim. Acta 652 (2009) 32. [22] J.A. Platts, D. Butina, M.H. Abraham, A. Hersey, J. Chem. Inf. Comp. Sci. 39 (1999) 835. [23] J.A. Platts, M.H. Abraham, D. Butina, A. Hersey, J. Chem. Inf. Comp. Sci. 40 (2000) 71. [24] J.A. Platts, Phys. Chem. Chem. Phys. 2 (2000) 3115. [25] J.A. Platts, Phys. Chem. Chem. Phys. 2 (2000) 973. [26] O. Lamarche, J.A. Platts, A. Hersey, Phys. Chem. Chem. Phys. 3 (2001) 2747. [27] O. Lamarche, J.A. Platts, Chem. Phys. Lett. 367 (2003) 123. [28] O. Lamarche, J.A. Platts, Phys. Chem. Chem. Phys. 5 (2003) 677. [29] O. Lamarche, J.A. Platts, A. Hersey, J. Chem. Inf. Comput. Sci. 44 (2004) 848. [30] J.S. Arey, W.H. Green, P.M. Gschwend, J. Phys. Chem. B 109 (2005) 7564. [31] M. Devereux, P.L.A. Popelier, I.M. Mclay, Phys. Chem. Chem. Phys. 11 (2009) 1595. [32] P. Havelec, J.G.K. Sevcik, J. Phys. Chem. Ref. Data 25 (1996) 1483. [33] P. Havelec, J.G.K. Sevcik, J. Chromatogr. A 677 (1994) 319. [34] D. Svozil, J.G. Sevcik, V. Kvasnicka, J. Chem. Inf. Comp. Sci. 37 (1997) 338. [35] T. Ghafourian, J.C. Dearden, J. Pharm. Pharmacol. 52 (2000) 603. [36] J.C. Dearden, T. Ghafourian, J. Chem. Inf. Comput. Sci. 39 (1999) 231. [37] A. Klamt, J. Phys. Chem-Us 99 (1995) 2224. [38] C. Mehler, A. Klamt, W. Peukert, AIChE J. 48 (2002) 1093. [39] A. Klamt, COSMO-RS: From Quantum Chemistry to Fluid Phase Thermodynamics and Drug Design, Elsevier, Amsterdam, 2005. [40] ChemSketch, Version 11. 01, Advanced Chemistry Development, Inc., Toronto, ON, Canada, 2007, http://www.acdlabs.com. [41] B.R. Brooks, R.E. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan, M. Karplus, J. Comput. Chem. 4 (1983) 187. [42] J.C. Smith, M. Karplus, J. Am. Chem. Soc. 114 (1992) 801. [43] PubChem Project, 2008; http://pubchem.ncbi.nlm.nih.gov. [44] R. Guha, M.T. Howard, G.R. Hutchison, P. Murray-Rust, H. Rzepa, C. Steinbeck, J. Wegner, E.L. Willighagen, J. Chem. Inf. Model. 46 (2006) 991. [45] The Open Babel team, 2009; http://sourceforge.net/openbabel. [46] T.A. Halgren, J. Comput. Chem. 17 (1996) 490. [47] T.A. Halgren, J. Comput. Chem. 20 (1999) 730. [48] A. Klamt, Fluid Phase Equilibr. 206 (2003) 223. [49] Lexichem, v1.8, Openeye Scientific Software, Santa Fe, NM 87508, USA, 2008, http://www.eyesopen.com/. [50] Ogham, v1.5, Openeye Scientific Software, Santa Fe, NM 87508, USA, 2008, http://www.eyesopen.com/. [51] J.J.P. Stewart, J. Mol. Model. 13 (2007) 1173. [52] MOPAC2009, Version 9.0*, J.J.P. Stewart, 2009, http://OpenMOPAC.net. [53] J.J.P. Stewart, J. Mol. Model. 15 (2009) 765. [54] A. Klamt, G. Schüürmann, J. Chem. Soc., Perkin Trans. 2 (1993) 799. [55] M. Hornig, A. Klamt, J. Chem. Inf. Model. 45 (2005) 1169. [56] A. Klamt, F. Eckert, M. Hornig, J. Comput. -Aided Mol. Des. 15 (2001) 355. [57] A. Klamt, F. Eckert, M. Diedenhofen, Environ. Toxicol. Chem. 21 (2002) 2562. [58] R. Kresz, Ph.D. thesis, Department of Physical Chemistry, University of Pannonia, Veszprem, 2006. [59] P. Politzer, J.S. Murray, Fluid Phase Equilibr. 185 (2001) 129. [60] J.S. Murray, P. Politzer, G.R. Famini, J. Mol. Struct. (Theochem) 454 (1998) 299. [61] P. Politzer, J.S. Murray, F. Abu-Awwad, Int. J. Quantum Chem. 76 (2000) 643. [62] M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, P. Alken, M. Booth, F. Rossi, GNU Scientific Library Reference Manual—Third Edition (v1.12), Network Theory Limited, United Kingdom, 2009. [63] GSL - GNU Scientific Library, G.S.L. Team, Free Software Foundation, Inc., 2009; http://www.gnu.org/software/gsl/. [64] J.C. Nash, S. Shlien, Comput. J. 30 (1987) 268. [65] G.H. Golub, C.F.V. Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 1996. [66] Gnome Office Spreadsheet, Version 1.9, Gnumeric Team, 2008; http://www.gnome.org/projects/gnumeric/. [67] MS Excel, Version 2003, Microsoft Corp, Redmond, WA, 2003. [68] C.L. Mallows, Technometrics 15 (1973) 661. [69] H. Akaike, in: F.C.V.N. Petrov (Ed.), Second International Symposium on Information Theory, Akademiai Kiadó, Budapest, Hungary, 1973, p. 267. [70] I.T. Jolliffe, Principal Component Analysis, Springer, New York, NY, USA, 2002. [71] W.O. McReynolds, J. Chromatogr. Sci. 8 (1970) 685. [72] I.G. Zenkevich, A.A. Makarov, J. Anal. Chem. +60 (2005) 845. [73] J.J. Li, Y. Zhang, P.W. Carr, Anal. Chem. 64 (1992) 210. [74] G.R. Famini, L.Y. Wilson, Rev. Comp. Chem. (2002) 211. [75] C.J. Cramer, G.R. Famini, A.H. Lowrey, Accounts Chem. Res. 26 (1993) 599. [76] L.Y. Wilson, G.R. Famini, J. Med. Chem. 34 (1991) 1668. [77] Using Theoretical Descriptors in Quantitative Structure–Activity Relationships 5. A review of theoretical parameters, CRDEC-TR-085, G.R. Famini, U.S. Army Chemical Research, Development and Engineering Center, Aberdeen Proving Ground, MD, USA, 1989, http://www.dtic.mil/cgibin/GetTRDoc?AD=ADA213580&Location=U2&doc=GetTRDoc.pdf. [78] G. Schüürmann, S. Funar-Timofei, J. Chem. Inf. Comp. Sci. 43 (2003) 1502. [79] J.C. Gower, D.J. Hand, Biplots, Chapman and Hall, London, 1996. [80] M.H. Abraham, C.F. Poole, S.K. Poole, J. Chromatogr. A 842 (1999) 79. [81] C.F. Poole, S.K. Poole, J. Chromatogr. A 1184 (2008) 254. [82] B.S. Everitt, S. Landau, M. Leese, Cluster Analysis, Arnold, London, 2001. [83] P. Legendre, L. Legendre, Numerical Ecology, Elsevier, Amsterdam, 1998. [84] J.H. Ward, J. Am. Stat. Assoc. 58 (1963) 238.