Chemometrics and Intelligent Laboratory Systems 47 Ž1999. 41–49
Evaluation of polarity indicators and stationary phases by principal component analysis in gas–liquid chromatography Karoly Heberger ´ ´
)
Central Research Institute for Chemistry of the Hungarian Academy of Sciences, P.O. Box 17, H-1525 Budapest, Hungary Received 10 November 1995; accepted 24 August 1998
Abstract Principal component analysis ŽPCA. was performed on a data matrix consisting of eight polarity indicators Žscales. and 30 stationary phases. Calculations were carried out on the correlation matrices of values characterizing polarityrselectivity of liquid phases Žcolumns.. It was found that three principal components account for more than 99% of the total variance in the data, indicating that no single polarity variable is applicable on alone. The plots of component loadings and component scores showed significant groupings of polarity indicators and of stationary phases, respectively. The polarity indicators can be divided into two groups: Ži. the sum of the first five McReynolds constants, Kovats’ ´ coefficients, Castello’s DC values; Žii. Snyder’s selectivity parameters. The latter differ from each other and from the other polarity indicators considerably. The physical meaning attributed to the most influential abstract factors, are as follows: Factor 1, polarity Žaccording to McReynolds, Kovats ´ and Castello.; Factor 2, hydrogen donating and accepting ability; Factor 3, dipole interactions. Sequential analyses showed that polarity represents a nonlinear function among the phases. q 1999 Elsevier Science B.V. All rights reserved. Keywords: Polarity indicator; Stationary phase; Principal component analysis; Gas–liquid chromatography
Contents 1. Introduction .
...................................................
42
2. Method of calculations
..............................................
42
3. Results and discussion
..............................................
43
...................................................
47
4. Conclusions .
Acknowledgements . References
)
.................................................
48
......................................................
48
Fax: q36-1-325-75-54; E-mail:
[email protected]
0169-7439r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 9 - 7 4 3 9 Ž 9 8 . 0 0 1 5 3 - 1
42
K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´
1. Introduction Multivariate data analysis—principal component analysis ŽPCA., factor analysis w1–6x, and cluster analysis w7–11x —have become a popular method in chemometrics mainly because they can provide information not otherwise accessible. The early applications of multivariate analyses to chromatography w1–11x have involved mainly gas–liquid chromatography data w1x. The usage of multivariate techniques has recently been extended to the field of reversedphase HPLC w12–17x. The main goal of these investigations was to select proper stationary phases, to recommend preferred solvents to a given separation, to predict retention data, etc. It should be noted that the resultsrconclusions of investigations mentioned here are often contradictory. The examinations are done mainly on the data set of McReynolds w18x Žretention indices of 10 solutes and 226 stationary phases. and frequently limited to a special compound class. The use of retention indices is understandable because their reproducibility surpasses that of any other retention data. Despite the different methods used to characterize the liquid phases, all techniques showed a single dominant factor in the data which was assigned to ‘polarity’ w19x. There is no single method, however, to characterize columns Žstationary phases. in the point of polarity unambiguously. Similarly, it is unknown which variable does characterize best the polarity Žselectivity.. The reason for that is the following: the term polarity cannot be defined unequivocally. For this reason, it is impossible to compare a particular measure of polarity to some ‘true’ value. Generally the McReynolds polarity is measured by summation Žor average. of the first five McReynolds constants. Moreover, selectivity and polarity are often used interchangeably and are therefore, vague in many discussions. Some authors think that the more polar the solvent the latter elutes a given solute in it. In others’ opinion polarity means polarizability which is a dynamic characteristic. Therefore our aim was to compare and evaluate the different polarity indicators defined in the literature. The aim of characterization of stationary phases is not given up either. Another aim is to attribute physical meaning to the abstract factors resulting fromPCA, that is to determine, if possible, the
importanteffects and interactions influencing the polarityrselectivity.
2. Method of calculations The following polarity indicators are known from the literature: Ž1. McReynolds polarity ŽMR. is the sum of the first five McReynolds constants w18x, the sum of retention index differences between a particular liquid phase and squalane for benzene, n-butanol, 2-pentanone, nitropropane and pyridine; Ž2. Kovats’ ´ coefficients Ž K c . as defined by Žy100. times the intercept slope ratio of the logarithm of retention volume vs. carbon number lines for n-alkanes w20x; Ž3. Retention polarity ŽRP. as defined by Tarjan ´ et al. w21x; Ž4. Snyders selectivity parameters x b , x n and x d for butanol, nitropropane and dioxane, respectively w22–24x; Ž5. Castello’s DC Žor DC. values defined by the analogy to Kovats’ ´ coefficients but using n-alkanols instead of n-alkanes, and y b values similarly to Snyders parameters: y b s x brŽ x b q x d . respectively w25x. In this paper PCA calculations are reported. They are carried out on Castello’s data matrix of eight polarity indicators Žcolumns. and 30 stationary phases Žrows. w25x. The 8 = 30 matrix has provided a reliable basis for a general investigation by PCA. It should be noted that the 30 stationary phases cover a wide range of polarity. To the best of our knowledge, no PCA results have been published on polarity indicators so far, although some measures of polarity have already been compared to each other on the basis of correlation coefficients w21x. PCA was used as formulated by Harman w26x. The aim of PCA is to represent the variables investigated in terms of several underlying components. PCA uses a linear model. The principal components are calculated such that they should be uncorrelated and should account for the total variance of original variables. More specifically, the first principal component should account for a maximum of the total variance, the second principal component should be uncorrelated with the first one and should account for a maximum of the residual variance, and so on until the total variance is accounted for. For a practical problem, it is sufficient to retain only a few components
K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´
accounting for a large percentage of the total variance. Rotation of the principal component axes can help in the identification of the abstract factors. PCA decomposes the original matrix into several products of multiplication by loading Žpolarity indicator. and score Žstationary phase. vectors. PCA will show which polarity scales Žand which stationary phases. are similar, i.e., carry comparable information and which one is unique. An assumption was made during the analysis, namely, all polarity scales express important features of column polarityrselectivity. 3. Results and discussion The popular criterion of using factors with eigenvalues larger than one leads here to a solution consisting of two principal components. The solution in this case, however, explains only 89% of the total variance in the data. A ‘scree plot’ fairly suggests the involvement of three principal components. Moreover, in the field of chromatography, when analyzing Kovats ´ retention indices, 98% description is required and is generally accepted w1x. In that case three principal components are retained with a 99% goodness of description. The first principal component explains 54%, the second 35% and the third 10% of the total variance revealing that no single polarity parameter can be applied on its own. This finding is in contrast to the earlier analyses w3–6x, where one dominant factor was recognized. The reason for that is: present input matrix involves some measures for characterizing selectivity, as well. Coefficients of correlation larger than 0.7 are indicated in Table 1 by asterisks. The best correlations are: Factor 1, polarity according to McReynolds, Kovats, ´ etc.; Factor 2, hydrogen donating and accepting ability with opposite signs Žbutanol and dioxane model compounds.; Factor 3, dipole interactions Žnitropropane model compound.. Similarly, Wold and Anderson w2x have found three main components. Their description is 93% only probably because they needed a factor explaining the interactions on squalane. Their interpretation is again similar: first factor is the polarity, the second depends almost solely on the solute and the third is related to interactions with hydroxylic groups in the solute.
43
Table 1 Component loading values Ži.e., correlation coefficients between the old and new variables, polarity indicators and abstract factors, respectively. for the first three principal components, unrotated case Župper part. and using normalized varimax rotation Žlower part. Variable
Factor 1
Factor loadings DC MR xb xn xd KC RP yb
0.948517) 0.268866 0.968198) 0.123877 y0.103258 0.985132) y0.437261 y0.557189 0.513583 y0.738848) 0.982044) 0.048803 0.967500) 0.128806 y0.331894 0.933920)
Explained variance Proportion
4.313341 0.539168
Factor 2
2.805656 0.350707
Factor 3 y0.127913 y0.201804 0.135037 y0.705624) 0.435891 y0.049055 y0.201490 y0.124637 0.821766 0.102721
Rotated factor loadings DC 0.972862) y0.017733 0.203863 MR 0.988517) 0.094737 0.065649 xb 0.036789 y0.874809) 0.482430 xn y0.286578 0.151117 y0.945832) xd 0.211424 0.953145) 0.215592 KC 0.940323) 0.216031 0.195763 RP 0.988645) 0.090192 0.087700 yb y0.104478 y0.977157) 0.179287 Explained variance Proportion
3.924350 0.490544
2.715541 0.339443
1.300871 0.162609
The most informative plot ŽFig. 1. identifies a small cluster. McReynolds polarity, Kovats’ ´ coefficients, the retention polarity and Castello’s DC values carry very similar information. Any of them can be used for the characterization of polarity. It should be emphasized that this is an uncommon result. To define K c alkanes are needed only but MR uses retention data of five different solutes Žbenzene, n-butanol, 2-pentanone, nitropropane and pyridine.. Still, the two polarity indicator have close resemblance. Moreover, the McReynolds- and the retention polarity are identical, better to say not linearly independent. Although this follows from their definition, this must be emphasized here because the two quantities are used separately and independently. The two polarity scales consist of very different numbers hence they attained individual careers. The role of the two
44 K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´
Fig. 1. Plot of component loadings in all the three dimensions ŽFig. 1a, b, c: unrotated; d: with varimax rotation, normalized..
K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´ Fig. 2. Plot of component scores in all three dimensions ŽFig. 2a, b, c: unrotated; d: with varimax rotation, normalized.. The numbers correspond to the stationary phases: Ž1. squalane; Ž2. Apiezon L; Ž3. SE-52; Ž4. SE-30; Ž5. OV-101; Ž6. DC-200; Ž7. DC-550; Ž8. di-2-ethylhexyl sebacate; Ž9. dioctyl sebacate; Ž10. diisodecyl phthalate; Ž11. didecyl phthalate; Ž12. dioctyl phthalate; Ž13. di-2-ethylhexyl adipate; Ž14. QF-1; Ž15. Castorwax; Ž16. Hallcomid M-18; Ž17. Pluronic L-81; Ž18. tricresyl phosphate; Ž19. neopentyl glycol adipate; Ž20. Pluronic P 85; Ž21. Pluronic P 65; Ž22. neopentyl glycol succinate; Ž23. Pluronic F 68; Ž24. Carbowax 20 M; Ž25. Pluronic F 88; Ž26. Carbovax 6000; Ž27. ethylene glycol adipate; Ž28. diethylene glycol adipate; Ž29. Carbowax 1000; Ž30. diethylene glycol succinate. 45
46
K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´
identical indicators in this work is rather an illustration of the correctness of calculations Žproper data transfer, error-free analysis. than a necessarily new and striking conclusion. In all three dimensions the observed pattern remains unchanged. Not only the small polarity cluster can be observed ŽFig. 1a,b,c. but a further fact that the points for x b and y b are close to each other. These indicators carry basically the same information, hence the use of y b is not necessarily. Varimax rotation of the principal component axes left the pattern unchanged. With this data manipula-
tion one can increase the variance explained by the third factor only at the expense of the vitiation of other factors. The similarities for stationary phases can be seen from score plots ŽFig. 2.. Factor 1 puts the stationary phases into series of increasing polarity. It should be mentioned that the pattern in scores is very similar to the pattern observed by McCloskey and Hawkes w3x, although the number of liquid phases in this study was appreciably smaller. That is, the location of the phases on a polarity scale is more or less uniform. This confirms
Fig. 3. Plot of variables characterizing the polarity at the best ŽMR, K c , DC. vs. rank number of stationary phases.
K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´
47
Table 2 Comparison of linear and exponential fits DC
MR Linear SSQ R Explained variance in %
1.819 = 10 0.9616 92.463
Exponential 6
1.672 = 10 0.9648 93.075
Kc
Linear 6
5.701 0.9779 95.623
Exponential 3.156 0.9878 97.576
Linear 8.155 = 10 0.8907 79.336
Exponential 4
5.097 = 10 4 0.9332 87.085
SSQ s sum of squared residuals. R s correlation coefficient. Explained variances R 2 s 1 y Žresidual sum of squaresrtotal sum of squares..
the validity of conclusions. Moreover, the outlying phases can be established: Ži. from Fig. 2a Apiezon L Ž2., QF-1 Ž14. and in the hydrogen donating and accepting ability very different Hallcomid M-18 Ž16. are outliers; Žii. from Fig. 2b Apiezon L Ž2., Castorwax Ž15. and in the dipole interactions very diverse QF-1 Ž14. are outliers. Žiii. From Fig. 2c the same outliers can be detected, the most significant deviations are: Ž2. – Ž16. and Ž14. – Ž15.. According to polarity one more phase can be considered as an outlier: diethylene glycol succinate Ž30.. The most polar phase Ž30. does not diverge considerably from the others from the point of view of hydrogen donating and accepting ability and dipole interactions. Such plots can help in choosing stationary phases for a given separation task. If a separation cannot be solved on a given column, it is expedient to try it with another phase, the point of which Žon Fig. 2. is relatively far away from the point of the former phase. In all three dimensions varimax rotation of the principal component axes left the pattern unchanged as illustrated by Fig. 2d. As it was mentioned in Section 1, there is no ‘true’ value for polarity. Therefore no wonder if there is no unambiguous sequence among the phases according to none of the polarity indicators even if they are highly correlated. If the phases are ordered by increasing polarity according to Castello et al. w25x, the polarity descriptors by M cReynolds and Kovats ´ will not be monotonously increasing but rather jagged ŽFig. 3a,b.. Almost the same zigzag pattern is seen if the MR and K c are plotted against rank number of increas-
ing polarity of Castello et al. w25x. The three polarity scales Žthe small cluster in Fig. 1. are so similar that they can be used interchangeably. Still the MR and K c are more ‘equal’ than DC. Apparently, Castello’s DC damps the individual Žspecial. differences which exist in polarity according to MR and K c . Moreover, if any of the three polarity indicators is fitted vs. the rank numbers mentioned, an exponential function provides a better, even though eventually slightly better, description ŽTable 2.. There is no reason to assume ordering the phases according to any of the polarity scales must be linear. This behavior is not unique, inherent nonlinearity for retention plots could be found earlier w27,28x. Which dependent variable should be used to discover this nonlinearity is not a simple question. No doubt that a variable can be chosen so that a linear dependence is observed, eventually by a proper transformation of independent variables. The results involve some far-reaching conclusions. For example there is no use to define other polarity scales by analogy to the scales investigated here. Even these are highly redundant.
4. Conclusions PCA is able to classify the polarity scales and stationary phases. The plots of component loadings and component scores showed significant groupings of variables describing the polarity and the stationary phases, respectively. The eight polarity indicators are redundant, the total variance in the data can be explained almost completely Ž; 99%. with three variables.
48
K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´
There is no polarity scale which characterizes the stationary phases well enough on its own, thus at least two polarity indicators Žsay: polarity, selectivity. are needed. Three, however, are definitely enough, i.e., there is no use to apply more than three. The polarity indicators can be divided into two groups: Ži. McReynolds polarity Žthe sum of the first five McReynolds constants., Kovats’ ´ coefficients and Castello’s DC values carry very similar information; Žii. Snyder’s selectivity parameters differ considerably from each other and from the other polarity indicators. The physical meanings attributable to the abstract factors are as follows: Factor 1, polarity Žaccording to McReynolds, Kovats ´ and Castello.; Factor 2, hydrogen accepting and donating ability; Factor 3, dipole interactions. The resultsrconclusions do not vary by the rotation of axes. There is no unique sequence among the stationary phases according to either of the polarity indicators. Some of the phases are unique Žoutlier phases. such as Apiezon L, QF-1, diethylene glycol succinate, Hallcomid M-18. The usage of Castello’s DC is justified, although it disregards the individual Žspecial. differences a little, whereas the same of retention polarity and y b is to be avoided. Sequential analyses showed that the polarity represents a nonlinear function among the phases.
Acknowledgements This work was supported by the Hungarian Science Foundation ŽOTKA. under project number T 016231. Dr. Judit Jakus’s kind help in correcting the manuscript is also acknowledged.
w4x
w5 x
w6x
w7x
w8x
w9x
w10x
w11x
w12x
w13x
w14x
w15 x
References w1x E.R. Malinowski, Factor Analysis in Chemistry, 2nd edn., Chap. 9, Chromatography, Wiley-Interscience, New York, 1991, pp. 266–291 and references therein . w2x S. Wold, K. Andersson, Major components influencing retention indices in gas chromatography, Journal of Chromatography 80 Ž1973. 43–59. w3x D.H. McCloskey, S.J. Hawkes, Choosing standard stationary
w16x
w17x
phases for gas chromatography, Journal of Chromatographic Science 13 Ž1975. 1–5. M. Chastrette, Factor analysis of solute-stationary phase interactions in gas–liquid chromatography, Journal of Chromatographic Science 14 Ž1976. 357–359. R. Fellous, L. Lizzani-Cuvelier, R. Luft, D. Lafaye de Micheaux, Data analysis in gas–liquid chromatography of benzene derivatives, Analytica Chimica Acta 154 Ž1983. 191–201. Z. Juvancz, T. Cserhati, ´ K.E. Markides, J.S. Bradshaw, M.L. Lee, Characterization of some new polysiloxane stationary phases by principal component analysis, Chromatographia 38 Ž1994. 227–231. D.L. Massart, P. Lenders, M. Lauwereys, The selection of preferred liquid phases after classification by numerical taxonomy techniques, Journal of Chromatographic Science 12 Ž1974. 617–625. S. Wold, Analysis of similarities and dissimilarities between chromatographic liquid phases by means of pattern cognition, Journal of Chromatographic Science 13 Ž1975. 525–532. J.F.K. Huber, G. Reich, Characterization and selection of stationary phases for gas liquid chromatography by pattern recognition methods, Journal of Chromatography 294 Ž1984. 15–29. J.A. Garcia-Dominguez, J. Garcia-Munoz, V. Menendez, M.J. ´ Molera, J.M. Santiuste, Method for the classification and selection of stationary phases in gas chromatography, Journal of Chromatography 393 Ž1987. 209–219. M.H. Abraham, G.S. Whiting, R.M. Doherty, W.J. Shuley, Hydrogen bonding: XV. A new characterization of the McReynolds 77-stationary phase set, Journal of Chromatography 518 Ž1990. 329–348. B. Walczak, M. Dreux, J.R. Chretien, R. Szymoniak, M. ´ Lafosse, L. Morin-Allory, J.P. Doucet, Factor analysis and experimental design in high-performance liquid chromatography: I. Trends in selectivity of 53 chalcones in reversed-phase high-performance liquid chromatography on alkyl- or phenyl-bonded stationary phases, Journal of Chromatography 353 Ž1986. 109–121. T. Cserhati, ´ Use of principal component and cluster analysis for the Comparison of reversed-phase HPLC columns, Analytical Letters 27 Ž1994. 2615–2637. E. Forgacs, ´ Comparison of reversed-phase chromatographic systems with principal component and cluster analysis, Analytica Chimica Acta 296 Ž1994. 235–241. C.E. Reese, L. Huang, S.-H. Hsu, S. Tripathy, C.H. Lochmuller, Universal retention indices and their prediction ¨ in reversed-phase liquid chromatography based on principal component analysis and target testing, Journal of Chromatographic Science 34 Ž1996. 101–110. M. Turowski, R. Kaliszan, C. Lullmann, H.G. Genieser, B. ¨ Jastorff, New stationary phases for the high-performance liquid chromatographic separation of nucleosides and cyclic nucleosides synthesis and chemometric analysis of retention data, Journal of Chromatography A 728 Ž1996. 201–211. P. Hindmarch, K. Kavianpour, R.G. Brereton, Evaluation of parallel factor analysis for the resolution of kinetic data by
K. Hebergerr Chemometrics and Intelligent Laboratory Systems 47 (1999) 41–49 ´
w18x w19x
w20x
w21x
w22x
diode-array high-performance liquid chromatography, Analyst 122 Ž1997. 871–877. W.O. McReynolds, Characterization of some liquid phases, Journal of Chromatographic Science 8 Ž1970. 685–691. S.R. Lowry, H.B. Woodruff, T.L. Isenhour, Comparing measures of polarity, Journal of Chromatographic Science 14 Ž1976. 129–131. E. Fernandez-Sanchez, A. Fernandez-Torres, J.A. Garcia´ ´ ´ Dominguez, J.M. Santiuste, Kovats’ ´ coefficients for predicting polarities in silicone stationary phases and their mixtures, Chromatographia 31 Ž1991. 75–79. ´ Kiss, G. Kocsis, S. Meszaros, G. Tarjan, J.M. Takacs, ´ A. ´ ´ ´ General contribution to the theory of retention index systems in gas–liquid chromatography, III. Contribution to the polarity of gas chromatographic stationary phases expressed by retention indices, Journal of Chromatography 119 Ž1976. 327– 332. L.R. Snyder, Classification of the solvent properties of common liquids, Journal of Chromatography 92 Ž1974. 223–230.
49
w23x L.R. Snyder, Classification of the solvent properties of common liquids, Journal of Chromatographic Science 16 Ž1978. 223–234. w24x M.S. Klee, M.A. Kaiser, K.B. Laughlin, Systematic approach to stationary phase selection in gas chromatography, Journal of Chromatography 279 Ž1983. 681–688. w25x G. Castello, G. D’Amato, S. Vezzani, Evaluation of the polarity of packed and capillary columns by different classification Methods, Journal of Chromatography 646 Ž1993. 361– 368. w26x H.H. Harman, Modern Factor Analysis, 3rd edn., Revised, University of Chicago Press, Chicago, 1967. w27 x K. Heberger, Empirical correlations between gas-chro´ matoghraphic retention data and physical or topological properties of solute molecules, Analytica Chimica Acta 223 Ž1989. 161–174. w28x K. Heberger, Discrimination between linear and non-linear ´ models describing retention data of alkylbenzenes in gaschromatography, Chromatographia 29 Ž1990. 375–384.