Journal of Chromatography A, 1216 (2009) 6224–6235
Contents lists available at ScienceDirect
Journal of Chromatography A journal homepage: www.elsevier.com/locate/chroma
Quantitative structure enantioselective retention relationship for high-performance liquid chromatography chiral separation of 1-phenylethanol derivatives Maciej Szaleniec a,∗ , Agnieszka Dudzik a , Marzena Pawul a , Bartłomiej Kozik b a b
Instytut of Catalysis and Surface Chemistry Polish Academy of Science, Niezapominajek 8, 30-239 Kraków, Poland Department of Organic Chemistry, Jagiellonian University, Ingardena 3, 30-060 Kraków, Poland
a r t i c l e
i n f o
Article history: Received 27 January 2009 Received in revised form 28 June 2009 Accepted 1 July 2009 Available online 7 July 2009 Keywords: QSRR QSERR DFT Chiral chromatography Ethylbenzene dehydrogenase Genetic algorithm
a b s t r a c t The Quantitative Structure Retention Relationship (QSRR) modeling techniques are employed for prediction of retention behavior of chiral secondary alkylaromatic and alkylheterocyclic alcohols, derivatives of 1-phenylethanol, separated on Chiracel OB-H column. Genetic algorithms and neural networks are used to obtain models predicting Retention Order Index (ROI) (R2 = 0.99), selectivity ROI log ˛ (R2 = 0.93) as well as retention factors (log k) for two types of mobile phases (90/10 and 85/15 n-hexane/isopropanol—R2 = 0.97 and 0.95). Additionally, a model that predicts log k for both mobile phase in function of i-PrOH concentration is developed (R2 = 0.97). HOMO energy turns out to be the most important parameter in description of log k while mixed steric-electrostatic interactions with chiral OH group and furan ring are responsible for the chiral recognition. The models are used to assess the stereoselectivity of ethylbenzene dehydrogenase (EBDH), which catalyzes stereospecific syntheses of the investigated compounds. The high stereoselectivity of the enzyme is confirmed but reversion of EBDH enantioselectivity is predicted to take place in the biosynthesis of 1-[1,1 -biphenyl]-4-ylethanol. © 2009 Elsevier B.V. All rights reserved.
1. Introduction The identification of chiral isomers of biologically active compounds is a crucial issue in modern pharmacology, toxicology and fine chemical sciences. The standard approach to identification of the optical isomer present in the sample is to perform separations on chiral stationary phase (CSP) with the help of chiral standards. Such an analysis usually provides unequivocal answer to the question which isomer is present in the reaction mixture. However, in practice the analyst is very often confronted with the lack of appropriate standard compounds. In such a case one can resort to the modeling of retention behavior. Having knowledge from the chromatographic experiments with a group of standards it is possible to predict retention of various new compounds. Well established Quantitative Structure Retention Relationship (QSRR) model can be easily used for assessing the retention of in silico prepared analytes which not only provides valuable information but can save a lot of effort and resources otherwise spent on custom synthesis and method development. In case of non-chiral chromatographic separations one can resort to the QSRR approach [1–4]. Fortunately, chemical interactions
∗ Corresponding author. Tel.: +48 126395155. E-mail address:
[email protected] (M. Szaleniec). 0021-9673/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.chroma.2009.07.002
in most commonly used reversed-phase systems are fairly well understood. Many retention prediction models were constructed utilizing various sets of chemical and topologic descriptors, such as the classical Linear Solvatation Energy Relationship (LSER) theory developed by Taft and co-workers [5,6]. In these approaches one can usually use 2D descriptors that do not discriminate the spatial configuration of the molecule. Investigation of chiral separations is much more complicated than for achiral systems as it requires parameters that can address stereoisomerism of investigated compounds. Nevertheless, LSER theory was used to describe the separation factor (˛) and explain, based on combined achiral and chiral chromatographic results, which types of interactions are involved in the chiral recognition mechanism [7,8]. Moreover, the chiral retention factor was also correlated with non-chiral chromatography-derived lipophilicity paramenters by Roussel et al. [9,10]. Usually, however, description of chiral separations on the basis of QSRR theory requires methodologies specially developed for such systems. This is due to the fact that between analyte and CSP both non-enantiospecific (named by Guiochon as non-selective interactions with type-I sites) and chiral (with type-II sites) interactions occur [11,12]. Based on the studies of chiral separation of alkylaromatic carboxylic acids, Wainer et al. suggested that the chiral recognition proceeds via two-step mechanism. The first step involves the distribution of the solute from the mobile phase to
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
CSP and is responsible for the general retention process. When the solute-CSP complex is formed, the chiral determination occurs. Due to the spatial specific interactions of the analyte, the two isomers vary slightly in the strength of interaction because of different configuration in the chiral center [13]. The chiral recognition can be explained basing on the basis of Pirkle and House [14] standard ‘three-point’ interaction model (also called geometric phenomenon) or postulated by Wainer et al., conformationally driven chiral recognition process (molecular phenomenon). The latter one assumes that both solute and CSP conformationally adjust to each other and the difference in the conformational change between isomers results in chiral recognition. Consequently, in QSERR (quantitative structure enantioselectivity retention relationships) studies one can address either variation in the non-enantiospecific strength of solute-CSP interactions due to differences in the chemical structure and properties (such as polarizability, number of hydrogen atoms, etc.) which increases the overall retention of both isomers or the forces which lead to chiral discrimination. In this paper that issue was addressed by separate modeling of a solely chiral separation (i.e. the retention order together with the selectivity) as well as a modeling of retention factor (log k) which describes both chiral and non-chiral solute-CSP interactions. Various methodological approaches were developed to describe 3D interactions of solute with chiral selector [15]. One can resort to 3D quantitative structure activity relationship (QSAR) methods such as CoMFA (Comparative Molecular Field Analysis) [16]. Indeed, this methodology was successfully applied in the modeling of many chiral systems [17–20] and is used in that paper. However, different approaches were also developed such as adaptation of 2D connectivity indexes or normal mode eigenvalues based EVA descriptors to correlate solutes chirality with either pharmacological activity [21] or retention and selectivity parameters [20]. Another successful alternative was proposed by Aires-de Sousa and Gasteiger, i.e. conformation-independent and conformation-dependent chirality codes (CICC and CDCC, respectively), which together with classification trees or Kohonen neural newtorks were successfully applied to prediction of the retention order [22–26]. Another complementary to CoMFA approach is an enantiophore concept [27,28], based on pharmacophore idea used in drug development. The enantiophore supplies useful and simple method for database search (as potential solutes can be easily fitted to obtain 3or 4-point enantiophore) and provide insight into chiral recognition mechanism in a similar manner as drug–receptor interactions are rationalized. The concept of enantiophore and CICC was together recently applied by Del Rio and Gasteiger to predict retention order of molecules separated on Whelk-O1 CSP [25]. Finally, the issue of solute-CSP interactions was also treated with direct ab initio, molecular mechanics and molecular dynamics calculations [29,30]. These approaches described in more details in the excellent review of Lipkowitz [30] provide valuable insight into the mechanism of enantiodiscrimination or solvent effects and are able to address both selectivity and to some extent a retention order. For well-defined CSP-solute contacts (such as Whelk-O1) they can provide very precise and quantitative information on the chiral recognition mechanism [29]. However, the calculations for polymer CSPs (like Chiralcel OB), due to the complexity of the models, still lack the speed and robustness of QSRR techniques [15]. Particularly interesting to this work are the studies of Del Rio et al. [28] which describes enantiophore development for Chiracel OB stationary phase in 90:10 n-hexane/isopropanol. According to the authors, Chiracel OB seems to be exceptional among modified cellulose based CSPs as it exhibits well-defined enantioselective mechanism, unlike other polymeric CSPs that have multiple binding sites. It is proposed that Chiracel OB enanthiophore is comprised
6225
of triangular arrangement of three points: H-bond donor, H-bond acceptor and lipophile or aromatic moiety. The studied problem of prediction of retention factors, selectivity and elution order of particular optical isomer on cellulose tribenzoate CSP is addressed based on the example of 1phenylethanol derivatives. This issue had been already investigated by Wainer et al. over 20 years ago [13]. That study attributed chiral recognition to the formation of diastereomeric solute-CSP complexes through hydrogen bond interactions between the solute’s alcoholic hydrogen atom and a carbonyl oxygen atom of an ester group on CSP and to stabilization of the aromatic ring in the chiral ravine of the cellulose. The paper also proved the importance of phenyl substituents in Chiracel OB CSP without which the stationary phase lost most of its chiral recognition capabilities. This indirectly suggests the importance of CSP-phenyl–aryl-solute interactions. Therefore, the results of Wainer et al. research suggests that only two out of three points of Del Rio et al. Chiracel OB enantiophore are crucial for chiral recognition of 1-phenylethanol derivatives —aromatic moiety and H-donor hydroxyl group. Such discrepancy might be the result of different training sets as Del Rio et al. used structurally diverse 52-compound-set while in the paper of Wainer et al. and also in our present report only congeners of 1-phenylethol were used (18 and 10 compounds, respectively). In this paper, the problem is approached from the practical point of view, where a number of chiral standards are available and determination of elution order in analyzed racemic mixtures is the objective. Seventeen alcohol racemates were analyzed with HPLC method on Chiracel OB-H column but only for ten compounds the chiral standards were available (see Fig. 1). In order to identify the elution order for the rest of the enantiomers (Fig. 2), the theoretical QSERR models were constructed. The obtained results were applied in the examination of practical analytical problem, i.e. determination of chiral stereospecificity of ethylbenzene dehydrogenase [31], the enzyme that catalyzes stereospecific synthesis of 1-phenylethanol derivatives. Due to the relatively low structural diversity of examined compounds, the authors do not challenge the issue of the universal determination of interactions that are responsible for chiral recognition of secondary alcohols in Chiracel OB-H CSP. The applied approach relies on 3D alignment of solutes carbon skeleton and as such is not suitable for analysis of compound with very different molecular structure (especially without aromatic ring which defines the superposition plane). Nevertheless, the statistically significant relations are pointed out. The collected experimental data
Fig. 1. Training compounds (1–10) for which both isomer standards were available.
6226
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
Fig. 2. Validation compounds (11–15) for which only racemate standards were available and enzyme synthesis of one of the isomer was conducted and compounds (16–17) for which only separation of racemic mixture was conducted or synthesis by EBDH was ambiguous.
seem to be sufficient to describe chromatographic interactions of Chiracel OB-H stationary phase with alkylaromatic and alkylheterocyclic secondary alcohols. Therefore, obtained models are not only valuable as practical prediction tools but also provide an insight into the nature of the solute-stationary phase interactions for this particular group of compounds (i.e. alkylaromatic and alkylheterocyclic secondary alcohols).
2. Experimental 2.1. Data set up The available set comprised of retention factors of seventeen racemic mixtures of alkylaromatic and alkylheterocyclic secondary alcohols. In this group for ten compounds (Fig. 1, Group I) the retention order was determined with chiral standards (Table 1). In each case, with the exception of 1-(2-furyl)ethanol (7), the first eluted fraction was of S conformation. For the rest of compounds (Group II) only retention factors of both isomers were available (Fig. 2, Table 2). Therefore, it was decided that theoretical QSRR model should be trained for the Group I, while the compound from Group II should be treated as an evaluation set.
With the exception of 8 and 16, the EBDH was used as a catalyst in the synthesis of secondary alkylaromatic alcohols. For all compounds of the Group I the S isomer was obtained as a predominant product. For Group II, EBDH-synthesized alcohols were eluted as a first fraction in all cases with the exception of 1-[1,1 biphenyl]-4-ylethanol (17). This suggested that the first fractions might be of S conformation per analogy to the above-mentioned analytical results for EBDH reaction mixtures of Group I. Therefore, these results were treated as additional premises for ad hoc assignment of the earlier elution times to the S isomers (and their field point descriptors) of the Group II. This allowed calculation of correlation factors between experimental and predicted values. However, as for 16 and 17 premises from enzymatic test were confusing or unavailable, it was decided that these compounds should not be used in validation of developed models. It means that during a model development the discrepancies between predicted and experimental retention parameters for 16 and 17 were not treated as a warning signal, due to the higher probability of inversion of the elution order. 2.2. Organic syntheses The chiral standards of 1, 2, 7–10 were commercially available (Aldrich, Fluka, Alfa Aesar, purity at least 95%—full list of standards is available in Supplementary Materials). The racemic compounds 3–6, as well as their enantiopure stereoisomers, were prepared by the reductions of the appropriate carbonyl compounds. 4-Hydroxy-substituted propiophenone and acetophenone were reduced to corresponding racemic alcohols 3 and 4 with lithium aluminium hydride, according to the published procedure [32]. Hydrogenations of 3- and 2-hydroxyacetophenone were carried out under milder conditions, with sodium borohydride (2.2 equiv) in anhydrous ethanol at 0 ◦ C, to give racemates 5 and 6, respectively. Chiral standards (S)-5 and (R)-6, were obtained from appropriate hydroxyacetophenones by enantioselective reductions with (−)-ˇ-chlorodiisopinocampheylboran [(−)-DIP-ChlorideTM ]. In the case of (R)-6, we strictly followed the procedure described by Ramachandran et al. [33] and (S)-5 was synthesized in accordance with method reported by Everhart and Craig [34]. Unexpected difficulties during syntheses of the optically active 1-(4-hydroxyphenyl)alkanols 3 and 4 encouraged us to modify existing methods. We worked out a three-step synthesis involving
Table 1 The retention data for Group I compounds (training and validation groups) with identification of each isomer. Log k for both mobile phases (90:10 and 85:15 n-hexane to isopropanol); ROI – Retention Order Index – −1 for first fraction, 1 for second fraction; ROI log ˛ − log of selectivity ˛ multiplied by ROI. Number
(S)-1 (R)-1 (S)-2 (R)-2 (S)-3 (R)-3 (S)-4 (R)-4 (S)-5 (R)-5 (S)-6 (R)-6 (S)-7 (R)-7 (S)-8 (R)-8 (S)-9 (R)-9 (S)-10 (R)-10
Compound name
(S)-1-phenylethanol (R)-1-phenylethanol (S)-1-phenyl-1-propanol (R)-1-phenyl-1-propanol (S)-1-(4-hydroxyphenyl)-1-propanol (R)-1-(4-hydroxyphenyl)-1-propanol (S)-1-(4-hydroxyphenyl)ethanol (R)-1-(4-hydroxyphenyl)ethanol (S)-1-(3-hydroxyphenyl)ethanol (R)-1-(3-hydroxyphenyl)ethanol (S)-1-(2-hydroxyphenyl)ethanol (R)-1-(2-hydroxyphenyl)ethanol (S)-1-(4-fluorophenyl)ethanol (R)-1-(4-fluorophenyl)ethanol (S)-1-(4-bromophenyl)ethanol (R)-1-(4-bomophenyl)ethanol (S)-1-(2-furyl)ethanol (R)-1-(2-furyl)ethanol (S)-1-(2-thienyl)ethanol (R)-1-(2-thienyl)ethanol
Log k
ROI
90:10
85:15
0.761 0.986 0.750 0.600 1.155 1.208 1.331 1.403 1.161 1.386 0.989 1.210 0.687 0.752 0.717 0.828 0.956 0.860 1.010 1.096
0.626 0.860 0.624 0.465 0.867 0.921 1.032 1.105 0.893 1.105 0.778 1.011 0.518 0.585 0.544 0.661 0.809 0.705 0.863 0.943
−1 1 −1 1 −1 1 −1 1 −1 1 −1 1 −1 1 −1 1 1 −1 −1 1
ROI log ˛ 90:10
85:15
−0.225 0.225 −0.149 0.149 −0.054 0.054 −0.072 0.072 −0.224 0.224 −0.221 0.221 −0.065 0.065 −0.111 0.111 0.096 −0.096 −0.086 0.086
−0.234 0.234 −0.159 0.159 −0.054 0.054 −0.073 0.073 −0.211 0.211 −0.232 0.232 −0.067 0.067 −0.117 0.117 0.104 −0.104 −0.079 0.079
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
6227
Table 2 The retention data for racemate mixtures of evaluation group; data on compounds from external test group were not supported with enzyme tests. ROI
ROI log ˛ 90:10
85:15
0.669 0.808 0.433 0.749 1.127 1.299 1.704 1.776 0.923 1.008
−1 1 −1 1 −1 1 −1 1 −1 1
−0.129 0.129 −0.307 0.307 −0.168 0.168 −0.070 0.070 −0.080 0.080
−0.139 0.139 −0.316 0.316 −0.171 0.171 −0.072 0.072 −0.085 0.085
0.508 0.609 0.935a 0.944a
−1 1 −1 1
−0.096 0.096 −0.013 0.013
−0.101 0.101 −0.009a 0.009a
Number
Compound name
90:10
85:15
11 11 12 12 13 13 14 14 15 15
1-(4-Methylphenyl)ethanol 1-(4-Methylphenyl)ethanol 1-(2-Methylphenyl)ethanol 1-(2-Methylphenyl)ethanol 1-(4-Methoxyphenyl)ethanol 1-(4-Methoxyphenyl)ethanol 1-(4-Aminophenyl)ethanol 1-(4-Aminophenyl)ethanol 1-(2-Naphthyl)ethanol 1-(2-Naphthyl)ethanol
0.809 0.938 0.586 0.893 1.298 1.465 1.956 2.026 1.073 1.154
External test compound 16 16 17 17
1-(4-Chlorophenyl)ethanol 1-(4-Chlorophenyl)ethanol 1-[1,1 -Biphenyl]-4-ylethanol 1-[1,1 -Biphenyl]-4-ylethanol
0.679 0.775 1.093 1.106
a
Log k
1-[1,1 -Biphenyl]-4-ylethanol (17) in 85:15 mobile phase not resolved. The retention parameters were estimated from the shoulder-type peak distortion.
simple, asymmetric, enzymatic reduction of the protected phenolic alkylaromatic ketones, that finally allowed us to synthesize missing chiral standards (S)-3 and (S)-4, with satisfactory optical purity. The above-mentioned procedure is still examined and will be published elsewhere. Enantiomeric excesses of the compounds (S)-3 (80% ee), (S)-4 (65% ee), (S)-5 (95% ee) and (R)-6 (89% ee) were determined by chiral HPLC analysis. The absolute configurations of the optically active alcohols 3–5 with protected phenolic groups, were assigned from 1 H NMR spectra of the appropriate MTPA (␣-methoxy-␣trifluoromethylphenylacetic acid) esters. The R configuration of the compound 6 was taken as previously evaluated [33]. 2.3. Enzyme synthesis The ethylbenzene dehydrogenase is the enzyme which natively catalyzes the oxidation of ethylbenzene to (S)-1-phenylethanol with 100% stereoselectivity [35]. It was recently shown that it exhibits very high stereoselectivity with other alkylaromatic compounds [36]. The oxidation products of alkylaromatic and alkylheterocyclyc compounds were identified by means of GCMS and LC-MS as secondary alcohols and were shown to be either in predominantly S configuration (by comparison with available chiral standards) or to be the earlier eluting peak in comparison with elution order of the racemate. The enzyme occurred to be highly S-stereospecific producing in most cases solely S isomers with the exception of 1-(4-hydroxyphenyl)1-propanol (3), 1-(4-hydroxyphenyl)ethanol (4) (Group I) and 1-(4-aminophenyl)ethanol (14) (Group II) where both isomers were present, but the earlier form (S for phenols) was always the predominant one. Only in the case of the last compound, 1-[1,1 -biphenyl]-4-ylethanol (17), the enzyme produced mixture of both enantiomers with the latter fraction being predominant [36]. The enzyme occurred to be inactive in synthesis of 1(4-bromophenyl)ethanol (8) and 1-(4-chlorophenyl)ethanol (16). Therefore, these results can be used as weak premises determining which of the two peaks in separated racemic mixture is of the S configuration. 2.4. HPLC separation The chiral HPLC separations were performed on Agilent 1100 System with DAD detector in normal-phase system on cellulose tribenzoate polysaccharide chiral stationary phase (Daicel Chiracel OB-H column, 250 mm × 4.6 mm). Two types of uniform isocratic
programs were used with n-hexane/isopropanol 90:10 and 85:15 (v/v) mobile phase in 25 ◦ C at 0.5 ml/min flow rate with 205 nm as the selected detection wavelength. The temperature of the separations was controlled by thermostated column compartment and air-conditioning of the laboratory (ensuring solvent temperature close to 25 ◦ C). Each sample concentration was close to 1 mg/ml (depending on the purity of the standard). Most of the samples were dissolved in pure n-hexane (1, 2, 7–13, 15, 16). Some (3–6, 17), were dissolved in a small quantity of i-PrOH initially and then diluted with n-hexane to obtain final solvent composition of 90/10 n-hexane/isopropanol. Finally, the 1-(4-aminophenyl)ethanol (14) was dissolved in a pure i-PrOH due to its small solubility in hydrophobic solvents. In order to avoid changes in sample concentrations during the analysis due to high volatility of n-hexane the autosampler was thermostated to 4 ◦ C. Retention times were obtained as average of the two 1 l sample injections. The hold-up time t0 value equalled 6.43 min and was determined with 10 l air injections for both mobile phases. The reaction mixtures, where ethylbenzene dehydrogenase acted as a catalyst, were analyzed from isopropanol solutions according to the protocols optimized for each compound [36] adjusting the content of i-PrOH from 5% to 30% depending on the polarity of the studied compound. However, it was proven that the variation in isopropanol content does not influence the order of elution and therefore, the conclusions drawn from these separations could be applied to the analyses of the samples separated at standard (90:10 and 85:15 n-hexane/isopropanol) conditions.
2.5. DFT modeling Initial geometries of all studied compounds were built in Cache Pro programme [37] and optimized with semi-empirical PM3 method [38]. In order to obtain reliable lowest energy conformer of studied compound the Density Functional Theory (DFT) level quantum chemical modeling was performed with Gaussian 03 package [39]. The initial alcohols geometric structures were fully optimized firstly in a vacuum on B3LYP [40] level of theory with 6-31G(d,p) basis sets and then reoptimized in PCM [41] model solvent medium. As n-hexane model solvent is not implemented in the Gaussian 03 package the n-heptane was used as a fair approximation of the mobile phase. At the end of each optimization the vibration analyses were conducted to ensure that minimum geometry was obtained. The conformation analyses, scanning the dihedral angle between alkyl group and aromatic or
6228
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
elimination of internal co-linearity were used for development of neural models. 2.8. Neural networks
Fig. 3. The overlay of the structure of all investigated compounds.
heterocyclic ring, were performed for: 1-(4-hydroxyphenyl)ethanol (4), 1-(4-metoxyphenyl)ethanol (13), 1-(2-naphthyl)ethanol (15), 1-[1,1 -biphenyl)]-4-ylethanol (17), and all 2- and 3-substituted aromatic (5, 6, 12) as well as for both heterocyclic (9, 10) compounds. Moreover, the conformation of alcoholic group was examined in each compound. The separate optimizations were performed for each of the localized deep minima and the structure of the lowest energy was used in further analysis. 2.6. 2D QSRR descriptors In prediction of log k following descriptors, were used: generated by Cerius2 AlogP98, H-bond donor and H-bond acceptor number as well as HOMO and LUMO energies calculated on the DFT (PCM) level in Gaussian 03 package. 2.7. CoMFA All models were aligned with Material Studio 4.22 [42] using rigid-body, least-square, template fitting method (with heavy atom core of (S)-1-phenylethanol as a reference structure) (Fig. 3). Carbon atoms of the ring and first two of the alkyl groups were used as tethers. In case of 5-membered ring compounds all atoms were superimposed to the nearest five carbon atoms in such a way as to ensure the best overlay. Comparative Molecular Field Analysis was performed within Cerius2 molecular modeling package [43]. Partial atomic charges were computed by the Gasteiger algorithm [44]. The energies of steric and electrostatic interactions were calculated in the universal force field (UFF) [45] respectively with H+ , CH3 , CH3 + , CH3 − , and OH− probe molecules, in a rectangular grid of 270 points with 2 Å step size. For modeling of global retention behavior the logarithm of retention factor log k was used as dependent y-variable. As a measure of retention order the Retention Order Index (ROI) as well as ROI log ˛ (i.e. ROI log k2 /k1 ) were applied. ROI was defined as a classification descriptor which assumes −1 value for the first eluting fraction and +1 for the second one. As a result it provides simple recognition of retention order and enables analysis of the selectivity (log ˛) along with the elution succession. Field points included in the subsequent analysis were selected with genetic algorithms (500 equations population, 15,000 generations with non-fixed parameter number and 2–15 terms per equation limitation) usually based on the Group I compounds. The only exception was the modeling of ROI log ˛, where input variables had to be selected for the whole data set. Usually, from the population of parameters used in 500 equations, descriptors with frequency occurrence up to 4% in the equation population or alternatively variables from the top 10 models were selected and after
For construction of the artificial neural network (ANN) models, the commercially available software package Statistica Neural Networks 7.1 was applied [46]. The selection of input parameters was conducted with genetic algorithm (GA) in the Material Studio package. The regression ANNs were constructed for prediction of log k, ROI log ˛ and ROI based on 20 compounds from Group I that was further subdivided into training (16) and validation (4) sets. The training cases were used in optimization of neural weights and selection of input vector. The validation cases were used to stop training procedure when over-fitting occurred and to select the best model from the population of investigated networks. The models trained on such data were used to estimate retention parameters in the test set (Group II). The optimal network architecture was determined experimentally with Intelligent Problem Solver (ISP), which built and selected the best models from linear (LIN), multilayer perceptron with linear output neuron (MLP) as well as generalized regression neural networks (GRNN). The ISP optimized both input vector and number of neuron in the hidden layers (in case of MLP and GRNN). The architecture of the network is presented further on by the following syntax: input : no input − no hidden variable neurons layer neurons For example the ‘MLP 3:3-2-1:1’ means that multilayer perception with three input variables and input neurons, two neurons in the hidden layer and one output neuron encoding one output variable is used. The equations obtained for linear networks are presented as standardized multiple linear regression equation with ˇ coefficients in place of linear constants. ˇ coefficients provide relative importance of the variable in the model. 2.9. Cross-validation Leave-one-out (LOO) and 5-fold cross-validation of models were performed in Material Studio package. For linear models (models 2–5) LOO and 5-fold algorithm were used to obtain cross-validated R2 and PRESS parameters. The cross-validations were performed for MLR models (corresponding to the linear networks developed in Statistica package) over training set (16 compounds) thus providing direct comparison of robustness between the crossvalidated MLR models and the linear neural networks. For model 1 (MLP neural network) the LOO and 5-fold cross-validation was preformed over randomly partitioned (16:4 ratio) training and validation cases. The validation cases were involved in the MLP cross-validation as they influence the end of the network training and therefore, have some influence on the model parameters. After successful prediction of the retention order with model 1 those evaluation cases (Group II), for which additional premises existed (see Sections 2.1 and 2.3), were additionally treated as external validation group for models 2–4, i.e. the consistency of ROI prediction was checked. However, as for some compounds of Group II the additional information were unavailable compound 16 and 17 were excluded from this set. Models, which exhibited high discrepancies in ROI prediction with estimation of model 1 were discarded from further analysis.
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
3. Results and discussion 3.1. Chiral chromatography The retention data of all analyzed compounds are presented in Table 1 (for training and validation cases) and Table 2 (evaluation group—test cases). For most of cases from Group I the S isomers were eluted as the first ones. The only reversion of the elution order occurred in case of 1-(2-furyl)ethanol (9). In order to simplify prediction of the retention order the Retention Order Index is introduced. ROI assumes −1 value for the first eluting fraction of the racemic alcohol and +1 for stereoisomer eluting as a second one. In order to address the selectivity of the separations together with the issue of retention order the combined index ROI log ˛ is used. This parameter not only yields information on a magnitude of stereoisomers separation but also provides information on the retention order. As such it can be associated with a particular compound (while pure ˛ describes pair of isomers and cannot be used to develop QSERR with each stereoisomer). The log scale for selectivity is used in order to retain uniform scale with models predicting log k. 3.2. Prediction of elution order The simplest model that performs flawless classification of Group I compound into R and S isomers is the MLP 2:2-1-1:1 neural network (model 1). It utilizes two input parameters, CH3 /80 and CH3 + /170 field points, requires 1 neuron in the hidden layer and provides predicted ROI values at its output. It was trained with 100 epochs of back propagation, 20 epochs of conjunct gradient
6229
and 33 epochs of conjunct gradient algorithm with momentum. The training, validation and test errors are 7.6 × 10−4 , 1.2 × 10−4 and 1.2 × 10−3 , respectively. The model 1 for all S isomer in Group II predicts ROI value of −1 while correctly predicting inversion of retention order for 1-(2-furyl)ethanol (9) (see Table 3). Therefore, the overall R2 between predicted and experimental values is 0.99 (training R2 = 0.99, validation R2 = 1.00). The 5-fold crossvalidation yields slightly lower R2 values (R2 training = 0.70, R2 validation = 0.99) but all classifications are consistent with the model 1. The sensitivity analysis shows that the most important parameter is CH3 /80. This field point lies in the vicinity of alcoholic group in S isomers and attains high energy due to steric interaction with the hydroxyl. It also assumes different energies for 9 (positive in case of R isomer, while rest are negative and close to 0, positive in case of S isomer but smaller than the rest of compounds which exhibit maximum interaction energy). The other point is responsible for discrimination between 1-(2-furyl)ethanol (9) stereoisomers. The R isomer has smaller interaction energy with CH3 + /170 than the rest of compounds which allows encoding of inversion of elution order phenomenon. The spatial point distribution is presented on the Fig. S1 in the Supplementary Materials. The model predicts usual retention order for compounds 16 and 17 indicating, that their retention orders are classical (which was expected in case of 16 due to similar behavior of 7 and 8). Therefore, this result also suggests reversed stereoselectivity in the enzymatic synthesis of 17. This simple model shows that Chiracel OB-H discriminates between secondary alkylaromatic alcohols based on position of the chiral OH group with the exception of 9, where oxygen from
Table 3 Values of residuals obtained for all developed models. Set column informs if the case was used for model development (training), validation or evaluated (test) with the particular model. Number
Set
Model 1
Model 2–10%
Model 2–15%
Model 3–10%
Model 4–15%
(S)-1 (R)-1 (S)-2 (R)-2 (S)-3 (R)-3 (S)-4 (R)-4 (S)-5 (R)-5 (S)-6 (R)-6 (S)-7 (R)-7 (S)-8 (R)-8 (S)-9 (R)-9 (S)-10 (R)-10
Validation Training Training Validation Training Training Validation Training Training Training Training Training Training Validation Training Training Training Training Training Training
3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 −6E−03 −8E−05 3E−04 −1E−05
0.044 −0.054 −0.025 −0.032 0.016 0.013 0.033 −0.011 0.012 0.002 0.029 0.009 −0.010 0.013 −0.017 0.032 0.004 −0.012 0.000 0.015
0.050 −0.062 −0.019 −0.043 0.013 0.017 0.032 −0.008 −0.004 0.011 0.037 0.002 −0.008 0.014 −0.013 0.025 0.007 −0.014 −0.006 0.021
0.083 0.015 0.008 0.032 −0.036 0.046 −0.046 −0.022 0.051 −0.033 0.028 −0.043 0.012 0.060 −0.025 −0.023 −0.026 0.036 −0.024 0.036
0.098 −0.032 0.017 −0.002 −0.011 0.024 0.005 −0.035 0.064 −0.087 0.013 0.028 −0.037 −0.034 0.039 −0.011 −0.015 −0.011 0.011 0.044
11 11 12 12 13 13 14 14 15 15
Test Test Test Test Test Test Test Test Test Test
3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05 3E−04 −1E−05
−0.058 −0.011 0.132 0.050 −0.013 −0.055 −0.030 0.039 0.058 −0.011
−0.052 −0.021 0.138 0.058 −0.008 −0.056 −0.026 0.042 0.057 −0.014
0.117 0.102 0.065 −0.129 −0.049 0.013 0.034 0.075 0.081 0.106
0.181 0.111 −0.087 −0.052 −0.118 −0.126 −0.154 −0.162 0.016 0.053
16 16 17 17
Test Test Test Test
3E−04 −1E−05 3E−04 −1E−05
−0.019 −0.001 −0.045 0.069
−0.015 −0.005 −0.054 0.076
−0.062 −0.031 −0.078 0.047
−0.016 −0.035 −0.009 0.063
6230
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
heterocyclic ring is involved in the chiral recognition. For all other compounds the unfavorable interaction between S–OH and stationary phase leads to the decreased retention of S isomers. CoMFA models are a complimentary way to probe ligand–receptor interactions when compared to Chiracel OB enantiophore proposed by Del Rio et al. [28], i.e. both enantiophore and CoMFA models should describe corresponding molecular interactions. As all analyzed solutes possess a ring which was used as a superimpose template we can assume that the presence of an aromatic moiety is required by chiral site for all investigated 1-phenylethanol derivatives separated on Chiracel OB-H. Besides this, two points are involved in enatiodiscrimination: CH3 /80 which is involved in strong destabilization of CSP-S complexes of all but (S)-9 (while (R)-9 is slightly stabilized) and CH3 + /170 which is involved in destabilization of (R)-9. This difference between 9 and the rest of solutes also comes, beside sheer molecular dissimilarity, from the conformational arrangement imposed by the furan ring and hydroxyl group intramolecular interactions. This results in a different OH position in comparison to the rest of compounds (see Fig. 3) and consequently different interaction with the field points. Nevertheless, one can see that S–OH group in diastereomeric complex encounters steric hindrance. However, CH3 /80 highly (R > 0.9) correlates with acceptor/donor (OH− ) interactions localized around both R and S hydroxyl groups. Therefore, the complex-destabilizing steric effect of interaction with (S)–OH can be replaced by complimentary stabilizing H-bond acceptor interaction with (R)–OH. The similar classifying model can be achieved by replacing CH3 /80 with HO− /224. However, such a model is not able to describe correctly interactions of (S)-9 leading to incorrect description of reversion of elution order (i.e. both isomers receive negative ‘S-like’ code). Summing up, it can be said that enantiophore of the secondary alkylaromatic alcohols is indeed composed of three points but only two are involved in the interactions with particular stereoisomer (as observed by Wainer et al. [13]). In each case the aromatic ring is involved (see below) along with either stabilizing H-bond acceptor interaction with (R)–OH groups or destabilizing (steric) interaction with (S)–OH groups. One can also speculate that the complex of CSP with (R)-9 is somehow destabilized due to unfavorable interactions of H-bond acceptor (usually electronegative atom) with oxygen from furan ring which is reflected by CH3 + /170. Of course the third Del Rio et al. enantiophore point, H-bond donor, could not be observed with these studies, as analyzed solutes beside OH group do not have another H-bonding acceptor in the vicinity of the chiral center. Therefore, for 1-phenylethanol derivatives having additional H-bond acceptor substituent in the alkyl group, the three-point binding mode proposed by Del Rio et al. may indeed be seen. Apparently, selection of the training compounds in the QSRR modeling can, to some extent, influence the result of the research. 3.3. Prediction of selectivity (ROI log ˛) The modeling of ˛ selectivity was a natural next step in the modeling of chiral separation mechanism of 1-phenylethanol derivatives. As was already mentioned above, in order to model selectivity a combined descriptor is applied, i.e. ROI multiplied by log ˛. As a result log ˛ for the first eluting stereoisomer assumes negative value while for the later eluting fractions is positive.Unexpectedly, to select good input parameters, GA analysis had to be conducted over all cases (34 isomers) for 15,000 generations, with 500 equations (5–10 terms per equation) population. Resulting equations exhibited very high correlation and cross-validation R2 (in the range of 0.98). As a next step 8 input parameters of the best equation were transferred to ISP and used to select best architecture of neural model. Compounds were divided into training (16), validation (4) and test (14) groups. The best model obtained
Fig. 4. Correlation plot between the experimental and predicted ROI log ˛ for (a) 90/10 mobile phase model 2–10% (LIN 6:6–1:1 neural network (R2 = 0.93, R = 0.97) and (b)) 85/15 mobile phase model 2–15% (LIN 6:6–1:1 neural network (R2 = 0.93, R = 0.96). S-2-CH3 stands for (S)-1-(2-methylphenyl)ethanol (12) while (S)-4-Ph and (R)-4-Ph stand for (S)- and (R)-1-[1,1 -biphenyl]-4-ylethanol (17). Dashed line provides 95% confidence interval of the predicted dependent variable (see Supplementary Materials for more information).
is a linear network LIN 6:6–1:1 (model 2–10%), which has training, validation and test error equal to 0.0465, 0.0727 and 0.1185 respectively. Model 2–10% performs correct prediction of retention order in Group I. Moreover, predictions for compounds from Group II are consistent with the retention order calculated by model 1. The R2 between experimental and predicted values (Fig. 4a) is 0.93 (training R2 = 0.98, validation R2 = 0.95, LOO c–v R2 = 0.95, 5-fold c–v R2 = 0.93). The ROI log ˛ obtained for both mobile phases (90/10 and 85/15) correlate with each other at R2 level of 0.99. Therefore, based on the same descriptors one can easily construct model of selectivity for mobile phase containing 15% of i-PrOH (model 2–15%). For such a model training, validation and test error are 0.04727, 0.0799 and 0.1196, respectively, while R2 between experimental and predicted values (Fig. 4b) is 0.93 (training R2 = 0.98, validation R2 = 0.99, LOO c–v R2 = 0.93, 5-fold c–v R2 = 0.94). The only significant deviation from the experimental values is found in case of 12, as both models predict smaller selectivity for 1-(2-methylphenyl)ethanol (12). Interestingly, the log ˛ predicted for 17 are 5–8 times bigger than that found in experiment, but the retention order is predicted to proceed without any inversion phenomenon.
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
6231
3.4. Prediction of retention
Fig. 5. 3D localization of field points involved in model 2–10%. The grey and black crosses indicate localization of the field points that increase or decrease ROI log ˛, respectively. Crosses were scaled according to relative descriptor importance in the model.
Two MLR equations corresponding to LIN 6:6–1:1 networks are presented below: model 2–10%
From among all studied 2D descriptors only number of H-bond donor and acceptor atoms, HOMO energy and energy difference between HOMO and LUMO orbitals exhibited statistically significant correlation with log k. The strongest correlation existed between retention factors and energy of HOMO. The variation of HOMO orbital energy described 82% and 78% of variation in log k (10% and 15% of isopropanol, respectively). However, this parameter assumes the same (or very close) values for either S or R isomers and, in consequence, cannot be used as a chiral discrimination descriptor. Therefore, in order to obtain a model which correctly predicts both chiral separation and overall retention one must also use the 3D descriptors. As in previous cases genetic algorithms were used to select initial population of input parameters. Top seven variables were used by ISP that scanned 2000 neural networks for the best model. The best results are obtained for linear model LIN 5:5–1:1 that utilized beside HOMO energy four field point parameters: H+ /117, CH3 /73, CH3 /227 and CH3 − /62. The training, validation and test errors are: 0.039, 0.073 and 0.097, respectively. The R2 between predicted and experimental data (Fig. 6a) is 0.97 (training R2 = 0.98, validation R2 = 0.99, LOO c–v = 0.94, 5-fold c–v R2 = 0.95).
ROI log ˛10% = 0.42CH3 − /34 + 0.4HO− /58 − 0.80HO− /99 − 0.16HO− /128 − 1.13CH3 + /80 + 0.17CH3 + /159 + 0.25 n = 16, R = 0.99, R2 = 0.98, corr. R2 = 0.97, F(6, 9) = 72.32, LOO c–v R2 = 0.95, PRESS = 0.019 model 2–15% ROI log ˛15% = 0.45CH3 − /34 + 0.40HO− /58 − 0.79HO− /99 − 0.16HO− /128 − 1.14CH3 + /80 + 0.16CH3 + /159 + 0.26 n = 16, R = 0.99, R2 = 0.98, corr. R2 = 0.96, F(6, 9) = 66.80, LOO c–v R2 = 0.93, PRESS = 0.025
The networks use CH3 + /80 as a parameter corresponding to CH3 /80, which was used by model 1 to distinguish between S and R isomers. However, for correct determination of selectivity more field points have to be used, namely CH3 − /34, CH3 + /159 for steric and mixed electrostatic–steric interactions and HO− /58, HO− /99, HO− /128 for acceptor–donor interactions (see Fig. 5). The analysis of ˇ coefficients in model 2 equations and network sensitivity analyses suggest, that electrostatic–steric interaction at CH3 + /80 has the strongest influence on the ROI log ˛. As in case of model 1 this field point is responsible for discrimination between S and R isomers. For most S isomers but 9 high positive energy values of CH3 + /80 yield negative value of ROI log ˛. The rest of points seems to be involved in modulation of selectivity (Fig. 5) and is localized far from stereogenic center near the para position and below all molecules. The exception is HO− /128, which is in close contact with both S and R hydroxyl groups and assumes small values only for (R)-1, (R)-6, (S)6, (R)-10 and (R)-12. The next in the importance is HO− /99 which detects polar substituents in para position followed by HO− /58 and CH3 − /34. The HO− /128 descriptor is the least important in both models.
Fig. 6. Correlation plot between the experimental log k and log k for: (a) model 3–10% LIN 5:5–1:1 neural network (R = 0.99, R2 = 0.97) and (b) model 4–15% LIN 4:4–1:1 neural network (R2 = 0.95, R = 0.97). Dashed line provides 95% confidence interval of the predicted dependent variable.
6232
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
As log k10% correlates with log k15% at R2 level equal to 0.98, the same neural model can be retrained to predict retention for mobile phase containing 15% of i-PrOH with R2 = 0.95. However, based on the same population of descriptors one can obtain even simpler model (4 parameters) which performance is not worse from 5 parameter model. In LIN 4:4–1:1 (training error 0.057, validation error 0.081, test error 0.158) neural network CH3 /227 is replaced by highly correlated with it CH3 /274 (R = 0.85) and CH3 − /62 is not introduced into the model due to its low statistical significance and value of ratio below 1.0 in sensitivity analysis. As a result, model 4–15%, exhibits R2 of 0.95 (training R2 = 0.96, validation R2 = 0.99, LOO R2 = 0.93, 5-fold c–v R2 = 0.92, see Fig. 6b). Both models, beside accurate prediction of retention, correctly predict the elution order in Group II and are 100% consistent with predictions of model 1 (see Fig. 6 and Table 3). Moreover, they predict fairly well the selectivity, with R2 value between experimental and predicted ROI log ˛ of 0.75 and 0.85 for model 3–10% and model 4–15%, respectively. Model 3 and model 4 correspond to MLR equations provided below with normalized ˇ coefficients describing relative variable importance. model 3–10% log k10% = 0.87HOMO + 0.49CH3 /73 + 0.48CH3 − /62 + 0.42CH3 − /227 − 0.27H+ /117 + 9.68 n = 16tr., R = 0.99tr., R2 = 0.98, corr. R2 = 0.97, LOO c–v = 0.94, PRESS = 0.049 F = 105.46 model 4–15% log k15% = 0.91HOMO + 0.44CH3 /73 + 0.49CH3 /274 − 0.59H+ /117 + 7.64 n = 16tr., R = 0.98tr., R2 = 0.96, corr. R2 = 0.92, LOO c–v = 0.93, PRESS = 0.039 F = 70.49 The analysis of ˇ coefficients and sensitivity analysis of model 3–10% suggests that HOMO descriptor has the strongest influence on the value of log k from the whole model while H+ /117 has the lowest. Both ˇ coefficients and sensitivity analysis points at CH3 /73 field point as a second in the importance. The field point is localized far below all molecules and its energy oscillates around −0.3 kcal (see Fig. 7a). It does not correlate precisely with log k but it assumes
different values almost for each compound which allows modulation of overall retention described by HOMO parameter. CH3 /227 is involved in encoding of inversion of elution order of 9 as it attains less negative value for (S)-9 isomer (higher retention). Moreover, energies of the rest of compounds are narrowly distributed and separated from energies of 1-(2-furyl)ethanol (9). Third in the importance is CH3 − /62 descriptor, which correlates with log k at R = 0.6 level and assumes positive energy values for more polar compounds. Moreover, it discriminates 6 and 9 attaining low energy values most probably due direct interactions with negatively charge oxygen atoms from the substituted ring. If that point is removed inversion of elution order for 9 is no longer predicted correctly. In case of model 4–15% the analysis of ˇ coefficients marks HOMO energy as the most important parameter, followed by H+ /117 and finally, almost equally important, CH3 /274 and CH3 /73. The sensitivity analysis points at H+ /117 field point as the most crucial, followed by HOMO orbitals. The other two points are equally important. In model 4 the CH3 /274 descriptor plays the same role as CH3 /227 in model 3, i.e. it is involved in the recognition of (S)-9 and, in consequence, in a simulation of its different chromatographic behavior (Fig. 7b). As to the role of HOMO it seems that the higher is its energy for a particular compound, the longer is the retention time. In alkylaromatic alcohols the HOMO orbital describes delocalized electron density of aromatic ring, the free electron pairs of the oxygen atom and some electron density of the alkyl group. As in tribenzoate cellulose CSP the retention will originate from both – interactions, disperse interaction of alkyl chain as well as hydrogen bonding between hydroxyl groups of alcohol and carbohydrate, the electrons described by HOMO naturally would play a vital role. The occurrence of frontier orbitals in QSRR equations is frequently explained as solute–stationary phase donor (HOMO) and acceptor (LUMO) interactions (e.g. Fabian and co-workers [47] and Wainer and co-workers [48]). This may suggest that upon formation of solute—CSP complex that is involved in overall retention (but not directly in chiral recognition), some form of charge transfer from analyte toward tribenzoate cellulose moiety takes place. The higher HOMO orbital is localized on the energy scale the less energy such process would require. However, the occurrence of the statistically significant correlation does not automatically imply the cause–consequence relationship. Although, the introduction of heteroatoms into the studied alcohols increases the energy of the HOMO orbital it allows binding with the stationary phase with additional dipole–dipole or hydrogen bond interactions. Indeed, log k correlates with H-donor atom number on 0.78 (10% IPA) and 0.71 (15% IPA) level. Therefore, one cannot determine type of interactions involved based only on the statistical analysis. Nevertheless, it seems apparent that the issue of overall polarity of the compound (which is also described by the energy difference between frontier orbitals) is responsible for retention on polar CSP. This is additionally confirmed by model 1S developed for all cases, which utilizes AlogP98. The model indicates that for high log P values (high hydrophobicity) short retention is observed. This result is in accordance with the correlation of log k with Hansch fragment constant [49] found by Wainer et al. [13]. Moreover, log k—solutes hydrophobicity relation expressed in terms of lipophylicity was previously observed in case of the polysaccharide CSPs [9,10].
4. Quality of models
Fig. 7. 3D localization of field points involved in (a) model 3–10% (b) model 4–15%. The grey and black crosses indicate localization of the field points that increase or decrease log k, respectively. Crosses were scaled according to relative descriptor importance in the model.
Figs. 4 and 6, Figs. S2 and S4 present marked with dashed lines the 95% confidence interval of the estimated values (either log k or ROI log ˛). These intervals describe the confidence range of the estimation of a particular dependent variable (see Supplementary Materials). Table 3 provides the list of residuals for all investigated
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
6233
Table 4 Statistical parameters of prediction models. Training R2 —determined for 16 training cases; validation R2 —determined for 4 validation cases, c–v R2 —leave-one-out R2 for training cases, 5-fold c–v R2 —5-fold cross-validation R2 for training and validation cases in case of model 1 and training cases for model 2–5, overall R2 —for correlation with all data, presuming classical order of elution as predicted by model 1; PRESS—predictive sum of squares. Model
Training R2
Validation R2
LOO c–v R2
5-Fold c–v R2
Overall R2
PRESS
Model 1
0.99
1.00
2.480
0.98 0.98 0.98 0.96 0.97
0.95 0.99 0.99 0.99 0.95
0.70 0.99a 0.93 0.94 0.95 0.92 0.95
0.99
Model 2–10% Model 2–15% Model 3–10% Model 4–15% Model 5
0.87 0.88a 0.95 0.93 0.94 0.93 0.94
0.93 0.93 0.97 0.95 0.96
0.019 0.025 0.049 0.039 0.096
a
Cross-validation R2 for validation set.
models while Table 4 sums up their statistical parameters. Both leave-one-out and 5-fold cross-validations yielded c–v R2 values of the same range as that obtained for models. Only for training set of model 1 5-fold cross-validation showed lower model performance which, however, did not influence the result of classification. Moreover, both absolute and normalized residuals are graphically presented on Fig. S5 (of the Supplementary Materials). As can be expected, the smallest deviations and best R2 parameters are found for the simplest model 1, predicting ROI. Fig. S5a shows that residuals are clearly bigger in the evaluation group than in the training and validation groups. This is natural, as the predictions of the models are done for the compounds with different substituents than those present in compounds of the training group. In case of models predicting selectivity the biggest absolute deviations are encountered for 1-(2-methylphenyl)ethanol (12) with systematic shift of both isomers towards more positive values. However, the analysis of normalized plot (Fig. S5b) suggests that the biggest errors are introduced into prediction of ROI log ˛ of the first fraction of 1-(2-naphthyl)ethanol (15) and both fractions of 1-[1,1 biphenyl]-4-ylethanol (17). The biggest systematic deviation for the model 3–10% is found for 1-(4-methylphenyl)ethanol (11) and 15, where both isomers are shifted towards higher retention while the highest normalized deviation are encountered for 12. Model 4–15% exhibits the biggest systematic residual shifts for 14 (negative) as well as for 11 (positive). However, the normalized plot shows that the error is not higher than 20% of the predicted log k value. The residual distribution seems to be random and does not exhibit trends that mark bad fitting of the model. It may seem that in case of models predicting retention, the model quality would benefit from the extension of the training group especially with methylsubstituted congeners. However, even if the network is obtained for the whole data set as a training group the residuals for 12 are still high. This indicates apparent inability of the network to perform precise prediction of 12 retention. It might be associated with the lack of crucial molecular interaction. This in turn might have its basis in the GA that selected independent variables based on the training group. This can be avoided if the variables are selected for the whole data set with both GA and stepwise MLR (to exclude auto-correlations). In case of prediction of log k10% this leads to 6term-equation (model S1) utilizing some different field points and AlogP98 (R2 = 0.98 see Supplementary Material—Figs. S2 and S3). However, such an approach exploits the knowledge that is restricted to the evaluation group, which is generally not a good practice in the model development. Such a model can be used however, to investigate molecular interactions crucial for retention behavior of the whole data set, provided more rigorous models were previously developed for prediction of the retention order.
5. The effect of the mobile phase As expected, the increase of concentration of the polar organic solvent (isopropanol) in the mobile phase leads to decrease in
the retention time of all investigated compounds. This effect is a result of competitive interactions of the polar organic solvent and solute molecules with stationary phase [50] and is from the practical point of view an equivalent to the increased organic solvent content in RPLC. The retention of investigated compounds for both mobile phases turn out to be linearly correlated (R2 = 0.98) which enables application of prediction models developed for one phase to the other. Moreover, one can easily recalculate the log k from 10% of i-PrOH into 15% i-PrOH conditions by the following formulalog k15% = −0.0784 + 0.8984 × logk10% However, as was shown by Wainer et al. [51], the competition for the CSP biding sites is in fact a saturable process and the maximum effect can be reached for a certain concentration of polar modifier. As a result the observed linear correlation is true only locally for similar i-PrOH concentrations. The relation in a wider concentration range usually turns out to be of non-linear type. This was exemplified by Armstrong and Berthod in non-linear log k (ethanol concentration) relation found for separation of 5-methyl5-phenylhydantoin in the NP mode [8] and by Weiner et al. for separation of 1-phenylethanol derivatives on Chiracel OB [13]. Establishing this relation for 1-phenyletanol derivatives would require broader investigation over wider range of i-PrOH concentration. However, such a study for the selected data set is difficult due to increasingly high retention for 14 (app. 2 h for 10% of isopropanol) from one side and decrease in resolution with higher isopropanol content for 17 (solute virtually not resolved in 15% of isopropanol). Nevertheless, the observed linear relation enables formulation of a simple model describing retention obtained in both mobile phases (LIN 6:6–1:1, training, validation and test errors: 0.04659, 0.0704 and 0.0839). Model 5 exhibits very good statistical parameters (overall R2 = 0.97, training R2 = 0.97, validation R2 = 0.95, LOO c–v R2 = 0.94, 5-fold c–v R2 = 0.95, see Fig. S4) and corresponds to the following MLR equation: model 5 log k = −0.43C% + 0.76HOMO + 0.48CH3 /73 + 0.39CH3 − /62 + 0.44CH3 /227 − 0.29H+ /117 + 9.64
n = 32, R = 0.98, R2 = 0.97, corr. R2 = 0.96, F = 117.53, LOO c–v R2 = 0.95, PRESS = 0.096 As can be seen from the equation, model 5 utilizes same variables as model 3. The only new parameter is the concentration of i-PrOH–C%. Negative ˇ at C% means that the increase of isopropanol concentration leads to the decrease of retention factor. Such a model could not be built for selectivity, as the influence of i-PrOH on log ˛ is not uniform. Even normally one expects a decrease of ˛ upon an increase of a polar solvent concentration, quite the reverse effect was observed for several investigated
6234
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235
solutes (1, 6, 11–13). However, these changes were in fact relatively small and in our opinion do not provide enough data for sound analysis. Similar inconsistent behavior of ˛, i.e. no change or no systematic change, for 1-phenylethanol, 1-phenylpropanol and 2phenylpropan-1-ol in a much wider isopropanol range (1–25%) was also observed by Weiner et al. [13]. 6. Stereospecificity of ethylbenzene dehydrogenase The modeling of retention order in Group II proved that EBDH is indeed highly S-stereospecific enzyme. Almost for all its substrates (12 compounds), EBDH catalyzes oxidation to S secondary alcohols. The exceptions are found for compounds substituted in aromatic ring in para position with polar substituents (3 ee% = 90%, 4 ee% = 60% and 14 ee% = 51%). Moreover, the synthesis of 17 seems to proceed predominant towards R isomer although the S isomer is also present among the reaction products. However, it is too early to speculate on the possible causes of the observed effects. More detailed investigation of the enzyme stereospecificity is conducted with the help of both LC-MS methods and quantum chemical modeling of the reaction mechanism. 7. Conclusions The problem of prediction of elution order and chiral separation is fully analyzed based on experimental results, premises from chromatographic analysis of enzymatic reaction mixtures and computer modeling. Various approaches are used in order to determine the chromatographic behavior of the training and evaluation cases. The simplest model 1 describes elution order by means of Retention Order Index, predicting classical behavior for all solutes in the evaluation group. Interaction with alcoholic hydroxyl group and heterocyclic oxygen atom of furan ring is identified as a crucial one for determination of retention order. Model 2 is developed for both mobile phases and predicts both ROI and selectivity (log ˛) with high quality (R2 = 0.93). Both models correctly predict retention order and are able to model log ˛ with high accuracy based on field point descriptors. Crucial interactions with chiral center and ring substituents are indentified. Three models describing log k are developed, that are able to predict retention order and log k with excellent accuracy (R2 from 0.94 to 97). Selectivity log ˛ is also described alas only with moderate precision (R2 in the range of 0.8). Apparently, different molecular interactions are responsible for retention and selectivity of CSP. HOMO energy turns out to be the most crucial parameter in log k determination. It can be associated either with solute-CSP charge transfer interaction or correlation of HOMO level shifts with increased solutes polarity due to introduction of heteroatomic substituents. Our study confirms the conclusion drawn by Wainer et al. [13], who attributed chiral recognition of alkylaromatic alcohols on tribenzoate cellulose CSP to H-bond interaction with hydroxyl group. Stabilization of the aromatic ring seems to be rather responsible for the selectivity (i.e. the extent of resolution) than the chiral recognition. The aromatic ring is apparently involved in the stabilization of the CSP-solute complex, and may be responsible for positioning of the solute in the CSP chiral cavity, thus indirectly influencing the efficiency of the chiral recognition. Additionally, different CSP-ring interactions seem to be involved in determination of retention and selectivity. Finally, the obtained models are applied to estimation of EBDH stereoselectivity. High stereoselectivity of the catalyzed reaction is confirmed for most of investigated cases. The reversion of usual S stereoselectivity in the case of 1-[1,1 -biphenyl]-4-ylethanol (17) is observed.
Acknowledgements The authors acknowledge the computational grant KBN/ SGI2800/PAN/037/2003 and financial support of the scientific network EKO-KAT. The authors gratefully thank anonymous referees for the thoughtful discussion and constructive remarks. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the on-line version, at doi:10.1016/j.chroma.2009.07.002. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39]
[40] [41] [42] [43]
R. Kaliszan, Chem. Rev. 107 (2007) 3212. S. Schefzick, Ch. Kibbey, M.P. Bradley, J. Comb. Chem. 6 (2004) 916. R. Puta, Y. Vander Heyden, Anal. Chim. Acta 602 (2007) 164. K. Heberger, J. Chromatogr. A 1158 (2007) 273. P.C. Sadek, P.W. Carr, R.M. Doherty, M.J. Kamlet, R.W. Taft, M.H. Abraham, Anal. Chem. 57 (1985) 2971. P.W. Carr, R.M. Doherty, M.J. Kamlet, R.W. Taft, W. Melander, C. Horvath, Anal. Chem. 58 (1986) 2674. A. Berthod, C.R. Mitchell, D.W. Armstrong, J. Chromatogr. A 1166 (2007) 61. C.R. Mitchell, D.W. Armstrong, A. Berthod, J. Chromatogr. A 1166 (2007) 70. C. Roussel, C. Popescu, Chirality 6 (1994) 251. C. Roussel, B. Bonnet, A. Piederriere, C. Suteu, Chirality 13 (2001) 56 (ref. therein). T.D. Booth, I.W. Wainer, J. Chromatogr. A 737 (1996) 157. G. Götmar, T. Fornstedt, G. Guiochon, Chirality 12 (2000) 558. I.W. Wainer, R.M. Stiffin, T. Shibata, J. Chromatogr. 411 (1987) 139. W.H. Pirkle, D.W. House, J. Org. Chem. 44 (1979) 1957. A. Del Rio, J. Sep. Sci. 32 (2009) 1566. R.D. Cramer III, D.E. Patterson, J.D. Bunce, J. Am. Chem. Soc. 110 (1988) 5959. A. Del Rio, P. Piras, Ch. Roussel, Chirality 18 (2006) 498. S. Schefzick, M. Lämmerhofer, W. Lindner, K.B. Lipkowitz, M. Jalaie, Chirality 12 (2000) 742. C. Altomare, A. Carotti, S. Cellamare, F. Fanelli, F. Gasparrini, C. Villani, P.-A. Carrupt, B. Testa, Chirality 5 (1993) 527. W.M.F. Fabian, W. Stampfer, M. Mazur, G. Uray, Chirality 15 (2003) 271. A. Golbraikh, D. Bonchev, A. Tropsha, J. Chem. Inf. Comput. Sci. 41 (2001) 147. J. Aires-de-Sousa, J. Gasteiger, J. Chem. Inf. Comput. Sci. 41 (2001) 369. J. Aires-de-Sousa, J. Gasteiger, J. Mol. Graph. Model 20 (2002) 373. J. Aires-de-Sousa, J. Gasteiger, I. Gutman, D. Vidovi, J. Chem. Inf. Comput. Sci. 44 (2004) 831. A. Del Rio, J. Gasteiger, QSAR Comb. Sci. 27 (2008) 1326. S. Caetano, J. Aires-de-Sous, M. Daszykowski, Y. Vander Heyden, Anal. Chim. Acta 544 (2005) 315. C. Roussel, A. Del Rio, J. Pierrot-Sanders, P. Piras, N. Vanthuyne, J. Chromatogr. A 1037 (2004) 311. A. Del Rio, P. Piras, C. Roussel, Chirality 17 (2005) S74. C. Zhao, N.M. Cann, J Chromatogr. A 1149 (2007) 197. Lipkowitz, J. Chromatogr. A 906 (2001) 417. M. Szaleniec, C. Hagel, M. Menke, P. Nowak, M. Witko, J. Heider, Biochemistry 46 (2007) 7637. T. Saito, Y. Nishimoto, M. Yasuda, A. Baba, J. Org. Chem. 71 (2006) 8516. P.V. Ramachandran, B. Gong, H.C. Brown, Tetrahedron Lett. 35 (1994) 2141. E.T. Everhart, J.C. Craig, J. Chem. Soc. Perkin Trans. 1 (1991) 1701. O. Kniemeyer, J. Heider, J. Biol. Chem. 276 (2001) 21381. A. Dudzik, M. Pawul, M. Szaleniec, B. Kozik, M. Witko, in: B. Sulikowski (Ed.), Proceedings of Catalysis for Society Conference, Kraków, 2008, p. 236. CAChe WS Pro v 7.5 , Fujitsu Limited, Tokyo, Japan. J.J.P. Stewart, J. Comp. Chem. 10 (1989) 221. Gaussian 03, Revision D.01, M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman, J.A. Montgomery, Jr., T. Vreven, K.N. Kudin, J.C. Burant, J.M. Millam, S.S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J.E. Knox, H.P. Hratchian, J.B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R.E. Stratmann, O. Yazyev, A.J. Austin, R. Cammi, C. Pomelli, J.W. Ochterski, P.Y. Ayala, K. Morokuma, G.A. Voth, P. Salvador, J.J. Dannenberg, V.G. Zakrzewski, S. Dapprich, A.D. Daniels, M.C. Strain, O. Farkas, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J.V. Ortiz, Q. Cui, A.G. Baboul, S. Clifford, J. Cioslowski, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham, C.Y. Peng, A. Nanayakkara, M. Challacombe, P.M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, J.A. Pople, Gaussian, Inc., Wallingford CT (2004). A.D.J. Becke, Chem. Phys. 98 (1993) 5648. M. Cossi, G. Scalmani, N. Rega, V. Barone, J. Chem. Phys. 117 (2002) 43. Accelrys, Inc., Accelrys Material Studio v4.2.2, San Diego: Accelrys Software Inc. (2007). Accelrys, Inc., Cerius2 Modeling Environment, Release 4.8, San Diego: Accelrys Software Inc. (2005).
M. Szaleniec et al. / J. Chromatogr. A 1216 (2009) 6224–6235 [44] J. Gasteiger, M. Marsili, Tetrahedron 36 (1980) 3219. [45] A.K. Rappe, C.J. Casewit, K.S. Colwell, W.A. Goddard, W.M. Skiff, J. Am. Chem. Soc. 114 (1992) 10024. [46] StatSoft, Inc. STATISTICA (data analysis software system), version 7.1. http:// www.statsoft.com (2006). [47] T. Suzuki, S. Timofei, B.E. Iuoras, G. Uray, P. Verdino, W.M.F. Fabian, J. Chromatogr. A 922 (2001) 13.
6235
[48] T.D. Booth, K. Azzaoui, I.W. Wainer, Anal. Chem. 69 (1997) 3879. [49] C. Hansch, A. Hansch, Substituent Constants for Correlation Analysis in Chemistry and Biology, John Wiley & Sons, New York, 1979. [50] X. Zhu, Y. Cai, W. Zhang, L. Chen, Y. Li, J. Chromatogr. A 1002 (2003) 231. [51] I.W. Wainer, M.C. Alembik, E. Smith, J. Chromatogr. 388 (1987) 65.