Accepted Manuscript Title: Using Quantitative Structure Activity Relationship Models to Predict an Appropriate Solvent System from a Common Solvent System Family for Countercurrent Chromatography Separation Author: Siˆan Marsden-Jones Nicola Colclough Ian Garrard Neil Sumner Svetlana Ignatova PII: DOI: Reference:
S0021-9673(15)00563-4 http://dx.doi.org/doi:10.1016/j.chroma.2015.04.020 CHROMA 356440
To appear in:
Journal of Chromatography A
Received date: Revised date: Accepted date:
10-12-2014 26-3-2015 8-4-2015
Please cite this article as: S. Marsden-Jones, N. Colclough, I. Garrard, N. Sumner, S. Ignatova, Using Quantitative Structure Activity Relationship Models to Predict an Appropriate Solvent System from a Common Solvent System Family for Countercurrent Chromatography Separation, Journal of Chromatography A (2015), http://dx.doi.org/10.1016/j.chroma.2015.04.020 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1 2
Using Quantitative Structure Activity Relationship Models to Predict an Appropriate Solvent System from a Common Solvent System Family for Countercurrent Chromatography Separation
3
Siân Marsden-Jonesa, Nicola Colcloughb, Ian Garrarda, Neil Sumnerb, Svetlana Ignatovaa a
Advanced Bioprocessing Centre, Institute of Environment, Health and Societies, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK b
6
AstraZeneca UK Limited, Alderley Park, Cheshire, SK10 4TG, UK
ip t
4 5
Abstract
8
Countercurrent Chromatography (CCC) is a form of liquid-liquid chromatography. It works by running
9
one immiscible solvent (mobile phase) over another solvent (stationary phase) being held in a CCC
cr
7
column using centrifugal force. The concentration of compound in each phase is characterised by the
11
partition coefficient (Kd), which is the concentration in the stationary phase divided by the concentration
12
in the mobile phase. When Kd is between approximately 0.2 and 2, it is most likely that optimal
13
separation will be achieved. Having the Kd in this range allows the compound enough time in the column
14
to be separated without resulting in a broad peak and long run time. In this paper we report the
15
development of Quantitative Structure Activity Relationship (QSAR) models to predict logKd. The
16
QSAR models use only the molecule’s 2D structure to predict the molecular property logKd.
17
Key words: countercurrent chromatography, centrifugal partition chromatography, solvent system
18
selection, method development, Quantitative Structure Activity Relationships (QSAR)
19
Highlights
21 22 23 24 25 26
an
M
d
te
Ac ce p
20
us
10
Quantitative Structure Activity Relationship (QSAR) models have been developed.
The QSAR models have been built for the six HEMWat solvent systems.
A data set of the logKd values of 54 compounds in six HEMWat systems was generated.
1. Introduction
Countercurrent Chromatography (CCC) was invented in 1966 by Ito [1]. In CCC, the compounds partition between two immiscible liquids (phases). One phase (stationary) is retained inside the column, which is spun in planetary motion. Whereas the other (mobile phase) is pumped through the column.
27
Separation is achieved as compounds that spend more time in the stationary phase take longer to pass
28
through the column than compounds that spend more time in the mobile phase. CCC has many
29
advantages over traditional liquid-solid chromatography including total recovery of compound; also
30
crude samples containing particulates can be separated and higher loading capacities are tolerated [2,3].
31
CCC is also reproducible and scalable. A disadvantage of CCC is that the choice of the solvent system is
1
Page 1 of 20
currently based on an analyst’s past experience, trial and error or literature analysis. This may mean that
33
systems that would give very well defined chromatography are missed or that large quantities of time and
34
solvents/samples are used to select an appropriate solvent system. Being able to predict the Kd values of
35
target compounds would speed up the method development phase of the CCC process without time
36
consuming, solvent intensive experiments.
37
There have been previous attempts to computationally predict the partition coefficients of compounds.
38
Hopmann et al. [4] used the software COSMO-RS to predict Kd values using activity coefficients of the
39
upper and lower phases. The conformation of the molecule plays a very important role in the calculation
40
and is computationally expensive to calculate, taking up to 9 hours for large molecules.
41
Another method used the UNIFAC (Universal quasichemical functional-group activity coefficients)
42
model which was developed by Li et al. [5]. This model uses thermodynamics to calculate Kd. A
43
potential disadvantage of this programme is its dependency on group interaction parameters which are
44
often limited.
45
Ren et al. [6] used NRTL-SAC (Non-Random Two Liquid – Segment Activity Coefficient) in
46
combination with UNIFAC and GA (Generic Algorithm) to predict partition coefficients. This method is
47
not purely computational as some experimental Kd values are needed for the prediction. This is a
48
disadvantage if the compound to be separated is expensive or supply is limited.
50 51 52 53 54
cr
us
an
M
d
te
Ac ce p
49
ip t
32
The modelling approach that has been investigated in this work is Quantitative Structure Activity Relationships (QSARs). QSARs are relationships that are used to predict a molecular property, in this case logKd, from a molecule’s structure. The logKd is predicted instead of the Kd value as this normalises
the experimental values. QSARs offer fast computational predictions. They rely on molecular descriptors which can be calculated manually (for example the number of hydrogen bond acceptors) or from a number of widely available software packages (e.g. ACD Labs logP/D, Daylight/Biobyte ClogP). As
55
long as a complete set of descriptors is available, QSAR models of the type explored in this work can be
56
run in Microsoft Excel or an equivalent. This type of software is owned by the majority of people so
57
using the model would be convenient. This could allow more automation of CCC, increasing the
58
techniques’ appeal especially to industry.
2
Page 2 of 20
The QSAR models that have been built in this study work with the HEMWat solvent systems. It contains
60
heptane (or hexane), ethyl acetate, methanol and water in varying proportions. It is a very versatile
61
system as changing the proportions of each solvent will change the polarity of the overall system as well
62
as the polarity difference between two phases. This control over the polarity allows the system to be
63
adjusted to optimise the partitioning of many different compounds. In the Brunel University CCC
64
literature database containing 2322 papers, 1121 of these (48%) use HEMWat based solvent systems.
65
The next most commonly used solvent system is based on butanol which was used in 542 papers (23%).
66
This implies that HEMWat is the most commonly used solvent system making it ideal for this proof of
67
concept study [7]. Garrard [8] adopted a numeric labelling scale from 6 - 28 to denote polarity, within
68
which HEMWat6 was the most polar and HEMWat28 was the least polar. The HEMWat systems
69
denoted 1-6, contain butanol and not always all four of the other solvents. The QSAR approach can be
70
applied to any solvent system. However, in this work we have chosen to focus on the HEMWat solvent
71
system. Traditionally, QSARs have been developed for much simpler two solvent systems such as
72
octanol/water and cylcohexane/water [9]. Therefore, successfully applying the QSAR methodology to the
73
much more complex HEMWat systems would by analogy show that QSARs are likely to be applicable to
74
all solvent system families. Liquid handling robots are commonly used in industry for logP
75
measurements so could easily be used for fast, accurate partition coefficient measurement. Therefore,
76
measuring Kd values for other solvent system families for QSAR generation would not be too time
77
consuming. This work attempts to develop QSAR models to increase the speed and efficiency of solvent
79 80 81 82
cr
us
an
M
d
te
Ac ce p
78
ip t
59
selection in CCC. Through the use of a diverse data set to train each HEMWat QSAR, the aim is that the models will be able to accurately predict logKd values for a large range of molecules.
2. Experimental
2.1 Materials and Chemicals
The solvents used were HPLC grade heptane, ethyl acetate, methanol, acetonitrile and ethanol purchased
83
from Fisher Chemicals (Loughborough, UK). The water used was deionised in house using a Purite
84
Select Fusion purification system (Thame, UK). All compounds were purchased from Sigma Aldrich
85
(Gillingham, UK) (including quality control compound, 2-ethylanthraquinone) with a minimum purity of
86
95%. Ammonia solution (35%) and TFA (99%) were purchased from Fisher Scientific (Loughborough,
87
UK).
3
Page 3 of 20
2.2 Apparatus
89
HPLC analysis was conducted on a HP1100 Agilent system (Stockport, UK) with detection at 254, 260,
90
275, 295, and 310 nm with a Symmetry C18 column (75 × 4.6 mm I.D., 3.5 μm), (Waters, USA). An
91
Eppendorf Concentrator 5301 (Hamburg, Germany) was used as a centrifuge at 1400rpm (240g) at room
92
temperature. The balance was a Sartorius Mechatronics analytical balance 1601A MP8-1 (Epsom, UK)
93
unit with a range from 0.1mg to 110g.
94
2.3 Preparation of two phase solvent system and determination of logKd
95
The predictive ability of the QSAR is dependent on the accuracy of the experimentally determined
96
partition coefficient values used to train the model. Therefore physical factors were controlled to
97
minimise the experimental error. These included temperature and pH which were held constant while the
98
compound was in the two phase system. Once each phase had been sampled, it was diluted in ethanol to
99
remove any matrix effect from the solvent system.
M
an
us
cr
ip t
88
To avoid volume variations when preparing the HEMWat solvent systems due to possible temperature
101
fluctuations six HEMWat solvent systems were made up by mass according to Table 1 [8] using
102
thermostated solvents (at 20°C for 20 minutes in a water bath) and the mixtures left overnight to
103
equilibrate at room temperature. Before sampling, the solvent systems were placed in a 20°C water bath
104
for 20 minutes. As these solvent systems have been made up by mass as opposed to volume, the final
106 107 108 109
te
Ac ce p
105
d
100
percentage composition is slightly different from the conventional HEMWat systems described by Garrard [8]. Therefore, they have been distinguished by the addition of the letter “m” to the HEMWat numbers.
110 111
The six HEMWat systems were chosen as they gave a large polarity range across the whole series. To
112
remove the effect of pH on ionising compounds such molecules were converted to their neutral form,
113
acidic compounds were run in acidified HEMWat (0.1% TFA in water, replacing pure water) and basic
114
compounds were run in basified HEMWat (1% ammonia solution in water, replacing pure water). 4
Page 4 of 20
Compounds with a negative ClogP (octanol/water partition coefficient from Biobyte, Inc. of Claremont,
116
CA, USA and Daylight, Laguna Niguel, CA, USA) were dissolved in 1.5ml of the lower phase of
117
HEMWat until the phase was saturated. Compounds with a positive ClogP were dissolved in the upper
118
phase of the HEMWat system until the phase was saturated. This ensured that the maximum amount of
119
compounds was dissolved in the HEMWat system. The solutions were centrifuged (1400 rpm for
120
30 seconds) to remove all particulates from the supernatant. An aliquot of 400 μl of supernatant was
121
mixed and centrifuged with 1400 μl of the alternative phase (1400 rpm for 30 seconds).
122
An aliquot of 80 μl of the 1400 μl volume phase and 320 μl of the 400 µl phase were separately diluted
123
using 1 ml of ethanol. To avoid cross contamination, before the lower phase was sampled the remaining
124
upper phase was removed by pipette until no upper phase could be seen on visual inspection. The
125
samples were run on a 10 minute gradient method on the HPLC using Symmetry C18 column
126
(4.6x75mm, 3.5um), at 1ml/min and 40C. Mobile phase consisted of 0.1% aqueous trifluoroacetic acid
127
(solvent A) and acetonitrile (solvent B). The gradient elution program was as follows: 0-6 min, 10% B;
128
2-8 min, 80% B.
129
The logKd values of a quality control (QC) compound, 2-ethylanthraquinone, was measured in each of
130
the six HEMWat systems alongside the other compounds. The mean and standard deviation for the
131
2-ethylanthraquinone can be found in Table 2 for 15 runs.
133 134 135 136
cr
us
an
M
d
te
Ac ce p
132
ip t
115
The compound concentration dependence of the Kd measurement was evaluated using three structurally
diverse compounds including a carboxylic acid, a compound class known to dimerise at high
137
concentrations in non-polar media. These 3 compounds were: 3-bromobenzoic acid (16-249 mM),
138
warfarin (5-81 mM) and phenol (30-531 mM). Throughout the range of tested concentrations, the
139
measured Kd value was the same for all three compounds. It was therefore concluded that the
140
concentrations used in this method did not affect the Kd value (see Supplementary data S1, S2 and S3 for
141
Kd results).
5
Page 5 of 20
2.4 Data sets
143
A data set of the logKd values of 54 compounds in six HEMWat systems was measured. Each logKd
144
value was measured in triplicate and averaged. The set of 54 compounds chosen contained 38 neutral
145
compounds, 10 acidic compounds and 6 basic compounds. From this data set, 50 compounds were
146
selected as a training set to build the QSAR and 4 compounds were selected as a test set. Figure 1 shows
147
a principal component analysis (PCA) carried out on all 54 compounds. The first principal component is
148
chosen to account for the maximum amount of variance and therefore describe most of the variability of
149
the model. The second component is fixed as orthogonal to the first component and then made to cover
150
the maximum variance possible under this constraint. The PCA analysis was used to ensure a spread of
151
property space of the training set compounds and also to select the test set for the models. This PCA
152
(Figure 1) was used to select 4 compounds that were well spread across parameter space to ensure the
153
model was tested on a diverse range of compounds. The four compounds chosen were Biphenyl,
154
Tolbutamide, Quinine and Benzoquinone as they represent distinct areas of parameter space to test.
M
an
us
cr
ip t
142
155
159 160 161 162 163
te
158
2.5 Generating QSAR models
Ac ce p
157
d
156
The QSAR models were developed using two dimensional molecular descriptors from CLab an in-house AstraZeneca software package which generates 196 descriptors for each compound (see supplementary data table S4). The descriptors are generated using SMILES (Simplified Molecular-Input Line-Entry System) and fall into seven main categories: lipophilicity, hydrogen bonding, size and shape, charge and polarity, atom counts, topology and drugability (i.e. Lipinski rule of five [10]). In addition to these 196
164
parameters, five Abraham parameters (Hydrogen bonding acidity, A, Hydrogen bonding basicity, B,
165
Polarisability, S, Excessive molar refraction, E, and McGowan volume, V) [11] were investigated as they
166
have been used extensively for modelling partitioning [12]. It was decided to use Partial Least Squares
167
(PLS) to explore the ability to generate predictive models for logKd.
6
Page 6 of 20
SIMCA version 13 (Umetrics, Umea, Sweden) was used to perform the PLS regression. The initial
169
QSAR models were generated using the automated fit tool within the software with the 196 2D
170
descriptors. Any descriptors with a Variance Inflation Plot (VIP) value of less than 1 were removed from
171
the model and the PLS models rebuilt. This was applied for a second time to the resulting model leading
172
to three PLS models for each HEMWat system. The models were then compared using the Root Mean
173
Square Error (RMSE) and R2 values for the training and test sets. The R2 terms quantifies how much of
174
the variation in the response, logKd, is explained by the model. An R2 value of 1 is indicative of a perfect
175
model and a value of 0 suggest the model is very poor. We considered an R2 value above 0.80 as
176
acceptable. RMSE is calculated using Equation 1. The smaller the RMSE the better the model but we
177
considered a RMSE value of less than 0.5 as acceptable.
an
us
cr
ip t
168
M
178
Equation 1 - Root Mean Square Error (RMSE) where
180
value.
181
The R2 values of the training sets were calculated. The predictive performance of the models was
182
assessed according to the cross validated coefficient of correlation Q2. In addition an external validation
183
of the models was undertaken by predicting the logKd of the external test set of 4 compounds calculating
185 186 187 188
is the observed
te
Ac ce p
184
is the predicted value and
d
179
the RMSE and R2. One model was chosen for each HEMWat system on the basis of these statistics. 3. Results and Discussion
The best PLS model for each of the HEMWat systems was selected on the basis of the R2 and Q2 values.
The QSAR equations are explicitly described in the supplementary data S5-S10. The summation of these coefficients multiplied by their corresponding compound specific value with the residual coefficient will
189
predict the logKd value for the compound. A summary of the statistics for these models can be found in
190
Table 3.The best models for HEMWat8m, 14m, 17m and 20m were obtained after all the descriptors with
191
a VIP value of less than one were removed once. The best performing models for HEMWat22m and
192
HEMWat26m were achieved after the descriptors with a VIP value of less than 1 were removed twice.
193
The models for five of the six HEMWat systems produced an R2 value for the training set within our
7
Page 7 of 20
acceptance criteria of greater than 0.8. HEMWat8m was the exception with an R2 value for the training
195
set of 0.69. This suggests that the model for HEMWat8m may not produce as accurate predictions as the
196
other five models. When the RMSE for the test set is analysed, the models for HEMWat17m, 20m, 22m
197
and 26m, are all within our target acceptance criteria of 0.5.
198
cr
199
ip t
194
us
200
Figure 2 shows the difference in the measured and predicted values of the test set (see Supplementary
202
data S11 for experimental and predicted values). Of the 24 predictions, 70% are within +/- 0.5 long units
203
and 79% are within +/- 0.52 log units. Interestingly, it is the systems with higher HEMWat numbers that
204
produce the most accurate predictions.
M
205
d
206
210 211 212 213 214 215
Figure 3 shows the coefficient plots for the six QSARS for each of the six HEMWat systems. From each
Ac ce p
209
te
207 208
an
201
HEMWat model’s coefficient plot, the descriptors contributing to the model can be observed. Interestingly, the octanol/water based lipophilicity descriptors dominate the lower HEMWat numbers whilst hydrogen bonding descriptors are more prevalent in the models of the higher HEMWat numbers. This possibly reflects the change in the organic phase from mostly ethyl acetate to mostly heptane as the HEMWat number increases. As heptane is unable to hydrogen bond to solutes, hydrogen bonding descriptors are important showing negative coefficients since they favour solutes partitioning into the aqueous phase.
216 217
218
8
Page 8 of 20
3.1 Predicting a solvent system for optimal separation conditions
220
The QSAR models generated using the PLS regression method were used to predict the logKd values of
221
the 4 test set compounds in the six HEMWat systems. By applying a linear fit between the six HEMWat
222
system numbers and the predicted logKd for each compound , the six individual predicted logKd values
223
can be used to suggest the HEMWat system in which the compound will have a logKd of zero (Kd equal
224
to one). This system should provide optimal separation conditions.
225
Table 4 shows a good comparison between the HEMWat systems predicted to have optimal separation
226
based on extrapolating the experimentally determined logKd results and extrapolating the QSAR
227
predicted logKd values for three of the four test compounds. The linear relationships used to predict this
228
optimal system provides high R2 values and low RMSE values indicting a strong linear fit for both the
229
predicted logKd values and the experimentally determined logKd values.
an
us
cr
ip t
219
M
230 231
d
232 4. Conclusion
234
In this work for the first time QSAR models were generated to predict the logKd values of compounds in
236 237 238 239
Ac ce p
235
te
233
HEMWat systems from their molecular structure alone. Of the 4 test compounds, 71% were predicted within +/- 0.5 log units by the PLS QSARs. These QSARs will be developed further as they have the potential to speed up the solvent system selection for CCC. 5. Acknowledgements
The first author especially would like to thank AstraZeneca and the Engineering and Physical Science
240
Research Council (EPSRC) for funding this project as part of her PhD. Furthermore, the authors are
241
grateful to Dr Jonathan Huddleston for support and advice at the beginning of this work. Thanks are also
242
extended to Professor Michael Abrahams and Dr Joelle Gola for their time and assistance.
243
6. Bibliography
9
Page 9 of 20
[1] Y. Ito, M. Weinstein, I. Aoki, R. Harada, E. Kimure, K. Nunogaki, Nature, The Coil Planet Centrifuge, 1966, Vol. 212, pp. 985-987.
246 247
[2] A. Marston, K. Hostettmann, Journal of Chromatography A, Developments in the Application of Counter-Current Chromatography to Plant Analysis, 2006, Vol. 1112, pp.181-194.
248 249 250 251
[3] L. Chen, Q. Zhang, G. Yang, L. Fan, J. Tang, I. Garrard, S. Ignatova, D. Fisher, I. A. Sutherland, Journal of Chromatography A, Rapid purification of and Scale-up of Honokiol and Magnolol using High-Capacity High-Speed Counter-Current Chromatography, 2007, Vol. 1142, pp. 115-122.
252 253 254
[4] E. Hopmann, W. Arlt, M. Mincerva, Journal of Chromatography A, Solvent system selection, in counter-current chromatography using the conductor-like screening model for real solvents, 2011, Vol. 1218, pp. 242-250.
255 256 257
[5] Z. Li, Y. Zhou, F. Chen, L. Zhang, Y. Yang, Journal of Liquid Chromatography and Related Technologies, Property Calculation and Prediction for Selecting Solvent Systems in CCC, 2003, Vol. 26, pp. 1397-1415.
258 259 260 261
[6] D-B Ren, Z-H Yang, Y-Z Liang, Q. Ding, C. Chen, M-L Ouyang, Journal of Chromatography A, Correlation and prediction of partition coefficients using non-random two-liquid segment activity coefficient model for solvent system selection in counter-current chromatography separation, 2013, Vol. 1301, pp. 10-18.
262
[7] Krystyna Skalicka, Internal Report, Medical University of Lublin.
263 264 265
[8] I. J. Garrard, Journal of Liquid and Chromatography & Related Technologies, Simple Approach to the Development of a CCC Solvent Selection Protocol Suitable for Automation, 2005, Vol. 28, pp. 19231935.
266 267 268
[9] M.H. Abraham, H.S.Chadha Applications of a salvation equation to drug transport properties, in Lipophilicity in Drug Action and Toxicology, Edited by V. Pliska, B. Testa, H. Van der Waterbeemd, 1996 VCH Weinheim.
269 270 271
[10] C. A. Lipinski, F. Lombardo, B. W. Dominy, P. J. Feeney, Experimental and computation approaches to estimate solubility and permeability in drug discovery and development settings, Advanced Drug Delivery Reviews, 1997, 23, pp. 3-25
274 275 276 277
cr
us
an
M
d
te
Ac ce p
272 273
ip t
244 245
[11] M. H. Abraham, Chemical Society Reviews, Scales of solute Hydrogen-bonding: Their Construction and Application to Physiochemical and Biochemical Processes, 1993, Vol. 22, pp. 73-83
[12] M. H. Abraham, J. M. R. Gola, A. Ibrahim, W. E. Acree Jr, X. Liu, Pest Management Science, The prediction of blood-tissue partitions, water-skin partitions and skin permeation for agrochemicals, 2014, Vol. 70, pp. 1130-1137 Figure Legends
278
Figure 1 - Principle Component Analysis (PCA) used to select Tolbutamide, Quinine, Biphenyl and
279
Benzoquinone.
280
Figure 2 - The difference between the test set experimentally determined logKd values and the predicted
281
logKd values.
10
Page 10 of 20
282
Figure 3 - The coefficient plot for each of the PLS models built for (a) HEMWat8m (b) HEMWat14m
283
(c) HEMWat17m (d) HEMWat20m (e) HEMWat22m and (f) HEMWat26m (see supplementary
284
information S4 for descriptor definitions).
Ac ce p
te
d
M
an
us
cr
ip t
285
11
Page 11 of 20
285
Table 1 - Ratios of solvents to make up the selected HEMWat solvent systems (heptane, ethyl acetate,
286
methanol and water) Heptane
Ethyl Acetate
Methanol
Water
number [7]
(g)
(g)
(g)
(g)
8m
1
9
1
14m
3
6
3
17m
4
4
4
20m
6
3
22m
6
2
26m
9
1
cr us
6
9
6 4 3
6
2
9
1
an
287
ip t
HEMWat system
Ac ce p
te
d
M
288
12
Page 12 of 20
288
Table 2 - The average and standard deviation for the experimentally measured logKd value for the QC
289
compound, 2-ethylanthraquinone across 15 HPLC runs in triplicate. Average
Standard Deviation
8m
3.19
0.36
14m
2.07
17m
1.18
20m
0.70
22m
0.52
26m
0.19
0.13 0.07
cr
0.10
us
0.04 0.02
an
290
ip t
HEMWat System
Ac ce p
te
d
M
291
13
Page 13 of 20
Table 3 - Statistics for the best performing QSARs generated using Partial Least Squares. R2 training set
Q2 training set
R2 test set
RMSE test set
8m
0.69
0.66
0.86
0.66
14m
0.83
0.69
0.67
0.60
17m
0.81
0.65
0.81
20m
0.85
0.68
0.83
22m
0.89
0.80
26m
0.92
0.87
cr 0.93 0.82
0.43 0.44 0.27 0.47
an
292
ip t
HEMWat number
us
291
Ac ce p
te
d
M
293
14
Page 14 of 20
293
Table 4 – The HEMWat system most likely to provide optimised chromatography QSAR predicted
R2
RMSE
Experimentally determined
R2
RMSE
0.92
0.15
Compound HEMWat system 0.98
0.19
2
Biphenyl
25
0.99
0.29
27
Quinine
16
0.95
0.34
16
Tolbutamide
14
0.88
0.39
16
0.98
0.20
0.98
0.25
0.95
0.31
cr
13
us
Benzoquinone
ip t
HEMWat system
294
Ac ce p
te
d
M
an
295
15
Page 15 of 20
295
15
-5
0
5
-5 -10 -15
10
ip t
0 -10
15
cr
-15
5
us
Principal Component 2
10
Principal Component 1 Training set
Test Set
an
296
Ac ce p
te
d
M
297
16
Page 16 of 20
0.50 0.00 8
14
17
20
22
26
ip t
-0.50
-1.50
cr
-1.00 HEMWat number
Benzoquinone
Biphenyl
297
Tolbutamide
Ac ce p
te
d
M
an
298
Quinine
us
Difference between experimentally determined LogKd and predicted logKd
1.00
17
Page 17 of 20
298
us
cr
ip t
a
an
299
300
Ac ce p
te
d
M
b
18
Page 18 of 20
us
cr
ip t
c
301
302
Ac ce p
te
d
M
an
d
e
303 19
Page 19 of 20
us
cr
ip t
f
304
Ac ce p
te
d
M
an
305
20
Page 20 of 20