Prediction of physicochemical properties based on neural network modelling

Prediction of physicochemical properties based on neural network modelling

Advanced Drug Delivery Reviews 55 (2003) 1163–1183 www.elsevier.com / locate / addr Prediction of physicochemical properties based on neural network ...

212KB Sizes 0 Downloads 82 Views

Advanced Drug Delivery Reviews 55 (2003) 1163–1183 www.elsevier.com / locate / addr

Prediction of physicochemical properties based on neural network modelling Jyrki Taskinen a,b , *, Jouko Yliruusi a,c a

Viikki Drug Discovery Technology Center, Department of Pharmacy, University of Helsinki, Helsinki, Finland b Pharmaceutical Chemistry Division, Helsinki, Finland c Pharmaceutical Technology Division, Helsinki, Finland Received 17 February 2003; accepted 16 May 2003

Abstract The literature describing neural network modelling to predict physicochemical properties of organic compounds from the molecular structure is reviewed from the perspective of pharmaceutical research. The standard three-layer, feed-forward neural network is the technique most frequently used, although the use of other techniques is increasing. Various approaches to describe the molecular structure have been successfully used, including molecular fragments, topological indices, and descriptors calculated by semi-empirical quantum chemical methods. Some physicochemical properties, such as octanol– water partition coefficient, water solubility, boiling point and vapour pressure, have been modelled by several research groups over the years using different approaches and structurally diverse large training sets. The prediction accuracy of most models seems to be rather close to the performance of the experimental measurements, when the accuracy is assessed with a test set from the working database. Results with independent test sets have been less satisfactory. Implications of this problem are discussed.  2003 Elsevier B.V. All rights reserved. Keywords: Quantitative structure–property relationships; Octanol–water partition coefficient; Aqueous solubility; Boiling point; Vapour pressure; Flash point; Drug design; Drug development

Contents 1. Introduction ............................................................................................................................................................................ 2. Methods published for prediction of physicochemical properties by neural networks .................................................................... 2.1. Octanol–water partition coefficient.................................................................................................................................... 2.2. Water solubility ............................................................................................................................................................... 2.3. Boiling point ................................................................................................................................................................... 2.4. Vapour pressure ............................................................................................................................................................... *Corresponding author. P.O. Box 56, Department of Pharmacy, Pharmaceutical Chemistry Division, FIN-00014 University of Helsinki, Finland. Tel.: 1358-4-0838-5605 or 1358-9-19159191; fax: 1358-9-1915-9556. E-mail address: [email protected] (J. Taskinen). 0169-409X / 03 / $ – see front matter  2003 Elsevier B.V. All rights reserved. doi:10.1016 / S0169-409X(03)00117-0

1164 1165 1165 1167 1168 1169

1164

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

2.5. Henry’s law constant ........................................................................................................................................................ 2.6. Critical temperature and critical pressure ........................................................................................................................... 2.7. Crystal packing ................................................................................................................................................................ 2.8. Density and refractive index ............................................................................................................................................. 2.9. Dielectric constant and Kirkwood function......................................................................................................................... 2.10. Flash point and autoignition temperature .......................................................................................................................... 2.11. Viscosity, surface tension and thermal conductivity .......................................................................................................... 2.12. Other properties ............................................................................................................................................................. 3. Discussion .............................................................................................................................................................................. 3.1. Comparison of the neural network methods with the established methods for logPoct prediction............................................. 3.2. Status of logSw prediction ................................................................................................................................................. 3.3. Neural network models compared with linear models ......................................................................................................... 3.4. Design of training set ....................................................................................................................................................... 3.5. Model validation .............................................................................................................................................................. 3.6. Selection of descriptors .................................................................................................................................................... 3.7. Alternative neural network techniques in QSPR ................................................................................................................. 4. Conclusions ............................................................................................................................................................................ References ..................................................................................................................................................................................

1. Introduction Interest in quantitative structure–property relationship (QSPR) studies has grown during last 10 years. Increasing number of neural network (NN) models are currently published for predicting various physicochemical properties from the molecular structure. Drug development is often referred to as a motivation for the work. What is the actual need for prediction of physicochemical properties in pharmaceutical research? Prediction method would be valuable, if the value of the property is seriously needed, but its experimental determination is difficult or impossible. In the development of optimised pharmaceutical products it is necessary to know all basic physicochemical parameters of substances which are aimed to be included in the formulation. Above all, it is important to know the solubility, partition coefficient and pH of the drug substance, preferably the former two as the function of pH. This information is needed in the preformulation stage. Most physicochemical properties of a drug substance needed in product development are rather easy to measure. In practice, the experimental values are anyway determined for regulatory reasons. The properties of solvents and other industrial chemicals are usually known. Therefore, the need for computational prediction in this area may be questioned. Still, reliable methods for prediction of all relevant properties would undoubtedly save experimental chemists from

1170 1170 1171 1171 1172 1172 1173 1173 1173 1173 1174 1175 1176 1177 1178 1179 1180 1180

unnecessary work. Experimental studies of certain properties are demanding and laborious. An important case is the study of polymorphism of the drug substance. The crystal form may affect the processability, stability and bioavailability of a pharmaceutical product. It would be of great value to explore the possible polymorphism in an early stage of the product development. In drug discovery phase, it would be valuable, if certain physicochemical properties could be calculated before synthesising or purchasing a screening library. Lipophilicity and water solubility are properties, which can be used as rough early ADME screens to reject probable development failures as early as possible. Two phases of drug discovery, screening and lead optimisation, set somewhat different requirements for a prediction method. Virtual screening of computer designed combinatorial libraries, or databases of available compounds, requires methods that are computationally inexpensive and can be applied to huge collections of molecular structures. Accuracy of the prediction would be adequate, if it is possible to eliminate compounds that are likely to possess very unfavourable physicochemical properties. On the other hand, during lead optimisation a desirable model would give fairly accurate predictions and would be interpretable to suggest structural modifications. Prediction of certain physical properties is relevant for some other fields than pharmacy and drug research. Prediction of dielectric constants may be

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

useful in the design of new materials. Vapour pressures and Henry’s law constants are important for assessing the distribution of chemicals in the environment. The critical values are needed in chemical engineering. Nevertheless, probably the principal motivation for working on the prediction of many properties was stated by Hall and Story [1] in their paper demonstrating the use of a new descriptor type: ‘‘Boiling point and critical temperature are significant properties in revealing the intermolecular aspects of molecules. Further, they are useful for testing development of QSAR models.’’ The first report of neural network modelling in QSPR was the work of Bodor and co-workers [2] on estimation of the aqueous solubility in 1991. Since then, neural network modelling has been applied to most physicochemical properties, for which suitable experimental data can be found in the literature. Two notable exceptions are the melting point and the acid dissociation constant, pKa . Table 1 shows the list of those 27 properties, the neural network modelling of which are reviewed in this paper. By the middle of the 90s, a standard procedure of applying neural Table 1 Physical–chemical properties predicted from the molecular structure by neural network methods Partition coefficient (logPoct ) [14–21,24,27,31–33,64] Water solubility (logSw ) [2,8,23,25,26,28,30,37–40,44,87,88] Aqueous activity coefficient [75] Solubility in solvents [77] Boiling point [1,45–56,89,90] Critical temperature [1,53,56,65] Critical pressure [56,65] Vapor pressure [17,59,60,62,80] Henry’s law constant [63] Heat capacity, DG of formation, DH of formation [46] Enthalpy of sublimation [91] Flash point and autoignition temperature [54,69–71] Heat of vaporization [50] Density [46,48] Refractive index [46,48] Surface tension [74] Viscosity [50,73,74] Thermal conductivity [74] Dielectric constant [67,68] Glass transition temperature [78] Solvatochromic parameter [76] Crystal packing [92] Pitzer’s acentric factor [50] Kirkwood function [68]

1165

networks in QSPR had evolved, and the same basic approach was used for the whole decade practically in all studies on property prediction. The basic method involves a feed-forward neural network containing three layers: the input layer, one hidden layer and the output layer with one node. Occasionally, network configurations with more than one output have been used. Variability of the networks has been taken into account by training an ensemble networks and averaging predictions. The standard procedure involves also the use of validation data sets to the control of the network training and to evaluate the prediction accuracy. During the last couple of years, there has been growing interest in applying other than the standard, feed-forward neural networks in QSPR modelling. These techniques include Bayesian, radial basis function, general regression, Fuzzy ARTMAP and Kohonen neural networks. In the first section of this article we review the results published on neural network prediction of physicochemical properties. Excluded are papers that deal with analytical properties, powder properties, processability of pharmaceutical materials or properties of mixtures. In the second section we review the discussion that the original authors have carried out about some critical aspects of predicting physicochemical properties. An effort is made to draw together results and conclusions from various studies to make some points that might be of interest to the researchers in the field. We have looked at the topic from the perspective of pharmaceutical research with emphasis on properties important for drug delivery. Reviews on related topics have been recently published for instance by Agatonovic-Kustrin and Beresford [3], Katritzky et al. [4], Taskinen [5], Grover et al. [6,7] and Huuskonen [8].

2. Methods published for prediction of physicochemical properties by neural networks

2.1. Octanol–water partition coefficient Lipophilicity was the first physicochemical property, and is still the only one, for which prediction methods were developed and widely accepted in pharmaceutical research. The octanol–water partition

1166

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

coefficient, or its logarithm (logPoct ), has become the standard scale for lipophilicity, largely as the result of the work of Hansch and Leo [9,10]. Their CLOGP program is the most widely used method for estimation of logPoct . The CLOGP program breaks molecules into fragments, for which its database contains fragment constants and correction factors accounting for fragment interactions. The logPoct value is estimated by summing up these constants. Several other methods have been published based on the group contribution approach or on calculated molecular properties. Another widely used method for logPoct prediction in pharmaceutical research is the atom / fragment contribution method KOWWIN by Meylan and Howard [11]. KOWWIN is based on fragment values determined for 130 simple fragments and 235 correction factors by linear regression fitting a training set of 2351 compounds. The results were validated with a test set of 6055 compounds. In the methods discussed above, molecules are described by a collection of atom types or substructures identified in the molecular structure. An alternative approach is to calculate various molecular properties assumed to be relevant to the partitioning process. Properties can be calculated from a two- or three-dimensional molecular structure. This represents a more general approach to describe molecular interaction properties, and therefore can be argued to lead to more general models. First examples of this approach combining neural networks modelling and descriptors calculated by the semi-empirical AM1 method are represented by the works of Bodor et al. [12], Cense et al. [13] and Grunenberg and Herges [14]. These early examples with small training and validation data sets serve to demonstrate the principle rather than work as applicable models. Later Duprat et al. [15] used the compound set and the structure description approach of Bodor to show the improvements achieved by using an advanced methodology for designing neural networks. Clark and co-workers used a data set containing 1085 compounds for developing an NN model for logPoct prediction from the results of semi-empirical AM1 calculations [16]. Subsequently, they proposed a refined model based on the same set [17]. The 16 input descriptors selected in the final model (16-101) included atomic charges, electrostatic potential

values and geometric descriptors. The training process and prediction ability were controlled by crossvalidation by dividing the data set into 11 roughly equal randomly selected portions giving the crossvalidated s 5 0.56. An external test set of 41 nucleosides and nucleoside bases were predicted with s 5 0.39. ¨ et al. developed NN models for logPoct Eros calculation using a database of 625 molecules, 98% of which are registered drugs showing high structural diversity [18]. The data set was divided into a work set of 325 molecules and an external validation set of 300 molecules trying to produce subsets, which are homogenously distributed in the descriptor space. The inputs to the NN model with 37-5-1 architecture contained constitutional, topological, charge, histogram type and molecular property descriptors. The standard deviations of the fitting and prediction errors were s 5 0.48 and s 5 0.72, respectively. Descriptors derived from optimised 3D structures using semi-empirical quantum chemical calculations have been applied only to rather small data sets. Molecular properties calculated from 2D structures are comparable to the substructure methods regarding computational cost, and can be applied to large databases. An example is the work of Devillers et al. who developed a feed-forward neural network model from a training set of 7200 chemicals. Autocorrelation vectors encoding hydrophobicity, molar refractivity, H-bonding acceptor ability and H-bonding donor ability calculated from 2D structures were used to describe the molecular properties [19,20]. Networks with 35-32-1 configuration were found optimal using a validation set of 200 compounds to control the training process. An ensemble of four networks was selected as the final model giving s 5 0.37 for the training set and s 5 0.39 for the test set (n 5 519). The authors concluded that the model is particularly powerful for simulating the lipophilicity of compounds presenting a high degree of structural diversity, such as pesticides and industrial chemicals. On the other hand, they noted that logPoct values for salts, amino acids, nucleosides, nucleobases and compounds with formal charges cannot be calculated, because these compound classes were not represented in the training set. The neural network models developed for prediction of logPoct include approaches related to the

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

group contribution methods. Schaper and Samitier trained a standard three-layer, feed-forward NN using a rather small training set (n 5 268) and indicator variables for atom and bond types [21]. The most successful group contribution type NN methods are based on atom / bond type electrotopological state (E-state) indices introduced by Hall and Kier [22]. Since demonstration of the usefulness of this approach by Hall and Story for prediction of boiling point and critical temperature [1], these indices have been extensively used in QSPR modelling, especially to develop models for predicting logPoct and water solubility. Many papers have been published describing basically the same approach for model building [23–32]. In addition, commercial programs, like logP/ Interactive Analysis and ScilogP Ultra / SCIVISION, have been developed. Of the several methods using E-state indices, the logPoct prediction module of the ALOGPS program by Tetko et al. [31–33] is based on the largest data base and is most thoroughly evaluated. The method was developed on the basis of neural network ensemble analysis of 12908 organic compounds from the PHYSPROP database of the Syracuse Research Corporation. Input parameters were 75 atom / bond type E-state indices. An ensemble of 50 networks with 75-10-1 architecture was used to calculate the predictions. To improve prediction accuracy, the method called associative neural network (ASNN) was used. In ASNN the results are corrected for bias based on the errors of k-nearest neighbours in the output space. The standard error of prediction estimated by leave-one-out technique was reported to be 0.45 for all 12908 compounds and 0.39 after removing 131 outliers. However, the results were less encouraging, when ALOGPS was tested with an independent data set of 6100 compounds from BASF. The prediction error was larger than 0.5 log units for half of the compounds and larger than 1 log unit for one-fifth.

2.2. Water solubility Water solubility is the other physicochemical property, for which prediction methods would be really useful in pharmaceutical research. It is usually expressed as logSw , where Sw is the solubility in

1167

moles per litre. Unlike in case of logPoct , there are no generally accepted and used methods for prediction of logSw . This may be partly due to the fact that experimental data sets of similar size, diversity and quality have not been publicly available. Another reason is that water solubility of drugs poses a greater challenge for empirical modelling than partition coefficients. Solubility of liquid compounds depends on similar intermolecular interactions and entropy effects as partitioning between two liquid phases. The empirical model of Hansch [34] and semi-empirical model of Yalkowsky [35] suggested decades ago that logSw for organic liquids can be calculated by regression equations with logPoct as the only variable. Therefore, it can be expected that all the approaches reviewed above for prediction of logPoct would give comparable results, if trained for prediction of logSw of liquids. Most drugs are solids. Solubility of solid compounds depends on the free energy changes involved in changing the solid state to liquid state, in addition to the interactions in the liquid phase. Yalkowsky [35,36] has introduced a model for solubility of non-electrolytes, called the general solubility equation (GSE) based on thermodynamic grounds. The model contains only two parameters, logPoct for liquid phase effects and the melting point for solid phase effects. Coefficients of the model are not fitted parameters, but based on thermodynamic approximations. GSE has shown to give reasonable predictions for diverse organic compounds [37], but it requires an experimental parameter, the melting point, which is as difficult a problem for prediction as solubility. For prediction of water solubility, GSE suggests semi-empirical type models based on calculated logPoct or a descriptor set for its calculation. The group contribution type approaches for prediction of logSw are actually very similar to the corresponding methods for logPoct prediction. Livingstone et al. [29] have even trained a neural network for simultaneous prediction of logPoct and logSw with same 30 atom type E-state indices as the input parameters. However, the first neural network methods for prediction of logSw by Bodor et al. [2] was based on molecular properties calculated using semi-empirical AM1 method for a training set of 331 compounds. Jurs and his co-workers have developed an approach

1168

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

in which a small set of topological, electronic, geometric and combination descriptors are selected as input variables [38–40]. In their most recent studies a set of 399 diverse compounds with a molecular weight range of 53–959 was used [40,41]. A standard, feed-forward neural network model (115-1) and a generalised regression neural network (GRNN) model were developed. The two models resulted in similar prediction accuracy evaluated with a test set of 50 compounds (s 5 0.69 and 0.70, respectively), but five descriptors only were necessary in the GRNN model. These descriptors represented one topological, one E-state, one geometric and two geometric / electronic hybrid descriptors. Five recent models for the prediction of logSw are based on somewhat larger data sets. Two of the models, those of Huuskonen [25] and Tetko et al. [30], are closely related. They are based on a set of 1297 compounds compiled by Huuskonen from the AQUASOL and PHYSPROP databases and use similar descriptor sets. Huuskonen used E-state and other topological indices as the input for the 30-12-1 neural network and achieved s 5 0.60 for the randomly chosen test (n 5 413). Tetko’s model with 33-4-1 architecture using only E-state indices as inputs, provided s 5 0.62 for the test set (n 5 412). Two other groups used the same compound set, but a different approach for structure description. Liu and So [42] chose seven molecular descriptors that were assumed to be physically meaningful for the network of 7-2-1 architecture and obtained s 5 0.71 for the randomly chosen test set (n 5 258). The descriptors calculated from 2D structure were logPoct as a hydrophobicity measure, molecular polar surface area (PSA) as a hydrophilicity descriptor, molecular weight as a size descriptor and four topological indices selected by a genetic algorithm. Yan and Gasteiger [43] described the molecules by a set of 32 values of a radial distribution function code representing the 3D structure, and eight other descriptors including the highest hydrogen bond donor and acceptor potentials. The network of 40-7-1 architecture produced s 5 0.59 for the test set (n 5 496) chosen by Kohonen neural network from the initial data set. When the model was applied to a completely independent data set (n 5 1587) from Merck KGaA, the performance was disappointing (s 5 0.93).

Bruneau [44] used a data set of 1560 compounds containing data from literature and a proprietary data set of 522 compounds. An independent test set (n 5 934) was similarly composed of literature data (n 5 673) and proprietary data (n 5 261) of AstraZeneca. About 100 topological, geometrical and electronic descriptors were calculated. Bayesian learning of neural nets and automatic relevance procedure was used to select 16 descriptors and to train a network with 16-8-1 architecture. The descriptors in the best model included CLOGP, number of hydrogen bond donors and number of potential positive charges. Standard errors for the training set and the test set were s 5 0.53 and s 5 0.81, respectively. The disappointing difference between the standard errors was analysed more closely by calculating the descriptor distance from the test compounds to the training set using two parameters designed for that purpose. It was found that the error was similar to that of the training phase for compounds with a small descriptor distance, and as the compounds get more dissimilar the standard errors increase smoothly.

2.3. Boiling point The boiling point of a compound is determined by intermolecular interactions in the liquid and by the difference in the molecular internal partition function in the gas phase and the liquid phase at the boiling point. Boiling point can be measured easily but its prediction has limited practical value only in drug research. Nevertheless, it has been the most popular physicochemical property for neural network modelling exercises. Several research groups have modelled the normal boiling point of hydrocarbons. Predictive NN models have been published for alkanes [45–47], alkenes [47,48] and for diverse hydrocarbons [47,49,50]. As expected, the models typically show good fitting and prediction statistics with less than ten simple descriptors. In the most recent work, Espinosa et al. [47] applied Fuzzy ARTMAP NN to model a set of 327 hydrocarbons including alkanes, alkenes and alkynes. Six topological indices were used as structure descriptors. Dipole moment was added in the descriptor set to separate geometric isomers of alkenes. The absolute average error for a test of 67 compounds was 1.2 K. Wessel and Jurs [49] mod-

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

elled a more diverse set of hydrocarbons (n 5 356) including aromatic compounds. Boiling point range was from 169 to 770 K. The best 6-5-1 network with quasi-Newton optimisation produced standard error of 7.1 K for a small test set (n 5 15). The six inputs were topological, electronic and geometric descriptors. Another example of NN models build for a specific chemical series is the model of Bunz et al. for prediction of physical properties including boiling point for chlorosilanes [51,52]. Several boiling point models have been developed also using data sets with more structural variation [1,53–57]. Jurs and his co-workers modelled a set of 298 compounds with the experimental boiling point range from 225 to 648 K in two studies [53,57]. The more recent NN models with the 8-3-1 architecture gave a standard error of 8.7 K for a test set of 30 members. The same data set was modelled by Hall and Story [1]. They trained a 19-4-1 model with atom type E-state indices as input descriptors. The mean absolute error of 5.6 K was obtained for the same test set. Tetteh et al. [54] applied Radial Basis Function (RBF) network in predicting the boiling point of 400 organic compounds representing multifunctional compounds in addition to 20 monofunctional classes. A molecular connectivity index and counts of the 25 molecular fragments were used as descriptors. The average absolute error was 14.7 K for a test set (n 5 133). The same accuracy was obtained with a double output network capable of simultaneous estimation of boiling point and flash point. Espinosa [56] et al. trained both Fuzzy ARTMAP and feed-forward network for predicting normal boiling points of diverse organic compounds. Descriptors were topological indices and dipole moment. The absolute mean errors of the 8-12-1 feedforward model were 27.7 K for the training set (n 5 1168) and 20.8 K for the test set (n 5 153). The corresponding errors in case of the Fuzzy ARTMAP model were 2.0 and 13.5 K. The most general boiling point model has been developed by Clark and his co-workers [55]. They used a data set of 6629 compounds with very diverse functionality, containing elements H, B, C, N, O, F, Al, Si, P, S, Cl, Zn, Ge, Br, Sn, I, and Hg. The boiling points ranged from 112 to 824 K. A predic-

1169

tion set of 629 molecules was separated in such a manner that the boiling points spanned the entire range. The 18 descriptors calculated from AM1 or PM3 optimised 3D structures included descriptors based on atomic charges and electrostatic potentials, counts of hydrogen bond donors and acceptors, and geometric descriptors. An ensemble of 10 networks with 18-10-1 configuration was trained. The predictions of the model were given by the mean result for the 10 networks. The best model used AM1 descriptors. The standard error for the training set was 16.5 K (n 5 6000) and for the prediction set 19 K (n 5 629). In the whole data set, 35 molecules displayed fitting or prediction errors larger than 55 K. The main reasons for large errors were analysed to be the following: (i) experimental boiling points measured at reduced pressures; (ii) the poor description of certain unusual structures by AM1; and (iii) descriptors were calculated for the wrong tautomeric structure.

2.4. Vapour pressure Vapour pressure determines the volatility of a chemical compound. It is one of the important properties determining the environmental fate of chemicals. In pharmaceutical research, vapour pressure is a relevant parameter for example in the development of pressurised aerosols and perfumes. Vapour pressure shows a pronounced dependence on temperature. A single-temperature model and temperature-dependent QSPR models have both been developed for vapour pressure prediction. Goll and Jurs [58] modelled vapour pressure at 25 8C for a set of 353 hydrocarbons and halogenated hydrocarbons with a vapour pressure range from 2 1.00 to 6.65 log (Pa) units. The NN model with 7-3-1 architecture predicted the test set with a root mean square error of 0.209 (n 5 52) Kuhne et al. [59] used a data set of 8148 experimental vapour pressures of 1838 hydrocarbons and halogenated hydrocarbons. The training set consisted of 1200 compounds and the rest were used as the validation set. Descriptors of the 25-14-1 model included 20 molecular fragments, temperature and melting point. The overall absolute errors were 0.08 and 0.13 log (Pa) units in the training and prediction sets.

1170

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

Yaffe and Cohen [60] developed temperature-dependent QSPRs for predicting vapour pressure of hydrocarbons. Their data set contained 274 compounds. The optimal NN model was based on 7-29-1 architecture. The descriptors used were three valence connectivity indices, molecular weight, and temperature. The average absolute errors in vapour pressure prediction were 0.039 (Pa) log units. The vapour pressure for diverse organic compounds have been modelled by McClelland and Jurs [61] and Clark et al. [17,62]. McClelland and Jurs modelled vapour pressure at 25 8C using a data set of 420 diverse organic compounds including molecules with oxygen and nitrogen containing functional groups. The molecular weight range was 26–260. Vapour pressures varied from 2 1.34 to 6.68 log (Pa) units. The optimal 10-descriptor set was selected from more than 200 topological, geometric, electronic and combinations descriptors. The NN model with 10-4-1 architecture gave rms errors of 0.19 for the training set (n 5 290) and 0.33 for the test set (n 5 65). Clark and his co-workers modelled similar vapour pressure data (n 5 551) at 25 8C using descriptors based on AM1 calculations [17]. A crossvalidated standard deviation of 0.37 was obtained with the NN model with 10-8-1 architecture. An external test set of 192 compounds gave higher errors (s 5 0.68). Subsequently, Clark et al. [62] developed a temperature dependent NN model using a larger data set of 2349 molecules and 8542 vapour pressure–temperature data points taken from Beilstein. The data set included O, N, Si, B, P, S and halogen containing compounds. Vapour pressure values varied from 2 8.63 to 5.47 log (torr) units and temperatures ranged from 76 to 800 K. The input descriptors of the 27-15-1 model included the absolute temperature and descriptors encoding bulk electrostatic interactions, weak interactions, geometrical properties and electrotopological states. The model resulted in the standard deviation of error of 0.322 log units for the training set (n 5 7681) and 0.326 log units for the validation set (n 5 861).

2.5. Henry’ s law constant Henry’s Law relates the equilibrium liquid and vapour phase concentrations of a solute. For low

solute concentrations, Henry’s Law can be written as w 5 kp, where w is the mass of gas dissolved by unit volume of solvent at an equilibrium pressure p, and the proportionality constant k is The Henry’s Law constant (HLC). HLC is important in a variety of engineering applications and in assessing the distribution of trace organic compounds in the environment. In drug research, there are a few fields in which the HLC is a valuable parameter. Two important examples are the design of chemical synthesis processes for drug substances in the industrial scale and the optimisation of production processes for certain parenterals, or other formulations, which should be protected from soluble gases like oxygen. In QSPR studies, the dimensionless partition coefficient, H, for air / water equilibrium concentration ratio has been used as the measure of the HLC. The logH values at 25 8C vary from 28 to 4 for usual organic compounds. At least two QSPR / NN studies have been published on the prediction of Henry’s Law constant [63,64]. English and Carroll [63] used data set of 357 organic compounds composed of various chemical classes and diverse structural features. The descriptors used contained information of various properties of materials (bulk, connectivity, gross structure, charge, charged surface area, bonding, atomic contributions, hydrogen bonding and group contributions). The best NN model with 10-3-1 configuration produced standard error of 0.24 for a test set of 54 compounds. Yaffe et al. [64] modelled Henry’s law constant using both fuzzy ARTMAP and feed forward NN. The heterogeneous data set (n5495) included compounds with oxygen, sulphur and nitrogen containing functional groups and halogens. The logH values ranged from 26.72 to 2.87. Topological descriptors were used as input parameters. The average absolute errors for the test set of 74 members were 0.13 and 0.27 logH units for FuzzyARTMAP and the feedforward network (7-17-1), respectively.

2.6. Critical temperature and critical pressure Critical temperature is the temperature above which a gas cannot be liquefied, which also means that above the critical temperature a substance cannot exhibit in distinct gas and liquid phases. Similarly

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

critical pressure is defined as the highest pressure at which species can coexist as a liquid and a vapour. Critical temperature and pressures are needed in thermodynamic calculations and they are important for industrial chemical engineers. In pharmaceutical research critical temperatures and pressures are important in many fields, for example, in crystal engineering, when supercritical fluids are used. Jurs and his co-workers modelled critical temperature [53], and subsequently, both critical temperature and critical pressure [65] using a set of 165 small organic compounds (hydrocarbons, oxygen-containing compounds, nitrogen-containing compounds, and halogen-containing compounds). The range of the critical temperatures was from 365 to 782 K and pressures from 11.94 to 55.47 atm. In the latter work, simulated annealing and genetic algorithms were used to select descriptor. A different optimal set of eight descriptors and best network architecture were found for the models. The root mean square errors calculated for a small test set (n518) were 9.85 K and 2.39 atm. Hall and Story [1] used the same data set of critical temperatures to develop a model based on atom type E-state indices. Their model of 19-4-1 architecture gave a mean absolute error of 5.59 K for the test set. Espinosa et al. [56] applied FuzzyARTMAP NN in modelling critical temperature and critical pressure of a large set of heterogeneous organic compounds (hydrocarbons, anilines, pyridines, aldehydes, amines, ketones and aromatic compounds). The critical temperatures of 530 compounds ranged from 190.5 to 926 K and the critical pressures of 465 compounds from 1.02 to 8.95 MPa. Topological indices and dipole moment were used as input descriptors. The absolute mean prediction errors were 8.1 K for critical temperature and 0.13 MPa for critical pressure (n550).

2.7. Crystal packing Crystallisation and crystal formation is a complex process. Most of the drug substances are crystalline solids and many of them have various polymorphic forms. An understanding of crystallisation and crystal packing is important in pharmacy, because many physical properties of various polymorphic forms may differ significantly from each others. From the

1171

classical crystallographic point of view, the molecular crystal packing is described by the position and orientation of the molecule in the unit cell plus the crystallographic space group, which generates equivalent molecules. So far one publication only has been found dealing with the QSPR-based prediction of crystal packing. Fayos and Cano [66] have proposed an NN method by which they predicted crystal packing of small organic molecules from the 3D structure of the molecules. The data set composed of 31 molecules, of quite different chemical characters and known crystal structures. These molecules, encoded by the 1D Fourier transform of their 3D point charge distributions, were used as input in a Kohonen neural network. The self-classification of the encoded packings, on the Kohonen map, presented good correlation with the classification found for the isolated molecules, and both correlated well also with the visual observed types of packings in the crystal structure. Finally the same neural network, trained with 16 of the 31 molecules, supervised with their crystal packing, was used to predict the encoded packing of the 15 molecules not included in the training, in order to classify them into a mode of packing. The results showed that the molecular Fourier Transform encoding included enough information to allow some approach to the prediction of the mode of packing, from the isolated molecule.

2.8. Density and refractive index Density is a basic important physical property of material, which is greatly related to the state of material. It is well known that the density of gases, liquids and solids vary quite differently with temperature and pressure. In drug research, density of gases is important for example in the design of inhalation aerosols. The density of liquids is needed in developing the moist heat sterilisation process, where thermal expansion of the products in the closed containers should be considered. The density of solids reflects molecular arrangement in the crystalline lattice (polymorphic form) as well as on the crystallinity of substances. The refractive index (n) of a medium is the ratio of the speed of light in vacuum to its velocity in the medium. By definition the refractive index of a

1172

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

vacuum is 1, for air the value is 1.0008. In pharmaceutical research refractive index is a property of interest in formulation of eye preparations. Two articles describe neural networks for the prediction of density and refractive index of hydrocarbons along with some other properties. Gakh et al. [46] trained a 7-8-6 network using a set of 106 saturated hydrocarbons and topological descriptors. An average error of 1–2% for density, and of 0.2% for the refractive index was reached with the test set (n525). Liu et al. [48] used five topological indices as input descriptors for the neural network with 5-5-3 architecture in predicting the refractive index, density and boiling point for alkenes. The training set contained 49 members. Standard error of 0.13% was found the refractive index and 0.4% for density using a test set of 16 alkenes.

2.9. Dielectric constant and Kirkwood function The dielectric constant (or relative permittivity ´) is a macroscopic property of a matter and it represents the ability of a substance to separate charge and to orient its molecular dipoles in an external electric field. In many cases, the use of a function of dielectric constant, for example the Kirkwood function (´ 2 1) /(2´ 1 1), is more practical than the constant itself. The dielectric constant can be easily measured with a condenser, but the ability to predict it could be valuable in the molecular design of new materials. The dielectric constant has a very significant effect on the strength of interactions between ions in the solution and has relevance in pharmaceutical research in, for example studying particle– particle and ion–ion interactions in liquids. Two neural network studies on the prediction of dielectric constants have been published recently. Schweitzer et al. [67] studied a large number of models based on a data set of 497 organic compounds with a wide variety of functional groups. The dielectric constant of the compounds spanned from 1 to 40. The molecular descriptors used included the dipole moment, polarisability, counts of elemental types, an indicator of hydrogen bonding capability, charged partial surface area descriptors, and molecular connectivity descriptors. The best models had a

standard error less than 2.0 for the test set (n597) and less than 3.0 units for the training set (n5350). Sild and Karelson [68] developed NN models for predicting dielectric constant and Kirkwood function using a data set of 155 organic liquids with extensive structural diversity and a range of 1.87–46.5 for the dielectric constant. Separate models with 5-5-1 configuration were developed for both dielectric constant and the Kirkwood function. It was concluded that at least two different types of descriptors are necessary to predict the constants: descriptors related to molecular electronic properties and descriptors to describe intermolecular interactions. The average prediction error for the dielectric constant was 27.0% and for the Kirkwood function 4.1%.

2.10. Flash point and autoignition temperature The flash point can be defined as the lowest temperature at which a liquid produces enough vapour to ignite in the presence of a source of ignition. Like the boiling point of a substance, its flash point depends on the ambient pressure: the more volatile a solvent is at normal temperature, the greater is the fire hazards it poses. The autoignition temperature (AIT) of a chemical is the lowest temperature at which a material will ignite without an external source of ignition. A knowledge of the flash point and AIT of materials are essential in many pharmaceutical and chemical unit operations. Important activities are chemical synthesis and, for example, processes were powdered materials are handled at low relative humidity (weighing, mixing, fluidizing, drying, storage, transportation) in which frictional electricity may be generated. Two groups have developed NN models for the prediction of the AIT with moderate success. Mitchell and Jurs [69] modelled the AIT using a data set of 327 compounds including hydrocarbons, halogenated hydrocarbons, and compounds containing oxygen, sulphur, and nitrogen. The AIT values varied from 140 to 705 8C. Attempts to develop a model for the entire data set were unsuccessful. More reasonable prediction results were obtained, when separate models were optimised for five subsets: low-temperature hydrocarbons, high-temperature hydrocarbons, nitrogen-containing compounds,

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

oxygen- and sulfur-containing compounds, and alcohol and ether compounds. An optimal set of 5–7 descriptors was selected for each model by genetic algorithm and simulated annealing routines from a large pool of topological, electronic and geometric descriptors. The prediction errors varied from 5 to 33 8C depending on the subset, and were in the range of experimental error. However, the test sets were small (n54–15). Tetteh et al. [70,71] have used both the standard NN technique and the radial basis function NN for modelling the AIT. The more recent work was based on a data set of 232 organic compounds with 13 different functional groups [71]. The RBFNN was optimised by biharmonic spline interpolation. The average error for the test set was 33 8C (n577). Tettech et al. [54] have also developed a RBFNN for simultaneous prediction of flash point and boiling point. The database contained 400 organic compounds with flash points between 260 o C and 200 o C. The structures were described simply with the molecular connectivity index 1 x and counts of the 25 functional groups present in the molecules. The average absolute error with for the test set in flash point prediction was 11.9 o C using a RBFNN with a 26-36-2 configuration.

2.11. Viscosity, surface tension and thermal conductivity Surface and transport properties of liquids, such as viscosity, surface tension and thermal conductivity, are important in chemical engineering and are relevant for pharmaceutics in drug formulation and administration. The work of Homer et al. [50] on modelling hydrocarbons by NN included liquid viscosity as one of the predicted properties. Suzuki et al. developed an NN model for predicting liquid viscosity at a standard temperature of 20 8C [72] and subsequently, a temperature-dependent model [73]. The latter work was based on experimental values for 1229 data points from 440 diverse compounds covering the temperature range from 2120 to 160 8C and viscosity values from 20.928 to 5.127 log (mPa s) units. The input descriptors included indicator variables for functional groups and several physicochemical properties. The best model showed a root mean square error of 0.148 log units for the test set

1173

of 79 compounds and 133 data points. Kauffman and Jurs [74] reported a predictive NN model for viscosity, surface tension and thermal conductivity of 213 common organic solvents. Eight-descriptor models were developed for surface tension and viscosity, while the thermal conductivity model contained nine descriptors. The root mean square errors of the test sets were 2.89 mN / m, 0.150 mPa s, and 0.0236 W/ m K for surface tension, viscosity and thermal conductivity, repectively. A general nine-descriptor model for the three properties was also developed. The results were comparable with the individual models.

2.12. Other properties In addition to the works described above, a single report has been published for NN modelling of a number of other physicochemical properties, usually with a rather limited set of compounds. The 7-8-6 model of Gakh et al. [46] included heat capacity, Gibbs energy and enthaply of formation for hydrocarbons. Chow et al. estimated aqueous activity coefficients [75]. Svozil et al. [76] predicted the solvatochromic polarity / polarisability parameter for diverse organic compounds. Bunz et al. [52] modelled the entalpy of fusion for 41 esters. Homer et al. [50] applied their approach for liquid viscosity modelling also to heat of vaporisation and Pitzer’s acentric factor for hydrocarbons. Jurs and co-workers have modelled fullerene solubility in 96 solvents [77] and glass transition temperatures of polymers [78].

3. Discussion

3.1. Comparison of the neural network methods with the established methods for logPoct prediction A critical test for the success of neural network methods is to compare their performance for logPoct prediction with the methods which are widely used in pharmaceutical research. Two neural network methods have been extensively compared with CLOGP and KOWWIN. These two methods are the ALOGPS program of Tetko et al. [31], and Auto¨ et al. [18]. The former is based QSAR / NN of Eros on E-state indices and a large general database (n5

1174

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

12908), the latter is based on molecular properties and trained with a specialised database (n5625) containing mainly drug compounds. For comparing the performance, Tetko et al. [31] divided the data set into two subsets. Subset 1 represented the 9429 molecules, for which the CLOGP database has an experimental value. Subset 2 represented the remaining 3479 compounds. Comparison of the prediction results is shown in Table 2. KOWWIN predicts both subsets equally well. ALOGPS predicts the subset 1 with comparable accuracy, but for subset 2 there are more prediction failures, although the validation schemes are favourable for ALOGPS, because the subset 2 is included in the training set. CLOGP showed reduced accuracy and a large number of outliers for the subset 2, but in most cases it gave a warning of missing parameters. The ability of KOWWIN to generalise is impressive considering that its training set contained 2351 compounds [11]. ALOGPS was trained with a training set that was five times larger, but it may be more biased due to unbalanced coverage of the structure space. The inadequate ability of ALOGPS to generalise is clearly demonstrated in the prediction results for a proprietary compound set from BASF [79]. The performance of ALOGPS can be improved by feeding it with new experimental data belonging to the same structural category as the unknown compound. This is, however, not the most wanted feature in drug discovery searching for new structures. Two commercial programs, Interactive Analysis / logP and ScilogP/ Ultra, are based practically on the same data and similar parametrisation as ALOGPS.

According to the evaluation of Tetko et al. [30], also their performance is similar to that of ALOGPS. ¨ et al. [18] compared the prediction ability of Eros their AutoQSAR logP models with CLOGP and KOWWIN using three different test sets (Table 3). All methods, including SciLogP/ Ultra performed equally well in predicting a test set of 300 drug-like compounds. Because these compounds are mostly ¨ et al. also known to CLOGP and KOWWIN Eros used a small test set of unknown molecules. The authors hypothesised that AutoQSAR models might be superior to fragment methods in predicting new molecules. This could not be demonstrated with respect to CLOGP. AutoQSAR modules and CLOGP predicted this set with similar accuracy as the other test set. KOWWIN failed with three compounds, which all contained the same pyridazinone fragment. A third comparison was made with a set of difficult compounds described as ‘heavy outliers’ in the literature of logPoct modelling. CLOGP, KOWWIN and the linear AutoQSAR models predicted this set with same accuracy as the other test sets, but the neural network module of AutoQSAR gave many poor predictions. The results seem to suggest that the linear models of AutoQSAR are comparable with CLOGP and KOWWIN in predicting complex and difficult molecules, but the neural network model may be over-fitted.

3.2. Status of logSw prediction Table 4 shows the comparison of logSw prediction accuracy for those five neural network methods which are based on the largest and most diverse data

Table 2 Comparison of the prediction of logPoct by ALOGPS, CLOGP and KOWWIN methods for 12908 compounds from PHYSPROP database (results from Ref. [30]) Method

Subset 1 (9429 molecules) s

CLOGP KOWWIN ALOGPS LOO c ALOGPS random d a

a

0.36 0.40 0.37 0.41

outliers

Subset 2 (3479 molecules) b

74 56 68 81

Standard error of prediction. Prediction error larger than 1.5 log units. c Predictions by leave-one-out technique. d Data was divided randomly between a training set and a test set of equal size. b

s

outliers

0.62 0.46 0.44 0.48

558 55 63 107

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

1175

Table 3 Comparison of the accuracy of logPoct prediction by AutoQSAR and other methods (results from Ref. [81]) Method

AutoQSAR / NN AutoQSAR / MLR AutoQSAR / PLS CLOGP KOWWIN SciLogP Ultra

Test set 1 a sd

Test set 2 b s

0.72 0.72 0.69 0.62 0.65 0.68

0.64 0.56 0.65 0.63 1.11 –

‘Heavy outliers’ c s

Outliers e

0.79 0.60 0.60 0.66 0.61 –

17 1 0 2 2

a

¨ et al. [18]. External test set (n5300) of Eros Compounds unknown to all models (n518). c Compounds referred as ‘heavy outliers’ in the literature (n578). d Standard error of prediction. e Prediction error more than 1.5 log units. b

Table 4 Comparison of the prediction accuracy of logSw by neural network and multiple linear regression models Model

Huuskonen [25]

Training set

Test set

sa

s

n

0.60 0.71 0.60 0.81 0.59 0.93 0.79 0.71 0.84

413 413 412 412 496 1587 b 496 258 934

n

Yan and Gasteiger [43]

NN MLR NN MLR NN

0.47 0.67 0.47 0.75 0.50

884 884 879 879 797

Liu and So [42] Bruneau [44]

MLR NN NN

0.93 0.70 0.53

797 1033 1560

Tetko et al. [30]

a b

Standard error of prediction. An independent test set from Merck KGaA.

sets. Four methods show standard errors of prediction of about 0.6 log units. The prediction accuracy is about the accuracy of experimental values [80]. Does this mean that there is a choice of adequate methods for prediction of logSw ? On the basis of the current literature the answer will be no. The picture was broken by the work of Yan and Gasteiger [43]. Their model was based on the same working set (n51297) as three other models shown in Table 4. With a test set selected from the working set, the prediction error was similar to that of the other models. Yan and Gasteiger tested their model also with an independent set from Merck KGaA. Prediction results for this data were disappointing (s50.93, n51587). The authors concluded that the

reason is not the modelling approach used, but the lack of adequate diversity in the training set. This conclusion means that the other methods based on the same compound set suffer from same limitations of prediction ability. The fifth method developed by Bruneau [44] is based on a different database, which contained, in addition to literature data, 522 research compounds from AstraZeneca. The latter data set was reported to represent restricted diversity. The test set used by Bruneau was not related to his training set, which is seen in the standard error (s50.83). Still, half of the test set was taken from the data used in the four methods discussed above. Therefore, it seems likely that the error estimate is optimistic considering more diverse data.

3.3. Neural network models compared with linear models The main advantage of neural network modelling is that complex, non-linear relationships can be modelled without any assumptions about the form of the model. It may be questioned, whether the flexibility of neural networks is really needed in QSPR modelling. Many papers describe NN and MLR models trained in parallel using the same descriptors and compound sets. Therefore, a lot of data are available for direct comparison of the methods. For instance, Sild and Karelson [68] report that their NN models for prediction of dielectric constant did not give

1176

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

noticeable improvement over the respective MLR models. The authors suggest that MLR should be preferred, because it is faster and simpler. On the other hand, the NN models display the smaller standard error of prediction in all cases shown in Table 4 for logSw prediction. Many other comparisons of NN and MLR have been published, for instance for logPoct [16,24,27,29], Henry’s Law constant [63], boiling point [49], critical temperature and critical pressure [65]. The overwhelming majority of comparisons show smaller prediction error for neural networks compared to MLR. This has been usually taken to show that the underlying relationships are non-linear and require neural networks for adequate approximation. However, the background of the comparison is often the selection of input descriptors for neural network by MLR. Genuine interest for optimisation of the MLR method may have been lacking, because none of the comparisons evaluated the effect of higher order terms in the MLR models. Lucic et al. [81] studied MLR models containing 2- and 3-fold cross-products of the initial descriptors. The authors conclude that higher order polynomial regression outperforms neural networks giving better predictions with simpler models. They modelled the data of Cherqaoui and Villemin [45] for predicting boiling points of 150 alkanes. With a 17-term regression model, a cross-validated standard error of 2.88 K was obtained compared to 3.60 K reported by Cherqaoui and Villemin for a neural network with 10-6-1 configuration. The underlying assumptions of MLR about independence, accuracy and relevance of the independent variables may be problematic in QSPR. This problem could be circumvented using a method like PLS. There are only few comparisons of neural networks with PLS modelling in the literature reviewed. The results of Tettech et al. [54] favoured neural network over PLS. They developed a radial basis function neural network model for predicting boiling point and flash point, and compared the result with PLS models trained with the same data. Neural network models gave lower absolute average errors for both properties. Significant curvature, more pronounced in case of flash point, was observed at lower temperatures, when comparing experimental and predicted values. It was concluded that PLS is not

able to model efficiently the non-linear relationships in the data. However, the effect of second-order terms in the PLS models were not studied. ¨ et al. [18] report slightly better results for Eros PLS and MLR compared to neural networks. In parallel with their neural network method for logPoct prediction, they optimised also MLR, PLS models using the same data. Optimal descriptor set was selected for each method from a large pool. The results of validation (Table 3), as discussed above, showed that the performance of the three methods is similar otherwise, but the neural network model produced much more outliers for a set of difficult compounds. This seems to imply that due to their flexibility, neural networks are more prone to lose the ability to generalise by over-fitting noisy data. Duprat et al. [15] have pointed out that neural networks should be able to approximate a non-linear response surface with fewer parameters than linear models, based on mathematical results [15,82]. They demonstrated this numerically for logPoct prediction model with the data of Bodor et al. [83]. Accordingly, Duprat et al. suggest that neural networks modelling should be preferred over MLR based on the scientific principle of parsimony, that things are usually connected in the simplest way [15]. Interestingly, the same principle was used by Lucic et al. [81] and by Yalkowsky [37] to favour MLR and the general solubility equation (GSE), respectively, over neural networks. Occam’s razor is a weapon for all seasons. Although GSE is simple with only two parameters, many parameters must be embedded in logPoct and melting point calculations, before GSE can be used for prediction. On the other hand, even with simple neural networks the relationship between the molecular descriptors and the dependent variable is very complex and difficult to use in molecular design.

3.4. Design of training set The QSPR models reviewed seem to be developed with minimal experimental design of the training set. The usual design principles have been to take the data used by another researcher, to take all available data, to select the most reliable experimental data, or to narrow the diversity in a specific class of compounds.

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

One of the design problems is the diversity of the data. Very accurate models can be built based on homogenous compound sets, for example, the vapour pressure model for hydrocarbons and halogenated hydrocarbons by Goll and Jurs [58]. The problem of the narrow models is narrow applicability. For instance, all environmentally interesting homologues might be in the training set, and no one is searching for a new analogue. The other extreme is the ‘universal model’ applicable in the whole organic chemical universe. All the models that do not specify any structure class, in or out, implicitly claim to be generally applicable. This means most published methods. The third approach may be to aim for maximal structural diversity in a specified subspace, ¨ et such as ‘drug-like’ compounds. The work of Eros al. [18] may be an example of this approach. The pursuit for the utopian universal model leads to training sets covering both structural features and property values that have no relevance for pharmaceutical research. Such a training set may include also the compounds available for a ‘drug-like’ training set. The resulting neural networks may anyhow make less accurate predictions for these compounds than a specific model due to the characteristics of a global model. The other facet of the diversity problem is that the databases that are publicly available are not diverse enough for the development of methods that would be reliable in the drug discovery environment. This is evident from the recent studies on logSw and logPoct prediction [43,79]. Tetko et al. [79] introduced the concept ‘public optimal prediction space’ to recognise the situation. Another aspect of the training set design is the size. Increasing the number of the training set members does not necessarily improve the quality of the model. Tetko et al. [31] studied the size effect on the accuracy of logPoct prediction. Training sets of increasing size were selected randomly from a database of 12908 molecules. The prediction error decreased steeply with the increase in size until about 1000 molecules. With larger sets the prediction accuracy improved only slowly. Apparently, random selection does not effectively increase the amount of new information in the training set, because compounds from largest clusters of similar structures are selected with high probability. Increasing number of

1177

examples with practically the same information content does not improve the model, but may reduce the ability to generalise. Interestingly, KOWWIN was trained with 2351 compounds, and still its prediction accuracy and ability to generalise are at least as good as those of ALOGPS and the other neural network methods trained with 9000–12000 compounds, according to the results of Tetko et al. [31]. The third problem is the distribution of compounds in the structure space. The training sets are typically composed of variable sized clusters of hierarchically related compounds representing homologue and analogue series and various chemical classes. Such training sets, however large, may lead to biased models. An optimal training set should cover evenly the structure space of interest. However, the literature reviewed contained no report of a systematic study on distribution of the compound set in the descriptor space.

3.5. Model validation The validation of a QSPR model should contain at least two items: (i) the definition of the area of competence; and (ii) the estimation of the prediction accuracy within this area. In the publications reviewed, both of these validation items are usually accounted for by calculating an over-all error estimate for a test set selected from the working data set. This validation practice implicitly suggests that: (i) the working set covers the whole structure space of organic chemistry; and (ii) the errors are randomly distributed. One may question, if this is an appropriate validation practice? Review of the literature reveals problems. As discussed above, even the largest databases used for development of the logPoct and logSw models have appeared to represent inadequate diversity when challenged with a large independent data set [43,79]. Good validation results with a test set from the working database do not prove general prediction ability. The popular beauty contests with an external test set of 20 compounds are obviously useless. Good validation results with a large independent test set would increase confidence in a model, but only temporarily. In addition to the

1178

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

frustrating search for the final proof of general prediction ability, one would like to see serious efforts to understand in structural terms what is covered and what is not. Experimental design of the training set is the basis of what is covered. Analysis of a considerable set of outliers in structural terms would be of great value for assessing the limits. The careful examination of outliers of a boiling point model by Clark et al. serves as an example [55]. Another problem is the practice to use the over-all error estimate for the test set as a measure of prediction accuracy. Compelling evidence shows the errors are not distributed randomly. In a few papers, an error analysis was carried out for specific chemical groups. For instance, Yaffe et al. [84] report the errors for 25 chemical classes in Henry’s law constant modelling. The average absolute errors observed with the feed-forward model varied from 0.1 to 0.6 logH units for the whole data set representing rather simple molecules. Tetko et al. [31] found by analysing a large logPoct data set that the mean absolute error was linearly related to the molecular size expressed as the number of nonhydrogen atoms. Simple structures tend to show smaller errors than complex structures. For instance, one of the logSw prediction methods compared in Table 4 give a test set s50.60, but the standard error for a subset of 43 monofunctional aliphatic compounds is only s50.20. Non-random distribution may also be seen in the diagrams showing the correlation between predicted and experimental values. The point swarm often has fat centre and slim ends. There are many reasons why the fitting and prediction errors vary in different parts of the response surface. For instance, the extreme values of the dependent variable may represent clusters of similar compounds, some areas of the structure space are covered better than others, the descriptor set is not equally good for all structure classes, simple compounds are more easily described than complex ones, etc. These effects tend to result in an over-all error estimate that is too optimistic for complex compounds representing ‘drug-like’ chemistry. A better balance of the training set would partially cure the problem. Bruneau [44] introduced another solution to the problem. He showed that the prediction error for the members of an external test set

varied smoothly with two parameters, which indicate the proximity of the test compound to the training set. One of the parameters is based on the smallest distance of a test compound to the training set in the descriptor space. The other parameter takes into account the characteristics of the Bayesian NN used by Bruneau. The parameters of Bruneau can be used to quantitate the similarity or dissimilarity of a test set with the training set, and to estimate the prediction error for a new compound. This approach provides also a new angle for evaluation of the quality of a model. For instance, small prediction errors with small descriptor distances mean good interpolation capability. Outliers with small descriptor distance may mean inadequate descriptors. Slow increase of prediction error with increasing descriptor distance means a good ability to generalise. Moderate prediction accuracy at higher descriptor distances may mean some ability to extrapolate.

3.6. Selection of descriptors Different approaches have been used for the structure parametrisation in QSPR modelling, the level of sophistication varying from counting atoms to quantum mechanical calculations. A question arises, whether the literature gives any clue about the winning strategy. Three different basic strategies have been used: (i) in the group contribution methods, the molecules are decomposed into fragments and all fragments that are present with adequate frequency in the database, are counted and used in the model; (ii) in empirical models, many molecular properties are calculated, but no assumptions are made about their relevance, instead the model building process selects the best set of descriptors; and (iii) in hypothesis-based models, a relatively small number of descriptors are used based on theoretical models or hypotheses about the basic dependencies. Recent group contribution type methods seem favour E-state indices, instead of molecular fragments. The atom type E-state indices are assumed to contain more information than mere counts of the corresponding atoms. Direct comparison has not been made in the QSPR studies reviewed. However, a surprisingly good logPoct prediction model (s5

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

0.87) was trained by Tetko [79] using only the count of elements. It would be interesting to know the result using the same atom and bond types as in the calculation of the E-state indices. The works of Jurs et al. [40,57,65,74] represent, for example, the second strategy of model building. They calculate typically more than 200 descriptors representing topological, geometric, electronic and hybrid descriptors. An optimal set of about 10 descriptors is selected in a process, which may include simulated annealing and genetic algorithm routines. Independent selection of descriptors for MLR and NN models leads to different optimal descriptor sets. There is an increased risk to select training set specific descriptors and build over-fitted NN models with this approach. Over-fitting may be difficult to reveal, if validation is based on subsets of the same data. Goll and Jurs [58] actually noticed that changing two compounds in a training set of 260 members resulted in the change of six descriptors in their 7-3-1 model for vapour pressure prediction. The logSw models of So [42] and Gasteiger [43] can be considered as examples of hypothesis based modelling. These authors have selected a set of relevant descriptors in the very beginning of the model building process. Although not stated explicitly, their choice of descriptors is loosely in accordance with GSE of Yalkowsky [35] and the thermodynamic model of Ruelle [85]. It appears that there is no simple answer to the question of the best strategy or best type of descriptors. Comparison of three recent models for prediction of boiling point is enlightening. Tettech et al. [54] modelled a diverse set of 400 compounds using simply the count of 25 atoms and groups as descriptors in addition to a connectivity index. The absolute average error was 14.7 8C for a test set of 133 members. Goll and Jurs [57] modelled a set of 298 compounds using eight descriptors, which included three topological, one electronic and four charged partial surface area descriptors. The standard error of prediction was 8.69 K (n530). Hall and Story used the same compound set and 19 atom type E-state indices. The mean absolute error was 4.57 K (n530). E-state indices appear to give superior performance. However, considering the experimental error, which is supposed to be about 10 8C [54], the model of Jurs with only eight parameters gives

1179

excellent results, the model of Tettech et al. gives surprisingly good results, and the model of Hall and Story seems to be over-fitted. On the other hand, the size and nature of the test sets questions all conclusions about ability to generalise. Molecular property descriptors represent a more general way to present the interaction capability of molecules than counting fragments. Consequently, models based on relevant properties are potentially more general than fragment-based models. The potential has been perhaps partially realised in certain ¨ et al. [18] showed QSPR models. The work of Eros that a model based on 27 molecular property descriptors has the same ability to predict logPoct of complex unknown structures as the fragment-based methods based on hundreds of terms. The NN model of Clark et al. [17] with 16 descriptors predicted logPoct for 41 nucleosides and nucleoside bases with high accuracy (s50.39) although that type of compounds were specifically excluded from the method development. The logSw model of Liu and So [42] contains only seven pre-selected descriptors derived from two-dimensional molecular structure. The reported prediction error (Table 4) is only slightly higher than the error reported for the group contribution type models [28,30] based on 30 descriptors.

3.7. Alternative neural network techniques in QSPR The three-layer, feed-forward neural network with the sigmoidal transfer function has been the standard technique used in QSPR modelling. Although it has been rather successful, there are some problematic features associated with this technique. The slow iterative training and the validation procedures used do not guarantee that the optimal architecture is found or over-training and over-fitting are avoided. The global optimisation may accentuate the variation of local bias, and the models are difficult to interpret. Now that most suitable data have been modelled using the standard method, the next round using alternative NN techniques has begun. Radial basis function neural networks (RBFNN) was employed by Tettech et al. [54] in simultaneous prediction of boiling point and flash point. Bayesian neural nets were used by Homer et al. [50] for the prediction of liquid viscosity and by Bruneau [44] for logSw

1180

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

prediction. Tetko et al. [79] developed a method named the associative neural network (ASNN) to address the global / local problem in logPoct prediction. Generalised regression neural network have been used by Mosier and Jurs to predict logSw [86]. Cohen and co-workers have applied fuzzy ARTMAP neural network for estimating boiling point, critical properties, water solubility, logPoct and Henry’s law constant [47,56,64,84,87]. Kohonen neural network was used by Fayos and Cano for predicting crystal packing [66]. The alternative NN techniques may alleviate some of the difficulties with the standard method. However, it is too early to evaluate their impact. So far, only one or two research groups have applied one of these techniques in QSPR modelling using rather limited compound sets. Only Cohen et al. compared systematically the performance of the standard, feedforward networks with Fuzzy ARTMAP results. Very low errors were reported for the training set in all cases, while the errors for the test sets were closer to those for the feed-forward model. However, the ability to generalise of Fuzzy ARTMAP technique in QSPR was not tested with a large, independent set of complex compounds.

4. Conclusions 1. This review of the literature shows that most physicochemical properties can be predicted from the molecular structure using neural network modelling within the structure space defined by the training set. 2. It has not been shown that neural networks are superior in QSPR modelling compared to other methods. 3. The best neural network models seem to be comparable with the established methods in their ability to estimate logPoct for unknown compounds, but any advantages remain to be shown. 4. The reviewed methods for predicting water solubility are based on inadequate data to make them reliable in a drug discovery environment. 5. After the first decade of QSPR by neural networks, much of the published work can still be characterised as exploration of the area and

demonstration of the potential of a particular approach rather than development of serious models with properly validated performance. 6. Future work in the field should develop Good QSPR Practices for validation of the models. Particular attention is needed in the design of the training sets, development of practices to evaluate the confidence of the prediction, defining the limitations of the model and careful outlier analysis. 7. Techniques other than standard neural network methods may prove to be advantageous considering accuracy, interpretation of the models, and ability to generalise.

References [1] L.H. Hall, C.T. Story, Boiling point and critical temperature of a heterogenous data set: QSAR with atom type electrotopological state indices using artificial neural networks, J. Chem. Inf. Comput. Sci. 36 (1996) 1004–1014. [2] N. Bodor, A. Harget, M. Huang, Neural network studies. 1. Estimation of the aqueous solubility of organic compounds, J. Am. Chem. Soc. 113 (1991) 9480–9483. [3] S. Agatonovic-Kustrin, R. Beresford, Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research, J. Pharm. Biomed. Anal. 22 (2000) 717–727. [4] A.R. Katritzky, U. Maran, V.S. Lobanov, M. Karelson, Structurally diverse quantitative structure–property relationship correlations of technologically relevant physical properties, J. Chem. Inf. Comput. Sci. 40 (2000) 1–18. [5] J. Taskinen, Prediction of aqueous solubility in drug design, Curr. Opin. Drug Disc. Dev. 3 (2000) 102–106. [6] M. Grover, B. Singh, M. Bakshi, S. Singh, Quantitative structure–property relationships in pharmaceutical research—Part 1, Pharm. Sci. Technol. Today 3 (2000) 28–35. [7] M. Grover, B. Singh, M. Bakshi, S. Singh, Quantitative structure–property relationships in pharmaceutical research—Part 2, Pharm. Sci. Technol. Today 3 (2000) 50–57. [8] J. Huuskonen, Estimation of aqueous solubility in drug design, Comb. Chem. High Throughput Screen. 4 (2001) 311–316. [9] C. Hansch, A.J. Leo, Substituent constants for correlation analysis in chemistry and biology, John Wiley, New York, 1979. [10] A.J. Leo, Calculating logPoct from structures, Chem. Rev. 93 (1993) 1281–1306. [11] W.M. Meylan, P.H. Howard, Atom / fragment contribution method for estimating octanol–water partition coefficients, J. Pharm. Sci. 84 (1995) 83–92. [12] N. Bodor, M.-J. Huang, A. Harget, Neural network studies.

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

3. Prediction of partition coefficients, J. Mol. Struct. (Theochem.) 309 (1994) 259–266. J.M. Cense, B. Diawara, J.J. Legendre, G. Roullet, Neural networks prediction of partition coefficients, Chem. Intel. Lab. Syst. 23 (1994) 301–308. J. Grunenberg, R. Herges, Prediction of chromatographic retention values (RM) and partition coefficients (log Poct ) using a combination of semiempirical self-consistent reaction field calculations and neural networks, J. Chem. Inf. Comput. Sci. 35 (1995) 905–911. A.F. Duprat, T. Huynh, G. Dreyfus, Toward a principled methodology for neural network design and performance evaluation in QSAR. Application to the prediction of logP, J. Chem. Inf. Comput. Sci. 38 (1998) 586–594. A. Breindl, B. Beck, T. Clark, R.C. Glen, Prediction of the n-octanol / water partition coefficient, logP, using a combination of semiempirical MO-calculations and a neural network, J. Mol. Model. 3 (1997) 142–155. B. Beck, A. Breindl, T. Clark, QM / NN QSPR models with error estimation: vapor pressure and logP, J. Chem. Inf. Comput. Sci. 40 (2000) 1046–1051. ¨ ¨ I. Kovesdi, ¨ D. Eros, L. Orfi, K. Takacs-Novak, G. Acsady, G. Keri, Reliability of logP predictions based on calculated molecular descriptors: a critical review, Curr. Med. Chem. 9 (2002) 1819–1829. J. Devillers, D. Domine, C. Guillon, W. Karcher, Simulating lipophilicity of organic molecules with a back-propagation neural network, J. Pharm. Sci. 87 (1998) 1086–1090. J. Devillers, D. Domine, C. Guillon, Autocorrelation modeling of lipophilicity with a back-propagation neural network, Eur. J. Med. Chem. 33 (1998) 659–664. K.J. Schaper, M.L.R. Samitier, Calculation of octanol / water partition coefficient (logP) using artificial neural networks and connection matrices, Quant. Struct. Act. Relat. 16 (1997) 224–230. L.H. Hall, L.B. Kier, Electrotopological state indices for atom types: a novel combination of electronic, topological and valence state information, J. Chem. Inf. Comput. Sci. 35 (1995) 1039–1045. J. Huuskonen, M. Salo, J. Taskinen, Aqueous solubility prediction of drugs based on molecular topology and neural network modeling, J. Chem. Inf. Comput. Sci. 38 (1998) 450–456. J. Huuskonen, A.E. Villa, I.V. Tetko, Prediction of partition coefficient based on atom-type electrotopological state indices, J. Pharm. Sci. 88 (1999) 229–233. J. Huuskonen, Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology, J. Chem. Inf. Comput. Sci. 40 (2000) 773–777. J. Huuskonen, J. Rantanen, D. Livingstone, Prediction of aqueous solubility for a diverse set of organic compounds based on atom-type electrotopological state indices, Eur. J. Med. Chem. 35 (2000) 1081–1088. J. Huuskonen, D. Livingstone, I.V. Tetko, Neural network modeling for estimation of partition coefficient based on atom-type electrotopological state indices, J. Chem. Inf. Comput. Sci. 40 (2000) 947–955.

1181

[28] J. Huuskonen, Estimation of water solubility from atom-type electrotopological state indices, Environ. Tox. Chem. 20 (2001) 491–497. [29] D.J. Livingstone, M.G. Ford, J.J. Huuskonen, D.W. Salt, Simultaneous prediction of aqueous solubility and octanol / water partition coefficient based on descriptors derived from molecular structure, J. Comput.-Aided Mol. Des. 15 (2001) 741–752. [30] I.V. Tetko, V.Y. Tanchuk, T.N. Kasheva, A.E. Villa, Estimation of aqueous solubility of chemical compounds using E-state indices, J. Chem. Inf. Comput. Sci. 41 (2001) 1488– 1493. [31] I.V. Tetko, V.Y. Tanchuk, A.E. Villa, Prediction of n-octanol / water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices, J. Chem. Inf. Comput. Sci. 41 (2001) 1407–1421. [32] I.V. Tetko, V.Y. Tanchuk, Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program, J. Chem. Inf. Comput. Sci. 42 (2002) 1136–1145. [33] I.V. Tetko, V.Y. Tanchuk, T.N. Kasheva, A.E. Villa, Internet software for the calculation of the lipophilicity and aqueous solubility of chemical compounds, J. Chem. Inf. Comput. Sci. 41 (2001) 246–252. [34] C. Hansch, J.E. Quinlan, G.I. Lawrence, The linear free energy relationship between partition coefficients and the aqueous solubility of organic liquids, J. Org. Chem. 33 (1968) 347–350. [35] S.H. Yalkowsky, S.C. Valvani, Solubility and partitioning I: solubility of nonelectrolytes in water, J. Pharm. Sci. 69 (1980) 912–922. [36] N. Jain, S.H. Yalkowsky, Estimation of the aqueous solubility I: Application to organic non-electrolytes, J. Pharm. Sci. 90 (2000) 234–252. [37] Y. Ran, N. Jain, S.H. Yalkowsky, Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE), J. Chem. Inf. Comput. Sci. 41 (2001) 1208–1217. [38] J.M. Sutter, P.C. Jurs, Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure–property relationships, J. Chem. Inf. Comput. Sci. 36 (1996) 100–107. [39] B.E. Mitchell, P.C. Jurs, Prediction of aqueous solubility of organic compounds from molecular structure, J. Chem. Inf. Comput. Sci. 38 (1998) 489–496. [40] N.R. McElroy, P.C. Jurs, Prediction of aqueous solubility of heteroatom-containing organic compounds from molecular structure, J. Chem. Inf. Comput. Sci. 41 (2001) 1237–1247. [41] P.D. Mosier, P.C. Jurs, QSAR / QSPR studies using probabilistic neural networks and generalized regression neural networks, J. Chem. Inf. Comput. Sci. 42 (2002) 1460–1470. [42] R. Liu, S.S. So, Development of quantitative structure– property relationship models for early ADME evaluation in drug discovery. 1. Aqueous solubility, J. Chem. Inf. Comput. Sci. 41 (2001) 1633–1639. [43] A. Yan, J. Gasteiger, Prediction of aqueous solubility of organic compounds based on a 3D structure representation, J. Chem. Inf. Comput. Sci. 43 (2003) 429–434.

1182

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183

[44] P. Bruneau, Search for predictive generic model of aqueous solubility using Bayesian neural nets, J. Chem. Inf. Comput. Sci. 41 (2001) 1605–1616. [45] D. Cherqaoui, D. Villemin, Use of a neural network to determine the boiling point of alkanes, J. Chem. Soc., Faraday Trans. 90 (1994) 97–102. [46] A.A. Gakh, E.G. Gakh, B.G. Sumpter, D.W. Noid, Neural network–graph theory approach to the prediction of the physical properties of organic compounds, J. Chem. Inf. Comput. Sci. 34 (1994) 832–839. [47] G. Espinosa, D. Yaffe, Y. Cohen, A. Arenas, F. Giralt, Neural network based quantitative property relations (QSPRs) for predicting boiling points for aliphatic hydrocarbons, J. Chem. Inf. Comput. Sci. 40 (2000) 859–879. [48] S. Liu, R. Zhang, M. Liu, Z. Hu, Neural network–topological indices approach to the prediction of properties of alkenes, J. Chem. Inf. Comput. Sci. 37 (1997) 1146–1151. [49] M.D. Wessel, P.C. Jurs, Prediction of normal boiling points of hydrocarbons from molecular structure, J. Chem. Inf. Comput. Sci. 35 (1995) 68–76. [50] J. Homer, S.C. Generalis, J.H. Robson, Artificial neural networks for the prediction of liquid viscosity, density, heat of vaporization, boiling point and Pitzer’s acentric factor. Part I. Hydrocarbons, Phys. Chem. Chem. Phys. 1 (1999) 4075–4081. [51] A.P. Bunz, B. Braun, R. Janowsky, Application of quantitative structure–performance relationship and neural network models for the prediction of physical properties from molecular structure, Ind. Eng. Chem. Res. 37 (1998) 3043–3051. [52] A.P. Bunz, B. Braun, R. Janowsky, Quantitative structure– property relationships and neural networks: correlation and prediction of physical properties of pure components and mixtures from molecular structure, Fluid Phase Equilib. 158– 160 (1999) 367–374. [53] L.M. Egolf, M.D. Wessel, P.C. Jurs, Prediction of boiling points and critical temperatures of industrially important organic compounds from molecular structure, J. Chem. Inf. Comput. Sci. 34 (1994) 947–956. [54] J. Tettech, T. Suzuki, E. Metcalfe, S. Howells, Quantitative structure–property relationships for the estimation of boiling point and flash point using a radial basis function neural network, J. Chem. Inf. Comput. Sci. 39 (1999) 491–507. [55] A.J. Chalk, B. Beck, T. Clark, A quantum mechanical / neural net model for boiling points with error estimation, J. Chem. Inf. Comput. Sci. 41 (2001) 457–462. [56] G. Espinosa, D. Yaffe, A. Arenas, Y. Cohen, F. Giralt, A fuzzy ARTMAP-based quantitative structure–property relationship (QSPR) for predicting physical properties of organic compounds, Ind. Eng. Chem. Res. 40 (2001) 2757–2766. [57] E.S. Goll, P.C. Jurs, Prediction of the normal boiling points of organic compounds from molecular structures with a computational neural network model, J. Chem. Inf. Comput. Sci. 39 (1999) 974–983. [58] E.S. Goll, P.C. Jurs, Prediction of vapor pressures of hydrocarbons and halohydrocarbons from molecular structure with a computational neural network, J. Chem. Inf. Comp. Sci. 39 (1999) 1081–1089.

[59] R. Kuhne, R.-U. Ebert, G. Schuurmann, Estimation of vapour pressures for hydrocarbons and halogenated hydrocarbons from chemical structure by a neural network, Chemosphere 34 (1997) 671–686. [60] D. Yaffe, Y. Cohen, Neural network based temperaturedependent quantitative structure property relations (QSPRs) for predicting vapor pressure of hydrocarbons, J. Chem. Inf. Comput. Sci. 41 (2001) 463–477. [61] H.E. McClelland, P.C. Jurs, Quantitative structure–property relationships for the prediction of vapor pressures of organic compounds from molecular structure, J. Chem. Inf. Comp. Sci. 40 (2000) 967–975. [62] A.J. Chalk, B. Beck, T. Clark, A temperature-dependent quantum mechanical / neural net model for vapor pressure, J. Chem. Inf. Comput. Sci. 41 (2001) 1053–1059. [63] N.J. English, D.G. Carrol, Prediction of Henry’s law constants by a quantitative structure property relationship and neural network, J. Chem. Inf. Comput. Sci. 41 (2001) 1150– 1161. [64] D. Yaffe, Y. Cohen, G. Espinosa, A. Arenas, F. Giralt, Fuzzy ARTMAP and back-propagation neural networks based quantitative structure–property relationships (QSPRs) for octanol–water partition coefficient of organic compounds, J. Chem. Inf. Comput. Sci. 42 (2002) 162–183. [65] B.E. Turner, C.L. Costello, P.C. Jurs, Prediction of critical temperature and pressures of industrially important organic compounds from molecular structure, J. Chem. Inf. Comput. Sci. 38 (1998) 639–645. [66] J. Fayos, F.H. Cano, Crystal-packing prediction by neural networks, Cryst. Growth Des. 2 (2002) 591–599. [67] R.C. Schweitzer, J.B. Morris, The development of a quantitative structure property relationship (QSPR) for the prediction of dielectric constants using neural networks, Anal. Chim. Acta 384 (1999) 285–303. [68] S. Sild, M. Karelson, A general QSPR treatment for dielectric constants of organic compounds, J. Chem. Inf. Comput. Sci. 42 (2002) 360–367. [69] B.E. Mitchell, P.C. Jurs, Prediction of autoignition temperatures of organic compounds from molecular structure, J. Chem. Inf. Comput. Sci. 37 (1997) 538–547. [70] J. Tettech, E. Metcalfe, S. Howells, Optimization of radial basis and backpropagation neural networks for modeling auto-ignition temperature by quantitative structure–property relationships, Chemometr. Intell. Lab. Syst. 32 (1996) 177– 191. [71] J. Tettech, S. Howells, E. Metcalfe, T. Suzuki, Optimization of radial basis function neural networks using biharmonic spline interpolation, Chemometr. Intell. Lab. Syst. 41 (1998) 17–29. [72] T. Suzuki, R.-U. Ebert, G. Schuurmann, Development of both linear and nonlinear methods to predict the liquid viscosity at 20 8C of organic compounds, J. Chem. Inf. Comp. Sci. 37 (1997) 1122–1128. [73] T. Suzuki, R.-U. Ebert, G. Schuurmann, Application of neural networks to modeling and estimating temperaturedependent liquid viscosity of organic compounds, J. Chem. Inf. Comput. Sci. 41 (2001) 776–790.

J. Taskinen, J. Yliruusi / Advanced Drug Delivery Reviews 55 (2003) 1163–1183 [74] G.W. Kauffman, P.C. Jurs, Prediction of surface tension, viscosity, and thermal conductivity for common organic solvents using quantitative structure–property relationships, J. Chem. Inf. Comput. Sci. 41 (2001) 408–418. [75] H. Chow, H. Chen, T. Ng, P. Myrdal, S.H. Yalkowsky, Using backpropagation networks for the estimation of aqueous activity coefficients of aromatic organic compounds, J. Chem. Inf. Comput. Sci. 35 (1995) 723–728. [76] D. Svozil, J.G.K. Sevcik, V. Kvasnicka, Neural network prediction of the solvatochromic polarity / polarizability parameter p H2 , J. Chem. Inf. Comput. Sci. 37 (1997) 338–342. [77] S.M. Danauskas, P.C. Jurs, Prediction of C60 solubilities from solvent molecular structures, J. Chem. Inf. Comput. Sci. 41 (2001) 419–424. [78] B.E. Mattioni, P.C. Jurs, Prediction of glass transition temperature from monomer and repeat unit structure using computational neural networks, J. Chem. Inf. Comput. Sci. 42 (2002) 232–240. [79] I.V. Tetko, Neural network studies. 4. Introduction to associative neural networks, J. Chem. Inf. Comput. Sci. 42 (2002) 717–728. [80] A.R. Katritzky, Y. Wang, S. Sild, T. Tamm, M. Karelson, QSPR studies on vapor pressure, aqueous solubility, and the prediction of water–air partition coefficients, J. Chem. Inf. Comput. Sci. 38 (1998) 720–725. [81] B. Lucic, D. Amic, N. Trinajstic, Nonlinear multivariate regression outperforms several concisely designed neural networks on three QSAR data sets, J. Chem. Inf. Comput. Sci 40 (2000) 403–413. [82] K. Hornik, M. Stinchcombe, H. White, P. Auer, Degree of approximation results for feedforward networks approximating unknown mapping and their derivatives, Neural Comp. 6 (1994) 1262–1275. [83] N. Bodor, M.-J. Huang, An extended version of a novel method for the estimation of partition coefficients, J. Pharm. Sci. 81 (1992) 272–281.

1183

[84] D. Yaffe, Y. Cohen, G. Espinosa, A. Arenas, F. Giralt, A fuzzy ARTMAP-based quantitative structure–property relationship (QSPR) for the Henry’s law constant of organic compouds, J. Chem. Inf. Comput. Sci. 43 (2003) 85–112. [85] P. Ruelle, C. Rey-Mermet, M. Buchmann, H. Nam-Tran, U.W. Kesselring, P.L. Huyskens, A new predictive equation for the solubility of drugs based on the thermodynamics of mobile disorder, Pharm. Res. 8 (1991) 840–850. [86] P.D. Mosier, P.C. Jurs, QSAR / QSPR studies using probabilistic neural networks and generalized regression neural networks, J. Chem. Inf. Comput. Sci. 42 (2002) 1460–1470. [87] D. Yaffe, Y. Cohen, G. Espinosa, A. Arenas, F. Giralt, A fuzzy ARTMAP based on quantitative structure–property relationships (QSPRs) for predicting aqueous solubility of organic compounds, J. Chem. Inf. Comput. Sci. 41 (2001) 1177–1207. [88] J. Huuskonen, M. Salo, J. Taskinen, Neural network modeling for estimation of the aqueous solubility of structurally related drugs, J. Pharm. Sci. 86 (1997) 450–454. [89] D. Cherqaoui, D. Villemin, A. Mesbah, J.-M. Cense, V. Kvasnicka, Use of neural network to determine the boiling point of acyclic ethers, peroxides, acetals and their sulfur analogues, J. Chem. Soc., Faraday Trans. 90 (1994) 2015– 2019. [90] L.M. Egolf, P.C. Jurs, Prediction of boiling points of organic heterocyclic compounds using regression and neural network techniques, J. Chem. Inf. Comput. Sci. 33 (1993) 616–625. [91] M.H. Charlton, R.Y. Docherty, M.G. Hutchings, Quantitative structure–sublimation enthalpy relationship studied by neural networks, theoretical crystal packing calculations and multilinear regression analysis, J. Chem. Soc., Perkin. Trans. 2 (1995) 2023–2030. [92] J. Fayo, F.H. Cano, Crystal-packing prediction by neural networks, Cryst. Growth Des. 2 (2002) 591–599.