Toxic. in Vitro Vol. 1, No. 3, pp. 143--171, 1987 Printed in Great Britain. All rights reserved
0887-2333/87 $3.00+0.00 Copyright © 1987 Pergamon Journals Lid
STRUCTURE-ACTIVITY RELATIONSHIPS IN TOXICOLOGY A N D ECOTOXICOLOGY: AN ASSESSMENT L. TURNEa*, F. CHOPL1N,P. DUGARD,J. HI/RMENS,R. JAECrd/, M. MARSMANN and D. ROnERTS Task Force of the European Chemical Industry Ecology and Toxicology Centre (ECETOC*), Avenue Louise 250, Bte 63, 1050 Brussels, Belgium
(Received 30 October 1986) Smmmm-y--A critical assessment of the scope, applicability and limitations of structure-activity relationships (QSARs) in toxicology and ecotoxicology opens with a general explanation of QSARs and a description of the components of a QSAR (chemical descriptors, biological descriptors and the techniques used to seek a relationship between them). The main statistical terms used to assess the validity of certain types of QSAR are briefly explained and attention is drawn to a number of common errors in the statistical assessments. This is followed by a detailed analysis of 18 typical QSAR publications, which were chbsen to represent the main types of chemical and biological descriptors that have been studied and a range of techniques for deriving the structure-activity relationships. A discussion of the strengths, weaknesses, applicability and limitations of QSARs is based on the above analysis, and some conclusions and recommendations are offered. The most important recommendation is that, at present, QSARs should not be used in isolation for making decisions that affect the health of humans or other Species.
INTRODUCTION
The activity of a chemical towards living organisms depends upon a physical or chemical action on biological tissues, and the nature of such action will depend ultimately on the molecular structure of the chemical. This was recognized over 100 years ago, and since then, but especially in the last two decades, many attempts have been made to relate biological activity to molecular structure in a qualitative or quantitative way. Such structure-activity relationships are designated by the abbreviations SARs or QSARs (Q standing for quantitative). These two abbreviations are used interchangeably in the literature. A QSAR is developed by applying various techniques to seek a relationship between a biological activity of a series of chemicals (often but not necessarily closely related in structure) and one or more parameters, the chemical descriptors, that represent their structure. The biological activity of interest may be a toxic effect or a specific action, usually biochemical, related to a toxic effect. The molecular structure can be expressed either directly, by considering the substituent groups or structural fragments in a molecule, or indirectly, by considering one or more phys*Enquiries should be addressed to ECETOC at the above address, not to individual authors, who are members of an ECETOC Task Force. Abbreviations: ADAPT ffiAutomatomated Data Analysis using Pattern recognition Techniques; BCF = bioconcentration factor; DOF = degrees of freedom; HB = hydrogen bonding; I ffi Ibali index; LOT = Level of Triviality; NBP ffi4-nitrobenzylpyridine; PAH = polycylcic aromatic hydrocarbon; QSARffiquantitative structure-activity relationship. [See p. 144 for a list of descriptors and symbols.]
ical or chemical properties which, of course, depend on the molecular structure. A QSAR is first developed from toxicological data and chemical descriptors on a limited series of chemicals, and may then be used to predict the biological activity of further chemicals if the corresponding descriptors expressing their structure are known. Whether the prediction is sound depends on whether the QSAR is valid for the chemicals and for the toxicological endpoint to which it is applied. During the period in which QSAR techniques have developed rapidly, the toxicity of a large number of chemicals has come under scrutiny, and because data on many of these chemicals are inadequate, techniques for predicting their biological activity would be attractive. Thus it has been proposed that QSARs should be used for predicting and resolving various problems relating to the toxicity and ecotoxicity of chemicals. It is therefore opportune to make a critical assessment of the present status of QSARs, and such an assessment is presented in this paper. THE COMPONENTS OF A QSAR
In any QSAR there are three components. These are the chemical structure descriptors (the data on certain properties of the molecule 6r on physical or chemical properties of the substance), the biological activity and the technique used to establish the relationship. Chemical structure descriptors
Descriptors used Many parameters have been chosen as descriptors to represent or express indirectly some aspect of chemical structure in QSARs. The main ones are as 143
L. T u R ~ et al.
144
follows (symbols being included for the more commonly used):
Phyaico-chemical descriptors General:
melting point boiling point vapour pressure dissociation constant (PKo) activation energy heat of reaction reaction rate constant (k) reduction potential Hydrophobicity: partition coefficient (P) RM coefficient from reverse-phase chromatography solubility in water (S) parachor Electronic: Hammett constant (~) Taft polar substituent constant (¢*) ionization potential dielectric constant dipole moments H bonding (HB) Quantum chemical (including indices derived from molecular orbital calculations): Molecular orbital indices such as atomic charge, bond energy, bond indices, resonance energy, electrondonating character (energy of highest occupied molecular orbital), electron accepting character (energy of lowest unoccupied molecular orbital), electrostatic potential distribution and Dewar numbers, electron density ~-bond reactivity electron polarizability
Steric descriptors Molecular volume Molecular shape Molecular surface area Molar refractivity (MR) Substructure shape Taft steric substltuent constant (E,) Verloop STERIMOL constants (L; B~-Bs)
Structural descriptors Atom and bond fragments Substructures Substructure environment Number of atoms in a given structural element Number of rings (in polycyclic compounds) Molecular connectivity (extent of branching)
It is important to verify that the chemical descriptors are truly independent. If any two are correlated, as, for example, are partition coefficient and water solubility in many cases, the simultaneous use of both in a QSAR will give a spurious relationship. Descriptors derived from molecular orbital calculations are subject to the approximations inherent in the fact that the basic Schr6dinger equation has only approximate solutions. The solution involving the least approximation---ab initio calculation---is costly and can take hours of computer time. Alternative semi-empirical methods are simpler and even more approximate, but are likely to be valuable only for families of chemicals of closely-related structure, since the errors are more probably constant within such families. Biological mcttvity
Types of activity studied These vary from primary effects involving welldefined biochemical endpoints to secondary effects in which the endpoint is an observation of a toxicological effect. The biological data may have been determined in a standard series of experiments or may have been drawn from a variety of authors who used different techniques. Almost any biological activity can be studied, and the list of those for which QSARs have been sought is too long to give here. Whereas chemical structure descriptors can, in principle, be determined or expressed with good precision, the qualitative and/or quantitative determination of biological activity is usually much less precise. A detailed account of databases on biological activity is given in a book edited by Golberg (1983; chapter 3).
Criteria for acceptibility of biological data used in QSARs Sufficient biological data, ideally generated via standard protocols and evaluated according to comparable criteria, must be available if a valid QSAR is to be developed. The important variables should be identical or comparable throughout any set of data used, but this requirement cannot always be met. Where such variables differ within the data set, they should be critically evaluated for their effect on the QSAR, as such differences may greatly affect the outcome of biological assays.
Techniques for establishing QSARs A wide variety of techniques has been used to develop QSARs. Many of these are illustrated in the review edited by Golberg (1983). The most commonly used methods are described here.
Criteria for acceptability of chemical descriptors used Simple graphical plots or equations in QSAPa It is highly desirable to select descriptors of chemical structure that relate to the mechanism of the biological activity under consideration, since in that case they should give a more credible and valid QSAR. When the mechanism is not known, as is often the case, the descriptors have to be chosen by trial and error based on judgement. They must be known with adequate accuracy from experimental determinations or theoretical calculation.
When a single chemical descriptor is used, its relationship to the biological activity is often expressed in a simple graph or the equation derived from it.
Regression analysis Hansch & Fujita (1964) and Free & Wilson (1964) developed two different techniques for deriving QSARs, each essentially based on regression analysis.
Structure-activity relationships in toxicology The Hansch method has been particularly widely used. Hansch techniques. The starting point for the Hansch approach is the postulate that the biological response to a chemical is a function of its hydrophobic, electronic and steric properties.
Hydrophobicity
(or lipophilicity): octanolwater partition coefficients (Po,,) are generally used as a measure of hydrophobicity, the influence of which on biological activity is probably related to the transport of a compound into and through lipid membranes to its site of action. This property may also be important for hydrophobic interactions between a chemical and a receptor molecule in an organism (a hydrophobic interaction being the association of non-polar regions in molecules with each other rather than with water). Fujita et al. (1964) showed that within various series of aromatic molecules, the contribution of a particular substituent to the Pow was a characteristic of that substituent. Within a homologous series of compounds, the Pow could be predicted from 'substituent constants'. Fujita et al. (1964) defined the hydrophobic parameter for a substituent X as: nx = log Pow RX - log Pow RH in which RX and RH represent, respectively, the substituted and unsubstituted parent aromatic molecule. For example: nm~y~ = log P toluene - log P benzene Numerous substituent constants are now known and the predictability of Po, has been extended further to aliphatic compounds. In addition, other techniques have been developed for calculating Pow from molecular-fragment values (Rekker, 1977). A survey of these methods is given by Hansch & Leo (1979). Values for Pow, both experimental and predicted with reasonable reliability, are available for a large number of chemicals, and this parameter has been widely used in QSARs. Electronic properties: these play a part in the specific interactions between a compound and a biological receptor, such as electrostatic interactions between centres of opposite charge. The effect of a substituent on the electronic properties of a molecule has been characterized by two constants: (i) the Hammett constant, ax = log K ~ - log K ~ , derived from the dissociation constants (K) of the substituted and unsubstituted benzoic acid, RX and RH respectively (Hammett, 1970)--if Ox is positive, then the group X is more electron-attracting than is H and vice versa--and (ii) the constant o*, similarly derived from substituted acetic acids. Steric properties: these influence the approach or binding of a molecule to a biological site. As a measure of the steric effect of a substituent in a molecule, Taft (1956) proposed a constant E, determined from the rate constant (k) of the hydrolysis of the ester XCOOC2H 5 according
145
to (E,)x = log k ~ - log klm, in which kRx and k ~ represent, respectively, the rate constants of the hydrolysis of the substituted and unsubstituted esters RX and RH, Values of o and E, for many substituents have been tabulated by Hansch & Leo (1979). The important contribution of Hansch to the development of QSARs was that he combined these three physico-chemical properties into one equation in order to describe the biological activity of compounds. The 'Hansch equation' can take several forms (Hansch & Fujita, 1964) but is typically: log BA = a n 2 + / n r + c o + d E ~ + e where BA is the biological activity of a member of a series of compounds RX, with varying substituents X, at a certain biological activity, n, o and E, represent, respectively, the effects of substituents on the hydrophobic, electronic and steric properties within the series of compounds RX, and a - e are coefficients determined by regression analysis. A n ~ term was included in this equation because there may be a parabolic dependence of biological activity on hydrophobicity, allowance for which often results in a better fit. A detailed discussion of the descriptors used in the Hansch technique is given by Hansch & Leo (1979) and Rekker (1977). In this type of equation the three parameters n, a and E, are not always significant or essential and one or more may be omitted. In principle any molecular parameter may be tried in this empirical approach. The development of a Hansch type of equation proceeds as follows. Values for the biological activity and the chemical descriptors are obtained for each of an initial series of compounds. These values are put into the Hansch equation and a general equation fitting all compounds (the QSAR) is sought by regression analysis. This may eventually give a QSAR based on those chemicals (called the 'training set') that remain after 'outliers' are omitted. The goodness of fit is expressed by statistical quality criteria such as the correlation coefficient (r), the standard error (s), and the F ratio (see below). Values for the biological activity of structurally-related compounds not used in the training set can then be predicted from their chemical descriptors.
Free-Wilson and related group-contribution techniques. As originally described (Free & Wilson, 1964), this technique is based on the assumption that in a series of related chemicals each particular substituent (group) adds a constant contribution to the biological activity (BA) of the parent molecule. Thus: f ( n A ) = SA + S s " " + p where # is the contribution of a hypothetical parent compound to the biological activity and S~, Ss. • • are the contributions added by groups A, B etc. Each compound yields an equation of the type shown above and the group contributions are found by solving a set of multiple simultaneous equations (Purcell et al. 1973). The 'goodness of fit' or 'correlation' can be determined by regression analysis as in other QSAR methods. An advantage of the Free-Wilson approach is that physico-chemical or other properties need not be determined; they are contained within the additive
146
L. TUgNF.R et al.
group contributions. Predictions are limited to compounds comprising the parent molecule and the substituents in the training set. The numerical values of the group contribution will depend upon the method of expressing the biological activity. Because specific physico-chemical properties are not ascribed to the substituent groups, information on the mechanism of biological action is not usually obtained by this approach. In an approach that can be regarded as an extension of that of Free & Wilson (1964), Enslein & Craig (1978) have developed a statistically-based type of QSAR for predicting a variety of toxic properties. Each molecule is divided into substructures and the contribution of the individual substructures to the biological activity is calculated by regression and discriminant analysis. The current scope of the Craig-Enslein technique can be seen from a publication by Health Designs Incorporated (HDI, 1982), which offers the prediction of rat oral LDso values, mutagenicity, carcinogenicity, teratogenicity and irritancy. The examples of rat oral LD~0 values and teratogenicity are used to illustrate this approach on p. 161. Pattern recognition Pattern recognition techniques are used to seek qualitative correlations between a set of descriptors of atomic or molecular structure and the presence or absence of a specific biological activity in a series of structurally-related chemicals. The techniques are wholly empirical and predictive; that is, they have, a priori, no biological basis. Consideration of the chemical descriptors eventually found to give a good SAR, however, may yield some indication of the mode of biological action. Most variants of the technique have the following common features:
(a) The training set of chemicals is rather large compared to that of other QSAR techniques. (b) A training set of atomic or molecular descriptors expressing the structure is chosen. (c) Each chemical structure is represented by a single point in n-dimensional space (where n is the number of chemical descriptors), the position of each point being defined by the value of the descriptors; n is rarely a small number but, for purposes of illustration, if n -- 3 the chemical would be represented by a point on the familiar three-dimensional graph with the x, y and z axes representing the three descriptors. (d) If the structure-descriptors have been correctly chosen, chemicals possessing the biolog-
M mo1.e~Les and n ckmcr~tor$ I~,ottld in muLti-dimensionaL space
Factor analysis
icai activity will cluster in one part of the n.dimensional space and those lacking activity will cluster in a separate part or may not cluster at all. (e) If at first there is no adequate separation of active and inactive compounds, different sets of structural descriptors are tested until a valid SAR is found or the attempt is abandoned. (f) A valid SAR can then be used to predict whether a structurally-related chemical possesses the biological activity in question by processing the appropriate set of structure descriptors and determining whether the compound clusters with the active or inactive groups. In pattern recognition techrdques, a variety of statistical methods for setting up and analysing the data can be used, the most common being factor analysis (e.g. principal component analysis and discriminant analysis), linear learning macl~ne and K nearest neighbour procedures. A brief description of the methods commonly used for pattern recognition follows. Factor analysis. M molecules, each represented by a set of n structural components (a component is an atomic or molecular descriptor) are considered. The goal of factor analysis is to reduce the number n and to find a new set of mathematically-derived descriptors (called factors) which represent the M molecules almost as well. Once these factors have been found they provide the best two- or three-dimensional graphical representation of the 'cloud' of the M molecules in space. The two-dimensional graphs, for example, are obtained by projecting the M points, each point representing a molecule, on to planes defined, respectively, by two factors. This is illustrated in Fig. 1 for a successful discrimination between active and inactive chemicals. Principal component analysis serves to produce the factors and discriminant analysis to find the best separation into clusters. Linear learning machine (Jurs et al. 1981). A 'hyperplane' in n-dimensional space that separates active and inactive chemicals is defined by trial and error. The use of this technique is limited by its technical complexity. K nearest neighbour (Kowalski & Bender, 1973). The training set of M molecules is distributed in n-dimensional space and the maximum distance between neighbouring molecules is noted. A molecule is classified as active or inactive according to the activity or inactivity of its nearest neighbours.
XX
X
~coml~n~ir~ctive
Ctuster of oetr~ cocnpo,r~
Fig. 1. Graph of the cloud of M molecules, in a plane defined by two factors (F, and F2). Each point represents a molecule in two-dimensional space.
Structure--activity relationships in toxicology S I M C A techniques. This statistical method was developed by Wold and co-workers (Norden et al. 1978) and provides a mathematical model of each cluster and a quantification of the similarities between molecules in each cluster. A D A P T (Automated Data Analysis using Pattern recognition Techniques). This is a completely integrated software system for performing data storage and manipulation, automatic construction of the molecular representation, and statistical analysis. The molecular representation may include structural fragments, steric and physico-chemical parameters, partition coefficients and atomic volumes. Factor analysis, linear learning machine and K nearest neighbour statistical techniques can be used. ADAPT was first described by Stuper & Jurs (1976), and numerous papers in which the method has been applied to various SARs have subsequently been published. Explanation o f some statistical terms
Many QSARs are expressed as equations, obtained by linear or multiple regression analysis, of the form: f ( B A ) = ax I + bx2 + cx3 "'" + k where f ( B A ) is a function of the biological activity (BA), x~, x2 etc. are the chemical/structural descriptors, a, b etc. are numerical coefficients obtained by regression analysis and k is a constant. The following statistical terms are commonly used to describe the quality of such equations: n, the number o f data points: the significance of n is that one might have more confidence in a 20-point QSAR with modest values of r and s (see below) than in an equation with good r and s values but few points. The distribution of the n points should also be taken into account--see p. 148. R, the multiple correlation coefficient: this a measure of how well the equation fits the data. A perfect fit has R = 1. For QSARs involving only one descriptor, x~, r (by convention written in lower case) is obtained from the plot of f(BA) against x~, and r will be + 1 o r - 1 for a perfect straight line. Sometimes R: is quoted. This corresponds to the fraction of the variance that is explained by the regression, referred to in some papers as 'explained variance'. The more the data points are scattered, the lower the value of R will be. For the QSARs reviewed in this paper, R is usually in the range 0.75 to 0.99. We note that R is frequently given to three significant figures, a degree of accuracy that is almost certainly spurious, given the inherent inaccuracy in the data used. This has led some authors to argue (erroneously) that one QSAR is more valid than another on the basis of comparing, for example, R values of 0.987 and 0.992. s, the standard error o f estimate: this is a measure of how well the f ( B A ) values are predicted by the QSAR, and is expressed as the standard deviation of the 'residuals':
147
where y = log(BA) observed, y~ = log(BA) calculated from the QSAR, n = number of observations and m = number of descriptors. The smaller s is, the better is the QSAR. The value s can be used to calculate the confidence limits (see below) of the predicted values. If s exceeds 3 times the limit of error of the f (BA) values, the QSAR is considered to be poor. If the predicted value of a n f (BA) differs from its actual value by more than 3s it may be considered as an outlier. As s is in the same units as f(BA), it should be noted that when the validity of two QSAR equations is compared the fact that s is larger for one of them has to be related to the units of the f (BA). The term s is also very useful when the effect of introducing an extra descriptor into the QSAR equation is considered. The descriptor has no relevance for the QSAR if its introduction does not lead to a significant lowering of s. confidence interval: this is a measure of the probability (usually 95 or 99% is taken) that the predicted value of log(BA) lies within a given range. The F test: this is a method of assessing the significance of a QSAR equation---the larger the value of F. the greater the significance. F is calculated from the number of data points and chemical descriptors, and the values of the actual and predicted biological activities. Its significance can be taken from standard tables. The F test is very useful in comparing the validity of QSAR equations in which different numbers of descriptors are used in an attempt to correlate the same set of biological data. It is also valuable in deciding whether the removal or addition of a subset of biological data improves the significance of a QSAR. When the QSAR equation is not that of a straight line, r is not relevant, but s can still be calculated and the F test applied to express the predictive ability and significance, respectively, of the curve. An R value can, however, be calculated for the plot of the log (observed BA) against the log (predicted BA) read from the curve, since for a valid QSAR this should be linear. n and R (or R 2) are nearly always quoted in QSAR publications, s is less frequently given and the F test is rarely quoted. Sources o f error in statistical techniques
Wold & Dunn (1983) have provided some useful rules for the correct application of multivariate statistical techniques in developing a QSAR. Their main emphasis is on assessing the Level of Triviality (LOT) in a QSAR, as follows. When dealing with a set of biological data for N compounds, each represented by m descriptors, two factors must be considered: DOF, the number of degrees of freedom, which is related to the numbers of truly independent data points and cannot be greater than N; P, the number of coefficients necessary to
L. Ttnuqv~ et al.
148
express the relationship. This equals (m + !) in multiple regression. It is a well-known property of empirical models that they can reproduce exactly a given data set when the number of parameters equals or exceeds the number of degrees of freedom. Thus for such models to be significant it is required that P<
II •
• •
[-(
•
• II II
•
• •
/" 1
I X
Fig. 2. By deleting points that do not fit in (a), an apparent agreement between the data and the regression model is reached in (b).
L;/'¢
en
I
I
I
I I
I
CD
Fig. 3. Plot of log (BA) against values of chemical descriptors (CD) for 23 compounds. the reader is referred to p. 10 of the Wold & Dunn (1983) paper. (d) Sampling artefacts may not be recognized. For example, in pattern recognition techniques when a QSAR is based on the statistical occurrence of substructures in active and inactive molecules, bias in the substructure definition and the data set can lead to wrong conclusions if substructures present in active molecules are over-represented because the corresponding compounds have been more thoroughly studied. Wold & Dunn (1983) noted that in a review of about 40 papers on pattern recognition analyses, 48% were reporting results that were completely trivial because the LOT criteria were met. ANALYSIS AND ASSF..~MENT OF ILLUSTRATIVE EXAMPLES In order to assess the present scope and limitations of QSARs, 45 papers representing the use of the techniques most commonly found in the extensive literature were examined initially. Of these, 18 were selected for analysis, on the basis that they illustrate important aspects of the techniques and problems and represent the use of the main types of QSAR approaches applied to a variety of chemical types and biological activities. The 27 papers not selected for analysis are listed in Appendix I. The majority were omitted because they dealt with chemical types, biological activities, chemical descriptors or techniques adequately illustrated in the papers chosen for analysis and, indeed, some of them broadly confirmed the findings in the latter. A number of papers gave such limited detail about the development of the QSAR, or based it on so few compounds, that no analysis was possible. A few were based on such highly sophisticated chemical descriptors that their use would be restricted to a few specialists. Finally, one paper contained apparent errors in the tabulated data, such that key points in the authors' argument could not be understood. No paper was omitted solely on the grounds that it was an example of a valid or invalid QSAR, and indeed examples of both types were included for analysis to aid understanding of the current scope and limitations of these techniques. The examples analysed, in the order in which they appear in the text, are listed in Table 1, which also identifies the biological activity, chemical descriptors, techniques and chemicals used. Each example is analysed in a standard way to bring out the important aspects and to enable any general observ-
binding
Toxicot~
A]kyl- and halogcn-substitued epoxides
Tabulation of chem. descript. and biol. activity
Po~, bp, by vibrational frequency of keto group, o, E, Part. coeff., tool wt, mol refractivity, water solubility g, 0', hydrogen bonding
Po,,, charge distribution Et (Hansch approach); presence/absence of Me or a Free-Wilson approach) Various
R , TLC values, tool refractivity, H-bonding Spectroscopic extinction codftcknt
Halogenated ethylenes
Graphical plot
Bond energy of C - - O in corresponding epoxides Electrostatic potential near epoalde O atom
Eder et al. 1982a (p. 159)~/
Substituted allylic compounds
Graphical plot
Lien & Tong, 1973 (p. 164) B a n d i e r a e t al. 1983 (p. 165)
Very large number of various types Ketones, alcohols, acetates, benzene derivatives Six classes of chemicals of very varied type Substituted tetrachlorobiphenyls
Combined Hansch and Free-Wilson Regression analysis
Regression analysis
Regression analysis
Craig & Enslein (HDI, 1982), (p. 161) Mfiller & Greff, 1984 (p. 163)
Me- and halogen-substituted alcohols
Hausch and Free-Wilson
Dillingham et al. 1973 (p. 160)
Biagi et al. 1983 (p. 159)
5-Nitroimidazoles
Jones & Mackrodt, 1982 & 1983 (p. 156) Po]~tzer & L a t u ~ , 1984 (p. 157)
Yuta & Jurs, 1981 (p. 155) Klopman, 1984 (p. 155)
Dunn & Wold, 1978 (p. 154)
Edelman e t a / . 1980 (p. 152) yon Szentpaly, 1984 (p. 153)
Saarikoski & Viluksela, 1982 (p. 151) Veith et al. 1979 (p. 152)
Hermens et al. 1985a,b (p. 150)
K6nemann, 1981a, (p. 150)
Referencet
Regression analysis
Substituted quinolineI-oxides Aromatic amines PAH; N-nitrosamines; ketoxime carbamates
Pattern recognition, ADAPT Pattern recognition
Topological and geometrical Molecular fragments
N-nitrosamines PAH
Pattern recognition, SIMCA
Regression analysis Regression analysis
Various 'Non-reactive' non-ionized organic compounds C l-substituted 'reactive' organic compounds Substituted phenols 62, of varied type
Chemicals used*
Various derived from molecular orbital calculations x, o, STERIMOL
R, O e, E:
Po.
Regression analysis
Po,,, Ist-order alkylation rate constant P ~ , dissociation constant Regression analysis Regression analysis
Regression analysis
geotoxk'eloly
Table I. QSAR papers analysed Technique
Po., water solubility
Chemical descriptors*
*For symbols/abbreviations see text or footnote to title page. tThe number in brackets indicates the page in this issue on which the analysis and assessment of the study appears. :[See also Jones & Mackrodt, 1982 & 1983 (p. 156).
~tor
Skin absorption, human and rabbit
Oral LDs0 , rat; teratogenicity lrritancy to respiratory tract, mouse
inhibition tissue-culture growth
LD~, mouse;
Other effects
Mutagemc~ty S. typhJ~Jurh,m
Sc injection & topical appin, mouse; tumour rate; LCs0 to spider mite Variously (and mutagenicity) Variously expressed
In various species, strains etc.
Liver, BD rat Skin, mouse
can~w~ty
Bioconcentration, fish
LC~o, guppy
Biological activity
O
8
o
g:
150
L. TuR~¢~ et al.
ations regarding the validity and applicability of the whose effect appeared to be based on modes of action QSARs to be made on a comparable basis. The other than the supposed anaesthetic one. The remainauthor's aim, the chemical descriptors, the biological J ing 50 chemicals, which fitted the QSAR very well, activity and the technique for deriving the QSAR are comprised a wide range of structures corresponding described. The resulting equations and graphical to low chemical reactivity. It may be presumed that plots that express the QSAR are then given, together they have a common (narcotic) mechanism of biologwith any information from the papers on their val- ical action. The QSAR is invalid for chemicals of log idity (e.g. regression coefficients and standard devi- Po, greater than about 6 (i.e. of low water solubility) ations). Detailed comments on the suitability, accu- as in these circumstances the LC~ may well exceed racy and/or comparability of the data, on chemical the solubility in water. This example of a cut-off has descriptors and on biological activity are made only often been described--see, for example, Toman & when these seem to be in doubt. In the sections Stota (1959). headed 'General comments', the phrase "There are Water solubility gave only a moderately good no comments on the data used" means that we correlation, possibly because the solubility data were believed that the data were sufficiently accurate, taken from various sources and/or because a corsuitable and compatible. Finally, an assessment of the rection for melting point (found to be useful in overall validity and applicability of the QSAR is similar cases) should have been applied (Mackay et made, and any limitations on its practical application al. 1980). (i.e. for the prediction of the biological activity of The wide applicability of this type of QSAR rechemicals not in the training set) are noted. lating Po, or water solubility to LCs0 and certain Some readers may find it more useful at first to sublethal effects has been further demonstrated for study the analyses that are relevant to their own the inhibition of reproduction in Daphnia magna interests and to refer to this chapter, as necessary, (Hermens et al. 1984), LC~o for the fathead minnow (Broderius & Kahl, 1985; Veith et al. 1983), toxicity when reading the subsequent chapters. to Daphnia magna, algae and two fish species (Calamari et al. 1983), LCs0 to 14 aquatic species (Slooff Ecotoxicology et al. 1983), LCs0 for the golden orfe (Lipnick & Dunn, 1983), toxicity to Daphnia magna (Bobra et al. K0oemann (1981a), LC~ to fish 1985) and subchronic toxicity to the fathead minnow (Call et al. 1985). Its application should be limited to The Q S A R With the aim of correlating certain physico- compounds of low chemical reactivity with biological chemical properties of a range of chemicals of 'lim- tissues and having a log Po~ below 6. It is essential in predicting LC~ values from this ited chemical reactivity' with their LC~ for the type of QSAR to have a basis for deciding which guppy, the author used 61 non-ionized organic chemicals, including aliphatic and aromatic hydrocarbons chemicals are 'reactive' or 'non-reactive'. This will and chlorohydrocarbons, alcohols, ethers, glycol de- not always be easy. The ability of this type of QSAR to distinguish rivatives and some related compounds (see Tables I groups of chemicals that probably have a common and II of the original paper). The biological activity mechanism of biological action is important because used was the 14-day, or in some cases the 7-day, LCs0 (~mol/litre) to the guppy (Poecilia reticulata). the toxic effects of chemicals in such groups are Numerous chemical descriptors were studied, the best expected to be additive and thus the toxic concencorrelations being provided by the octanol-water tration of mixtures should be predictable. Additivity partition coefficient (Pow) and water solubility (S). has, indeed, been proven for the LCso of a mixture of The Hansch linear regression technique was used the 50 non-reactive organic chemicals mentioned above (K6nemann, 1981b), the sublethal effects, in and for Po, gave the following equation: species other than the guppy, of mixtures of some of log I/LCs0 = 0.871 log Pow-4.87 the 50 chemicals (Hermens et ai. 1985b), the LCs0 of mixtures of chlorophenols (K6nemann, 1981b), the (n = 50, r -- 0.988, s -- 0.237)... (1) LC~ of mixtures of chloroanilines and 'reactive' where n = no. of compounds tested, r = regression organic halides (Hermens et ai. 1985b) and the LCs0 coefficient and s = standard error of estimate. of mixtures of between two and 21 non-reactive organic compounds (Broderius & Kahl, 1985). General comments
The author's basis for using Po~ or water solubility (which is colinear with it) is that both descriptors relate to the ability of a chemical to penetrate fish membranes. Such penetration leads to a general 'membrane perturbation' effect and sometimes causes death, for example, by anaesthesia. Water solubilities were either determined experimentally or were obtained from the literature, and their accuracy and comparability are not certain. Validity, applicability and limitations
The QSAR of equation (1) gave a good correlation of Po~ with LCso after the elimination of 1 ! chemicals
Hermens eta/. (1985a) LCs0 to fish The Q S A R
The authors' objective was to develop a QSAR between the LCs0 (to the guppy) of a variety of halogen-substituted organic compounds and their alkylating ability and Pow. This work represents an attempt to develop a QSAR for chemically-reactive organic compunds in contrast to the QSAR described above for chemically-unreactive substances. In total, 22 halogenated substances were considered initially (listed below with percentage purity in brackets), and they included seven of the ! 1 elimin-
Structure-activity relationships in toxicology ated by K6nemann (1981a) as 'reactive' (see previous section): (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (! 6) (17) (18) (19) (20) (21) (22) (23)
2,3-Dichloro- 1-propene ( > 99) ,,, ~'-Dichloro-m-xylene (>98) 2,2'-Dichlorodiethyl ether (>99) 2,4,ct-Trichlorotoluene ( > 95) Hexachlorobutadiene ( > 98) Allylchloride ( > 99) Benzylchloride ( > 99) 1,2-Dichloroethane (>99.5) I-Chloro-2,4-dinitrobenzene (>99) 1,3-Dichloropropane (>98) 1,5-Dichloropentane ( > 98) Chloroacetone (>95) 4-Nitrobenzyl bromide (>97) trans- 1,4-Dichloro-2-butene ( > 95) cis-trans-l,3-Dichloropropene ( > 75) l-Chloro-2-butene ( > 98) 3-Chloro- l-butene ( > 98) 1,3-Dichlorobenzene ( > 95) 2,4-Dichloroaniline (>99) 3,5-Dichloronitrobenzene ( > 98) Trichloroethene ( > 99.5) 1,2-Dichloropropane ( > 99.5) Formaldehyde
1o
t~
11
--
-5
-4
-3
-2
-1
tOg k
Fig. 4. The 14-day log 1/LC5o to the guppy against log k (reaction rate constants with 4-nitrobenzylpyridine) for 15 compounds: plot of equation (3)--line drawn by Task Force.
(1)
a straight line. The four saturated di-primary dichloro compounds (nos 3, 8, 10 and 11) had rather low alkylation rates and, apart from 8, did not fit very well on the straight line. The toxic activity of these rather unreactive compounds may depend more on P ~ , as in the K6nemann (1981a) QSAR (above). The three chemicals (nos 5, 9 and 13) with the highest alkylation rates lay on what appeared to be a plateau, and their LC~ values remained constant with increasing alkylation rate, possibly because they underwent competing side reactions. Therefore equation (3) which generates a plateau on the LCs0 axis was developed. It had a much better correlation coefficient than did equation (2). We conclude that the QSAR of equation (3) may be useful for predicting LCs0 values for fish when log k of the chemical is above about -4.5. At values below this the correlation is weaker. This QSAR results from one of the first attempts to use chemical reactivity in studying structure-activity relationships in aquatic toxicology, but more work is needed before its usefulness can be established.
(2) Saarikesld & Vfleksela (1982), LC~e to fish
= - 1.301og (1604 + k - I ) +4.35 (r = 0.94, s = 0.04)...
-4
-2
= 0.832 log k + 2.15 (r =0.89, s = 0 . 5 6 ) . . .
o
o,
The 14-day LCso Omol/litre) for the guppy, Poecilia reticulata, was taken as the biological activity. The Po, and the pseudo-first-order rate constants (k) of reaction with 4-nitrobenzylpyridine (NBP) were used as chemical descriptors. Linear regression was used in developing the QSAR. These equations were developed for 15 of the above compounds (nos 16-22 were excluded because five did not alkylate NBP and two were readily hydrolysed): 1 log ~-~--~= 0.474 log Po, - 1.98 (r =0.41, s = 1.11)...
151
(3)
General comments Alkylating ability was chosen as a chemical descriptor because it is assumed to reflect the capacity of a chemical to react with nucleophilic centres in biological macromolecules, a reaction that can cause, for example, a loss of activity in various enzyme systems. Validity, applicability and limitations Equation (1) reveals a poor correlation between log Po, and log I/LCso for these chemically reactive substances (the good correlation found by K6nemann (1981a) was for chemically inert compounds). The use of the reaction-rate constants, k, gave a markedly better QSAR, as is seen from the regression coefficient of equation (2) and from the authors' plot of log I/LCs0 against log k in Fig. 4. The seven chloro compounds (nos 1, 2, 4, 6, 7, 14 and 15) in which the CI atom is activated by allyl or benzyl groups lay on
The QSAR The aim in this paper was to correlate the partition coefficients and dissociation constants of a series of phenols with their bioconcentration potential (not considered here) and LCs0 to the guppy, and 21 phenols, variously substituted with alkyl, chlorine, methoxy and nitro groups, were studied. The biological activity was the 96-hr LCso (mmol/iitre) for the guppy (Poecilia reticula), determined at pH 6, 7 and 8. The chemical descriptors were the octanoi-water partition coefficient (Po,), and the difference between the dissociation constants of phenol and the substituted phenol (~Ko). Linear regression was used to establish QSARs, and 42 Hansch-type equations were examined, of the form: log !/LCs0 = a log Po~ -+ b, or log 1/LCs0 - a log Po, + b ApK, + c
152
L. Ttnu~Ev. et al.
General comments The biological basis for using Pow is that it relates to the lipophilicity of the phenols, i.e. their ability to penetrate fish membranes and so exert a toxic effect. pica is a measure of their degree of ionization at a given pH, and its inclusion tends to correct for the changes in Po~ at different pH values. Validity, applicability and limitations For 19 substituted phenols at pH 7.0 the equation log I/LCs0 = 0.59 log Po~ - 0.34
(n = 19, r =0.976, s =0.14) gave a good QSAR which was not improved by incorporating ApK~. At pH 7, therefore, hydrophobicity appears to be the primary factor determining toxicity. With the above equation, two dinitrophenols (2,5-dinitrophenol and 2-sec-butyl-4,6-dinitrophenol) were outliers and were therefore omitted from the calculations. The authors gave no explanation for this. LCs0 data at pH 6 and 8 were also available on 15 of the phenols. With Po~ only, and Po, and ApK~ together, excellent QSARs were developed with the results at pH 6, 7 and 8. At pH 6 and 8, incorporating ApKa improved the QSAR. The ApKo coefficient changed from positive at pH 6 to negative at pH 8, suggesting that the toxicity is higher the more acidic is the phenol, and/or that the ionized form of the phenol, increasingly present at pH 8, bioconcentrates less than does the un-ionized form and is therefore less toxic. The change in the coefficients of ApKo at different pH values was observed earlier by K6nemann & Musch (1981) in a similar study with chloropbenols. To summarize: good QSARs have been developed relating Po,, ApKo and LCs0 for a series of alkyl-, chlorine-, alkoxy- and mononitro-phenols at pH 6, 7 and 8. For reasons that are not understood, two dinitrophenols did not fit. Without knowing why this was so, and in particular without a better understanding of the mechanism of the biological action, the applicability of this type of QSAR to other substituted phenols is limited. Veith et ai. (1979), bioconcenratiog factors in fish The Q S A R
quito fish were taken from various papers in which the exposure periods were not always the same as those of Veith et al. The comparability of this set with that of the authors is therefore in some doubt. The authors provided limited experimental evidence that BCFs varied between species by at most only a factor of 3. The values for Pow were in part taken from the literature and in part determined by reverse-phase liquid chromatography. Their degree of intercomparability is thus not certain. Validity, applicability and limitations The above equation expressed a satisfactory QSAR from which, according to the authors, "log BCF can be estimated to within an order of magnitude" (the authors presumably mean BCF) for chemicals of log Pow ranging from 1 to nearly 7. Of the 62 chemicals used initially, three had BCFs much lower than was predicted from their Pow and were omitted. No reason for the discrepancy was given. These three outliers were all chemicals whose BCF had been determined by Veith et al. (i.e. the data from other laboratories were compatible with the remaining data of the authors). We note that the log BCF of seven chemicals is expressed as <0.32 or <0.90, and of NTS-I (chemical structure not given) as 0.32-1.00. Strictly speaking values expressed thus are not usable, and the QSAR is in fact valid for 52 not 59 chemicals. This does not, however, alter the comments on its overall validity. This paper is one example of many in which the correlation of BCF and Pow has been established, and such QSARs have found frequent use in estimating BCFs for practical purposes.
Toxicology Edelman eta/. (1980), Hver emrcfumogemu~ityin tl~ rat The QSAR The authors aimed to correlate certain physicochemical properties of structurally related Nnitrosamines with their organ selectivity (liver tumours) in the BD rat. The following members of the series of N-nitrosodialkylamines of general formula R~-N(NO)-R 2 were used: dimethyl, diethyl, dipropyl, diisopropyl, dibutyl, diamyl, methyl-ethyl, methylcyclohexyl, methyl-heptyl, methyl-phenyl, methylbenzyl, methyl-(2-phenylethyl), ethyl-vinyl, ethylisopropyl, methyl-vinyl, methyl-allyl, methyl-amyl, ethyl-butyl and butyl-amyl. The primary metabolites formed by enzymatic cleavage of the parent molecule on either side of the central nitrogen atom, and containing either the RI-N - or R2-N- fragment, were also taken into consideration, but other possible metabolites were not. The biological activity was expressed as L ffi A/(A + B), where L is the proportion of animals with liver tumours in the total number of animals with tumours, A is the number of animals with liver tumours and B is the number of animals with other
The aim of the authors was to correlate the Pow of a variety of chemicals with their ability to bioconcentrate (expressed as the bioconcentration factor, BCF) in several species of fish. Altogether 62 chemicals, too diverse to group into classes and including many pesticides, were studied (see original paper). Linear regression was used to derive the equation: log BCF = 0.85 log Po~ - 0.70 (n -- 59, r = 0.947) General comments The authors determined BCFs of 30 of the 62 chemicals in the fathead minnow by a standard tumours. method and the values should be internally consisAs chemical descriptorsthe following were taken: tent. Of the remaining 32 chemicals, BCFs for the --hydrophobic parameters (derived from partition fathead minnow, bluegill, rainbow trout and mos- coefficients),i.e.x for the parent compound, ~l for the
Structure-activity relationships in toxicology longest-chain metabolite, and ~t, for the shorter-chain metabolite; ---a*, the Taft electronic factors for R ~ and R'; --Es, the Taft steric factors for R t and R'. The biological activity (L) was taken to be a function of it, o* and E,, and was expressed as L = 1/(1 + e - / ) wherefwas determined by regression analysis giving the following equations: f = 0.996 ~t2 - 4.95 it + 4.04
were taken into account gave poor correlations. When the partition coefficients of metabolites 0it and It,) were introduced the correlation became very good (equations 3 and 4). These observations are consistent with the concept that organ selectivity involves the initial diffusion of the intact parent molecule to a target cell, followed by the release of one or more metabolites. von
(n = 19, R = 0.580, s = 0.39)
...
(1)
153
Szea~aly (1984), Hndl indices for ¢arc/mog~dty
...
(2)
...
(3)
...
(4)
The QSAR The aim of this author was to establish QSARs relating the carcinogenic potency of a series of polycyclic aromatic hydrocarbons to chemical descriptors related to the mechanism of action. He used 26 unsubstituted polycyclic aromatic hydrocarbons containing between three and eight fused rings to establish QSARs, and studied their application to a further ten such compounds. The biological activity was expressed as the Iball index (I), a measure of potency derived from skinpainting studies in mice, where % of papilloma-bearing mice* 1= average latency period of affected mice
Validity, applicability and limitations Equations (3) and (4) had high values of r, but are weak statistically because there arc, respectively, seven and eight chemical descriptors and 19 chemicals (see Sources o f e r r o r . . , p. 147). Correlation was poor when equations (i) and (2) were used. Edelmann et al. conclude that "QSARs derived from larger sets of well-chosen examples may be of use as extremely rapid methods for non-biological evaluation of the carcinogenic potential of new compounds". They pointed out, however, that the present exercise "was done with a fairly small and welldefined sample of compounds, and that only selectivity towards liver with respect to other organs (albeit mainly oesophagus) was considered". We agree with this caution and emphasize, in addition, that this QSAR must be restricted to the BD rat and the liver since organ specificity varies with species and strain, and often with the dose level. The authors rationalize their findings by suggesting that the electronic and steric factors represent effects near the reaction centre in the molecules (i.e. the amine N atom). Equations (1) and (2) in which the transport properties of only the parent molecules
(*surviving beyond the shortest latency period). The Iball index is not much used nowadays, but was a convenient parameter for this QSAR study. Three chemical descriptors were used. The first, M, expresses the preference for epoxidation of the molecule in the M region relative to reactions at other competitive centres (Fig. 5) and is based on Dewar numbers which express reactivity and can easily be calculated by simple molecular orbital theory. It is defined as (Nm - N c ) 2, where Nm is the smaller of the two Dewar numbers for the C atoms (*) at the M region and Nc is the smallest Dewar number corresponding to competitive reactions at other positions. The epoxidation in step 1 is believed to be followed rapidly by steps 2 and 3 (Fig. 5), the conversion of the initial epoxide (I) into the dihydrodiol epoxide (II). The second descriptor, (ED + Ec) , which again is easily calculable from MO coefficients, quantifies the ease of formation of a carbo-cation by the opening of the epoxide ring in compounds I and II. Eo is the Hfickel delocalization energy of this delocalized carbo-cation, and E c is the charge dispersal energy, which corrects for the inability of the Hfickel term to distinguish between the formation of a radical and a cation. The third descriptor, A = (n - 20) 3, where n is the number of C atoms in the molecule, is an empirical term that takes account of the optimum size of the
f = 14.61 ~r2 - 78.2 it - 2.6 o* - 20.98E, + 76.4 (n -- 19, R =0.861, s =0.26) f = 12.13 7r2- 35.11 7r +30.61 o* - 31.17 n~ + 165.51 ~t, + 50.12 It~ -
316.86 xl + 230.54
(n = 19, R = 0.997, s = 0.048) f = 10.64 x 2 - 34.11 ~ - 62.82o* + 34.35 E, -- 10.25 ~r,2 + 42.52 7t, + 8.04 x~2 - 61.65 7t~+ 53.98 (n =
19,
R = 0.999,
s =
0.028)
General comments There are no comments on the data used.
0
I
~.
2
(D
OH
3
) (tl)
Fig. 5. Epoxidation in the M region in preferenoe to reaction at other competitive centres (step 1) and conversion of the initial epoxide to the dihydrodiol epoxide (steps 2 and 3).
L. ~ e t
154
molecule (between 20 and 24 atoms) for carcinogenic activity. The QSARs were established by regression analysis, giving: I = -80.47 M + 8.244 (Ev + Ec) - 0.073 A - 331.7 ...
(n = 2 6 , R =0.961, s = +6.8)
(l)
I = - 146.33 M + 38.91 E~ - 0.0569 A - 614.17 ...
(n = 2 6 , R =0.824, s = +14.1)
(2)
General comments The Iball indices were taken from data in six different papers and the author stated that they "can be trusted to within a range of + 10". (The values range from negative or zero for non-carcinogens, to up to 80 for carcinogens.) Validity, applicability and limitations The QSAR represented by equation (1) gave a good correlation of the Iball index with the three descriptors. When only the Hfickel term E~ was used, the much poorer QSAR of equation (2) resulted. This means that it is necessary to distinguish between the formation of a carbo-cation and a radical in step 3. Overall, the author has skiifully used a knowledge of the mechanism of the carcinogenic activity to choose appropriate descriptors leading to a valid QSAR. The fact that the ver~ simply calculated M, ED and Ec terms proved adequate for the QSAR raises the question as to whether other types of descriptors of greater accuracy, derived by more sophisticated molecular orbital calculations, are really necessary for QSARs, given the limited accuracy of the biological data. The QSAR of equation (!) was based on 26 hydrocarbons of varying structural types and should thus be useful for prediction. The author, on the basis of the QSAR, suggested that the reported Iball index of dibenzo[a, i]pyrene was much too high. Subsequent examination of the original papers showed that the index was indeed in error, but the correct value has not yet been determined by experiment.
Dana & Wold (1978), carcinogenicity The S A R The authors aimed to develop an SAR between the carcinogenic activity of a series of 4-nitro- and 4-hydroxyaminoquinoline-l-oxides and certain of their physico-chemical and steric parameters. Thirtythree variously substituted quinoline-l-oxides (Fig. 6)
NO 2
NHOH
0
0
(!) (!I) Fig. 6. Parent structures of the substituted quinolinc-loxides.
al.
of parent structure (I) or (II) were considered. The biological activity was the presence or absence of carcinogenic potential. No attempt was made to rank the compounds by potency. The chemical descriptors examined were: n, the substituent contribution to hydrophobicity (related to Pow); a,~ and %, Hammett electronic parameters; molar refractivity; Verloop's STERIMOL constants, i.e. L, related to the length of the molecule along its main axis, and B~, B2, B3 and B~ expressing the thickness along the main axis. Some of the descriptors related to the six possible positions of substitution in molecules (I) and (II), and in total 43 were initially considered, of which 35 were retained for use. The SIMCA pattern recognition technique was used to establish the SAR. General comments The carcinogenicity data, drawn from four papers, were derived from a variety of methods, strains of animals and doses. They seem unlikely to be very self-consistent. Of the 33 compounds considered, 18 had consistently given positive results in animal studies, ten had given negative results, and five had given inconsistent results and were excluded from the training set. Validity, applicability and limitations On the basis of 35 chemical descriptors used in a four-component mode, 16 out of 18 (89%) of the carcinogens and seven out of ten (70%) of the non-carcinogens were correctly classified. The correctly classified carcinogens fell into a well-defined grouping, but the non-carcinogens were diffusely spread outside this. A better correlation was obtained when only the fourteen 6-substituted quinoline oxides were considered in a two-component model based on only ten chemical descriptors. The QSARs proved to be robust (i.e. independent of the data set chosen) when checked by the 'leave out' technique--the repeated removal of one quarter of the compounds in the training set and classification of the remaining three quarters. Data were available on the ability of seven of the 6-substituted quinoline oxides to initiate unscheduled D N A synthesis. When a term expressing this ability was plotted graphically against a parameter representing the position of each chemical in the grouping, a "significant relationship" was revealed except for the compound with a ---COOH substituent, which may have been in the anionic form rather than in the neutral form to which the chemical descriptors referred. Dunn & Wold suggested that this indicates that the 6-substituted compounds cluster according to carcinogenic potency, which should be related to their ability to initiate unscheduled DNA synthesis. While the validity of the above SARs for classifying the 16 carcinogens and seven non-carcinogens is beyond doubt, their applicability to other compounds is limited by our ignorance of why five related chemicals were incorrectly classified.
Structure-activity relationships in toxicology Ynta & Jars (1981), ¢arcinogenictty
The S A R The object in this paper was to develop an SAR to discriminate, on the basis of numerous structural descriptors, between aromatic amines that are carcinogenic and those that are non-carcinogenic to the rat. The chemicals comprised 157 aromatic amines, subdivided into six classes--i.e, substituted biphenyl, stilbene, fluorene, diphenylmethylene, diphenylazo and some miscellaneous amines. The biological activity was taken as the presence or absence of carcinogenic activity in various organs of the rat, expressed separately by route of administration and tumour site. The chemical descriptors used were of two types, topological and geometric. Topological descriptors included: fragment descriptors (number of atoms and bonds of various types, molecular weight, number of rings, number of atoms in the rings); substructure descriptors relating to an assumed mechanism or metabolic activation, and chosen on the basis of experience; molecular connectivity descriptors, expressing branching in the molecule; environment descriptors, indicating how substructures are interconnected and what their immediate surroundings in the molecule are. Geometric descriptors (the three principal moments of inertia of the molecule and their ratios, and molecular volumes) represented the shape of the molecule in three-dimensional space. A pattern recognition technique with the A D A P T software system was used as follows. Each chemical was represented by a point in n-dimensional space corresponding to its n chemical descriptors. Various techniques were used to search for clusters. Once discriminants were found that separate active and inactive chemicals, the internal consistency was checked by a repeated process of removing some chemicals in the data set and reclassifying them in the SAR developed from the remainder. This constituted a partial validation of the SAR. Complete validation would require its application to carcinogenic and non-carcinogenic amines outside the training set.
General comments The biological data were taken from two compilations of carcinogens. Aromatic amines have been studied intensively since the 1950s and in many cases carcinogenicity has been confirmed in more than one study. In order to be classified as a carcinogen a chemical was required to cause cancer in the breast (a rather uncommon site for the action of aromatic amines), and in at least two other sites in the rat. Only chemicals inactive at any site in the rat were classified as non-carcinogens. The biological data thus seem to be particularly well founded. In developing the SARs, the number of descriptors was limited to less than one third of the number of chemicals, to avoid chance separations. To avoid bias in the data set, a descriptor was used only if it occurred in at least 10% of the compounds.
155
Validity, applicability and limitations Specific SARs that enabled between 85 and 90% of the amines to be correctly classified were developed for the six subclasses. The authors add the caution that "These results are specific to the data sets used. They will be general to the extent that this data set mimics the universe of aromatic amino compounds". The results were not greatly improved by considering the route of administration and target organ, and the authors cite this as evidence that in SAR studies "it is not necessary to limit a study to one site, one route of administration, etc. in order to proceed". In our opinion this is (quite correctly) not meant to be a general statement about all carcinogenicity SAR studies, but may be true of studies, such as that of these authors, in which a closely-related group of chemicals of similar carcinogenic profile is considered. To obtain the above results, between 6 and 15% of the chemicals had to be excluded from the various subclasses. Yuta & Jurs noted that aromatic amines require metabolic activation before manifesting carcinogenic activity, and that no consideration of metabolites was introduced into their study. Without a clear understanding of why some of the amines did not fit the QSAR, its applicability to amines outside the training set is questionable. Molecular descriptors relating to size and shape (e.g. number of tings and the principal moments of inertia) appeared consistently in the SARs and may therefore be related to the mechanism of carcinogenic action. The various pattern recognition techniques used were markedly different in their ability to classify the amines correctly. In general, depending on the quality of the data and the nature of the problem under study, one or other pattern recognition technique usually proves to be more suitable than the others. K l o p m ~ (1984), cardnogenldty and pestlddal activity
The QSAR The author describes an "artificial intelfigence" method for establishing a qualitative SAR between topological substructures of molecules and their carcinogenic or pesticidal activity. Three groups of chemicals were studied: benzene (classed as non-carcinogenic) and 37 polycyclic aromatic hydrocarbons 0PAHs) of which 15 were classed as carcinogenic, 39 cyclic N-nitrosamines of which 27 were classed as carcinogenic, and 39 ketoxime carbamates, of which 23 possessed pesticidal activity. The carcinogenicity of the PAHs was based either on subcutaneous injection or on topical application studies in mice. Data on the carcinogenicity of the nitrosamines was taken from a single paper describing feeding studies in rats. If more than 50% of the rats were still alive after 100 weeks the compound was rated as 'inactive'; it was rated as 'active' if more than 50% had died from cancer before 100 weeks. The activity of the ketoxime carbamates was expressed as their LCso to two species of spider mites. Sixteen were classed as 'inactive' (LCse above 500 ppm), 4 as 'marginal' (LCz 300-500 ppm) and 19 as 'active' (LCs0 below 300 ppm).
L. TUR~r.g et al.
156
The technique used was to make a computer analysis of each structure, and so recognize the fragments formed by breaking it up into linear subunits of between 3 and 12 atoms other than hydrogen. All subunits belonging to active molecules were labelled 'active' and vice versa. A statistical analysis was carried out to eliminate fragments common to the active and inactive molecules. General comments
The set of substructures used as chemical descriptors were developed from the training-set molecules, thus avoiding the problems of bias noted in the Craig-Enslein approach (see p. 161). Data on the carcinogenic activity of the PAHs were taken from a review (Dipple, 1976) of 12 papers describing studies of subcutaneous injection giving rise to sarcomas and of topical application leading to benign papillomas or malignant epitheliomas, in mice. Although Dipple (1976) gave no clear basis for his carcinogenicity classification, Klopman adopted this uncritically. In fact, the data are very inhomogeneous; for example, Klopman classified dibenz[a, c]anthracene as carcinogenic because it was so by topical application, but it was inactive by subcutaneous injection. It is also noted that 13 of the 23 PAHs classified as 'inactive' had been subjected to only "limited testing". From a toxicological and practical standpoint, the classification as 'inactive' of nitrosamines which could have caused up to 49% of the experimental animals to die from tumours is simply unacceptable. Although this does not mean that the QSAR is, of itself, invalid, it could mean that its application could result in some dangerously wrong classifications of compounds as 'inactive'. Validity, applicability and limitations PAHs. Of the 2876 structural subunits identified
by the computer program, structure I existed in six active and two inactive molecules, II was present in five molecules all of which were active, and III was found in ten inactive and four active molecules (Fig. 7 for structures I-III). The carcinogenic activity of active PAHs was subclassified as + , + + or + + + , and substructure II was present in all four compounds in the + + + class. N-Nitrosamines. Structural fragments IV and V were consistently associated with carcinogenically active compounds, and VI with the inactive ones: ----CH2--N--NO --CH2--N--CH 2 - - - C H - ~ O O H (IV) (V) (VI) The presence of IV or V and the absence of VI led 8
bay ~.~'.,,~9~ teeion7 ~ f IT
>,. , 1
to the correct identification of all 27 active compounds. Of the 12 classed as inactive, ten lacked IV and V or contained VI. Ketoxime carbamates. Of the 4155 structural fragments identified, ten appeared to be relevant to the pesticidal activity and three were particularly so. The use of these three, with some of the other relevant fragments, enabled all of the 16 inactive compounds to be correctly identified. Of the remaining 23 compounds, four were marginally active (two identified as active and two as inactive) and 19 were active (15 identified as active, four as inactive). When the activity of four carbamates not used in the training set was predicted, the results were correct for the single inactive compound and for two of the three active ones. The author emphasized that because the chemical descriptors are only topological, the SAR can be used only when the biological activity does not depend on geometric or physico-chemical properties. This is a severe limitation since it implies that a considerable knowledge of the biological mechanisms is necessary before recognizing whether the approach is applicable. Although this interesting approach could be further validated by applying it to wider classes of chemicals, the limitations noted above make this difficult. Jones & Mackrodt (1982 & 1983), oaeogenicity and mutagenicity The Q S A R
The aim in these publications was to correlate the mutagenic and oncogenic activity of a series of halogenated ethylenes with the stability of the weakest C---O bond in the ring of the corresponding epoxides, and to use the resulting relationship to predict the mutagenicity and oncogenicity of other alkene epoxides. The chemicals used in the training set were CCl2=:~Cl2, C C I ~ H C I , C C I ~ H 2 , c/s-CHCI==CHCI, CHC~-----CH2, CH~--------CHF and CHr--~CF2. There were no mutagenicity data for the last two, and the mutagenicity training set therefore comprised only five compounds. The oncogenicity training set consisted of six substances, there being no data for c i s - C H ~ H C I . The oncogenicity of seven chloro- and fluoro-substituted ethylene oxides and of ethylene oxide itself was predicted but not compared with any experimental data. The mutagenicity of a further 25 alkene oxides containing various substituent groups was predicted and in 16 cases the mutagenicity predictions could be compared with experimental findings.
bay ~egl°n911
6
,
,....
....
4
2
: ....
3
1
(1) (II) (lid Fig. 7. Three structural subunits identified in polycylcic aromatic hydrocarbons, (I) and (II) being activating fragments and (III) a deactivating fragment.
Structure-activity relationships in toxicology The biological activities used were, for mutagenicity, the % increase in the spontaneous mutation rate for the arginine operon of E. coil Kin2 in the presence of a metabolic activating system, and for oncogenicity, the percentage of preneoplastic loci deficient in nucleoside-5'-triphosphatase in rats exposed to the chemical (expressed as the theoretical activity at a concentration of I mol/kg body weight). The two-centre bond energies of the weaker of the two C - - O epoxide bonds Ec-o, calculated by a molecular orbital method and expressed in electron volts (eV), were taken as chemical descriptors. The QSAR was expressed as a graphical plot of Ec-o against the mutagenic and oncogenic activities of each compound in the training sets. General comments
Mutagenicity data for the training-set compounds were from a single paper (Greim et al. 1975) in which the authors emphasize that the degree of mutagenicity is dose dependent, which means that quantitative comparisons cannot be made at different dose levels. Jones & Mackrodt, however, did use data at different dose levels. The actual mutagenicity of compounds in the prediction set was taken from several papers by different authors and may not have been internally consistent or quantitatively comparable. We also note that the occurrence of preneoplastic loci deficient in nucleoside-5'-triphosphatase is not necessarily a good descriptor for oncogenicity because not all such loci progress to neoplasms---their formation may be reversible, depending for example on the chemical. Validity, applicability and limitations
The plot of oncogenic and mutagenic activity against the values of Ec--o for the training set is shown in Fig. 8. Crucial parts of the figure were based on rather few points: the left-hand line for oncogenicity and the right-hand line for mutagenicity were established by one point only. This severely limits the
°i°i 600
8
4
......
''''7
' /
200
. 0
Ec_ o ( i V )
Fig. 8. Spontaneous mutation rate for the arginine operon of E. coli Kll (O) and the foci theoretically produced by I mol metabolites/kg body weight (O) for fluoro- and chloroethylenes as a function of the calculated two-oentre energy of the corresponding epoxides. [After Jones & M a c k rodt, 1982].
157
credibility of the QSAR. It is also noted that the two points at the peaks in Fig. 8 were from experiments at the highest concentrations used, and would thus be expected to lie at high values of mutagenic activity. On the basis of Fig. 8, the authors noted that the parallel between oncogenic and mutagenic activity is quite marked and that only those halogenated olefins whose corresponding epoxides have Ec--oS between about -14.5 and - 1 2 . 8 eV are oncogens or mutagens. They were unable to find any examples of "simple genotoxic aliphatic epoxides" whose calculated Ec_o was not within the range - 1 4 . 5 to - 12.8 eV. We note, however, that the Ec-oS of the training set and oncogenicity-prediction set of epoxides do not correlate with published data on their hydrolytic or thermal stability. This is not surprising since Ec--o relates either to the ease of removal of an electron from the ~-orbital of the C - - O bond or to its homolytic cleavage (it is not clear which from the papers), whereas neither process is involved in the reaction of the epoxides with water or nucleophiles, and probably not in their thermal isomerization either. As the theoretical basis assumed in this SAR seems to be weak, its predictive ability must be suspect. Nonetheless, the agreement between the predicted and actual mutagenicities of the epoxides in the mutagenicity prediction set is high (14 out of 16 correct--see next paragraph) although, because the Ec--oS are not reported, it is impossible to assess the degree of agreement in detail. A possible reason for the good agreement is that the strength of the C - - O bond depends on the electronegativity of the C atom. Electron-attracting or -donating substituents on this atom will make Ec-o more negative or less negative, respectively. When the C atom carries 2H atoms it will be a good centre for SN2 reactions and the corresponding epoxides may have Ec_oS in the range assumed to correlate with mutagenicity. In the absence of the Ec-o data one can only speculate. For comparison with the predicted mutagenicity, the measured mutagenicities were taken from several different publications and were expressed qualitatively, i.e. as + or - . Fourteen of the predictions for the 16 epoxides in the prediction set were correct. Of the remaining two, the experimental mutagenicity results for 3-epoxybutane seemed to be ambiguous, and 2-trichloromethylpropylene oxide was incorrectly predicted to be non-mutagenic. The predicted oncogenicities were not compared with any experimental assessments, and their validity cannot therefore be evaluated. The authors noted that the results relate specifically to the epoxides, the assumption being that they are formed by metabolism of the alkenes. The mutagenicity of the alkenes per se may be different. They add that stereochemical and lipophilic effects and the consequences of epoxide metabolism have not been considered. In our opinion, the above considerations remove the basis for a QSAR related directly to the alkenes. Polltzer & Laurence (1984), c a r d m g t m ~ t y The Q S A R
The aim of this study was to correlate qualitatively
158
L. TuR~_a et al.
\/o\
Y'
/
/
C--C
Y
X'
subcutaneous injection. Significantly, however, Walpole found that ethylene oxide produced no tumours under his experimental conditions, and yet on the basis of a different (inhalation) study, Politzer & Laurence (1984) classified it as a 'carcinogen'. Furthermore, propylene oxide was apparently a much weaker carcinogen than was ethylene oxide, both in an inhalation study (NIOSH, 1981) and in a later subcutaneous injection study (Dunkelberg, 1979), despite the fact that the V~n values of the two epoxides are rather similar ( - 53.4 and -51.3keel/tool, respectively). These facts throw much doubt on the consistency of the carcinogenicity allocations.
\
'
X
where Ihc sub$tituent$ were :
X
Y
X'
Y'
H CH3 CI CH2CI cis
[ CH 3
and t CI tran~ CH2CI CI
CI
CI
Validity, applicability and limitations
CI
-
CI
CI
CI
CI
CI
C1
CI
CI
Fig. 9. Identity of the 13 substituted ethylene oxides used in the Politzer & Lawrence (1984) study. the carcinogenicity of a series of alkyl- and halogensubstituted epoxides with the most negative electrostatic potential in the neighbourhood of the epoxide oxygen atom. Thirteen substituted ethylene oxides were used (Fig. 9). The biological activity was expressed by the phrases 'carcinogen', 'weak carcinogen', 'insignificant activity' or 'considered inactive (little evidence)'. The chemical descriptor used was the most negative electrostatic potential V~o (kcal/mol) in the neighbourhood of the oxygen atom, calculated by a molecular orbital procedure. The SAR was expressed by simply comparing V ~ with the carcinogenicity categories listed above. General comments
The carcinogenicity assignments are based on very inhomogeneous data drawn from six literature sources in which studies on various species by various routes of administration are described. On the basis of such inhomogeneous data, few toxicologists would claim to be capable of making the subtle distinctions of potency expressed by the authors in the terms 'carcinogens', 'weak carcinogens' and 'insignificant activity'. c/s-l,2-Dichloroethylene oxide was classified as
There was a good qualitative correlation between V ~ and the authors' carcinogenicity ranking. Thus eight epoxides of V ~ ranging from - 2 8 . 4 to - 53.4 kcal/mol were 'carcinogenic', three epoxides of V~,~, between - 23.1 and - 27.7 keel/tool were 'considered inactive' or a 'weak carcinogen' and two epoxides of V~m between --9.2 and -17.1 keel/tool were of 'insignificant activity'. The authors concluded that "a halogenated hydrocarbon epoxide is reasonably likely to be a carcinogen if its electrostatic potential near the oxygen reaches values more negative than about - 30 kcal/mol". They "do not claim that the carcinogenicity of epoxides correlates directly with the electrostatic potentials near their oxygen atoms, nor that V ~ permits an ordering of the relative potencies of several carcinogens", but suggest that other factors need to be combined with V~o in eventually establishing a true correlation and that V~n provides a useful basis for a preliminary assessment of the likelihood that the halogenated hydrocarbon epoxide is a carcinogen. This is an exemplary model of cautious appraisal, but casts much doubt on the "ordering of the relative potencies of several carcinogens" which they in fact claim to make. In many cases, epoxides produced by metabolic oxidation of alkenes are believed to be the reactive species responsible for toxic effects arising from electrophilic attack of the epoxide on nucleophilic cellular components. Vm should correlate with the ease of protonation of the epoxide-O atom, such protonation activating the molecule towards ring opening by biological nucleophiles (Nu-): H ~)+
0
/
\
/
/
Nu-
/ /
\
OH
C--C
Nu 'considered inactive (little evidence)' on the basis of one study of its metabolism and mutagenicity, in which, however, no carcinogenicity data were given. The trans isomer was also classified on the basis of little evidence. This means that the classification of two of the five epoxides that are not 'carcinogens' was somewhat doubtful. The description of propylene oxide as a 'carcinogen' was based on a limited study by Walpole (1958) in which 12 rats were given a
This assumption is an oversimplification since although protonation will undoubtedly facilitate ring opening, it may not be necessary for epoxides with electron-withdrawing substituents. Another important issue is whether the epoxide ring opens by an SN l or SN2 (as above) pathway. V ~ would not distinguish between these two pathways since protonation would facilitate both. Furthermore, if an epoxide contained a substituent group
Structure-activity relationships in toxicology itself capable of reacting with a biological nucleophile, e.g. the CI atom in epichlorohydrin which is in the training set, a reaction that does not require protonation could occur. In view of the above criticisms, the predictive value of the QSAR is expected to be low, and great care would be needed in any attempt to extend it to other substituted epoxides. Biagt eta/. (1993),
mutagenlclty
The QSAR The authors' aim was to develop a QSAR between the mutagenic activity of a series of 5-nitroimidazoles and their chromatographic behaviour (related to lipophilicity), hydrogen bonding, molar refractivity and redox potential. Twenty substituted 5-nitroimidazoles were used (Fig. 10) with widely varying groups at R~ and R 2. As biological activity, the mutagenicity of each substance in the Ames (Salmonella typhimurium TAI00) test was expressed as log l/C, where C = t h e molar concentration that increased the number of revertants by a factor of 5. The chemical descriptors were: R M values of the substances from thin-layer chromatography--these correlate with the partition coefficient, i.e. with lipophilicity; a composite term MR 2 x HB, where MR2.is the calculated molar refractivity of the R 2 groups, and HB is their hydrogen bonding ability expressed as 1 or 0, so that if the chemical has H-bonding capacity (H = !) the molar refractivity (a polarizability factor) is taken into account; the redox potential, Ecp, of the nitro group. The QSAR was established by regression analysis which led to the following equations: 1
log ~ = 3.805 + 0.680 (MR 2 x HB) + 0.548 R M- 0.749 R~ (R = 0.926, s = 0.466)
...
(1)
...
(2)
...
(3)
1
l o g ~ = 3.585 +0.615 (MRs x HB) (r = 0.861, s = 0.593) 1
log ~ = 7.293 + 0.057 Ecp (r ----0.357, s = 1.088)
General comments There are no comments on the data used.
i
Rl NO2~,~NI N~R2 Fig. 10. General structure of the substituted 5-nitroimidazoles used by Biagi et al. (1983). Rt and R2 were widely varying groups.
159
Validity, applicability and limitations The correlation of MR 2 x HB with log I/C in equation (2) was rather modest, but nevertheless said by the authors to show that bulkier R2 groups capable of H-bonding increased the mutagenic activity. Introduction of the R M and R~ terms gave the rather better correlation as in equation (1), indicating that lipophilicity is important. Although it is well known that nitro-aromatic compounds have to undergo reduction in order to exert mutagenic activity, there was only a very poor correlation (equation 3) between log I/C and the redox potential of the 20 substances. The authors suggested that all compounds may have been reduced under their experimental conditions so that differences in Ecp played no role in determining the extent of mutagenic action. Although the MR and HB descriptors referred only to the R~ substituents, R~ was not kept constant--in fact there were 11 different R~ groups of very widely differing structure. The authors give no reason for ignoring any possible influence of variations in R~. The authors' use of a fivefold increase in the number of revertants as the biological activity meant that other parts of the dose-response relationship were not taken into account. If some other fixed point for the number of revertants had been used, a different mutagenicity ranking may have resulted. See also the comment later Bandiera et al. (1983) analysis, p. 165 on the use of values for 0 or 1 of HB. Eder et al. (1982a),
mutagenicity
The QSAR The aim in this paper was to develop a qualitative and quantitative SAR between the mutagcnic properties of a series of allylic compounds and their alkylating ability. Four groups of variously substituted compounds were considered: five allylic compounds with different leaving groups, i.e. groups involved in the alkylation and eliminated from the molecule; 14 allylic compounds lacking alkylating ability because they possessed no appropriate leaving group; eight structurally related, non-allylic chloro compounds lacking alkylating ability; seven Me- or Cl-substituted allylic compounds plus allyl chloride from the first group. The allylic compounds were of general formula RICH = C H - CH(R2)X, where X is the potential leaving group. Mutagenic activity was expressed as the number of revertants//~mol in a Salmonella typhimurium assay in the presence or absence of S-9 mix. The chemical descriptor was AE (560 rim), i.e. the change in the spectroscopic extinction coefficient of 4-(p-nitrobenzyl)pyridine, at wavelength 560 run, after a 10-rain reaction with the allylic compound. The QSAR was expressed as a graphical plot of AE against mutagenic activity. General comments The biological basis for choosing alkylating ability
L. TURNER et al.
160
as the chemical descriptor is the assumption that mutagenicity follows from a direct reaction of the allylic compound with nucleophilic sites of cell macromolecules, i.e. that such allylic compounds have genotoxic potential without the need of metabolic activation by the S-9 mix. Validity, applicability and limitations
All of the chemicals in the second and third groups above were non-mutagenic, corresponding to their lack of alkylating ability. All of those in the first and fourth groups, possessing measurable alkylating activity, were mutagenic. Thus a good qualitative SAR exists for a total of 34 allylic and non-allylic substances. The QSAR for the 12 allylic compounds in the first and fourth groups above is represented in Fig. 11. Seven allylic compounds ( × ) lay on a straight line representing the relationship between log (mutagenic activity without metabolic activation) and log (AE). The three compounds (I-l) with the highest alkylating activity were much less mutagenic than was predicted from the straight-line relationship, and for these the QSAR is probably best represented by the authors' curve shown as the broken line in Fig. 11, although it could be drawn lower down to represent the three points better. The authors could suggest no clear-cut explanation as to why the two compounds ( 0 ) lay off the straight line. Overall, the straight-line portion of the graph represents a QSAR which is valid for the chloro compounds but not for the three allylic compounds with other substitutents. Eder et ai. (1982b) have given three reasons why compounds of high alkylating ability may not fit such a QSAR: toxic actions other than those leading to mutagenicity (e.g. the destruction of cell walls) may predominate, the compounds may be inactivated by
reaction with nucleophiles in bacterial cells or the suspension medium before reaching DNA or RNA, or they may react with nucleophiles faster than they can diffuse towards them (i.e. the rate of reaction is determined by diffusion rather than reactivity). The authors conclude that the presence of an ailyl moiety and a leaving group in the allylic position are prerequisites for mutagenic activity in the types of compounds studied. There is a good qualitative SAR between mutagenicity and alkylating ability. The quantitative SAR is of more limited applicability. From the practical point of view, a quantitative rating by mutagenic potency is not often necessary, and the qualitative SAR could be used simply to predict whether an allylic compound is mutagenic. D i l l k ~ a m et al. (1973), LD~ to mice and inldbidon
cell growth
of
The Q S A R
The authors' aim was to derive a QSAR between the tissue-culture toxicity or mouse LDs0 of a series of methyl- and halogen-substituted aliphatic alcohols and either three simple physico-chemical properties (Hansch approach), or the presence or absence of methyl or halogen substituents (Free-Wilson approach). The following 15 chemicals were used: ethanol, l-propanol, 2-propanol, 2-methyl-l-propanol, 2,2-dimethyl-l-propanol, 2-butanol, 3-methyl-2butanol, 3,3-dimethyl-2-butanol, 2-chlorocthanol, 2,2-dichloroethanol, 2,2,2-trichlorocthanol, 2-bromoethanol, 2-fluoroethanol, l,l,l-trifluoro-2-propanol and l-bromo-2-propanol. Three biological activities were studied: the IDs0 (moi/kg) for inhibition of the growth of mouse fibroblast cells in tissue culture, the 7-day LDs0
10,000
~tyt me~am~l;~'~te t000
/ / /
x/X 8en~/L cl~o¢ide / 1,3-di¢~ X~otoXlt3-diChtO ~ cflLaro-2-bLmme
lOO
• /
J =E
o ALLyL iodide o alLyL t ~ 0 c ~ t e
(¢ls) (froos)
• 1- cl'~oro-2 - m~'~l-2 -- b u ~ -2-methyl-- 1- prot:~me 2,3-dk:hLoro--1- propene
1o
I
0.1
I
1.0
I
10.0
I
100.0
I
t000.0
ALkytoflng activity (extinction E in NBP test)
Fig. 11. Correlation of mutagenicity and alkylating properties of a series of allylic compounds. [After Eder et aL 1982a].
Structure-activity relationships in toxicology (mol/kg) for male albino mice, and haemolytic activity (not considered further in this report). Standard deviations and 95% confidence limits were given for the IDs0 and LDs0 values. The chemical descriptors used in the Free-Wilson approach were the presence or absence of Me- or halogen-substituents in the molecule, and in the Hansch approach, the experimentally-determined PoO, the charge distribution parameters Qc (charge on hydroxyl carbon) and Qo (charge on hydroxyl oxygen), and the Taft steric parameter E,. In the Free-Wilson approach, the standard type of equation as described on p. 145 of this paper
(Free-Wilson and related group-contribution techniques, para 1) was used. In the Hansch approach, many equations of the general form - l o g BA = k(Iog p)2 + k~ log P
d- k2Qx + k3 (E,)x + k, were explored for various combinations of substituted alcohols and some or all of the parameters, where B A = b i o l o g i c a l activity (IDso or LDso), P = octanol-water partition coefficient of the alcohol, Qx = charge distribution parameter for substituent X and (Es)x = Taft steric parameter for substituent X.
General comments There are no comments on the data used.
Validity, applicability and limitations Free-Wilson approach. This yielded a good correlation of structure with the IDle when the experimental values of 1 ! of the alcohols were plotted against calculated values, but gave a poor correlation with the LDs0. When used to predict the IDso of 2-butanol, the L D ~ correlation gave a value within 18% of the experimentally determined figure. From the data in the paper it seemed that the LOT (see Sources of error p. 147) was approached in the QSAR based on the IDs0. The authors stated that the poor LDs0 correlation "was not surprising and indicated secondary reaction shown by Hansch analysis to be associated with P, charge parameters and steric parameters". Hansch approach. When log P was plotted against log IDs0 the correlation was good for the Mesubstituted, but not for the halogen-substituted alcohols. Several equations were examined in attempts to correlate various combinations of the parameters with IDse for all alcohols but none was found to be valid. Better correlations with the IDs0 were obtained when certain subgroups of alcohols, e.g. all eight non-halogenated alcohols or the four nonhalogenated primary alcohols were taken, but the six subgroups contained only four, four, five, five, six and eight compounds, respectively. In these cases, the number of alcohols was too low to establish the validity of the approach, and the LOT was certainly reached. The LDso correlations were poor, in general. The causes of death are probably so varied and can embrace such a wide variety of toxicological mechanisms that the mammalian L D ~ cannot be correlated in a QSAR of this type.
161
Cr Jdg-F.mlein technique A commercial prediction service (HDI, 1982), based upon a statistical QSAR developed by Enslein and Craig (e.g. Enslein & Craig, 1978) is available. The approach can be regarded as an extension of the Free-Wilson technique of ascribing activity to individual chemical groups (fragments or substructures) in combination with regression analysis. The Craig-Enslein statistical procedure can be applied to a continuous variable (e.g. rat oral LDs0) by regression, or to a discrete property (e.g. the presence or absence or mutagenicity) by discriminant analysis. In the statistical approach there are many similarities between regression and discriminant analysis. The latter is performed in a 'stepwise' manner to ensure that important descriptors for the QSAR are recognized. The current status of the Craig-Enslein technique is apparent from the brochure of Health Designs Inc. (HDI, 1982) offering the prediction of rat oral LDso, mutagenicity, carcinogenicity, teratogenicity and irritancy. QSARs for the prediction of ecotoxic effects are under development. The examples of rat oral LDso and teratogenicity are used here to illustrate the approach.
Predictions of rat oral LDso (Enslein et al. 1983; HDI, 1982) Chemicals for the 'learning set' or 'testing set' of compounds were drawn from a wide variety of sources in the Registry of Toxic Effects of Chemical Substances (RTECS), which quotes the rat oral LDs0 values of 3600 compounds. A random selection of 2000 chemicals was first made and by the elimination of duplicates and 'outliers' the number was reduced to 1851. These were then used to develop the regression equation. The structures used to describe molecular components were developed from the CROSSBOW system (Eakin et al. 1974) and the following regression equation was used: 1
log ~ -- al Pl + a2 P2. -- an log M W + k where C - - r a t oral LDs0 in mol/kg, a~ etc.-coefficients for substructures, p~ etc. = 1 if the substructure is present in the molecule and 0 if it is absent, M W = molecular weight and k = a constant. The R 2 term for the regression equation was 0.449 which shows that less than half of the explained variance (see Explanation of some statistical terms, p. 147) was accounted for by the equation. The predictions are, however, still better than a random allocation of log l/C, although it is a cause for concern that this situation was achieved only after the elimination of outliers. From the 3600 compounds in RTECS, 567 compounds not in the training set were randomly selected to test the predictive ability of the regression equation. Slightly more than 80% of the predicted values were within a factor of 10 greater or lower than the measured values, i.e. the predicted values of almost one in five compounds differed from the actual LDs0 by an order of magnitude. Even allowing this limited
162
L. TURNER et al.
degree of success in predicting the LDs0, the net result is inadequate for regulatory decision making or considerations of safety. With this technique a preliminary selection of chemicals from a candidate list for LDs0 testing could be attempted, but following the selection, the LDs0 should be determined experimentally.
Predictions of teratogenicity (HDI, 1982) A discriminant equation was generated from post1969 teratogenicity data taken from three reviews. Before a chemical was accepted in the training set it had to have been tested in two species of mammals and in two separate laboratories. Initially, 670 compounds were considered. Each was scored for teratogenie potential between 0 and 1 and the scores were reviewed by four teratologists. Chemicals with a score of 0-0.25 were classified as 'non-teratogens', and those scoring between 0.75 and 1.0 were classified as 'teratogens'. Compounds scoring 0.26 to 0.74 were 'indefinite' and were not used to generate the equation. The equation itself was finally developed from 430 chemicals comprising 195 non-teratogens and 235 teratogens. The equation contained 69 terms, i.e. 68 'difference coefficients' (teratogenic character minus non-teratogenic character) of the chemical substructures, and one constant. The teratogenicity of a chemical predicted from the equation was expressed as a probability on a scale of 0 to 1, a value of 0-0.3 leading to classification as a non-teratogen, 0.5-0.2 as indeterminate and 0.7-1 as a teratogen. Unfortunately there were too few compounds of known teratogenicity for an independent set of chemicals to be tested in the discriminant model. When each chemical in the training set was tested individually, 69% were correctly classified, 9% were incorrectly classified (false negatives or false positives) and 22% were indeterminate. The success rate was similar for chemicals known to be teratogenic or nonteratogenic but no statistical indication of confidence for the predictions was given. A further criticism is that there was a bias in the system since some substructural fragments appeared too frequently or infrequently in the training set for their properties to be correctly represented in the equation. The removal of indefinites from the training set may introduce further bias. It is possible, even probable, that predictions for chemicals outside the training set would be less successful. The stepwise discriminant approach, where variables are excluded to improve the data fit, has been criticised by Wold & Dunn (1983) because, in effect, it increases the number of parameters relative to the number of degrees of freedom, and so makes the relationships in the equation insecure. Coral,hen Is
An IUPAC Working Party on QSAR has criticized the techniques used by Craig and Enslein to predict toxicity, in particular the LDso (Rekker, 1980 & 1984). The Working Party view is that a successful QSAR requires that there is a well-defined mode of interaction, a congeneric series of chemical structures and an active site involved in the biological action. The toxicological phenomenon of lethality (LDs0) in
mammalian species does not involve a "well-defined mode of action" and the requirement for a specific active site is obviously not met. A very wide range of chemical classes was used by Craig and Enslein, and because the chemicals were not congeneric a large number of descriptors was necessary. Even then, the usual constraint associated with the Free-Wilson technique applies: predictions are restricted to those chemical substructures contained in the learning set. The IUPAC team warns that the Craig-Enslein prediction service is based on a statistical estimate liable to large errors and that it cannot replace appropriate experiments. Although the IUPAC views were directed towards predictions of LDso values, similar criticisms can be levelled at those of teratogenicity and irritancy, and even of mutagenicity and carcinogenicity where there may be fewer modes of interaction. The assumption underlying the Craig-Enslein approach, that a panicular substructure always makes the same contribution to lethality, teratogenicity and so on regardless of the remainder of the molecule, is unlikely to hold for a diverse series of chemicals. The reason that the approach has some capacity for predicting the LDs0 is that the average range of 'chemistry' of the substructures in the training set is similar to that of those in the prediction set. Some of the concerns about this type of approach have been eloquently expressed by Wold & Dunn (1983). We note that: the development of new chemicals with novel properties, a prime objective of the chemical industry, is likely to yield compounds whose properties fall outside those used in the training set, and the approach would thus give poorer predictions; the CROSSBOW system and similar compilations comprise lists of substructures predefined for a variety of purposes but not necessarily able to describe structures in terms suitable for developing a QSAR; very large data banks such as RTECS contain toxicological data whose validity is not assessed critically before incorporation. In a well-conceived study, Adamson et al. (1984) applied the Free-Wilson approach and a statistical approach similar to that of Craig and Enslein to develop a QSAR for rat oral LDso values of 129 herbicidal trifluoromethylbenzimidazoles. Although the predictive ability of the resulting QSAR was slightly better than that of the above LDs0 QSAR developed by Craig and Enslein, it was less than the authors would have expected for a closely congeneric series of chemicals whose LDs0 values were determined in a single laboratory. They concluded that LDs0 predictions based on a large number of chemicals are unlikely to be more successful than those from their QSAR, and that a greater accuracy of prediction could be expected only for smaller sets of chemicals (presumably with a common mode of action). We believe that Craig-Enslein procedures for predicting toxicity should not be used for regulatory decisions or in considerations of the safety of chemicals for human use.
Structure--activity relationships in toxicology Miiller & G~ff (1984), upper respiratory tract irritancy in the mouse
The QSAR The aim of this paper was to establish QSARs between certain physico-chemical properties of a number of chemicals and their irritant effect on the upper respiratory tract in mice. Four sets of related chemicals were considered, namely 18 ketones, comprising 14 saturated non-cyclic aliphatic ketones, acetophenone, cyclohexanone, methyl vinyl ketone and mesityl oxide, 13 saturated non-cyclic aliphatic primary and secondary alcohols, with two unsaturated (allyl and crotyl) alcohols, 12 acetates, CH3COOR, "where R was primary, secondary or tertiary alkyl in ten cases and -CH2CH2OCH3 and -CH2CH2OC2H5 in the other two cases, and 15 benzene derivatives with various substitutents in the ring. The biological activity was expressed as log I/FRDs0, where FRDso is the concentration in air (mg/m 3) that lowers the respiratory frequency in mice by 50%. The chemical descriptors were as follows. Log Pow and the boiling point at atmospheric pressure (Teo) were taken for all compounds. For the ketones, the vibrational frequency of the CO group (Vco) was also used. The Hammett electronic constant, a, and the Taft steric parameter, E,, were used in the case of the R group in the acetates. The QSAR equations below were established by linear regression. For the saturated non-cyclic aliphatic ketones only: log ~
1
= 0.588 log P - 4.740 (n -- 14, r -- 0.992)
...
(i)
For all ketones: log ~
1
= 0.459 log P - 0.069 Vco + I 15.11 (n -- 18, a = 0.854)
...
(2)
...
(3)
...
(4)
...
(5)
...
(6)
For all except the two unsaturated ketones: l
log F---R-~ = 0.014 T,b -- 5.700 (n = 16, r =0.987) For saturated alcohols: log~
1
= 0.654 log P - 4.192 (n = 13, r =0.996)
Iog~
1
= 0.017 Teb -- 5.652 (n = 1 3 , • = 0 . 9 9 4 )
For acetates: 1
log - = 0.097 log P + 1.832 FRDs0 + 0.653E, - 2.545 (n = 12, R = 0.874)
163
For aromatic compounds: log~
1
= 0.017 T,b -- 6.010 (n = 14, r •= 0.973)
...
(7)
General comments The biological activity was given in mg/m 3. This is a serious mistake because all types of activity that arc related to a number of molecular events should be expressed on a molar basis when used in a QSAR. Validity, applicability and limitations Equations 1, 3, 4, 5 and 7 represented good QSARs with regression coefficients of 0.97 or more. Ketones. Both log P and T~ (which for the noncyclic saturated and unsaturated ketones correlate well with each other) give good QSARs (equations 1 and 3, respectively). The QSAR given by Td, seems to be somewhat superior to that given by log P, since equation 3 includes all of the ketones in equation 1 plus cyclohexanone and acetophenone, which do not fit in the latter equation. If irritation of the upper respiratory tract depends upon partition of a chemical between the vapour phase and a biological membrane, it would not be surprising if Tcb were a somewhat better descriptor than log P. The QSAR represented by equation 2 (Po~ + v~o) was poorer than that of equation 1 (Po~ only). This may have been because v~o is a poor descriptor, or because the two unsaturated ketones were included in equation 2 but not equation 1. The irritancy of these may be due not to reaction of the > CO group but to a Michael-type addition of a biological nucleophile to the C-----C activated by the ~-carbonyl group. Alcohols. For the 13 saturated aliphatic alcohols, where again log P and Tcb correlate wall, both descriptors gave a good QSAR (equations 4 and 5). Acetates. The typical Hansch equation (6) combining log P, a and E, gave a rather poor correlation, probably because there was very little spread between the values of biological activity of the acetates. When the activity is recalculated on a molar basis as log M/FRDs0, the values often acetates lie between - 1.4 and -1.8, the remaining two being - 2 . 2 and - 2 . 8 (see also below). Benzene derivatives. Despite the fact that these contained a variety of substituents (alkyi, halogen, C H 3 C O - and some ~ groups), a good QSAR based on T,b (equation 7) was obtained when vinyltoluene was excluded. The authors stated that only Teo correlated well with log I/FRDs0. It may be significant that the good correlation of log P with irritancy for the ketones and alcohols (equations 1 and 4) parallels findings in aquatic toxicology where a linear relationship between log P and log 1/LCs0 for fish has often been found for chemicals with a wide range of structures---see the discussion on the paper by K6nemann (1981a) on p. 150 of this paper, for example. The toxicity of 'chemically unreactive' compounds has been attributed to reversible non-specific binding to cell membranes (in effect, to dissolution in the lipid bilayer). A similar rationale may explain the results with upper respiratory tract irritation.
164
L. Tug~e.g el al.
Extension o f the QSAR One of the authors of this paper (Roberts, 1986) has found that by expressing the biological activity on a molar rather than weight basis, and modifying the Teb data to take into account differences in the Trouton constant (relating molar heats of vaporization to boiling points), the equations 3, 5 and 7 can be combined into one QSAR based on the modified T~,: log MW/FRDso = 0.0173 T~, - 4.090
(n = 42, r = 0.987)
(MW = molecular weight)
This was valid for the saturated ketones, saturated alcohols and the benzene derivatives. Lien & Tong (1973), skin absorption The QSAR The authors' objective was to correlate certain physico-chemical properties of various classes of chemicals with skin absorption/n vitro and in situ and thus to establish quantitative guidelines for predicting skin absorption. The chemicals considered were of widely-varying structure and fell into six groups, consisting of eight phenylboronic acids (group i), seven miscellaneous 'polar non-electrolytes' (EtI, MeOH, EtOH CO(NH2)2, CS(NH2)2, glycerol and glucose (group ii), eight alkanols and water (group iii), 14 steroids (group iv), nicotinic acid, its hydrochloride and six esters (group v) and 11 corticosteroids (group vi). The absorption of groups i, iii and iv was measured on human skin/n vitro and that of group ii on rabbit skin in vitro. Groups v and vi were applied to human skin in situ. The biological activity was expressed for/n vitro experiments as the molar concentration (C) needed to produce a standard response in a standard time, or as a permeability constant, /~, which is a concentration-independent indicator of absorption rate. F o r / n situ experiments, the biological activity was expressed as the applied molar concentration (C) giving a standard degree of erythema (group v) or blanching (group vi). The following chemical descriptors were used:
for group i: for group ii: for group iii:
for group iv:
for group v: for group vi:
In most of the equations finally developed only one or two of the descriptors were used. General comments
Within each chemical series the biological data were generated by the same technique in a single laboratory and the results from different series were not combined. The data in each series are therefore comparable. The results of the/n situ experiments are less suitable for QSAR development than are those from /n vitro techniques because in the latter the intrinsic potential to produce erythema and blanching, and the distribution within the skin, are also components of the response used to assess skin absorption. Validity, applicability and limitations
For each chemical group, at least one equation giving a reasonable fit to the data was found. The authors concluded that the "lipophilic character of the compound, as measured by partition coefficients, plays the most important role in determining percutaneous absorption". In addition, very low solubility in water may be a limiting factor, whilst electronic and steric terms were of minor significance. The most successful equations for each series of compounds are given below. Group i. the skin absorption o f all phenylboronic acids correlated well with log Po~: log C = 0.573 log P o ~ - 3.749 (r = 0.907, n = 8) or
log C = 0.212 (log Pow)2 + 1.333 log Po~ - 3.999 (R = 0.919, n = 8) The following QSAR based on log Pb was obtained when one acid was omitted (reason not given): log C = 0.417 log P~ - 2.463 (r = 0.954, n = 7)
Group ii. for these polar non-electrolytes, absorption through the intact skin of the rabbit appeared to be influenced by lipophilicity and a steric property of the molecule, the latter being represented by molar refractivity or molecular weight: log Pow, log Pb (partition coefficient benzene-water). log/~, = 0.360 log Po~ - 0.964 log MR - 1.599 logMW (mol wt), l o g M R (R = 0.963, n = 7) (molar refractivity), log Po~. log Ko (partition coefficient olive log Kp = 0.385 log Pow - 0.856 log MW -- 1.15 I oil-water), log Po,, log Km (par(R = 0.975, n = 7) tition coefficient stratum corneum-water). Low MW and high lipophilicity favour absorption. log K ~ (partition coefficient Group iii. any of the partition coefficients of the hexadecane-water), log K~ (par- n-alkanols correlated well with absorption but the tition coefficient amyl caproate- physiologically relevant stratum corneum/water parwater), log P, (partition coefficient tition coefficient (Kin) appeared to be best. Two ether-water). typical equations are: log Pc, log S (solubility in water). log Kp = 0.544 log Po~ - 2.884 (r = 0.979, n = 8) log S, log Pow, o* (Taft polar constant for the 6- position). log /(1,=0.934 log / ~ - 2 . 8 9 1 (r =0.986, n = 8 )
Multiple regression was used to establish QSAR equations based on: log BA = --kl (log p ) 2 + k2 log P + k3 cr* + k4 log MW (or log MR or log S ) + k 5.
Group iv. with all of the steroids, the best correlations were obtained with the amyl caproate partition coefficients:
logKp=l.2621ogK~-5.211
(r=0.933, n = 1 3 )
Structure-activity relationships in toxicology An equally good correlation based on K~ was obtained when one steroid was omitted (no reasons given): log K, = 2.626 log K~ - 7.537
(r = 0.931, n = 14)
With only ten out of the 14 steroids (reason for omissions not given) a correlation with log P, was obtained: log Kp = 0.207 (log Pc)2 + 1.494 log P, - 5.425 (R = 0.978, n = 10) Group v. no adequate correlation could be obtained with a single descriptor for nicotinic acid esters. Only by excluding two compounds and using log P, and the water solubility was a good correlation achieved: log I/C = 1.008 log P, + 1.230 log S + 6.604 (R = 0.967, n = 6) The observed erythema is a function of the amount absorbed and the biological response. The authors linked the log Pe term with absorption and the log S term with the production of erythema, although there is no basis for doing so and, in fact, S and P, are not independent variables. Group vi. A reasonably good correlation was achieved for all corticosteroids by including water solubility and the ether/water partition coefficient in the equation: log I/C = 2.553 log P, + 1.139 log S + 6.101 (R = 0.924, n = 11) The authors again claimed that the log S term is related to the biological response following absorption but not to the absorption itself. By excluding the three acetate derivatives, a relationship with log Pe alone was found: log I/C =2.1661og P , - 2 . 6 1 1 (r = 0.943, n = 8) Correlation was improved by including ~*, which (the authors believed) may relate to the drug receptor component of the vasoconstriction response: log 1/C = 2.007 log Pe + 1.831 or*+ 1.914 (R = 0.985, n = 8) Comment. Each correlation was reasonably successful for a specific series but even then only for the special circumstances of absorption from water. The correlations did not relate to the intrinsic skin permeability of the chemicals, and the predictions were of limited use for the following reasons. The permeability constant, as a concentration-independent index of absorption rate, is proportional both to the diffusion coefficient of the molecule within the stratum corneum and to the partition coefficient between the stratum comeum and the external vehicle, i.e. the solvent system (Dugard, 1983). The stratum corneum is slightly more lipophilic than is water. Thus for an aqueous vehicle and a series of chemicals possessing similar diffusion coefficients, the stratum corneum/ water partition coefficient and hence permeability constant will rise with increasing lipophilicity. This is
165
exactly what happens with the n-alkanols (Scheuplein & Blank, 1973). However, the impression is left that n-octanol is, in general, more readily absorbed through skin than is ethanol, but this is not the case: ethanol is absorbed almost 200 times more rapidly from undiluted liquids than is n-octanol (Scheuplein & Blank, 1973). Octanol absorption is limited by its low solubility in the stratum corneum. It is interesting to speculate that for a vehicle more lipophilic than the stratum corneum, permeability constants for the chemical series considered by Lien & Tong would have fallen with rising log Pow and other partition coefficients (but not log K~). Absorption through skin is a passive process which obeys simple physico-chemical rules and should therefore be amenable to QSAR treatment. For success, the diffusion coefficient (probably dependent upon steric, electronic and hydrogen-bonding factors) and the solubility of molecules in the stratum corneum have to be taken into account. Roberts & Williams (1982) have described quantitative correlations between the skin-sensitizing action, the dose and certain physico-chemical parameters of alkyl-substituted sultones. The sensitization responses to various challenge doses of a number of saturated and unsaturated sultones were plotted against their 'relative alkylation indices' (RAI, a composite term comprising the dose, the partition coefficient and the second-order rate constant for alkylation of n-butylamine). The resulting curves revealed the existence of a sensitization region, a plateau and a region of tolerance, according to the magnitude of the RAI. This interesting work in a little studied area is at an early stage of development, but has been extended by Roberts et al. (1983), who successfully correlated RAI with the sensitization scores of substituted p-nitrobenzyl compounds tested at identical induction doses. Bandiera et ~d. (1983), binding to cytosolic receptor protein The QSAR The aim in this paper was to correlate the hydrophobic, electronic and hydrogen-bonding constants of 4'-substituted 2,3,4,5-tetrachlorobiphenyls with their ability to bind reversibly with a cytosolic receptor protein. The structures of the 18 compounds studied are given in Fig. 12. Biological activity: the polychlorinated biphenyls are believed to exert their various toxic actions by an initial, obligatory first step comprising reversible binding to a cytoplasmic receptor protein. Thus, relative binding affinity should correlate closely with toxic potency. The relative binding affinity was expressed as an ECs0--i.e. the effective molar concentration of a non-labelled competing chemical that Ct Ct R
~
-
~
Ct
Ct Fig. 12. The 4'-substituted 2,3,4,5-tetrachlorobiphenyls studied by Bandiera et al. (1983). R = F, el, Br, I, H, Me, Et, /soPr, nBu, Ph, CF 3, CN, OH, OMe, COMe, NO 2 or NHCOM©.
166
L. "l'lml~rt et al.
reduced the binding of 3H-labelled 2,3,7,8-tetrachlorodibenzo-p-dioxin to rat hepatic cytosol to 50°/, of the maximum value obtained in the absence of the competitor. The following chemical descriptors were used for each substituent, R: n, the hydrophobic parameter based on log Po,; ~, the Hammett electronic constant; HB, which is 1 or 0 for hydrogen-bonding and non-hydrogen-bonding substituents, respectively. QSAR equations were established by multiple regression analysis: log I/EC~ = 1.39 cr + 1.31 n + 1.2 HB +4.20 (n = 15, R =0.916 and s = 0 . 3 1 ) . . .
(1)
log 1/ECs0 = 1.53 ~r + 1.47 n + 1.09 HB + 4.68 (n = 13, R =0.978 and s = 0 . 1 8 ) . . .
(2)
General comments There are no comments on the data used.
Validity, applicability and limitations Equation I was reasonably valid (R = 0.916) for 15 of the compounds studied. The nBu, tertBu and Ph compounds did not fit the equation, their log I/LCso values being much higher than those predicted. The authors noted that the van der Waal's volumes of these three substituents were among the highest of all 18, but that, nevertheless, the use of various types of steric parameters (not specified) failed to produce good correlations for the series of compounds. The NHCOMe compound gave the poorest fit in equation 1 but no explanation for this was given. The NO 2 derivative was also an outfier when HB was given the value of 0, and the situation was worsened by putting HB equal to 1. The authors suggested that the receptor binding of the NO2 derivative is related to the planarity of the NO 2 group with respect to the benzene ring (it can lie in or out of plane). This explanation is not convincing, since other groups can also be planar or non-planar (e.g. gsoPr, nBu, COMe and NHCOMe). Equation 2 was derived by eliminating the NI-ICOMe and NO2 compounds, and gave a better correlation (R = 0.978) than did equation !. Both equanons 1 and 2 indicate that increasing electron-withdrawing capacity, hydrophobicity (lipophilicity) and H-bonding ability favour receptor binding. The use of only 2 discrete values (0 or 1) for HB seems difficult to justify theoretically. If hydrogen bonding is significant, the strength of bonding should play a role (i.e. HB should be varied between 0 and 1). For most substituents a good QSAR (equation 2) was developed, but the existence of a number of outliers, with no clear understanding of why they did not fit, limits the applicability of the QSAR. DISCUSSION
The status and usefulness of QSARs can be assessed from the purely academic point of view or
from the standpoint of their application to resolving practical problems in toxicology. In this chapter emphasis is on the latter. The role of QSARs in toxicology QSARs may be envisaged as having actual or potential use in improving our understanding of the mode of biological action of chemicals and in predicting the toxic action of a chemical or series of chemicals. The latter use has four practical applications: in product development, to aid the choice of which of a series of chemicals should be developed, when it is important to know possible toxic effects; in the choice of which chemicals in a series have the highest priority for toxicity studies; in the choice of which toxicity studies are most likely to be necessary on a given chemical; in the safety clearance of chemicals in a legislatory context. We suggest that, in addition, well-founded QSARs could be used instead of range-finding studies to set concentration ranges in certain acute toxicity tests on aquatic species, although no example of such a use was found in the literature. A QSAR is valuable when it provides indications about the mode of action of a chemical, but it might still be of interest even when it does not. The use of QSARs for predicting toxicity as in the first three applications listed above may, at worst, lead to a waste of scarce resources should the predictions be wrong. Their use for safety clearance, on the other hand, could have dangerous consequences for the health of humans or environmental species if the predictions were in error.
How well have QSARs fulfilled their role? Understanding the mode of biological action of chemicals Certain QSARs have, indeed, proved useful in improving our understanding of the mechanisms by which chemicals interact with biological systems. This is especially the case when chemical descriptors relevant to one or more of the key steps in the biological process are chosen. In general, techniques that are based on physico-chemical properties as descriptors yield more information on mechanism than do those based on substructure descriptors, although if the latter are geometrical descriptors or can be associated with a specific chemical/biological action they may also be useful in this respect. The consideration of why some chemicals fit a QSAR while others, sometimes structurally related, do not, can confirm a theory about the biological mode of action or suggest a need to modify it. Although it is often stated that pattern-recognition techniques can throw no light on mechanisms, there seems to be no inherent reason why this should be so. The finding that the use of certain substructures leads to a valid QSAR, while use of others does not, seems to have implications regarding mechanism.
The prediction of toxicity There is no doubt that some QSARs valid for predicting toxicity have been developed, although in
Structure-activity relationships in toxicology most cases their applicability is severely limited. Among the most successful are those of the K6nemann (1981a, b) type (see p. 150) relating the LCs0 of certain classes of 'non-reactive' chemicals in a number of aquatic species to their hydrophobicity as expressed by partition coefficients. Such chemicals are believed to exert their effect by a common biological mechanism (narcosis) and the descriptor is relevant to it. The use of Po~, for which data on many chemicals are available, enables a large number of chemicals to be incorporated in the training set, and the assumed mechanism gives some basis for deciding to which chemicals (non-reactive) the QSAR should be applicable. QSARs for chemicals which cannot be classed as 'non-reactive' are under study, and the paper of Hermens et al. (1985a; see p. 150) tentatively indicates that alkylating ability may correlate with LCso to fish. In other areas of toxicology there is no example of a QSAR that has been as extensively validated and used as have those of the K6nemann type. At this early stage in the development of QSARs, it would be unrealistic to expect that they have been widely applied to resolving practical problems. Albert (1983) has reported that the EPA use the K6nemann QSAR for predicting LCs0 values to fish. To the best of our knowledge there is no published information on the practical outcome of these or any other predictions, and thus no possibility of assessing their correctness. Strengths and weaknesses of QSARs A QSAR should be used for a practical purpose only if it is credible and has been adequately validated, so that it is likely to yield correct predictions most of the time. One of the main purposes of the critical analyses of QSAR publications in the previous section was to identify factors leading to the development of a successful QSAR and weaknesses that led to the conclusion that a QSAR was of low validity or applicability. Factors leading to a successful QSAR It would be expected that there is more chance of finding a valid QSAR when the biological activity under consideration involves a single simple mechanism and a well-defined endpoint and when the compounds studied are closely related structurally. The papers reviewed seem to provide little support for these expectatious--see, for example, the papers by K6nemann (1981a) and by Mfiller & Greff (1984) discussed on pp. 150 and 163 of this paper. At most, valid QSARs are probably more difficult to establish for the results of in vivo or whole-animal experiments because of the many and varied steps involved in the biological action (e.g. distribution, metabolism and protein binding of the chemicals under study). In particular, in vivo studies on mammals using single or repeated doses contrast with toxicity studies on aquatic species where exposure is continuous and the metabolism is, in general, slower (Franklin et al. 1980). Obviously, another major factor necessary for a valid QSAR is the correct choice of chemical descriptor(s).
167
Common weaknesses that lower the validity o f a Q S A R In most QSAR publications the authors claim a marked or modest success in developing a valid relationship. There are fewer papers in which the failure to find a QSAR is reported. Therefore, in this part o f the report are described some common weaknesses that cast doubts on the validity of some of the "successful" QSARs. Weaknesses related to descriptors used. These are of four types. Several cases were noted in which the biological data were taken from various papers and were probably not comparable because they had been determined by experimental methods in which important parameters (e.g. species, route of exposure) were not constant. This is especially significant for certain biological/toxicological measurements, such as those of LCs0 and LD~ values, which are notoriously variable. A related source of error is the use of data banks in which toxicological data from the literature have been uncritically incorporated. The compilation will contain both accurate and inaccurate information, the latter being derived, for example, from outmoded experimental techniques now known to be at fault. It is always possible, of course, that by taking data on a large enough number of chemicals, the error is averaged out in the QSAR as a whole, but it is also possible that the error is not evenly distributed and has introduced bias into the QSAR. Structural descriptors for use in QSARs have sometimes been taken from compilations derived for other purposes, such as the CROSSBOW system developed for chemical documentation. Such compilations may lack some or all of the substructural groups that relate to the biological activity under consideration and may therefore be inappropriate for a QSAR. Some QSARs are based on the use of substructural elements and rely on the assumption that a particular structural fragment always makes the same contribution to a biological action irrespective of the remainder of the molecule. As a proposition in toxicology this cannot be sustained for chemicals whose structures are not closely related, and it is by no means always true of chemicals whose structure are. Weaknesses related to the improper use o f techniques in deriving a QSAR. Most authors using regression techniques quote values of R or r and s for the derived equations but do not always use them correctly in assessing the validity of the QSAR. The F ratio, often essential for this, and especially for comparing the validity of QSARs based on different chemical descriptors, is rarely given. In at least one paper (not analysed in this report) one descriptor was said to yield a better QSAR than another on the basis of the differences between the following R values of four regression equations: 0.978, 0.984, 0.986 and 0.987. It is highly doubtful whether these differences are significant, and indeed whether R values should be quoted to the third decima! figure. There are some examples of "successful" QSARs in which the ratio of the number of chemical descriptors to the number of chemicals in the training set seems to indicate that the Wold & Dunn (1983) LOT
168
L. TURNERet aL
(see Sources o f error, p. 147) has been reached or surpassed. When, as is not uncommon, a QSAR is derived by omitting outliers from the training set by trial and error, the credibility of the QSAR is weakened unless good reasons can be given as to why the outliers do not fit. Limitations on the applicability o f the QSAR. A QSAR can be of academic interest or it may be considered for use in one of the practical applications noted above. There are, however, numerous features in the QSARs analysed by the Task Force that limit or nullify their usefulness in practice. Some QSARs have been based on as few as four to six chemicals, and their wider applicability must therefore be taken as unproven. Some biological descriptors are of dubious practical value or may be misleading. This is especially true in several carcinogenicity QSARs. In one paper (Politzer & Laurenoe, 1984, this issue, p. 157) claims are made that chemicals can be classified according to quite subtle differences in carcinogenic potency, a concept that most toxicologists would not accept in the present state of knowledge (ECETOC, 1982). In another publication (Jones & Mackrodt, 1982 & 1983; this issue p. 156) the occurrence of preneoplastic foci deficient in nucleoside-5'triphosphatase was taken as equivalent to oncogenicity, whereas such foci by no means always progress to neoplasms. This endpoint is thus unsuitable for a QSAR meant to predict oncogenicity. In general, the mechanism of carcinogenicity is so complex that the choice, as biological descriptor, of an endpoint that represents only one stage in the whole process of the development of the final neoplasm should be very strictly justified on the basis of a good knowledge of the mechanism. When it is claimed that a QSAR can predict whether a chemical is a carcinogen or not, the significance of the precise biological endpoint used as the descriptor must be carefully evaluated. The QSAR described by Klopman (1984; this issue p. 155) rates benzene as a non-carcinogen--correctly in the light of the endpoints used (the development of tumours after subcutaneous injection or topical application in mice) but incorrectly if the QSAR were used for a more generic purpose. The same comment applies to the nitrosamines QSAR in the same study, in which nitrosamines causing up to 49% of experimental animals to die after up to 100 weeks of exposure are classed as "inactive". Perhaps the most common weakness in applying QSARs between chemicals of closely-related structure is in defining the term 'closely-related'. Many such QSARs have been derived only after eliminating from the training set some chemicals of 'closelyrelated structure' that did not fit. Thus when a QSAR is used to predict the toxicity of a 'closely-related' chemical not in the training set, a number of questions immediately arise. For example, does the chemical belong to the class of those that fit or to the class of outliers, and on what basis can this be decided? Moreover, although reasons (often speculative) may have been given to explain why the known outliers do not fit, there may well be other, unrecognized, reasons why the QSAR is not applicable.
These questions can be adequately answered only when there is a well-established understanding of the mode of action of the biological effect under consideration. In the absence of this, the QSAR-based prediction is, at best, the expression of a probability. There seems little value in using descriptors that have to be calculated, at great expense in computer time, by very sophisticated molecular orbital techniques, particularly when the biological endpoint cannot be quantified with great precision. The papers by Jones & Mackrodt (1982 & 1983) and by Politzer & Laurence (1984) provide good examples of this type of over-elaborate and costly calculation. They s~ould be contrasted with the approach used by yon Szentpaly (1984; this issue p. 153) in which the molecular orbital descriptors, although somewhat complex, can be calculated by hand. In general, there is nothing to be gained by calculating descriptors to a precision much greater than that with which the biological endpoint can be determined. It is often suggested that a major use for QSARs is to predict the toxicity of chemicals for which little or no toxicity information is available. In practice, this information is often wanted so that adequate control can be established, or regulatory actions can be taken, to ensure safety in production and use. For this purpose a toxicity profile of the chemical is needed so that its key toxic effects can be identified. Each QSAR, of course, yields a prediction of only one specific biological endpoint and it is highly unlikely that, for a given chemical, enough applicable QSARs will be found to cover the desired profile. It is unrealistic to expect that QSARs can reach this goal. While it may be that QSARs that predict one or two biological endpoints are found, it cannot be known whether these are the important ones. This is a severe limitation on the applicability of QSARs for predicting toxicity profiles, and is likely to remain so in the foreseeable future. In our opinion there is as yet no QSAR of sufficient applicability to replace experimental testing for control or regulatory purposes. General comment. The development of QSARs calls for an exceptionally wide range of expertise (in toxicology, biological and chemical reaction mechanisms and statistics) in most cases, and for quite sophisticated techniques in pattern recognition and similar methods in others. Some of the papers published on QSARs contain fundamental errors or deficiencies, a situation that could be avoided in future by ensuring that manuscripts are refereed by experts who, between them, cover all the necessary areas of expertise. CONCLUSIONSAND RECOMMENDATIONS (1) The development of QSARs in toxicology and ecotoxicology is still the subject of considerable re= search and, at present, the possibifities of applying QSAR techniques to practical problems are limited. It is recommended that research into the development of QSARs is continued and extended. (2) QSARs have given useful indications about the mode of biological action of chemicals. It is recommended that, in future work, chemical descriptors that are as relevant as possible to the mechanism of
Structure-activity relationships in toxicology biological action should be chosen. In the absence of such descriptors, new ones should be developed. (3) It is not uncommon to find QSARs in the literature which are of doubtful validity because, although of the type which could have been assessed statistically this was not adequately done. In future, such an assessment should always be made. (4) A QSAR should be considered to be of low validity if during its development chemicals that do not fit are eliminated without an understanding of the reason(s) why they fail to fit. In this situation, the applicability of the QSAR to further chemicals is doubtful. (5) Even with a QSAR of undoubted validity for the chemicals used in developing it, and for those that may subsequently have been used to extend the validation, its applicability to further chemicals is difficult to define. The main problem is to decide which further chemicals are 'closely-related' to those used originally, although a knowledge of the mechanism of biological action may make this definition easier. (6) QSARs between the hydrophobic parameters of 'non-reactive' chemicals and their LC~ to certain aquatic species have been particularly successful and are being used in practice. Well-founded QSARs of this type could be used instead of range-finding tests for setting concentration ranges in certain acute toxicity tests on aquatic species. They could also be used for grouping chemicals with a common mode of action in aquatic species to assist in calculating the (additive) toxicity of mixtures of chemicals that are in the same group. (7) QSARs between the partition coefficients of chemicals and their bioconcentration factors in aquatic species have been widely used to predict these factors for practical purposes. (8) Where a specific toxic effect is of interest or concern in choosing chemicals for product development, certain QSARs may be useful for ranking the candidate chemicals. QSARs may also be useful in choosing from a series of chemicals those that should have priority for toxicity studies, and/or for deciding which toxic endpoint should be studied. (9) The use of a QSAR for any of the abovementioned practical purposes should be preceded by a careful and critical assessment of its validity and applicability. (10) It is recommended that QSARs are not used in isolation for making decisions that affect human health or the health of other species. (l l) QSAR papers submitted for publication in a scientific journal should be reveiewed by referees whose joint expertise covers the necessary fields. Appendix I. Papers examined but not analysed
Bandiera S., Safe S. & Okey A. B. (1982). Binding of polychlorinated biphenyls classified as either pbenobarbitone-, 3-methylcholanthrene- or mixed-type inducers to cytosolic Ah receptor. Chemico-Biol. Interactions 39, 259-277. Bobra A. M., Shiu W. Y. & Mackay D. (1983). A predictive correlation for the acute toxicity of hydrocarbons and chlorinated hydrocarbons to the water flea (Daphnia magna). Chemosphere 12, 1121-1129.
169
Calamari D., Gallassi S., Setti F. & Vighi M. (1983). Toxicity of selected chlorobenzenes to aquatic organisms. Chemosphere 12, 253-262. Cullen J. M. & Kaiser K. L. E. 0984). An examination of the role of rotational barriers in the toxicology of PCB's. In QSAR in Environmental Toxicology. Edited by K. L. E. Kaiser, pp. 39-66. D. Reidel Pubfishing Co. Durkin P. R. (1978). Biological impact of various chlorinated phenolics and related compounds on Daphn/a magna. TAPPI Envir. Conf. Proc. 165-169. Enslein K. & Craig P. N. (1978). A toxicity estimation model. J. envir. Path. Toxic. 2, 115-121. Enslein K., Lander T. R., Tomb M. E. & Landis W. G. 0983). Mutagenicity (Ames): a structure-activity model. Teratogen. Carcinogen. Mutogen. 3, 503-513. Filov, V. A. & Ivin B. A. (1985). QSAR: carcinogenic effect of xenobiotics. Proc. Syrup. "QSAR in Toxicology and Xenobiochemistry". Prague, 12-14 September, 1984. Henry D. R., Jurs P. C. & Denny W. A. (1982). Structure-antitumour activity relationships of 9-anilinoacridines using pattern recognition. J. mednl Chem. 25, 899-908. Hermens J., Leeuwangh P. & Musch A. (1984). Quantitative structure-activity relationships and mixture toxicity studies of chloro- and alkyl-anilines at an acute lethal toxicity level to the guppy (Poecilia ret~ldata ). Ecotox. envir. Saf. 8, 388--394. H6fer-Bosse T. & Kroker R. (1985). In QSAR in Toxicology and Xenobiochemistry. Edited by M. Tichy. Elsevier Pharmacology Library. Vol. 8, pp. 65-72. Jurs P. C., Chon J. T. & Yuan M. (1979). Computer-assisted structure-activity studies of chemical carcinogens. A heterogeneous data set. J. medl Chem. 22, 476-483. Lipnick R. L., Pritzker C. S. & Bentley D. L. (1984). A QSAR study of the LD~ for alcohols. Paper presented at 5th European Symposium on Chemical StructureBiological Activity, Bad Segeberg, FRG, 17-21 September. Liu D. & Thomson K. (1983). Toxicity assessment of chlorobenzenes using bacteria. Bull. envir. Contam. Toxicol. 31, 105-I 1I. Liu D., Thomson K. & Kaiser K. L. E. (1982). Quantitative structure-toxicity relationship of halogenated phenols on bacteria. Bull. envir. Contam. Toxicol. 29, 130-136. McKinney J. D. (1984). PCB and dioxin binding to cytosol receptors: a theoretical model based on molecular parameters. Quant Struct.-Act. Relat. 3, 99-105. McKinney J. D., Gottschalk K. E. & Pedersen L. (1983). The polarizability of planar aromatic systems. An application to polychlorinated biphenyls (PCB's), dioxins and polyaromatic hydrocarbons. J. Molec. Struct. 105, 427-438. McLeese D. W., Zitko V. & Peterson M. R. (1979). Structure-lethality relationships for phenols, anilines and other aromatic compounds in shrimp and clams. Chemosphere g, 53-57. Mager P. P. (1982). Structure-nenrotoxicity relationships applied to organophosphorus pesticides. Toxicology Left. II, 67-71. Niculescu-Duvaz I., Craescu T., Tugulea M., Croisy A. & Jacquignon P. C. (1981). A quantitative structure-activity analysis of the mutagenic and carcinogenic action of 43 structurally related heterocyclic compounds. Carcinogenesis 4, 269-275. Nord~n B., Edlund U. & Wold S. (1978). Carcinogenicity of polycyclic aromatic hydrocarbons studied by SIMCA pattern recognition. Acta chem. scand. 32, 602-608. Schultz T. W. & Cajina-Quezada M. (1982). Structure-toxicity relationships of selected nitrogenous heterocyclic compounds. II. Dinitrogen molecules. Archs envir. Contain. Toxicol. II, 353-361. Schultz T. W., Kier L. B. & Hall L. H. 0982). Structure-toxicity relationships of selected nitrogenous
170
L. TURNERet al.
heterocyclic compounds. IlL Relations using molecular connectivity. Bull. envir. Contain. Toxicol. 28, 373-378. Veith G. D., Call D. J. & Brooke L. T. (1983). Structure-toxicity relationships for the fathead minnow. Pimephales promelas: narcotic industrial chemicals. Can. J. Fish Aquat. Sci. 40, 743-748. Weinstein H., Namboodiri K., Osman R., Liebman M. N. & Rabinowitz J. (1985). Use of molecular reactivity criteria to predict toxicity of xenobiotics. In QSAR in Toxicology and Xenobiochemi.~try. Proceedings of a Symposium in Prague, 12-14 September 1984. Edited by M. Tichy. Elsevier, Amsterdam. Zitko V., McLeese D. W., Carson W. G. & Welch H. E. (1976). Toxicity of alkyldinitrophenols to some aquatic organisms. Bull. envir. Contam. Toxicol. 16, 508-515. REFERENCES
Adamson G. W., Bawden D. & Saggers D. T. (1984). QSAR studies of acute toxicity (LDs0) in a large series of herbicidal benzimidazoles. Pestic. Sci. 15, 31-39. Albert A. (1983). SAR Program Evaluation Report, D-50717-1A, for ICAIR Life Systems Inc., Cleveland OH, 10 January. Bandiera S., Sawyer T. W., Campbell M. A., Fujita T. & Safe S. (1983). Competitive binding to the cytosolic 2,3,7,8-tetrachlorodibenzo-p-dioxin receptor. Biochem. Pharmac. 3~ 3803-3813. Biagi G. L., Barbaro A. M., Guerra M. C., Cantelli-Forti G., Aicardi G. & Borea P. A. (1983). Quantitative relationship between structure and mutagenic activity in a series of 5-nitro-imidazoles. Teratogen, Carcinogen. Mutagen. 3, 429-438. Bobra A., Shiu W. Y. & Mackay D. (1985). Quantitative structure-activity relationships for the acute toxicity of chlorobenzenes to Daphnia magna. Envir. Toxicol. Chem. 4, 297-305. Broderius S. & Kahl M. (1985). Acute toxicity of organic chemical mixtures to the fathead minnow. Aquat. Toxicol. 6, 307-322. Calamari D., Galassi S., Setti F. & Vighi M. (1983). Toxicity of selected chlorobenzenes to aquatic organisms. Chemosphere 12, 253-262. Call D. J., Brooke L. T., Knuth M. L., Poirier S. H. & Hoghind M. D. (1985). Fish subchronic toxicity prediction model for industrial chemicals that produce narcosis. Envir. Toxicol. Chem. 4, 335-341. Dillingham E. O., Mast R. W., Bass (3. E. & Autian J. (1973). Toxicity of methyl- and halogenated-substituted alcohols in tissue culture relative to structure-activity models and acute toxicity in mice. J. Pharm. Sci. 62, 22-30. Dipple A. (1976). ACS Monograph, 173, 245. Dugard P. H. (1983). Skin permeability theory in relation to measurements of percutaneous absorption in toxicology. In Dermatotoxicology. Edited by F. N. Marzulli & H. I. Maibach. 2nd Ed, p. 91-116. Hemisphere Press, Washington, DC. Dunkelberg H. (1979). On the oncogenic activity of ethylene oxide and propylene oxide in mice. Br. J. Cancer 39, 588. Dunn W. J. & Wold S. (1978). A structuro--carcinogenicity study of 4-nitroqulnoline-l-oxides using SIMCA method of pattern recognition. J. medl Chem. 21, 1001-1007. Eakin D. L., Hyde E. & Parker G. (1974). The CROSSBOW system. Pestic. Sci. 5, 319-326. ECETOC (1982). Risk Assessment of Occupational Chemical Carcinogens, Monograph No. 3, p. 20. Edelmann A. S., Kraft P. L., Rand W. M. & Wishnok J. S. (1980). Nitrosamine carcinogenicity: quantitative relationship between molecular structure and organ selectivity for a series of acyclic N-nitroso compounds. Chemico Biol. Interactions 31, 81-92. Eder E., Henschier D. & Neudecker T. (1982a). Mutagenic
properties of allylic and %~-unsaturated compounds. Xenobiotica 12, 831-848. Eder E., Neudecker T., Lutz D. & Henschler D. (1982b) Correlation of alkylating and mutagenic activities of allyl and allyfic compounds: standard alkylation test vs. kinetic investigation. Chemico-Biol. Interactions 38, 303-315. Enslein K. & Craig P. N. (1978). A toxicity estimation model. J. envir. Path. Toxicol. 2, 115-132 (Suppl.). Enslein K., Lander T. R., Tomb M. E. & Craig P. N. (1983). A predictive model for estimating rat oral LD~o values. In Benchmark Papers in Toxicology. Princeton Scientific Publishers Inc., Princeton, NJ. Franklin R. B. et al. (1980). Comparative aspects of disposition and metabolism of xenobiotics in fish and mammals. Fedn Proc. Fedn Am. Socs exp. Biol. 39, 3144-3149. Free S. M. & Wilson J. W. (1964). A mathematical contribution to structure-activity studies. J. medl Chem. 7, 395-399. Fujita T., lwasa J. & Hansch C. (1964). A new substituent constant, pi, derived from partition coefficients. J. Am. Chem. Soc. 86, 5175-5180. Golberg L. (Editor) (1983). Structure-Activity Correlation as a Predictive Tool in Toxicology. Hemisphere, Washington, DC. Greim H., Bonse G., Radwan Z., Reichert D. & Henschler D. (1975). Mutagenicity in vitro and potential carcinogenicity of chlorinated ethylenes as a function of metabolic oxirane formation. Biochem. Pharmac. 24, 2013-2017. Hammett L. P. (1970). Physical Organic Chemistry. McGraw-Hill, New York. Hansch C. & Fujita T. (1964). p, o, n analysis. A method for the correlation of biological activity and chemical structure. J. Am. Chem. Soc. 86, 1616--1626. Hansch C. & Leo A. (1979). Substituent Constants for Correlation Analysis in Chemistry and Biology. John Wiley & Sons, New York. HDI (1982). The HDI Toxicity Predictive Service. Health Designs Inc., Rochester, New York. Hermens J., Busser F., Leeuwangh P. & Musch A. (1985a). Quantitative correlation studies between acute lethal toxicity of 15 organic halides to the guppy and chemical reactivity towards 4-nitrobenzylpyridine. Toxic. envir. Chem. 9, 219-236. Hermens J., Canton J. H., Janssen P. & de Jong R. (1984). QSARs and mixture toxicity studies of chemicals with anaesthetic potency: acute lethal and sub-lethal toxicity to Daphnia magna. Aquat. Toxicol. 5, 143-154. Hermens J., K6nemaun H., Lecuwangh P. & Musch A. (1985b). QSARs in aquatic toxicity studies of chemicals and complex mixtures of chemicals. Envir. Toxicol. Chem. 4, 273-279. Jones R. B. & Mackrodt W. C. (1982). Structure-mutagenicity relationships for chlorinated ethylenes: a model based on stability of metabolicallyderived epoxides. Biochem. Pharmac. 31, 3710-3713. Jones R. B. & Mackrodt W. C. (1983). Structure-genotoxicity relationship for aliphatic oxides. Biochem. Pharmac. 32, 2359-2362. Juts P. C., Ham C. L. & Breugger W. E. (1981). Computerassisted studies of chemical structures and olfactory quality using pattern recognition techniques. ACS Symp. Ser. 1,18, 143. Klopman G. (1984). Artificial intelligence approach to structure-activity studies. J. Am. Chem. Soc. 106, 7315-7321. K6nemann H. (1981a). QSARs in fish toxicity studies. Part 1: Relationship for 50 industrial pollutants. Toxicology 19, 209-221. K6nemann H. (1981b). Fish toxicity tests with mixtures of more than two chemicals: a proposal for a quantitative approach and experimental results. Toxicology 19, 229-238.
Structure-activity relationships in toxicology K6nemann H. & Musch A. (1981). QSAR in fish toxicity studies. Part 2: The influence of pH on the QSAR of chlorophenols. Toxicology 19, 223-228. Kowalski B. R. & Bender C. F. (1973). Pattern recognition II: linear and non-linear methods for displaying chemical data. J. Am. Chem. Sac. 95, 686. Lien E. J. & Tong G. L. (1973). Physicochemical properties and percutaneous absorption of drugs. J. Sac. Cosmet. Chem. 24, 371-384. Lipnick R. L. & Dunn W. J. (1983). A MLAB study of aquatic structure-toxicity relationships. In Quantitative Approaches to Drug Design, Edited by J. C. Dearden. pp. 265-266. Elsevier, Amsterdam. Mackay D., Bobra A., Shin W. Y. & Yalkowsky S. H. (1980). Relationships between aqueous solubility and octanoi-water partition coefficient. Chemosphere 9, 701-711. Mfdler J. & (3reff (3. (1984). Recherche de relations entre toxicit~ de moli~-ules d'int~r~t industriel et propri~t~ physico-chimiques: test d'irritation des voles a~riennes sup~rieures appliqu~ fi quatre families chimiques. Fd Chem. Toxic. 22, 661--664. NIOSH (1981). Current Intelligence Bull. no. 35: Ethylene oxide, evidence of carcinogenicity. Norden B., Edhind U. & Wold S. (1978). Carcinogenicity of PAHs studied by SIMCA pattern recognition. Acta chem. scand. B 32, 602. Politzer P. & Laurence P. R. (1984). Relationships between electrostatic potential, epoxide hydratase inhibition and carcinogenicity for some hydrocarbon and halogenated hydrocarbon epoxides. Carcinogenesis 5, 845-848. Purcell W. P., Bass (3. E. & Clayton J. M. (1973). Strategy of Drug Design: a Guide to Biological Activity. Wiley Interscience, New York. Rekker R. F. (1977). The Hydrophobic Fragmental Constant. Elsevier, Amsterdam. Rekker R. F. (1980). LD~ values: are they about to become predictable? TIPS October, 383-384. Rekker R. F. (1984). 5th European QSAR Conf., Bad Segeberg, FRG, September. Roberts D. W. (1986). QSAR for upper respiratory tract irritation. Chemico-Biol. Interactions 57, 325-345. Roberts D. W., Goodwin B. F. J., Williams D. L., Jones K., Johnson A. W. & Alderson J. C. E. (1983). Correlations between skin sensitization potential and chemical reactivity for p-nitrobenzyl compounds. Fd Chem. Toxic. 21, 811-813.
171
Roberts D. W. & Williams D. L. (1982). Derivation of quantitative correlations between skin sensitization and physico-chemical parameters for alkylating agents, and their application to experimental data for sultones. J. theor. Biol. 99, 807-825. Saarikoski J. & Viluksela M. 0982). Relation between physico-chemical properties of phenols and their toxicity and accumulation in fish. Ecotoxic. envir. Saf 6, 501-512. Scheuplein R. J. & Blank I. H. (1973). Mechanism of percutaneons absorption. IV: penetration of nonelectrolytes (alcohols) from aqueous solutions and from pure liquids. J. invest. Derm. 60, 286-296. Slooff W., Canton J. H. & Hermens J. (1983). Comparison of the susceptibility of 22 fresh water-species to 15 chemical compounds I: (sub)-acute toxicity tests. Aquat. ToxicoL 4, 113-128. Stolzberg L. J. & Wilkins C. L. (1977). Molecular transforms: a potential tool for structure-activity studies. J. Am. Chem. Sac. 99, 439. Stuper A. J. & Jurs P. C. (1976). ADAPT: a computer system for Automated Data Analysis using Pattern Recognition Techniques. J. Chem. Inf. Comput. Sci. 16, 99. Taft R. W. (1956). The separation of polar, steric and resonance effects in rates of normal ester hydrolysis. In Steric Effects in Organic Chemistry. Edited by M. S. Newman. John Wiley & Sons, New York. Toman M. & Stota Z. (1959). Uber die Toxizi~t des Benzols und seiner chlorsubstitutions-derivate gegenfiber Fischen. Biolbgia, Bratisl. 14, 674-679. Veith (3. D., Call D. J. & Brooke L. T. (1983). Structure-toxicity relationships for the fathcad minnow: narcotic industrial chemicals. Can. J. Fish Aquat. Sci. 40, 743-748. Veith G. D., DeFoe D. L. & Bergstedt B. V. (1979). Measuring and estimating the bioconcentration factor for chemicals in fish. J. Fish. Res. Bd Can. 36, 1040-1048. yon Szentpaly L. V. (1984). Carcinogenesis by PAHs: a multilinear regression on new type PMO indices. J. Am. Chem. Soc. 106, 6021-6028. Walpole A. L. (1958). Carcinogenic action of alkylating agents. Ann. N.Y. Acad. Sci. 68, 750-761. Wold S. & Dunn W. J. (1983). Multivariate QSARs: conditions for their applicability. J. Chem. Inf. Comput. Sci. 23, 6-13. Yuta K. & Jurs P. C. (1981). Computer-assisted structure-activity studies of chemical carcinogens: Aromatic amines. J. medl Chem. 24, 241-251.