Application of chemometrics to the screening of hazardous chemicals A case study

Application of chemometrics to the screening of hazardous chemicals A case study

w OriginalResearch Paper 155 Chemometricsand IntelligentLaboratory Systems, 16 (1992) 155-167 Elsevier Science Publishers B.V., Amsterdam Applicat...

1MB Sizes 0 Downloads 27 Views

w

OriginalResearch Paper

155

Chemometricsand IntelligentLaboratory Systems, 16 (1992) 155-167 Elsevier Science Publishers B.V., Amsterdam

Application of chemometrics to the screening of hazardous chemicals A case study * Maria Livia Tosato Structure Activity Research Group, Istituto Superiore di Sanit;, Kale Regina Elena 299, Roma (Italy)

Rossano Piazza and Claudio Chiorboli Centro di Fotochimica de1 CNR, Universitcidi Ferrara, VU Borsari 46, Ferrara (Italy)

Laura Passerini and Anna Pino Structure Activity Research Group, Istituto Superiore di Sanitri, Viile Regina Elena 299, Roma (Italy)

Gabriele Cruciani and Sergio Clementi Chemometrics Laboratory, Dbartimento di Chimica, Universitli di Perugia, via Elce di Sotto 10, Perugia (Italy)

(Received 12 September 1991; accepted

20 February

1992)

Abstract

Tosato, M.L., Piazza, R., Chiorboli, C., Passerini, L., Pino, A., Cruciani, G. and Clementi, S., 1992. Application of chemometrics to the screening of hazardous chemicals. A case study. Chemometricsand IntelligentLaboratory Systems, 16: 155-167. The key aspects of a priority setting process for risk assessment of chemicals are reviewed. The process is based on the assumption that chemical and biological properties can be modelled within series of closely similar compounds. It rests heavily on multivariate methodologies for the development of predictive models. Statistical design, in particular, is essential for optimizing the selection of the most informative objects and variables to be used in the calibration of the models. A case study is presented where the process is applied to all (EEC-wide) commercially available halo-methanes, -ethanes and -propanes with the objective of modelling their tropospheric reactivity and, hence, estimating their lifetimes. A training set of just nine, properly informative haloalkanes was first selected from a series of 142; their reactivity was expressed in terms of the rate constant of their reaction with

Correspondence to: Dr. M.L. Tosato, Structure Activity Research Group, Istituto Superiore di Sanita, Viale Regina Elena 299, Roma, Italy and to Dr. C. Chiorboli, Centro di Fotochimica de1 CNR, Universitl di Ferrara, Via Borsari 46, Ferrara, Italy. 0169-7439/92/$05.00

* Paper presented at the 2nd Scandinavian Symposium on Chemometrics (SSCZ), Bergen, Norway, 28-31 May 1991.

0 1992 - Elsevier Science Publishers B.V. All rights reserved

156

M.L. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 15%167/0ri&al

Research Paper

n

the OH radical, k(OHk their st~cture was parametrized by a selected set of variables that, by using a new statistical method, were found to optimize the k(OH) model. The experimentally validated model provided &OH) estimates, and hence allowed the formulation of a persistency ranking for the non-tested compounds in the series, which also included several hydro-chlorofluorocarbons, H-CFCs, presently considered as possible replacement chemicals for the extremely stable, environmentally incompatible ozone-depleting CFCs.

INTRODUCTION

Under the pressure posed by emerging socioeconomic issues, chemical risk assessment is a rapidly developing, multifaceted discipline. Its far from trivial goal is to achieve some understanding and provide some kind of measures of how and how much a chemical may be capable of damaging humans and/or the environment. This entails the collection, development, and combination for any potentially hazardous chemical - of information regarding its likely exposure levels and intrinsic capability to harm biotic and abiotic targets at risk. Among countless and complex problems to be addressed in this area, a crucial one at the present time derives from the fact that what is currently known about thousands of chemicals in use is, in most cases, just their chemical formula. Therefore, ~nsidering how high the costs of testing are, we need to address the non-trivial question of ‘where to start from’. In order to find appropriate answers, it is essential to define some kind of strategy for systematically filling the gaps in at least basic data, and use such data for screening chemicals and setting priorities for risk assessment studies.

ing effort. It was suggested that, as a first step, inventories listing chemicals in use should be scrutinized in order to subdivide chemicals by classes or, more pra~atically, to sort all chemicals belonging to already recognized h~ardous classes (e.g., aromatic amines, halo-aliphatics, nitro-aromatics, etc.), or to any class of specific interest. This will provide complete series of chemicals for further processing, series by series, through the sub~quent steps 2-6. Briefly: following the parametr~ation of the structures (step 21, a training set for the series is identified by a statistical design (step 31, is then tested (step 41, and used for calibrating multivariate quantitative structure-activity relationship (QSAR) models (step 5). Finally, the predictive abili~ and the range of the models are assessed (step 6). Validated models will permit us to estimate missing data for the compounds that belong within the range of the model, and to formulate ‘hazard’ rankings, thus providing the basis for identifying priorities for additional studies. So much for the framework. Validated models will, however, also provide several other items of information, including, for example, model parameters highlighting the importance of each descriptor with respect to the investigated endpoints and some aspects of the reaction mechanisms.

A priority settingframework

Present study

As a ~nt~bution in this area, a stepwise strategy has been formulated [l-51 which provides a general framework for a transparent, systematic, and cost-effective screening of toxic substances. The approach strives to incorporate all available information about chemicals (their structures first of all), as well as state~f-the-a~ chemometric methods for sound modelling with minimal test-

In the following, the first case study is reported. This starts from scratch and goes through each of the above steps. The study addresses, and provides reasonable solutions to, problems faced in the real world. In particular, the problem deriving from the absence, in practice, of adequate and consistent set of data for the paramet~zation of the structures is given a new, possible solution.

Background: need for priorities

m

M.L. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 155-167/Original

The objective of the study was to fo~ulate a persistency ranking for a series of volatile haloalkanes for which the persistency data are essential for hazard assessment. The series includes already recognized carcinogens, mutagens, neurotoxicants, etc. It also includes chloroffuoroalkanes, CFCs, and hydr~chloroff uoroal~nes, H-CFCs. The former, owing to their extreme stability, are considered to act as stratospheric ozone depleters and, therefore, are to be phased out of industrial production and use. The latter, owing to their lower stability, are expected to include ‘en~ro~entally compatible’ replacement compounds. In view of the objective mentioned above, it was considered that the most important degradation pathway for these chemicals is their gas-phase reaction with the na~rally occurring hydroxyl radical (OH * 1, the rate-determ~ing step of which is the hydrogen abstraction C,H,X,

+ OH * ---!L

157

Research Paper

Chemicals All the halogenated alkanes that are commercially available in the EEC countries were initially sorted from the European Inventory of Existing Chemical Substances, EINECS. Their subdivision on the basis of the number of carbons, and the halogenation pattern is shown in Table 1. For a number of practical reasons (especially the limited availabili~ of adequate e~erimental data for the rate constant to be modelled) the compounds considered for the present investigation were the halo-methanes, -ethanes, and -propanes which, after exclusion of eighteen iodinated compounds, numbered 142 (see Appendix). Structural and reactivity data

H,O + C,H,_tX,*

Structural data Fourteen variables were initially considered for the structural parametrization of the compounds. These were physic~hemi~l properties (namely, molecular weight, boiling and melting points, density, refractive index, Van der Waals volume, octanol-water partition coefficient, and ionization potential) as well as indicator variables such as the punting of each type of atoms carbons, chlorine, fluorine, and hydrogens - and

(X = halogen) followed by a rapid decay of the organic radical into degradation, down to mineralization, products [61. The information of interest for formulating a persistency ranking of haloalkanes is therefore their reaction rate constant (k), hereinafter k(OH). Accordingly, the study focused on the const~~ion of a k(OH) model. TABLE 1 The ensemble of industrial haloalkanes C-chain length

No. of compounds

Br

cl

62

4 4 1 6 5 1 2 3 2 7

4 8 6 18 13 7 7 6 2 16

4 10 16 21 10 6 4 5 1 12

4 8 2 3 3 2 2 3 1 9

18 33 38 15 7 10 6 6 2 18

402

35

87

89

38

153

CS

34 63 63 63 38

c6

26

Cl c2

C3 c4

C,

22

c8

23

c9

c,o-Go Total

No. of compounds haIogenated with

8

* With two or more different halogens.

I

F

X’

1.58

M.L. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 15%167/01igina1 Research Paper

of all the halogen atoms. This set of descriptors had, in fact, already been used in previous studies with small series of haloalkanes [7,8]. However, so many variables were not available for many of the compounds investigated in the present study. It was therefore necessary to select smaller sets of variables, yet ones which are capable of providing an adequate multivariate description of the compounds. Two different approaches were taken, according to the two different purposes of the parametrization. (i) For the purpose of carrying out a statistical design, a subset of five variables was selected on the basis of pragmatic considerations. These variables were: the number of carbons, fluorines, chlorines, bromines and hydrogens. (ii) For the purpose of constructing a k(OH) model, a subset of six was selected by a newly developed statistical method (see below). It turns out that these were the same as those above, plus the total number of halogens. Reactivity data Experimental values for the rate constants under tropospheric conditions were available for 33 compounds. A small number had been measured in our laboratory according to an agreed standard method, and have already been reported 171.The majority of the k(OH) values have been taken from a review paper [6] that lists all the values reported in the literature up to 1986. The review, in some cases, provides ‘recommended’, and in some other cases, indicates ‘non-recommended’ k(OH) data. Prior to data analysis, the rate constants - expressed in cm3 molecule-’ s-l were transformed into the corresponding decimal logarithms.

8

A two-level ( + / - ) fractional factorial design, FFD [121, was used for selecting a training set out of 142 compounds on the basis of four PCA-derived design variables. A modified PLS version was used for computing the standard deviation of error of prediction, SDEP [131, and to check the model dimensionality with the highest predictive capacity. The method is based on the leave-N-out concept. The data set is subdivided into N groups, and N PLS are run so that each group is excluded once and used for predictions for each model dimension; using such predictions the corresponding standard deviations are computed and stored. The process is repeated several times, each time following a random permutation of the N groups. The final SDEP values (obtained by averaging the stored results), allow the identification of the PLS component with the highest predictive capacity as the one with the lowest SDEP value. A newly developed method [14] has been used for the selection of the modelling variables that optimize the predictive capacity of the model. In regard to our present application, the process (a)-(d) described below was applied to the training compounds. Briefly: (a) the compounds were parametrized by the complete set of fourteen variables (see Experimental Section); (b) a number of subsets of variables (including one to fourteen variables) were selected by a statistical design (here FFD); (c) the SDEP analysis was carried out with each of these subsets; and (d) the optimal subset of variables was identified as the one providing the k(OH) model with the lowest SDEP value. (The resulting optimal subset was reported in the previous section.)

RESULTS AND DISCUSSION STATISTICAL METHODS

Selection of a training set

Principal component analysis, PCA [91, and partial least squares in latent variables, PLS [lO,ll] were the multivariate data analytic methods used for deriving design variables for the statistical selection of the training set, and for constructing the k(OH) model, respectively.

Parametrization of structures As has already been mentioned, in the absence of a rich and consistent set of descriptors for each of them, our 142 compounds were parametrized by simply counting their carbons, fluorines, chlorines, bromines and hydrogens. In fact, for the

N

M.L. Tosato et al. /Ch-enwm.

Intell. Lab. Syst. 16 (1992) 15.G167/0riginal

identification of a representat~e training set, it was considered to be sufficient to span a few structural variables that are relevant to the problem being investigated. In regard to the present application, it is well known that the atmospheric degradation of haloalkanes is strongly influenced by factors such as the carbon chain length, the number of hydrogens, and the number and type of halogens. Design variables In order to further reduce the dimensionali~ of the problem and, hence, to keep the size of the training set as small as possible, a PCA of the data table (142 objects X 5 variables) was carried out. The analysis contracted the five variables to four PCs, all together explaining (as expected) practically all of, and each one explaining equiva-

Research Paper

1.59

lent fractions of, the variance in data. The first component received the highest contribution from the number of hydrogens and carbons, the second from the number of chlorines and bromines, the third from the number of fluorines, and the fourth from all variables to practically the same extent. The score plots (see, for example, Figs. 1 and 2) indicated a rather homogeneous distribution of the investigated compounds in the latent variables space. The four PCs therefore seemed adequate variables to be spanned in the selection of a training set for the whole ensemble of the investigated compounds. Statisticaldesign Application of the two-level FFD in four design variables provided eight ‘experiments’ (here impounds). They are listed in Table 2 with their

3

2

1

0

-1

-2

-3

Fig. 1. Score pfot from the PCA showing the second against the first principal component. FFD selected training compounds are in brackets.

M.L. Tosato et al. /Chemom.

160 TABLE

2

FFD design points and training set

No. 16 10 81 30 35 71 127 78 31

Sign combination

_ + + + +

_ + + + +

+ + + +

+ + + +

0

0

0

0

Compounds

log k(OH)

CH,Br CHFCl 2 CH,Br-CHBr-CH,CI CHCI,-CHCI, CH,Br-CH,Br CF, -CHF, CH,-CHCI-CH, CF,-CH,CI CHCl,-CH,CI

-

13.41 13.52 12.36 12.60 12.60 14.60 11.80 13.79 12.48

sign combinations and rate constants. In order to ensure a better mapping of the structural space, the training set was expanded to include one

Intell. Lab. Syst. 16 (1992) 155-167/0riginal

Research Paper

W

additional, centre-point compound. As may be observed from Table 2, the set well reflects the structural composition of the series, as it includes compounds with l-3 carbon atoms and varied halogenation patterns. It also spans the range of variation of the rate constants rather well, as, in fact, the log k(OH) values for our haloalkanes range from about - 15 to about - 11. However, it should also be noted that some borderline compounds (see score plots) may not be well represented by the identified training compounds. A better selection was not easy to work out. The present one resulted, in fact, from a compromise between two contraints: an adequate location of compounds according to the FFD rules, and the need for high quality experimental data for the training of a QSAR model. In our application the selection was therefore driven towards com-

3

(- +I

132

131

I+ +I

2 136 109 40

I

0

104

118 120 121 PI

----14

-

128

-1

-2

-3 -3

1

-1

3

PC3 Fig. 2. Score plot from the PCA showing the fourth against the third principal component. brackets.

FFD selected training compounds

are in

w

M.L. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 155-167/0riginal

pounds for which ‘recommended’ were available [61.

k(OH)

data

Research Paper

161

feasible to construct a k(OH) model by simply using descriptors drawn from the chemical structure: i.e., a model that will be applicable to all the compounds considered, as well as to any similar one, existing or not yet synthesized. In more general terms, this finding showed that the use of several descriptors, no matter how collinear they are, does not necessarily improve the model.

Model development Modelling variables By using the selection method described above, the modelling variables that maximize the predictive ability of the model turned out to be just six out of the fourteen originally considered, namely: the number of each of the five atoms in the series and the sum of halogens. This was not very surprising as a substantial collinearity among several of the fourteen descriptors (particularly among the physicochemical properties) had in fact already been noted [S]. This finding had a positive practical impact on our application. It showed, in fact, that it was

PLS analysis The PLS method was applied to relate the log k(OH) to the above mentioned structural data. The results of the analysis are summarized in Table 3 and in the loading plot in Fig. 3. A model with two highly significant dimensions, according to cross-validation [15], could explain as much as 98% of the variability of k(OH). According to the first and most important dimension, k(OH) is

1, 0.9 -

Cl

0-e -

0.7 0.6 0.5 0.4 0.3 0.2 0.1 ;-

IX C

0 -0.1

H

-

-0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.6 -0.9 -1

F

-

Br

I 4

I

I

-0.8

I

I

-0.6

I

I

-0.4

I,

l

-0.2

I

0

loading vector

I

0.2

I

I

0.4

II

I,

0.6

l

0.6

l

1

I

Fig. 3. Loading plot with vector 2 against vector 1. The symbols C, H, Br, Cl, and F indicate the number of carbons, hydrogens, bromines, chlorines and fluorines, respectively; X indicates the total number of halogens.

ML. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 155167/Originul

162 TABLE 3

Variance

explained (%I

X block

Y

1

23

2

16

89 9

Total

39

98

w

ber of compounds. These were the fully halogenated bromo-fluoro-propanes (Nos. 104, 120, and 132, which can be seen in the score plots at the very borders of the X range of the series), pentabromoethane (No. 75) and the group of fully halogenated chloro-fluoro-propanes.

PLS model parameters dim

Research Paper

Model validation

lowered by an increasing number of fluorines and number of halogens, whereas it is increased by increasing the number of carbons, hydrogens, and bromines. According to the second dimension, the number of chlorines has a positive correlation, whereas the number of fluorines and bromines are negatively correlated to k(OH). The fit of the training compounds to the model was quite good, since R* and the standard deviation, SD, were 0.98 and 0.11, respectively. This result was promising. However, the reliability of the model had to be checked experimentally. Particular attention was, possibly, to be paid to those structural areas where a somewhat high residual standard deviation was noted for a num-

External validation Experimental values of k(OH) were available for 24 compounds that were not included in the training set. The comparison between experimental and predicted rate constants is shown in the correlation plot in Fig. 4. A group of four outliers can be noted whose predicted log k(OH) is, in all cases, about one order of magnitude higher than the actual one. An interpretation of this finding, already noted in previous studies [7,8], may be attempted, if we consider that the outliers are all those haloalkanes in our series that are characterized by the highly asymmetric halogenation pattern CH,-CX, (with X = halogen). Their low actual rate constant may be caused by an unusually high C-H bond energy [16], which is likely to

-12

-13

-

-14

-13

-14

CALCUIATED / PREDICI’ED log k(OH) Fig. 4. Correlation brackets.

plot of experimental

against

calculated/predicted

log k(OH)

values.

The training

set compounds

are in

W

M.L. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 155-167/0tiginal

reduce the frequency of their active collisions with the OH radicals. Hence, the failure to correctly estimate their rate constants may depend on the absence of bond energy descriptors in the parametrization. This point needs additional investigation. However, for the time being, this well-identified type of structures is outside the range of application of the model. The predictive ability of the model was therefore evaluated on the remaining 20 compounds for which Q* (= R* calculated on the validation set compounds) is 0.88 and the corresponding SD is 0.22. No experimental k(OH) values were available for the groups of haloalkanes with a somewhat large residual standard deviation. Their predicted k(OH) values should therefore be regarded with caution. A warning is also necessary in respect of the predictions for the fully halogenated alkanes in general, since no experimental k(OH) was available for comparison, other than a ‘non-recommended’ one [6]. SDEP analysis The PLS model dimensionality and its predictive ability were further checked by applying the SDEP analysis (see Methods Section). All the

-14

-13

u

Research Paper

I

163

1

NO. OF PLS

J

I

COMPONENTS

Fig. 5. Variation of the SDEP parameter with the number of PLS components.

compounds with experimental k(OH) data, except the outliers, were included in this analysis. The results are summarized in Fig. 5 which shows that the ‘best’ model dimension is the second one, where the SDEP parameter reaches its minimum value (0.21).

-12

-II

Predicted log k(OH)- YODEL 1 Fig. 6. Correlation plot of the log k(OH) values predicted by Model I and Model II, respectively, for fourteen haloalkanes.

164

ML. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 155-167/0riginal

These findings perfectly match those previously obtained in the traditional PLS analysis and in the external validation process, respectively. It is worth noting that the ‘small’ statistically designed training set is capable of giving results that are as good as the much larger one used for the SDEP analysis. However, examples have been reported where statistically designed training sets give a much better model than very large, non-designed ones [17]. Comparison of k(OH) estimates from different models

The predictive ability of the described k(OH1 model based on six indicator variables (call this Model I> was compared with that of a previously developed model [8] where the training set (slightly different from the present one) was parametrized by the whole set of 14 variables (call this Model II). The statistical parameters R2 and Q2 were the same or practically the same (0.98 and 0.88, for Model I; 0.97 and 0.90 for Model II, respectively). Furthermore, the k(OH) values predicted by Models I and II for the non-tested compounds were closely similar, as is shown in Fig. 6. The plot shows the fourteen haloalkanes for which predictions from both Model I and Model II were available. It is therefore confirmed that the small set of indicator variables is at least as adequate for k(OH) modelling as the much larger set which includes physicochemical variables: variables which are most often not available. Persistency ranking

The data analysis and the validation process permitted a good assessment of the range of the model and its reliability. A persistency ranking can therefore be formulated with confidence for the relevant haloalkanes. This can be done by using, for instance, their half-times, i.e., r = (k(OH) x [OH *I}- ’ and assuming an adequate value for the OH radical concentration in the troposphere (e.g. 5 x lo5 molecules/cm3 is an agreed global diurnal mean>. As an example, Table 4 provides the persistency ranking of the 28 H-CFCs in our series, i.e., the type of hydro-

Research Paper

n

TABLE 4 Ranking of hydro-CFCs on the basis of their tropospheric half-life No.

Compound

l

;da ys) 71 101 81 86 80 105 106 9 29 52 55 78 85 97 110 115 116 73 79 83 96 102 89 140 107 112 98 114

CF, -CHF, CF,-CHCI-CF, CHCIF-CF, CHF, -CHF, CH,F-CF, CF,Cl-CF, -CH 2F CHFCI-CF, -CHF, CHF,CI C&F, CF,-CHCl, CF,Cl-CHFCl CF,-CH,Cl CH 2F-CHF, Ccl,-CF,-CHF, CHCl,-CF,-CHF, CHF, -CC1 2-CHF, CHFz-CHCl-CHF, CHFCl-CHFCI CH,Cl-CClF2 CH,F-CH,F CF,-CHCl-CH,Cl CF,-CH,-CHCl, CF,-CH,-CH,Cl CF,-CH,-CH, CF&l-CH,-CH,Cl CF,Cl-CHCl-CH, CHF,-CHCl-CH, CFCI,-CH,-CH,

9534 7236 3831 3387 3387 2570 2570 2029 1793 1540 1540 1361 1203 1168 1033 1033 913 547 547 427 367 367 324 287 130 130 115 46

Assuming an OH radical concentration of 5 X lo5 molecules cmm3. l

genated CFCs that are candidate replacement chemicals for the fully halogenated CFCs. As can be seen, none is short-lived. However their halftimes range from well below one year (for a few compounds) up to around thirty years (for two of them). Whereas the former are likely to undergo degradation before reaching the stratosphere, the latter are quite unlikely to degrade and may therefore well have an impact on the ozone layer.

CONCLUSIONS

The study has shown the remarkable effectiveness of a chemometrics-based strategy for dealing

n

ML. Tosato et al. /Chemom. Intell. Lab. Syst. 16 (1992) 155-167/Original

with the screening of chemicals and priority setting. In particular, statistical designs offer unique tools for optimizing the selection of the smallest possible number of highly informative compounds. Statistical designs also proved the basis for the new method described here for selecting the most informative variables, thus allowing the testing effort to be minimized while maximizing the predictive capacity of a model. Because of this, the new method provides a means for substantially improving the screening strategy; it therefore deserves to become an integral part of the process. As for the developed k(OH) model, it can certainly be improved; for example, additional investigations are necessary in order to: (i> examine its validity within poorly explored structural areas; (ii> understand the reasons why the CH,CX, compounds are outliers; and (iii) distinguish isomeric structures. However, the model seems to work well and, most important within the scope of this work, it uses simple descriptors that are drawn directly from the chemical formula: the kind of models that regulators seek for expediting the management of poorly tested existing and new chemicals, but, as discussed, which cannot be used straight away without carefully considering the limits of application.

ACKNOWLEDGEMENTS

This study has been supported by the CNRENEL (National Research Council-National Agency for Electric Energy) Project “Interaction of Energy Systems with Human Health and the The provision from HIMONT Environment”. S.p.A of a grant to R.P. is also gratefully acknowledged.

APPENDIX

Table Al lists the investigated haloalkanes and their observed and calculated/ predicted log k(OH). For the training (marked with T) and the validation (marked with V) compounds, their respective observed and calculated or predicted

Research Paper

165

log k(OH) values are reported. An asterisk marks the outlier structures CH,-CX,. Italicized compounds are those (among the non-tested haloTABLE

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Al (obs.1

Name --. cc1

4

CH ,Cl 6’) 2

- 12.85

CHBr, CH,CI (v) CH ,ClBr

CBr, CHCl, (VI CF,Cl z CHF,Cl Q CHFCl 2 (T) CF,ClBr CF,Br CFCI 3 CHBrCl, CHBr,CI CH,Br (T) CF,Cl

- 13.36

- 12.99 - 14.33 - 13.52

- 13.41

CF4

CHF, CF,Br,a CBr,F CFBrCl 2 CH ,ClF (V) CBrCl,

- 13.36

CI%Br, CBr,Cl, CH,F (V) CHF, Br

- 13.77

CH,F, (V) CHCl ,CHCI 2 (T) CHCl ,CH 2CI (T) CH ,ClCH ,Cl (V) CH ,ClCH 3 Q CH,BrCH,CI CH,BrCH,Br (T) CHCI,CH, (V)

-

13.96 12.60 12.48 12.66 12.40

- 12.60 - 12.59

CHBr,CHBr, Ccl,-CHCl, Ccl,-CH, * ccl,-ccl, CH,Br-CH, (V) CF,CI-CF,Cl CFCI,-CF,Cl CF, -CF,Cl CFCI,-CFCl, Ccl, -CF,Cl CCl,-CF, CF,CI-CH, *

- 13.92 - 12.46

- 14.4s

(pred.) - 13.20 - 13.10 - 13.69 - 13.04 - 13.28 - 13.92 - 13.15 - 14.00 - 13.94 -13.55 - 14.18 - 14.57 - 13.60 - 13.33 - 13.51 - 13.22 - 14.39 - 14.79 - 14.34 - 14.36 - 14.14 - 13.78 - 13.49 - 13.38 - 13.46 - 13.56 - 13.44 - 14.12 - 13.89 - 12.58 - 12.53 - 12.47 - 12.42 - 12.65 - 12.83 - 12.47 - 13.30 - 12.64 - 12.53 - 12.69 - 12.60 - 14.27 - 13.88 - 14.67 - 13.48 - 13.48 - 13.88 - 13.32

(Continued)

166

M.L. Tosato et al. /Chemom.

alkanes)

that

were

found

to deviate

somewhat

from the X range of the model and whose predictions are uncertain. The reader should also

Intell. Lab. Syst. 16 (1992) 155-167/0riginal

general be regarded

Al (continued)

No.

Name

(ohs.)

(pred.)

TABLE

CHF, -CH 3 (VI CF2Br-CH,Br CF, -CHCIBr CF,-CHCl, (V) CF,Br-CHFCI CF,Cl-CHCl, CF,Cl-CHFCI CHBr, -CH, (V) CH,Cl-CH,F CFCl 2 -CF, Ccl,-CH,Cl CFBr, -CF, CHBr, -CHF, CClBr, -CF, CF,-CH,Br CBr3 - CH, Br Ccl,-CH,Cl CHBr, - CHBr, CH,F-CH, 0’) CF,Br-CHFBr Ccl,-CHFCI CHCl,-CFCI, CF, -CHF, CT) CFCl 2 -CF, CHFCI-CHFCl CCl,Br-CH,Br CBrJ - CHBr, CF,Br-CF,Br CHBr,-CH,Br CF,-CH,CI CT) CH,Cl-CClF, CH,F-CF, (V) CHCIF-CF, (V) CH,-CCl,F l CH,F-CH,F (V) CF3-CH, * CH,F-CHF* (V) CHF, -CHF, (V) CH2Br-CHBr-CH,Cl CH*Br-CHBr-CH,Br CF,-CH,-CH,Cl CH,Br-CH,-CH2Br CH,Br-CHBr-CH3 CH,Cl-CHCl-CH, CHCl,-CH,-CH3 CH,Cl-CM,-CH, CH&l-CHz-CH,Cl CF, -CHCl-C&Cl

- 13.47

-

No.

- 13.47

- 12.60

- 12.64

- 14.60

- 13.79

(T)

-

14.07 13.99 14.14 12.96 14.70 13.74 14.28 12.36

13.27 13.73 14.00 13.82 14.00 13.43 13.82 12.83 12.87 14.27 12.58 14.63 13.73 14.24 13.95 13.30 12.58 13.30 12.82 14.18 13.03 13.03 14.61 14.27 13.37 12.94 13.53 14.63 13.07 13.77 13.37 14.17 14.22 12.92 13.27 13.72 13.72 14.17 12.26 12.44 13.15 12.21 12.21 11.85 11.85 11.80 11.85 13.20

n

remember (see text) that the predicted log k(OH) values for the fully halogenated alkanes should in

TABLE

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96

Research Paper

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142

with caution.

Al (continued) Name

CC12-CF, -CHF, CHF, -CiiCl-CI-i, CF3-CHBr-CH,Br CFCl,-CHCl-CF,Cl CF,-CHCI-CF, CF,-CH,-CHCl, CHCl,-CHCl-CH, CF,Br-CFBr-CF, CF,Cl-CF,-CH,F CHFCl-CF, -CHF, CF,CI-CM,-CH,Cl CF,Cl-Ccl,-CF, Ccl,-CF,-Ccl, CHCl 2 -CF, -CHF, CF,-Ccl,-CH, CF,CI-CHCl-CH, Ccl,-CH,-CH, CFCl,-CH,-CH, CHF, -Ccl, -CHF, CHF, -CHCl-CHF, Ccl,-CFCl-CF, CF, Br - CF, - CHF, CH,-CFBr-CH, CF, Br - CF, - CF, CF,Cl-CF,-CF, Ccl,-CHCl-CH, Ccl,-CH,-CH,Cl Ccl,-CF,-CF,Cl CH,Br-CBr,-CH, CH,Br-CH,-CH,Cl CH,-CHCl-CH, (T) CF, - CF, - CF, CH,-CBr,-CH, CH,-Ccl,-CH, CCl,-ccl,-ccl, CBr,F-CF,-CFBr, CF,-Ccl,-CF, Ccl, - CF, - CF, CHCl,-Ccl,-CHCl, Ccl,-Ccl,-CHCl, CH,Br-CH,-CH3 CH,Cl-CHCI-CH,Cl CH,Cl-CCl,-CH, CF,-CH,-CH, CHBr,-CHBr-CH, CHCl,-CHCl-CH,Cl

(ohs.)

- 11.80

(pred.1 -

13.70 12.70 13.56 13.31 14.49 13.20 11.90 14.91 14.05 14.05 12.75 14.15 12.96 13.65 13.20 12.75 11.90 12.30 13.65 13.60 13.76 14.67 12.43 15.12 14.94 11.96 11.96 13.76 12.44 12.03 11.80 15.34 12.21 11.85 12.17 14.48 14.55 14.15 12.07 12.12 11.98 11.90 11.90 13.09 12.44 11.96

n

h4.L. Tosato et al. / Chemom. Intell. Lab. Syst. 16 (1992) 155-167/0riginal

REFERENCES M.L. Tosato, D. Cesareo, S. Galassi, L. Vigano, G. Cruciani, S. Clementi and B. Skagerberg, Multivariate design in toxicological evaluation of chemical data, Chimica Oggi, 3 (1988) 41-45. J. Jonsson, L. Eriksson, M. Sjostrom, S. Wold and M.L. Tosato, A strategy for ranking environmentally occurring chemicals, Chemometrics and Intelligent Laboratory Systems, 5 (1989) 169-186. M.L. Tosato, L. Vigano, B. Skagerberg and S. Clementi, A strategy for ranking chemical hazards: framework and application, Environmental Science and Technology, 25 (1991) 695-702. L. Eriksson, A strategy for ranking environmentally occurring chemicals, Thesis, Ume% University, Umei, Sweden, 1991. L. Eriksson, M. Sjostrom and S. Wold, Rational ranking of chemicals according to environmental risks, Chemometrics and Intelligent Laboratory Systems, 14 (1992) 245-252. 6 R. Atkinson, Kinetics and mechanisms of the gas-phase reaction of the hydrozyl radical with organic compounds, Chemical Reviews, 86 (1986) 69-201. 7 M.L. Tosato, C. Chiorboli, L. Eriksson and J. Jonsson, Multivariate modelling of the rate constant of the gasphase reaction and halo-alkanes with the hydroxyl radical, Science of the Total Environment, 109/110 (1991) 307-325. 8 M.L. Tosato, C. Chiorboli, R. Piazza, L. Passerini and V. Carassiti, Estimation of the tropospheric reactivity of organics with the hydrozyl radical. A QSAR model for a group of haloalkanes. Quantitative Structure - Activity Relations, submitted for publication.

Research Paper

167

9 S. Wold, K Esbensen and P. Geladi, Principal components analysis, Chemometric and Intelligent Laboratory Systems, 2 (1987) 37-52. 10 S. Wold, A. Ruhe, H. Wold and W.J. Dunn III, The collinearity problem in linear regression. The partial least square approach to generalized inverses, SZAM Journal of Scientific Statistics and Computation, 5 (1984) 735-743. 11 A. Hoskuldsson, PLS regression methods, Journal of Chemometrics. 2 (1988) 211-228. 12 G.E.P. Box, W.J. Hunter and J.S. Hunter, Statistics for Experimenters, Wiley, New York, 1978. 13 G. Cruciani, M. Baroni, D. Bonelli, S. Clementi, C. Ebert, and B. Skagerberg, Comparison of chemometric methods for QSAR, Quantitative Structure- Activity Relations, 9 (1990) 101-107. 14 M. Baroni, G. Cruciani, G. Costantino, D. Reganelli and S. Clementi, Generating optimal linear PLS estimation, GOLPE: an advanced chemometric tool for handling 3-D QSAR programmes, Quantitative Structure- Activity Relations, submitted for publication. 15 S. Wold, Cross-validatory estimation of the number of components in factor and principal components models, Technometrics, 20 (1978) 397-405. 16 L. Silver and R. Rudman, Polymorphism of the crystalline methylchloromethane compounds. IV: The crystal and molecular structure of methylchloroform, Journal of Chemical Physics, 37 (1972) 210-216. 17 S. Hellberg, L. Eriksson, J. Jonsson, F. Lindgren, M. Sjiistrom, B. Skagerberg, S. Wold and P. Andrews, Minimum analog peptide sets (MAPS) for quantitative structure-activity relationships, International Journal of Peptide and Protein Research, 37 (1991) 414-424.